Disambiguator - Approaches and Methods

Approaches and Methods

As in all natural language processing, there are two main approaches to WSD – deep approaches and shallow approaches.

Deep approaches presume access to a comprehensive body of world knowledge. Knowledge, such as "you can go fishing for a type of fish, but not for low frequency sounds" and "songs have low frequency sounds as parts, but not types of fish", is then used to determine in which sense the word is used. These approaches are not very successful in practice, mainly because such a body of knowledge does not exist in a computer-readable format, outside of very limited domains. However, if such knowledge did exist, then deep approaches would be much more accurate than the shallow approaches. Also, there is a long tradition in computational linguistics, of trying such approaches in terms of coded knowledge and in some cases, it is hard to say clearly whether the knowledge involved is linguistic or world knowledge. The first attempt was that by Margaret Masterman and her colleagues, at the Cambridge Language Research Unit in England, in the 1950s. This attempt used as data a punched-card version of Roget's Thesaurus and its numbered "heads", as an indicator of topics and looked for repetitions in text, using a set intersection algorithm. It was not very successful, but had strong relationships to later work, especially Yarowsky's machine learning optimisation of a thesaurus method in the 1990s.

Shallow approaches don't try to understand the text. They just consider the surrounding words, using information such as "if bass has words sea or fishing nearby, it probably is in the fish sense; if bass has the words music or song nearby, it is probably in the music sense." These rules can be automatically derived by the computer, using a training corpus of words tagged with their word senses. This approach, while theoretically not as powerful as deep approaches, gives superior results in practice, due to the computer's limited world knowledge. However, it can be confused by sentences like The dogs bark at the tree which contains the word bark near both tree and dogs.

There are four conventional approaches to WSD:

  • Dictionary- and knowledge-based methods: These rely primarily on dictionaries, thesauri, and lexical knowledge bases, without using any corpus evidence.
  • Supervised methods: These make use of sense-annotated corpora to train from.
  • Semi-supervised or minimally supervised methods: These make use of a secondary source of knowledge such as a small annotated corpus as seed data in a bootstrapping process, or a word-aligned bilingual corpus.
  • Unsupervised methods: These eschew (almost) completely external information and work directly from raw unannotated corpora. These methods are also known under the name of word sense discrimination.

Almost all these approaches normally work by defining a window of n content words around each word to be disambiguated in the corpus, and statistically analyzing those n surrounding words. Two shallow approaches used to train and then disambiguate are Naïve Bayes classifiers and decision trees. In recent research, kernel-based methods such as support vector machines have shown superior performance in supervised learning. Graph-based approaches have also gained much attention from the research community, and currently achieve performance close to the state of the art.

Read more about this topic:  Disambiguator

Other articles related to "approaches and methods, methods":

Disambiguator - Approaches and Methods - Local Impediments and Summary
... Unsupervised methods rely on knowledge about word senses, which is barely formulated in dictionaries and lexical databases ... Supervised methods depend crucially on the existence of manually annotated examples for every word sense, a requisite that can so far be met only for a handful of words for testing purposes, as it is done in the ...

Famous quotes containing the words methods and/or approaches:

    How can you tell if you discipline effectively? Ask yourself if your disciplinary methods generally produce lasting results in a manner you find acceptable. Whether your philosophy is democratic or autocratic, whatever techniques you use—reasoning, a “star” chart, time-outs, or spanking—if it doesn’t work, it’s not effective.
    Stanley Turecki (20th century)

    The Oriental philosophy approaches easily loftier themes than the modern aspires to; and no wonder if it sometimes prattle about them. It only assigns their due rank respectively to Action and Contemplation, or rather does full justice to the latter. Western philosophers have not conceived of the significance of Contemplation in their sense.
    Henry David Thoreau (1817–1862)