Word-sense Disambiguation - Approaches and Methods

Approaches and Methods

As in all natural language processing, there are two main approaches to WSD – deep approaches and shallow approaches.

Deep approaches presume access to a comprehensive body of world knowledge. Knowledge, such as "you can go fishing for a type of fish, but not for low frequency sounds" and "songs have low frequency sounds as parts, but not types of fish", is then used to determine in which sense the word is used. These approaches are not very successful in practice, mainly because such a body of knowledge does not exist in a computer-readable format, outside of very limited domains. However, if such knowledge did exist, then deep approaches would be much more accurate than the shallow approaches. Also, there is a long tradition in computational linguistics, of trying such approaches in terms of coded knowledge and in some cases, it is hard to say clearly whether the knowledge involved is linguistic or world knowledge. The first attempt was that by Margaret Masterman and her colleagues, at the Cambridge Language Research Unit in England, in the 1950s. This attempt used as data a punched-card version of Roget's Thesaurus and its numbered "heads", as an indicator of topics and looked for repetitions in text, using a set intersection algorithm. It was not very successful, but had strong relationships to later work, especially Yarowsky's machine learning optimisation of a thesaurus method in the 1990s.

Shallow approaches don't try to understand the text. They just consider the surrounding words, using information such as "if bass has words sea or fishing nearby, it probably is in the fish sense; if bass has the words music or song nearby, it is probably in the music sense." These rules can be automatically derived by the computer, using a training corpus of words tagged with their word senses. This approach, while theoretically not as powerful as deep approaches, gives superior results in practice, due to the computer's limited world knowledge. However, it can be confused by sentences like The dogs bark at the tree which contains the word bark near both tree and dogs.

There are four conventional approaches to WSD:

  • Dictionary- and knowledge-based methods: These rely primarily on dictionaries, thesauri, and lexical knowledge bases, without using any corpus evidence.
  • Supervised methods: These make use of sense-annotated corpora to train from.
  • Semi-supervised or minimally supervised methods: These make use of a secondary source of knowledge such as a small annotated corpus as seed data in a bootstrapping process, or a word-aligned bilingual corpus.
  • Unsupervised methods: These eschew (almost) completely external information and work directly from raw unannotated corpora. These methods are also known under the name of word sense discrimination.

Almost all these approaches normally work by defining a window of n content words around each word to be disambiguated in the corpus, and statistically analyzing those n surrounding words. Two shallow approaches used to train and then disambiguate are Naïve Bayes classifiers and decision trees. In recent research, kernel-based methods such as support vector machines have shown superior performance in supervised learning. Graph-based approaches have also gained much attention from the research community, and currently achieve performance close to the state of the art.

Read more about this topic:  Word-sense Disambiguation

Other articles related to "approaches and methods, methods":

Disambiguator - Approaches and Methods - Local Impediments and Summary
... Unsupervised methods rely on knowledge about word senses, which is barely formulated in dictionaries and lexical databases ... Supervised methods depend crucially on the existence of manually annotated examples for every word sense, a requisite that can so far be met only for a handful of words for testing purposes, as it is ...

Famous quotes containing the words methods and/or approaches:

    I think it is a wise course for laborers to unite to defend their interests.... I think the employer who declines to deal with organized labor and to recognize it as a proper element in the settlement of wage controversies is behind the times.... Of course, when organized labor permits itself to sympathize with violent methods or undue duress, it is not entitled to our sympathy.
    William Howard Taft (1857–1930)

    I should say that the most prominent scientific men of our country, and perhaps of this age, are either serving the arts and not pure science, or are performing faithful but quite subordinate labors in particular departments. They make no steady and systematic approaches to the central fact.... There is wanting constant and accurate observation with enough of theory to direct and discipline it. But, above all, there is wanting genius.
    Henry David Thoreau (1817–1862)