P3: Extending non-contextual word embeddings with word-sense disambiguation

Contact

RQ: What is the best way of extending a non-contextual word embeddings model, like Word2Vec, with word-sense disambiguation? How can we best represent different meanings of the same word in this enriched model? What is the best mechanism to determine the meaning of a word in a text, depending on its context? How can we best train such a word-sense disambiguation model?

Description:
Word embeddings offer semantic representations of words in a vector space, where vectors of semantically similar words point in similar directions [1]. They are therefore well suited for synonym, relatedness and analogy detection as well as calculating document similarities [2]. However, since the vector representations are learned from the co-occurrences of words in example text, there is no disambiguation of meanings for homonyms or polysemes. The embedding model represents all meanings of a word by the same vector, reducing the quality of the learned word embeddings.

Over the recent years, contextual embeddings, such as ELMo [3], and transformer models like BERT [4], have had great success in calculating disambiguated embeddings based on their context. However such models have a few drawbacks: (i) disambiguation based on context is continuous, rather than discrete into a set of meanings, (ii) such models don’t enable easy synonym detection via nearest neighbors and (iii) these models are usually expensive to train and deploy.

We would therefore like to develop a method for non-contextual embeddings for disambiguating word meanings based on their context. This would not only increase the quality of contextual synonyms, but also increase the accuracy of document similarities. Concretely, we envision a mechanism that should work like a classifier, which outputs the correct meaning of an input word based on its context. This enables the embedding model to learn separate vectors for each meaning of a word.

Three central goals of this project are thus (1) to determine whether this is the best approach for disambiguation; (2) to determine the best way such a classifier works with a word embedding model and how they can be trained together; and (3) compare the performance of the proposed mechanism against contextual embeddings.

Tasks:

  1. Literature review of existing approaches based on non-contextual embeddings.
  2. Review previously developed in-house approaches.
  3. Potentially redesign, improve above approaches and test word-sense classification mechanisms based on an input word and its context.
  4. Design and test methods for mutually training the embedding model and the classifier.
  5. Compare the developed method against non-contextual embeddings without word-sense disambiguation and context/transformer models over various tasks.

References:
[1] Efficient Estimation of Word Representations in Vector Space.
[2] Word importance-based similarity of documents metric (WISDM).
[3] Deep contextualized word representations.
[4] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.