P3. Enriching ontology/knowledge base with domain embeddings knowledge

February 26, 2021

RQ: How do we best complete the definitions in a prototype ontology by enriching each defined entity with its position in the semantic space?
How do we best represent an existing domain specific ontology in a semantic space in order to detect the appearance of its entities in text? By identifying their semantic properties through word embeddings, what is the best way to integrate these properties into the ontology?

Description:
Ontologies have long been employed in the life sciences to formalize and reason over domain knowledge. In recent years, ontologies are increasingly used in machine learning models. However, in computational linguistics, the construction of ontologies for specific domains often requires significant manual annotation and relies heavily on expert knowledge. Injecting information into ontologies from textual data could automate this process and enrich ontologies with previously unused information.

Various approaches have been introduced to tackle this challenge and those that leverage word embeddings have attracted a lot of interest. For example, recent publications [1] [2] show that domain ontologies can be enriched by incorporating new properties from domain specific word embedding models. Another method jointly learns word and entity embeddings from an existing ontology and text that are then used to create contextual relations within the ontology [3].

In this project we will investigate ways of automatically constructing ontologies and ways of leveraging textual information to enrich existing ontologies. We will develop a basic ontology skeleton and train word embedding models based on given domain corpora. Then, we will implement a method to inject the textual information from the word embeddings into the constructed ontology. Finally, we will compare and evaluate the enriched ontology against the baseline ontologies to be identified.

Tasks:

Research existing methods for enriching ontologies with textual information.
Collect existing ontologies of a selected domain for baselines.
Find and/or create a basic ontology structure.
Train a word embedding model on the selected domain.
Develop a method to inject information from the word embedding into the basic ontology.
Compare and evaluate the newly enriched ontology against the baselines.

[1] A Web-scale system for scientific knowledge exploration (2018).
[2] A Word Embedding Analysis towards Ontology Enrichment (2019).
[3] Combining Word and Entity Embeddings for Entity Linking (2017).