Header
  • Home
  • Academia
    • Academic Solutions
    • Tools and Features
    • Free vs Premium
    • Try the free tools
    • Knowledge base
  • Corporate R&D
    • Industrial R&D solutions
    • Chemistry
    • Data Extraction
  • Initiatives
    • COVID-19
    • R4R
    • Project Aiur
  • About us
    • Team
    • Blog
  • Join us
  • Contact
Header
FREE SIGN UP
LOG IN
Header
project_2

P2. Building a classifier to mark words as chemical elements, properties, processes, and products

August 1, 2019
Contact

RQ: What is the best method to classify a concept occurring in a text into a fixed set of classes (such as e.g. “chemical element”, “chemical property”, “process”, …) given its position in the text and its context.

Description:
In order to build and enrich good knowledge graphs in the chemical domain using text mining of scientific documents, it is necessary to relate concepts occurring in the text to the classes in the knowledge graph. This is important for identifying occurrences of elements (nodes) of the knowledge graphs in the text, adding new nodes, or establishing/reinforcing links between existing nodes. Such a node/relation structure often exists as outside expert knowledge and the goal is to use text mining of extensive corpora of specific literature to improve and extend such knowledge graphs.

As a side effect, such a classification is also important for building disambiguated embeddings for concepts that can play different roles (i.e. belong to different classes, have different meanings) depending on how they appear in the text. This avenue will however not be explored in this project, just shows an amazing bi-product of a good classifier.

The goal is to establish a reliable and efficient method to perform concept classification. As an input the algorithm should take a context window (paragraph, sentence, 10 words, etc.) and as an output predict the correct class of all concepts in that context. The different classes and the relations between them are pre-given in the form of outside expert knowledge.

Tasks:

  1. Literature research of existing state of the art methods for concept classification
  2. Build corpus of annotated documents from given ontology of concepts, their classes and the relations
    between them
  3. Building of prototypes of increasing complexity and performance using – amongst others – word embeddings
    (Word2Vec), statistical methods (based e.g. on co-occurrences), RNNs and Attention based methods (LSTM,
    Transformer, BERT, …)
Sitemap
Credits
Footer

How chemical researchers are applying artificial intelligence

Introducing the AI Chemist

Why Chemistry and AI is such a good match