How chemical researchers are applying artificial intelligence

Artificial intelligence is rapidly changing the way we work in various companies and industries around the world, including the chemical industry. Organizations are adopting these technologies to accelerate processes and reduce costs, as well as saving employees from tedious, mundane tasks. 

Accenture suggests that there are three ways of applying artificial intelligence in research across industries: 

  1. Reinventing the process to manage process change, rethinking standardized processes as continuously adaptive, and using AI across multiple processes.
  2. Rethinking human-machine collaboration; how companies can have an AI-enabled culture to reskill employees to work in alliance with machines.
  3. Utilizing data, making use of AI and data to solve previously unsolved problems and reveal hidden patterns.

In this article, we will explain how chemical researchers are applying artificial intelligence.

How chemical researchers are applying AI

There are three categories of chemical research that are affected by AI. The first category is molecule prediction — draw on known properties to predict new behavior. The second category is synthesis models, which predict how to create certain molecules in fewer steps and more reliable processes. The third is handling prior knowledge to make sense of what we already know —starting with data mining to find the right information. 

1. Case studies on molecule predictions

The pharmaceutical industry is one of the front runners in AI. In February 2020 the model in “A Deep Learning Approach to Antibiotic Discovery” was created, a model that translates molecules into vectors. It starts with every atom being represented with a vector of simple properties. This is used to create a fingerprint of the molecule’s structure, which helps the neural network to learn. 

The model was trained on tests with E.coli to see what molecular structures actually were antibiotic. Then it was applied to the Broad Institute’s drug repurposing hub – an open-access library of more than 6000 molecules with known biological activity. As a result, they discovered a compound called Halicin with impressive antibiotic activity, despite having a chemical structure unlike conventional antibiotics.

Following this success, the team applied their AI technique to a database known as ZINC15 — 107 million molecules were manually selected for screening. Based on the deep learning tool’s predictions, 23 compounds were chosen for further investigation. Two of these compounds showed promise against a range of drug-resistant E. coli.

In march 2020 Münster University published A Structure-Based Platform for Predicting Chemical Reactivity. The new tool is based on the assumption that reactivity can be directly derived from a molecule’s structure. It uses an input based on multiple fingerprint features as an overall molecular representation. Organic compounds can be represented as graphs on which simple structural (yes/no) queries can be carried out. Fingerprints are numeric sequences based on a combination of multiple queries. They were developed to search for structural similarities and proved well suited for use in computational models. For the most accurate presentation of the molecular structure of each compound, a large number of different fingerprints are used.

2. Finding the best synthesis method: expert system vs. machine learning

In 2018, The Defense Advanced Research Projects Agency (Darpa), the development agency of the United States Department of Defense, presented  a project where artificial intelligence was used to develop and find the best synthesis methods. The user can input any structure, either known or novel, and then a machine generates thousands or even millions of reaction sequences in order to end up with the final product. Reactions are being ranked and identified based on feasibility, cost, and other factors. Darpa has two ways of doing this. They can apply the expert system, a system based on 60 000 handwritten rules, which is effective but not scalable. Alternatively, they can encode each of the molecules to predict bond changes, using machine learning (much like on molecule predictions). The next step is having a manual help to filter the results and generate a shorter list of top candidates.

There are three fundamental problems in using the machine learning approach as opposed to the expert system. First, the challenges seen in using machine learning, in this case, is the data acquisition. There is missing information and biased reporting due to lacking reports on failed experiments. When it comes to reaction sequences that can be extracted from patents, not all information is going to be reported in the same place.

The second disadvantage is data representation, meaning how this data is presented and explained to a machine in a comprehensive way. The data format needs to be considered and determined — whether the data is presented in formulas, images, features, properties, etc. 

The third problem is the exploration space. That space is so much vaster than the information we have available. That raises questions about how to teach a chemistry engine to invent new potential molecules and pathways when we don’t have data on that at all. 

There is a model called Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction which can predict the outcome of a chemical reaction with much higher accuracy than trained chemists, and it will suggest ways to make complex molecules. However, it needs a lot of data and in a very specific text-based format called SMILES (simplified molecular-input line-entry system) that has been data mined from patents. In the end, the preparations to use it for a specific use case might not be worth it from a cost perspective. 

3. Organizing knowledge

Artificial intelligence is already used in prior art. There are a few existing and future inventions in that area which will change the current process radically. The first and most basic invention already in use is smarter search. Automated literature reviews is the second step, which we have been working on for the past five years at We’ve gotten to semi-automation, meaning the search needs human-machine collaboration. 

The next frontier that we are working on is identifying specific insights from text. The first step is advanced data extraction and linking, which we have developed in our Extract tool. The PDF to be extracted is sent to the system. This PDF can be a patent, a clinical trial report, a research paper or any other relevant type of scientific content. It can be one simple document at a time, or hundreds or thousands of them in a batch. The engine extracts the text and identifies all the domain specific entities, then locates the tables and extracts the data from rows and columns, and links the data between the text and table. Graphs, figures and other elements go through the same process. Then the engine populates a pre-defined output in a machine readable format; an excel sheet, an integrated lab tool, a database or anywhere else your researchers require. 

What’s important in this step is the self-assessment module which communicates to the human researchers how confident the machine is in its results, to give the human guidance on where to do the most rigorous manual verifications. 

In the long-run, we expect to see developments in hypothesis extraction from the prior art, knowledge validation based on prior art, and lastly, drawing new conclusions and finding new hypotheses from all of the existing prior arts. 

Automating manual tasks vs. rethinking the imaginable 

There are two very different mindsets when it comes to applying AI in your organization. You can replace a human process and have a machine do the same activity but faster, for example, in extracting the data. Willingness to invest time and resources is needed, but there is clear ROI and known outcome and benefits. The second mindset is about activities that cannot be done by a human. For example, a machine can identify new potential application areas, meaning you need willingness to invest as well as rethink and re-imagine what’s possible (ROI will be unknown until you try).

Interpretability and explainability

One of the emerging fields in AI worth mentioning is interpretability or explainability. It is not just AI that tells if something will work or not, but explaining why. For example in molecule prediction, AI can predict that certain actions will cause an activity or property because of a specific area in the molecule or combination. As a result, it gives the chemist an immediate indication of how it could be altered if the reaction is unwanted. Similar to the data extraction tool that is working on, where every row and column will come with a machine-created self-assessment with a percentage of certainty.