In this article we’ll discuss how R&D leaders leverage AI tools in patent landscape analysis to drive competitive advantage. We’ll cover three AI tools: patent and literature discovery, focus and data extraction.
The number of patents grows every year. In 2018, there were 3.3 million patent applications, representing a 5.2% increase from the previous year. That’s equivalent to around 9,000 patents a day, meaning there are tens of millions of published patents and patent application references available to review. Moreso, it’s generally assumed that 80% of published science and technology information contained in patents is not published anywhere else. This makes patent databases a gigantic treasury mine of knowledge.
That’s where patent landscape analysis comes into play, enabling R&D leaders to find “white spots” and gain competitive advantage in the market. Patent landscape analysis or “patent mapping” is a comprehensive study of a particular field of technology, enabling large businesses, universities, start-ups and research organizations to understand trends and explore rewarding business product development opportunities.
Advancements in artificial intelligence (AI) are presenting opportunities for R&D leaders to drive competitive advantage for their patent landscape analysis.
Three AI tools to drive competitive advantage
Over the past few years, AI has become a lot easier and more accurate for processing scientific documents. Three of the most relevant areas for R&D leaders and patent landscape analyses are:
Discovering patents and papers
Narrowing down huge lists of documents to a succinct reading list
Extracting key data from texts, tables and charts
Discovering literature and patents
Intellectual Property departments are often running patentability searches (also known as prior art search or novelty search) on submitted inventions from R&D departments. Using AI technologies, researchers can find similar patents based on a provided description of the company’s patent idea. The engine reads your provided texts and identifies key concepts based on similarity — saving researchers time, as they will quickly discover whether an idea already exists. The new-found time enables R&D leaders to focus on driving competitive advantage and finding a niche for development.
Narrowing down your reading list
After finding documents, researchers can apply AI to narrow down the list, based on relevant and irrelevant concepts. Going manually through big sets of papers and patents is time-consuming, but the engine understands the context of the article and rates its relevance. More so, it identifies the similarity between each document and the description provided by the researcher. As a result, researchers save up to 80% of their time.
Extracting key data
Many patent analysts, R&D managers, researchers and data scientists go through tens of patents daily, manually analyzing and drawing out the most important information from text, tables and graphs. The process lends itself to automation, as it’s time-consuming and error-prone for humans. AI-enhanced tools find key data points from patents and automatically add the information to a prepared excel sheet. This helps avoiding human errors and most importantly saves time.
Key takeaways
The number of patents grows at an unprecedented rate, and it’s time-consuming and difficult to analyze them. AI technology enables R&D leaders and their teams to spend more time analyzing the actual data in patents — as the engine does a lot of the manual work. This includes discovering relevant patents, narrowing down long lists of documents based on specific criteria and lastly extracting all key data points. This way, researchers save time during the patent landscape analysis and drive competitive advantage in their market.
Artificial intelligence is rapidly changing the way we work in various companies and industries around the world, including the chemical industry. Organizations are adopting these technologies to accelerate processes and reduce costs, as well as saving employees from tedious, mundane tasks.
Accenture suggests that there are three ways of applying artificial intelligence in research across industries:
Reinventing the process to manage process change, rethinking standardized processes as continuously adaptive, and using AI across multiple processes.
Rethinking human-machine collaboration; how companies can have an AI-enabled culture to reskill employees to work in alliance with machines.
Utilizing data, making use of AI and data to solve previously unsolved problems and reveal hidden patterns.
In this article, we will explain how chemical researchers are applying artificial intelligence.
How chemical researchers are applying AI
There are three categories of chemical research that are affected by AI. The first category is molecule prediction — draw on known properties to predict new behavior. The second category is synthesis models, which predict how to create certain molecules in fewer steps and more reliable processes. The third is handling prior knowledge to make sense of what we already know —starting with data mining to find the right information.
1. Case studies on molecule predictions
The pharmaceutical industry is one of the front runners in AI. In February 2020 the model in “A Deep Learning Approach to Antibiotic Discovery” was created, a model that translates molecules into vectors. It starts with every atom being represented with a vector of simple properties. This is used to create a fingerprint of the molecule’s structure, which helps the neural network to learn.
The model was trained on tests with E.coli to see what molecular structures actually were antibiotic. Then it was applied to the Broad Institute’s drug repurposing hub – an open-access library of more than 6000 molecules with known biological activity. As a result, they discovered a compound called Halicin with impressive antibiotic activity, despite having a chemical structure unlike conventional antibiotics.
Following this success, the team applied their AI technique to a database known as ZINC15 — 107 million molecules were manually selected for screening. Based on the deep learning tool’s predictions, 23 compounds were chosen for further investigation. Two of these compounds showed promise against a range of drug-resistant E. coli.
In march 2020 Münster University published A Structure-Based Platform for Predicting Chemical Reactivity. The new tool is based on the assumption that reactivity can be directly derived from a molecule’s structure. It uses an input based on multiple fingerprint features as an overall molecular representation. Organic compounds can be represented as graphs on which simple structural (yes/no) queries can be carried out. Fingerprints are numeric sequences based on a combination of multiple queries. They were developed to search for structural similarities and proved well suited for use in computational models. For the most accurate presentation of the molecular structure of each compound, a large number of different fingerprints are used.
2. Finding the best synthesis method: expert system vs. machine learning
In 2018, The Defense Advanced Research Projects Agency (Darpa), the development agency of the United States Department of Defense, presented a project where artificial intelligence was used to develop and find the best synthesis methods. The user can input any structure, either known or novel, and then a machine generates thousands or even millions of reaction sequences in order to end up with the final product. Reactions are being ranked and identified based on feasibility, cost, and other factors. Darpa has two ways of doing this. They can apply the expert system, a system based on 60 000 handwritten rules, which is effective but not scalable. Alternatively, they can encode each of the molecules to predict bond changes, using machine learning (much like on molecule predictions). The next step is having a manual help to filter the results and generate a shorter list of top candidates.
There are three fundamental problems in using the machine learning approach as opposed to the expert system. First, the challenges seen in using machine learning, in this case, is the data acquisition. There is missing information and biased reporting due to lacking reports on failed experiments. When it comes to reaction sequences that can be extracted from patents, not all information is going to be reported in the same place.
The second disadvantage is data representation, meaning how this data is presented and explained to a machine in a comprehensive way. The data format needs to be considered and determined — whether the data is presented in formulas, images, features, properties, etc.
The third problem is the exploration space. That space is so much vaster than the information we have available. That raises questions about how to teach a chemistry engine to invent new potential molecules and pathways when we don’t have data on that at all.
There is a model called Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction which can predict the outcome of a chemical reaction with much higher accuracy than trained chemists, and it will suggest ways to make complex molecules. However, it needs a lot of data and in a very specific text-based format called SMILES (simplified molecular-input line-entry system) that has been data mined from patents. In the end, the preparations to use it for a specific use case might not be worth it from a cost perspective.
3. Organizing knowledge
Artificial intelligence is already used in prior art. There are a few existing and future inventions in that area which will change the current process radically. The first and most basic invention already in use is smarter search. Automated literature reviews is the second step, which we have been working on for the past five years at Iris.ai. We’ve gotten to semi-automation, meaning the search needs human-machine collaboration.
The next frontier that we are working on is identifying specific insights from text. The first step is advanced data extraction and linking, which we have developed in our Extract tool. The PDF to be extracted is sent to the Iris.ai system. This PDF can be a patent, a clinical trial report, a research paper or any other relevant type of scientific content. It can be one simple document at a time, or hundreds or thousands of them in a batch. The Iris.ai engine extracts the text and identifies all the domain specific entities, then locates the tables and extracts the data from rows and columns, and links the data between the text and table. Graphs, figures and other elements go through the same process. Then the engine populates a pre-defined output in a machine readable format; an excel sheet, an integrated lab tool, a database or anywhere else your researchers require.
What’s important in this step is the self-assessment module which communicates to the human researchers how confident the machine is in its results, to give the human guidance on where to do the most rigorous manual verifications.
In the long-run, we expect to see developments in hypothesis extraction from the prior art, knowledge validation based on prior art, and lastly, drawing new conclusions and finding new hypotheses from all of the existing prior arts.
Automating manual tasks vs. rethinking the imaginable
There are two very different mindsets when it comes to applying AI in your organization. You can replace a human process and have a machine do the same activity but faster, for example, in extracting the data. Willingness to invest time and resources is needed, but there is clear ROI and known outcome and benefits. The second mindset is about activities that cannot be done by a human. For example, a machine can identify new potential application areas, meaning you need willingness to invest as well as rethink and re-imagine what’s possible (ROI will be unknown until you try).
Interpretability and explainability
One of the emerging fields in AI worth mentioning is interpretability or explainability. It is not just AI that tells if something will work or not, but explaining why. For example in molecule prediction, AI can predict that certain actions will cause an activity or property because of a specific area in the molecule or combination. As a result, it gives the chemist an immediate indication of how it could be altered if the reaction is unwanted. Similar to the data extraction tool that Iris.ai is working on, where every row and column will come with a machine-created self-assessment with a percentage of certainty.