The Extract tool

Manually extracting – and linking – the data you need from a PDF of free text, tables, graphs, figures and a plethora of layouts requires major effort from highly skilled manual labor.’s Extract tool fetches and links all the key data from these documents into a tabular, machine readable, systematic format. A full month of data extraction work can be done in minutes, at 90% accuracy.


The PDF containing the relevant data points to be extracted is sent to the system. This PDF can be a patent, a clinical trial report, a research paper or any other relevant type of scientific or technical content. It can be one simple document at a time, or hundreds or thousands of them in a batch.

The engine extracts the text and identifies all the domain-specific entities, then locates the tables and extracts the data from rows and columns, and links the data between the text and table. Graphs, figures and other elements go through the same process.

Then the engine populates a pre-defined output in a machine-readable format; an excel sheet, an integrated lab tool, a database or anywhere else your researchers require.

WHY IRIS.AI? has spent the last 5 years building an award-winning AI engine for scientific text understanding. Our algorithms for text similarity, tabular data extraction, domain-specific entity representation learning and entity disambiguation and linking measure up to the best in the world. On top of that our machine builds a comprehensive knowledge graph containing all entities and their linkages to allow humans to learn from it, use it and also give feedback to the system. Applying these on scientific and technical text is a complicated challenge few others can achieve.

