In the beginning of 2021 we started an exciting research journey when we received research funding from the Norwegian Research Council in the “BIA” grant scheme. And now this project is coming to an end. We’re very grateful for this opportunity. Let’s reflect on what we accomplished in the last 1,5 years.
The main theme of the BIA project is to understand the knowledge graph, which we divided into three different subprojects with collaboration:
Domain-specific word embeddings
Embeddings evaluation framework
Knowledge graph building
Domain-specific word embeddings
Domain adaptation of embedding models is a proven technique for domains that have insufficient data to train an effective model from scratch. Chemistry is one such domain, where scientific jargon and overloaded terminology inhibit the performance of a general language model. In the past year, we have experimented with spherical embeddings, latent semantic imputation (LSI) and Cross-domain knowledge discovery with NLP (collaboration with CEA Saclay).
We published several research paper about each project:
This ongoing project aims to develop a suite of transferable intrinsic and extrinsic tasks for domain-specific word-embedding evaluations which can be applied to chemistry-specific evaluations. We combine the ideas of an extrinsic test suite, VecEval, and an intrinsic test suite, LDT toolkit, to design an automated pipeline for evaluating embeddings using various intrinsic and extrinsic evaluation tasks. Our current progress includes implementations of semantic partitioning as one of the intrinsic tasks, and named-entity-recognition (NER) and document classification as part of the extrinsic tasks.
Moreover, there will be another paper published in December 2022 – “Be aware of semantic `misuse’: benchmarking distributed representations for triple classification in Chemistry”. You can check our scientific publications on iris.ai/publications/.
“I enjoyed seeing how other research groups operate and bringing in their expertise. It can be quite stimulating and we can get inspiration from different projects. And these collaborations also created opportunities – some of them did turn into research papers and some were extended into new projects. So in general I think it’s great that we maintain those research collaborations with the research groups outside of Iris because collaborations are the key to cross-domain research and stimulations of new ideas.” saysRonin Wu, AI Research Lead and Head of Research Collaborations at Iris.ai.
We’re extremely grateful for this opportunity and excited for upcoming ones.
On May 25th 2022, we hosted a webinar together with Materiom where we shared details about our collaboration and the project. Here, you can read more about it or watch it below.
Sustainability needs innovation and innovation needs R&D. However, R&D needs time- the most precious commodity we all have in common. As a consequence of all the world’s current challenges, such as advancing global warming but with too slow progress on carbon capture, energy insecurity, or food supply crisis, researchers need time more than ever to drive innovation for a sustainable future across all industries.
In this webinar we answer:
How can NLP, a subdiscipline of ML, save vastly more time for R&D to innovate?
How can systematized data extractions uncover hidden knowledge scattered in thousands of research papers to contribute to the success of R&D teams and impactful projects like Materiom´s.
Materiom’s mission is to grow a regenerative materials economy. The company is building a suite of solutions that can help holistically to grow the market of petroleum alternatives. Together, we are attempting to pool the world’s data on biomaterial development and performance and make it useful to the world’s entrepreneurs.
Interview with Alysia Garmulewicz, the Co-Founder of Materiom:
How did you get this idea to create such a database?
My co-CEO Liz Corbin and I had a meeting of minds a few years ago upon founding Materiom that really crystallized a lot of our research and doctoral work. I’d been working in the sustainability world for a while in terms of being able to understand the key points that were preventing the cycling of materials. One of the main issues in a centralized manufacturing system where all of this has to go back into production is that it’s very difficult to imagine just being able to take up all those little distributed bits and put it back. In nature there is a much more distributed and nested way of cycling nutrients within local and regional ecosystems.
That was a model that we saw as very compelling but the material economy doesn’t actually work that way. The biomaterial world and the way that you can source feedstock locally and put that back into production at local and regional scales was a very compelling approach to unlocking this systemic rigidity.
The main issue about how we can unlock more local and regional capacity for making materials and making them effective at those scales is data and access to information. That was the galvanizing point of developing Materiom. The goal is to create a starting point for R&D and get to market faster. Since then it’s built into having deep dives, talking to entrepreneurs in the field, businesses that are making amazing plastics and biomaterial alternatives and seeing the challenges they face. It’s deepened over time but that was the genesis of really understanding that the missing link was access to information. If we’re going to have a more systemic change from a very centralized manufacturing system and centralized material economy to a more distributed one that allows for regenerative sources of biomass to be effective.
How did you hear about Iris and why did you choose us as a collaboration partner?
Liz (Co-CEO) was the one who stumbled across Iris. I can’t remember the exact point of reference, but it certainly stood out to us as being a solution that we were looking for specifically focused on the scientific literature. We’ve been aware that the field of natural language processing is growing but Iris.ai’s specific focus on scientific papers and scientific data itself made the most sense so it was an obvious fit from that perspective.
How would you describe our collaboration so far and do you think that we already had some major challenges that we had to overcome together?
One of the most interesting and important challenges that has shaped the first stage of our collaboration has been getting the grip of the nature of the information that we’re trying to extract. I’ve really been grateful for the amount of feedback and transparency in terms of what the challenges are and what the team is working on and being able to feed in with our domain knowledge when it’s needed. I’m sure there’s gonna be more challenges along the way but I think grappling first with just the type of information how it’s portrayed was the main one that I’ve seen in this first stage.
We are very happy to be collaborating with you. It’s always very helpful when our clients are very responsive and give feedback and tell us the questions. Sometimes you’re so deeply into the topic that maybe you don’t understand which words the outside world might not understand, so it’s very helpful to have open communication. This already leads me to my last question – what would you tell someone thinking to be collaborating with us or any other AI startup? What they should be expecting or what they should have in the back of the head?
I think one of the main things that I would emphasize is the importance of the journey – being able to learn and iterate and be responsive to the challenges along the way. It’s certainly a field or an area of solution space that really requires a lot of engagement with the topic. It’s not magic that just suddenly will appear at your doorstep. The technology is incredible but it’s even more incredible when you have the partnership between two companies – in our case Iris.ai and Materiom. That can really enrich the process and make the journey enjoyable. The more you put into it, the more you can get out of it. I really enjoy that process of working with the team and I think that it’s a really exciting learning process for us to understand how far you can push the technology and what results you can get and how that fits with the mission and the time frame and the goals that we have. It’s definitely a learning process and I think that should be enjoyed.
Iris.ai has recently partnered with Materiom to help building the world’s largest database and research community of material science knowledge to aid the transition away from petrochemicals. With the automated extraction and systematizing of the content, recipes and ensuing properties of materials from more than 50,000 articles, Materiom will have laid a solid foundation for their groundbreaking community of researchers. This database will then be published on Materiom’s platform to speed up R&D processes and market entry of regenerative materials. This will ultimately lead to reduced plastics pollution, and the creation of a materials economy that benefits ecological regeneration.
“We are excited to collaborate with such a great team of enthusiastic professionals and scientists like Materiom. Seeing how our Extract tool can extract such a big number as 50k documents and extract data from a wide range of renewable materials is thrilling. Even more, as we are contributing in that way to a more sustainable world.” – says Kimberly Holtz, Key Account Manager at Iris.ai.
“Iris.ai is helping us get to scale with our open database, a resource that will accelerate regenerative materials R&D” – Alysia Garmulewicz, Founder and Co-CEO of Materiom, states.
Keep an eye on our blog for project updates and information when the database is available this year. If you think that we can help with your problem, contact us!
Materiom is an open access platform for creating sustainably-sourced biomaterials, made from locally-abundant natural ingredients. The Materiom community includes material scientists, designers, engineers, data scientists, and sustainability experts. The project supports companies, cities, and communities in creating and selecting materials sourced from locally abundant biomass that are part of a regenerative circular economy.
In the last blog post we introduced our new shiny tool – the Researcher Workspace. A place where you decide how to do your research process and apply the AI-based tools you need. In this post we want to introduce you to some concrete client use cases and explain how we set up their workspace for their unique process to address their needs.
Food safety reviews
We are working with a niche review team that needed to perform a wide range of food safety related searches to keep the population safe. The team needed a better way to do full interdisciplinary literature searches on a broad variety of topics, expanding beyond their individual areas of expertise. They have a simple Workspace with an Explore tool, where the reviewer can input a human language description of the research, without the need for comprehensive vocabulary understanding. Iris.ai uses contextual key terms, synonyms and hypernyms to build a ‘fingerprint’ it can match to other papers and presents a visual map with overview over topics and relevant research papers.
Exploratory R&D for a biotech Company
This collaboration is with a biotech company that has extensive R&D efforts in mapping out a knowledge graph of real world data (e.g any microorganism they locate in their field or lab work) that will lead to new insights for their product development. Their starting point is an entity (e.g bacteria, parasite) in a given context – and then search for all relevant knowledge, analyze all articles found, extract the key data points and summarize the findings, with the goal of connecting information in a database.
Pharmacovigilance for a Contract Research Organization
This CRO performs regular contract work where they do limited, specific and rapid literature searches for their clients across a variety of medical devices and drugs.
They are regularly undertaking post market surveillance and pharmacovigilance studies. Usually each project is about 80 hours. In order to reduce the time the team spends on the manual work, the CRO has an Iris.ai Researcher Workspace setup where they upload their search results (≈500 articles), apply a range of smart filters (entities, data points and context descriptions) to instantly filter down the list to the ≈30 articles of interest. Then the tool automatically extracts all clinical data points of relevance into a database – both to a short collection of summarizing data points and to a comprehensive 500-point collection. All actions are recorded and a final report draft is automatically generated.
Drug repurposing for an interdisciplinary research group
A Californian research group, funded by the Canadian government, is searching broadly for new drug candidates from western, Chinese and herbal medicine for a specific medical situation. Each researcher starts from an idea, doing a PICO style natural language text search and filters, across a variety of western and alternative research articles. Once an interesting approach is found, the articles are collected and all relevant clinical data is extracted and used to populate a database, made openly accessible to researchers across the world.
IP analysis for a steel conglomerate
This world-leading steel manufacturer strategically monitors filed IP to spot new market opportunities, but extracting detailed experiment data from patents is incredibly time consuming. Their Workspace allows researchers to provide the system with one or several patents, using that input to identify a range of other patents based on the content. The identified patents can then with one click be sent for extraction, where thousands of data points in text and tables are extracted, linked and mapped to their desired format.
Librarians are always searching for better ways to help their students and researchers find the right literature. An academic-oriented Researcher Workspace is connected to all relevant literature sources and allows exploratory searches based on problem descriptions. The tool, with its visual search interface, is especially loved by Master and PhD candidates early in their research careers, with still-unknown topics to explore broadly.
Each presented use case is distinct, but in every instance we found a suitable process to help companies, research institutes and universities with their scientific knowledge processing needs. Our tools are customized and trained on each client’s domain to optimize results. If you think that we can help you out, contact us!
CORE and Iris.ai are extremely pleased to announce the initiation of a new research collaboration funded by the Norwegian Research Council.
Discovering scientific insights about a specific topic is challenging, particularly in an area like chemistry which is one of the top-five most published fields with over 11 million publications and 307,000 patents. The team at Iris.ai have spent the last 5 years building an award-winning AI engine for scientific text understanding. Their patented algorithms for identifying text similarity, extracting tabular data and creating domain-specific entity representations mean they are world leaders in this domain.
The AI Chemist project is a collaboration between Iris.ai and The Open University, Oxford University, Trinity College, Dublin and University College, London. CORE is a not-for-profit platform delivered by The Open University in cooperation with Jisc that hosts the world’s largest collection of open access scientific articles. As of February 2022, the CORE dataset provides metadata information (title, author, abstract, publishing year, etc.) for approximately 210 million articles, and the full text for 29.5 million articles.
Working in partnership with CORE developers and researchers, Iris.ai will now leverage the vast quantities of research papers available in the CORE dataset. This dataset will be employed in improving the quality of text extraction from scientific literature from Chemistry focused domains. The output of this phase will support Iris.ai and The AI Chemist in understanding reasoning and inference across research papers.
Currently, the state of the art in the chemical domain is a combination of direct manual evaluation of text documents, social networks and curated, but incomplete databases. The manual nature of these approaches makes discovery of novel application areas immensely time consuming. The goal is to develop a set of algorithms that can machine read vast amounts of scientific literature and data, discover and detect mentions of entities of interest and their relations (such as chemical products, compounds, properties, processes, applications, etc.) and connect these pieces of information to build an increasingly complex knowledge graph.
Dr Ronin Wu, Research Lead and Head of Research Collaboration at Iris.ai, said: “Iris.ai are extremely pleased to be partnering with CORE on the AI Chemist project and we’re looking forward to seeing some exciting new developments with our AI models”.
Dr. Petr Knoth, Head of CORE and Senior Research Fellow in Text and Data Mining, said: “This cooperative research project will put CORE at the forefront of the global effort to create open scholarly knowledge graphs. As part of this project we will use state-of-the-art machine learning approaches to address problems including topic / themes extraction, affiliation extraction, deduplication and citation function detection. With the demise of Microsoft Academic Graph at the end of 2021, we see on a daily basis how much this is in demand among CORE users. ”
In pharmacovigilance, post market surveillance and many other areas of drug development, highly skilled team members spend an enormous amount of time systematically mapping out publications. This includes going through clinical trial reports and real world evidence, and extracting key data points such as adverse effects.
In this webinar, we covered how to:
👉 Automatically filter based on entities of interest and specific context.
👉 Extract data such as adverse effects, treatment and patient baseline characteristics.
Starting a PhD degree is daunting. You’re probably beginning your first full-time job in academia and you have to write numerous papers in a research field you’re not fully familiar with. Day-to-day you’ll spend a lot of time discovering new knowledge, writing literature reviews and publishing papers, which will ultimately form the foundation of your PhD degree. But as a new researcher, it’s often hard to know where to start your research discovery.
At Iris.ai, we’ve spoken with our users about their literature review process. They gave us tips and tricks on how to improve the research discovery, which have helped them find more relevant papers, save time and organize research in visual maps.
Traditional keyword-based literature search is constrained to what you already know as a researcher, and finding the most relevant papers becomes an exercise in pairing the perfect match of words. That isn’t to say that keyword search is outdated, but that it must be coupled with additional powerful research discovery tools. Only then can you be sure you’ve found the most relevant papers.
4 steps to boost research discovery
So here’s a step-by-step guide on how to research like the pros (aka Irisians aka our users).
1. Find a research paper
Head over to our friends at PubMed, and look up an existing research paper or systematic review which covers the topic you want to discover. For example, on the image below I’ve searched for a systematic review for autonomous vehicles.
2. Copy the link
When you have found a paper, copy the URL or DOI of the article.
3. Search in Iris.ai
Go to your account at Iris.ai (available for free), and paste the URL or DOI into Explore. Iris.ai will now look for related articles.
To get a bit techy: Iris.ai builds a fingerprint of your chosen paper based on the most meaning-bearing words in the abstract and contextual synonyms and hypernyms. Then, Iris.ai matches the fingerprint against more than 200 million papers.
Reading a lot of research papers is daunting (and sometimes boring), we know. That’s why we built Iris.ai — to help researchers get an overview of the seemingly unsurpassable mountain of published papers, and seamlessly identify the relevant ones. Good luck exploring!
The development of innovative technology has reshaped the way we consume, access, process, and distribute information. Academic and research libraries are adopting new technology, searching to improve their services and competitive advantage. Artificial intelligence has been a major force driving this change, and in this article we are going to answer: how is AI shaping the world of libraries and researchers?
What is AI? And what are the different categories of artificial intelligence?
Artificial intelligence is a wide-ranging branch of computer science that attempts to build smart machines with human intelligence aspects. It’s so far one of the most complex and impressive human inventions but the field remains mostly unexplored and with huge growth potential. AI is divided into 3 categories:
Artificial narrow intelligence (ANI)
Referred to as weak AI with a narrow range of abilities, it is the only type of AI we have available for now. Artificial narrow intelligence is used in facial recognition, speech recognition/voice assistants, and driving cars.
Artificial general intelligence (AGI)
Referred to as strong AI with the ability to mimic human intelligence or behaviors to solve any problem. For now, strong or deep AI is not yet available but currently researchers are working on improving machines’ ability to see, understand, and learn as humans do.
Artificial superintelligence (ASI)
It is the hypothetical AI that surpasses human intelligence and abilities. It has always been a source of inspiration for science fiction in which robots take over the world. Having powerful and self-aware super-intelligent machines may be an exciting idea, but their impact on humanity remains uncertain. For now, there are still many years before artificial superintelligence will be achievable.
How AI will change the job of librarians
AI has been implemented in more and more libraries, and here are some ways in which AI will have a significant impact:
1. Content indexing
Up until today, indexing has been a tedious and manual task. It is done partly by publishers and partly by authors. Indexing provides an overview of the context in which the book, journal, or paper was originally thought up. However, indexing says very little about, for example, other fields the information could potentially be useful for, and human-made labeling and indexing is hampering interdisciplinary discovery. It also limits the literature’s ability to stay relevant over time because the indexing was done in a specific category in a specific context, and over time that context of what we know about the world will change.
AI tools for indexing will improve consistency and quality. It can identify concepts and assign them corresponding keywords. Index automation will also help the reader discover new literature and navigate through different disciplines, which is not applicable through manual indexing. This type of AI tools will surpass human capabilities in indexing by providing more specific and accurate material for the readers and as a result, help university librarians improve their job.
2. Document matching
AI machines are better at processing documents fast and accurately than humans. Thanks to automatic proper indexing, AI tools are now identifying similarities and differences between documents or patents. Matching documents with similar ones or connecting sections that are describing the same topics, solutions, or phenomena is now possible. When a document can be indexed based on its actual content, it means that you can compare the content of thousands of documents that are contextually relevant to the search topic. It can be limited to only sections of a document, such as certain book chapters or research paper sections. Then you compare the content in these sections to find exactly what you’re looking for in the literature rather than doing a five keyword summary in the indexing. It is an essential operation that helps researchers and libraries to get to their knowledge easier and faster.
3. Death of citation
The citation system can be perceived as a popularity contest, but it doesn’t do much more than providing a very biased overview of a researcher network. When doing research landscape mapping and literature reviews, it is clear that using the citation system for snowballing is not an ideal method for covering everything. AI algorithms, which are based on the actual content of papers, will create far better mapping systems of the actual research, and be of major help to librarians and researchers alike (as opposed to the network of researchers presented in the citation system).
4. Content summarization
Automatic content summarization is about condensing documents to a shorter version, independently from human interference, while preserving the key elements and the meaning of the original text. Instead of summarising the whole article or book, AI tools are able to summarize just a section of a book or five documents into three sentences. AI tools for content summarizations are already available online and gaining popularity as well as machine learning algorithms that are continuously improving this task.
There are two types of automatic summarization: extraction and abstraction
Extractive summarization depends on extracting sentences from the original text based on a scoring function. It selects the most important sections of the input based on the statistical survey and rearranges them together to produce a new condensed version of the document.
Abstractive summarization used advanced natural language techniques to produce a new summarized version of the document that is different from the original one. It aims at preserving the most important sentences while rephrasing them and incorporating critical information, like a human-written summary.
Most of the summarizations today use the extractive approach as it is easier and requires less linguistic analysis.
5. Quality of service
AI has penetrated the world of librarians and researchers in the form of chatbots that can answer directional or simple questions, alert when a new book is published, and direct a customer to specific library resources. The automation of conversations between a user and a machine will enable librarians to embed their focus on more difficult questions and save time answering repetitive ones. This will also enable libraries to extend the opening hours of both in-person and online services.
6. The Impact Factor of the Future
The impact factor is a measure of the relative importance and quality of the individual publication, journal, or researcher to literature. In the future, the algorithm will be capable of breaking down scientific research into arguments and validating them against other pieces of research. Or it could build for each document a truth tree of arguments and evidence, verify each branch, and then find the overall validity score. Having validated or rephrased research is more important than the number of its readers, as it is the solid and validated research that deserves a broader readership.
7. Better Operational Efficiency
Libraries can identify and magnify operational efficiency by improving service effectiveness and reducing operational costs with process automation, optimized research data management, and digital asset management (DAM). Implementation of machine learning in the library’s processes and digital resources can optimize collection analysis, visualization, and preservation, and reduce expenses associated with the provision of services. The adoption of advanced library service platforms can help in the development of operational efficiency.
The road ahead for libraries
Artificial intelligence is changing the information landscape while disrupting librarians’ traditional jobs. They are required to embrace AI not as a user but as an active leader to better serve the new upcoming generations. However, some reservations hinder the integration of AI in the world of libraries. The fear of being replaced by AI robots is totally understandable but we cannot neglect that advanced technologies will open up new horizons for librarians. It will help them maintain new innovative positions and roles, solve current challenges, and prevent them from becoming old fashioned. The focus on traditional tasks should be shifted to a new direction that embraces the advanced technologies and assists the upcoming generation with their evolving needs.
“There are many research tools that I recommend to my students, but Iris.ai is maybe the best one.”
Name: Josmel Pacheco Mendoza
Position: Researcher and Veterinarian
University: Universidad San Ignacio de Loyola
Region: Lima, Peru
My name is Josmel and I am currently working with multiple universities, besides being a partner of the editorial team of the Journal of Veterinary Research of Peru (RIVEP).
I found Iris.ai when I needed a tool to help me find the right documents for my COVID-19 research in a short time frame.
In one of the projects where I used Iris.ai, I worked with Mexican universities on two COVID-19 papers. There is a huge number of papers and patents published and available online, which makes it difficult and time consuming for my students and me to find the exact information that is related to my research.
How did Iris.ai help your research?
Iris.ai helped me to find the right collection of papers. One of the most powerful features is how you can add one paper into Iris.ai, and in return the software gives you a map of relevant papers. For me that’s like magic!
The Iris.ai tools also organize my work by clustering and filtering the papers. It is important for my work to have a map that organizes documents by their concepts and content.
Finally, the Iris.ai tools saved me and my students a lot of time, which is an important factor while doing research in medical fields.
It is exciting to use an academic research tool that relies on artificial intelligence to better serve students and researchers. There are many research tools that I recommend to my students, but Iris.ai is maybe the best one.
To get started with Iris.ai, sign up for a free account.
“Thanks to Iris.ai I found useful resources to build the thematic background of my paper. I ended up with around 20 papers that were immensely important and relevant.”
Name: Diego Raza Position: Lecturer – Research methodology
University: Universidad Andina Simon Bolivar Region: Quito, Ecuador
My name is Diego Raza and I am a lecturer at Universidad Andina Simon Bolivar in Quito, Ecuador. I teach subjects related to research methodology and help my students with their main thesis. When I was recently doing research for my literature review, I needed a tool to help me find research papers. I found that Iris.ai has higher quality sources in comparison to other tools on the market, and that it has access to papers that are of better academic quality.
How did Iris.ai help your research?
I was using Iris.ai for my own research in self-efficacy. I began my research by doing a broad exploration of papers in the Explore tool, then I imported the research map into the Focus tool to narrow down the results into a reading list. The tool helped me get an overview of the topic. Thanks to Iris.ai I found useful resources to build the thematic background of my paper. I ended up with around 20 papers that were immensely important and relevant.
Finally, I want to mention that I use Iris.ai to help my students identify literature that they may have overseen when preparing their main theses.
The biggest value with using Iris.ai, is how much time it’s saving. It’s doing a part of my work for me.
To get started with Iris.ai, sign up for a free account.