Search – is not just search.

One issue we bump into on a regular basis is the idea of Search engines, or Search tools. When I say that, you know very well which tools I mean, right? You have a very clear idea of what I mean, when I say Search. Except… So did the client we spoke to yesterday, and I promise you, their definition was radically different from yours. Here are some of our learnings over the past few years – with a focus on our field of research, as well as AI enabled search, as always.

Search to find the one thing you’re looking for

When what you need is to find is one exact article, that you know exists, where you can formulate the approximate title and you will remember the author name when you see it… Well, you’re lucky! There’s hundreds of search engines out there, and whether you use Google Scholar or Google Patents, PubMed, Embase or your University’s federated search engine – you might just need 2-3 attempts to formulate the search well enough to find the article you were looking for. Easy, peasy and no need for fancy tools or smart tools. In essence, this is a problem of the past, where you had to scour long manual lists and hunt down smart librarians and flic through hundreds of book pages to find the one you were looking for. With thanks to Tim Berners Lee, the internet, and digitalization of research – this problem is in its essence solved. Yay!

Exploratory search

But what when you really, really don’t know what you’re looking for yet? You’re searching for something unknown. An idea, an inspiration, an interdisciplinary connection. You suspect it’s out there, but you don’t know the terminology. This might be because you’re new as a researchers, just new to the field you’re currently researching – or perhaps the field of research in itself is new. In any case, you can not yet properly formulate a key word query – you’re looking for perspectives and mapping out “what’s out there”.

At Iris.ai, we’ve found that a combination of contextual “fingerprint” matching of content, combined with visualization tools, is a great way to start this kind of search. How we do it is that the user provides us with one text – self written or an abstract of one article to start with – and the machine builds a visual map with an overview over the different concepts in the text, and articles contextually matched to this. The Explore tool can be applied to millions of papers, or smaller data set – and we have scientifically proven that this method is absolutely superior in getting an overview, finding spot on papers and drawing conclusions, compared to regular search engines designed to “just find the thing you know you’re looking for”.

Search in millions of documents to create a focused collection

Our experience is that again thanks to the internet, digitalization and the implementation of Boolean logic, the search task of narrowing down a document collection from say 50 million down to a few thousand documents is also essentially solved. You do not need refined key word queries; you need the broad terms of what you are looking for, a decently working search engine – and you quickly have a data set that has a lot of noise, but should contain every relevant article. Now, one problem with this is that the next step – narrowing down the few thousand documents to just the relevant document set – is so incredibly time consuming and manual. That means that most research teams will spend a lot more time on their boolean logic search queries than needed, in order to get down to just a few hundred documents. This process not only comes with the risk of excluding papers that would have been relevant through the key word query – but also with a lot of time wasted on trial, error and experimentation with said queries.

Search a few thousand documents to make a complete shortlist

Which brings us to one of the trickiest part of Search: getting from a data set that is to large to read manually, but too short for simple key words to really make sense. Terminology that has changed over time, different researchers who calls entities by different names, contexts that are hard to describe in a key term even though you know very well what you are looking for… For some researchers, this part is ridiculously time consuming and tedious, and includes skimming hundreds if not thousands of articles. For others, it simply mean it’s not doable and they will either spend much more time in the step before – or simply select the top hits of the search engine and ignore the rest – knowing very well in both scenarios that they’ll be guaranteed to miss out on relevant content.

At Iris.ai, this is one of our major achievements – giving our users a set of smart filters to rapidly reduce a reading list from up to 10,000 documents, down to exactly the relevant ones. This can be done by for example running the Topic Analysis and using it for filtering on the increasingly smaller data set, by selecting Concepts from the Word analysis for inclusion and exclusion, and by using self-written context descriptions for similarity matching to find those contexts it is impossible to turn into one word. All of these together gives a rapid narrowing of the list down to something that can be manually cleaned up with little effort.

A note on recommended article and sorted results lists

Quite a few scientific search providers offer sorted lists. Once you’ve inputted your key word query they essentially communicate that the top 5-10 hits are the ones that are relevant, the first page is really all you have to look at, and the next 40+ pages of results are really not worth your time. These recommendations may be built on things like number of citations, number of views, what other people found interesting when searching similar terms, impact factor, and a number of other factors that very quickly becomes deeply problematic when scrutinized. As a researcher, you want to be bias free – whether it is to keep your academic rigour intact, or to no look at exactly the same as what your competitors will be looking at in an innovation project. At Iris.ai, we believe that recommendations that take anything else but the actual content of the article into consideration, without the user explicitly asking for it, is deeply problematic.

Skimming the shortlist for validation

Sooner or later, no matter what, of course you’re going to have to read the actual articles you have found. We’re all grateful for good abstracts! Unfortunately, not all abstracts are good – and the moment we venture outside of academic rigour, abstracts or summaries are not all that common. At Iris.ai, we’ve solved that by offering a summarization tool, that takes one or multiple documents (abstracts only or full text) and writes its own summaries of the most important parts of the documents. While abstractive summarizations as a technical discipline, especially of scientific papers, is still in early days – the summaries are already quite useful for rapid navigation.

Plus, abstractive summaries are a great way to kick off the writing process!

Finding the right data points

At the final stage, there is yet another search that most people do not think about when saying “finding the right information” – because this is a technical solution that, until very recently thanks to the technology we have developed, has not yet been possible. What we are talking about is stepping into the papers themselves, and finding and fetching the knowledge. The user defines the data points they are interested in from across relevant articles in a spreadsheet and requests from the Extract tool to fetch, link, convert and fill in all the relevant datapoints from across tens, hundreds or even thousands of papers, building a powerful database of scientific experiment data to be further processed. Truly, we believe this is the search of the future.

So to sum it up…

Search is not just search. Understanding what search you or your team need to do on a regular basis to achieve your goals is a great first step to understand whether implementing smarter solutions such as Iris.ai – or others – is right for you.