Search K
Appearance
Upon importing a data source, the machine identifies the most meaning-bearing words in the text itself, enriches it with contextual synonyms and other words that scientists use within the same context. It also enriches the model by adding hypernyms or topic modeled words (ex: if we have dogs, cats, dolphins → “mammals” might be added). All of that goes into the so-called fingerprint, which is a weighted list of words according to importance. (Word importance is measured by calculating the term frequency-inverse document frequency (TF-IDF) statistic, used for finding key terms, words and phrases.)
Once the fingerprint is ready, the next step of the process is to look for similar documents as compared to the input - a.k.a Fingerprint matching. It is based on the Word importance-based similarity of documents (WISDM) metric, which measures the similarity between the Fingerprint and a set of documents revolving around the topic of interest. (Iris.ai has developed the metric and the research paper can be found here). The end result of the fingerprint matching is a hierarchical list of documents based on the similarity to the input, allowing for a broad definition of every word depicted in the fingerprint.
Essentially, the Fingerprint is: