Master Thesis Topic 4: Topic modeling vs Keyword extraction

Topic modeling vs keyword extraction in terms of forming document search queries

RQ: Which of the two, or a combination, is better when it comes to summarizing a text into keywords to be used in search queries?

Description:
The selected student will be given a corpus of tens to a hundred documents. Those documents should be used to form a search query in Google Scholar. The student can use Iris AI algorithms, a combination of them, and or proposed by herself/himself improvements.  Based on that she/he should form an algorithm that from a single document builds a search query of keywords from the document itself. The documents in the corpus will be on a similar topic. Such that they appear on the first page of results of Google Scholar when executing a manually defined query. The measurement for success of an algorithm will be what is the percentage of documents from the top results on Google Scholar that are part of the corpus (the higher the better) when using the machine defined query.

The goal is an evaluation of the quality of the two categories of algorithms – and/or combination – in terms of getting papers that are close to a chosen paper.

The students are free to propose new metrics and compare them to the proposed one. They also could suggest different keyword extraction or topic modeling algorithms that can perform optimally in the presented setting.

Interested? Get in touch!