Master Thesis Topic 2: Metrics for quality of keyword extractions

Evaluation of methods and metrics for quality assessment of keyword extraction algorithms

RQ: What are the state-of-the-art metrics for evaluation of keyword extraction algorithm? And how well do they capture the notion of a word being a keyword of a text document?

In other words how to quantitatively measure if a set of words cover all key aspects of a text and what are the advantages and disadvantages of the currently available metrics.

Sub RQs:

  • What methods exist currently for measuring keyword extraction?
  • Which ones are comprehensible to humans and which are hard to interpret?
  • When given a set of keywords from a document with what confidence level we can confirm that this set covers the key aspects of the document?

Students are also encouraged to propose new methods on top of what they find in their study.

The work requires a literature review on what metrics exists and then analysis on the Iris AI keyword extraction algorithm. Expected output is assessment of the metrics, summarization of the results and proposition either an existing metric, a new metric, or a combination of existing metric and suggested improvements, to be used for automatic quality metric for current and future algorithms.

Interested? Get in touch!