Multimedia Knowledge and Social Media Analytics Laboratory

Query-based Topic Detection Dataset


Error message

Deprecated function: The each() function is deprecated. This message will be suppressed on further calls in menu_set_active_trail() (line 2405 of /var/www/mklab/public_html/includes/

The dataset contains a collection of text documents that were retrieved for the evaluation of the topic detection framework, developed within the MULTISENSOR project. The text documents were retrieved from the database of the project, using the following queries:

  • energy crisis
  • energy policy
  • home appliances
  • solar energy

For each given query, the retrieved results were clustered into labelled clusters (topics) without knowing the number of clusters a priori. It should be noted that the DBpedia spotlight online tool was used, in order to extract textual concepts and named entities from each text document and the final concepts and named entities replace the raw text of each document in the dataset.