Multimedia Knowledge and Social Media Analytics Laboratory

Query-based Topic Detection Dataset

The dataset contains a collection of text documents that were retrieved for the evaluation of the topic detection framework, developed within the MULTISENSOR project. The text documents were retrieved from the database of the project, using the following queries:

  • energy crisis
  • energy policy
  • home appliances
  • solar energy

For each given query, the retrieved results were clustered into labelled clusters (topics) without knowing the number of clusters a priori. It should be noted that the DBpedia spotlight online tool was used, in order to extract textual concepts and named entities from each text document and the final concepts and named entities replace the raw text of each document in the dataset.