Social Event Detection 2013 (SED 2013) dataset

Nov 04, 2014
data
5 min read

This page makes available for research purposes the dataset, challenge definitions, ground truth challenge results and corresponding evaluation script that were created and used in the 2013 edition of the Social Event Detection (SED) task of the MediaEval benchmarking activity. To be more concise, by social events, we mean events that are planned by people, attended by people and that the media illustrating the events are captured by people.

The Social Event Detection (SED) task of MediaEval 2013 had two subtasks. The first asked participants to cluster a collection of images so that the images in each cluster are associated with a distinct social event. The second asked participants to classify a set of images as either representing some social event or not and if yes, to identify the type of event that is represented (music, sports, protest, etc.). Each image in the dataset is accompanied by metadata typically found on the social web (including time-stamps, tags, geotags for a small subset of them).

For more information on the SED 2014 dataset, challenges and evaluation, please see the following publication. If you use the dataset for your research, please cite the paper:

T. Reuter, S. Papadopoulos, G. Petkos, V. Mezaris, Y. Kompatsiaris, P. Cimiano,C. M. De Vries, S. Geva: "Social Event Detection at MediaEval 2013: Challenges, Datasets and Evaluation". Proc. MediaEval 2013 Workshop, Barcelona, Spain, October 2013.

How to access:

Different datasets are used for each subtask. Also, for each subtask, both a development and a test set are provided. The metadata file for the development dataset of the first subtask can be obtained from here (in xml format):

Subtask 1 - Development set - Metadata file

To obtain the image files please use the URL listed for each image in the metadata file. The association of these images to the events is provided in the following file:

Subtask 1 - Development set - Ground truth file

The metadata file for the test set can be found here:

Subtask 1 - Test set - Metadata file

Again, the image files can be obtained using the URLs in the metadata file. The ground truth (association of image ids to events) can be found in the following file::

Subtask 1 - Test set - Ground truth file

For the second subtask, the development set's metadata can be found here:

Subtask 2 - Development set - Metadata file

It comes in xml format. Again the image files can be obtained using the URLs included in the metadata file. The ground truth file for the development set is included in this file:

Subtask 2 - Development set - Ground truth file

The metadata file for the test set can be found here:

Subtask 2 - Test set - Metadata file

And it's corresponding ground truth file here:

Subtask 2 - Test set - Ground truth

Evaluation:

A script is provided to easily compute the evaluation measures on your results. You can find it here:

Evaluation resources

The script requires python 2.7. For instructions about how to run this script, please run the script with the parameter --help. Please also note that your result files much have the formats below.

For the first subtask the file must be a plain ASCII file, in which each line represents the association of each image in the test set to a cluster. Each line must have the form "photoID-single tab-eventID". The photoIDs must be the same as those in the test set's metadata file, whereas the eventID can be any string that does not include a tab character. E.g. the following would be part of a valid file:

3587419916 7081

3750102971 7028

3740276442 7047

9230492703 122

1820390101 2080

For the second subtask the file must be a plain ASCII file, in which each line contains the id of the image in the test set followed by the category of the event. The following is part of a valid submission.

27293486293864926293_394720 no_event

93027102397017031970_871399 conference

23091720397321092731_837129 sports

93029710297102971029_328193 concert

Many thanks to Timo Reuter for leading the organization of the task! The dataset used for the first subtask has also been released here

Copyright notice: The images distributed as part of the Social Event Detection 2014 (SED 2014) dataset were collected from Flickr, where they were posted by their respective owners under a Creative Commons license. The Creative Commons attribution licenses allow for image use as long as the photographer is credited for the original creation. Possibly, use is granted under additional restrictions, but none of these preclude the use of the images for benchmarking purposes. While compiling the Social Event Detection 2014 (SED 2014) dataset, we collected only Creative Commons images, and also collected as much information possible about the creators of each image. The creator information, the exact license type and other relevant information are included in the image license file, which is distributed together with the images. We would like to take this opportunity to express our gratitude to the image photographers for allowing us to use their pictures: we greatly appreciate this and gladly acknowledge your work. Your names and license details are listed in image license file. Please let us know if you have special wishes on how you would like to be credited or have additional details that must be incorporated.

MKLab

Social Event Detection 2013 (SED 2013) dataset

Evaluation: