Annotated dataset for sub-shot segmentation evaluation

Description

This dataset was created for evaluating the performance of the developed motion-driven approach for fine-grained temporal segmentation of user-generated videos, that is reported in [1].

Given the fact that user-generated videos are most commonly captured without interruption using a single camera, thus being single-shot videos, their fine-grained temporal segmentation aims to identify visually coherent parts (called sub-shots) that correspond to individual actions taking place during the video recording, such as left camera panning, camera zoom-out or tracking of a moving object.

Based on the above, the created dataset contains the ground-truth sub-shot segmentation for 33 single-shot videos. This ground-truth segmentation was created by human annotation of the sub-shot boundaries for each video, where each boundary indicates the end of a visually and temporally contiguous activity of the video recording device and the start of the next one (e.g. a downward camera tilting that is followed by a camera zoom-in). Overall, our dataset contains 674 sub-shot transitions.

The set of videos can be divided in three parts:

  1. Own Videos: 15 single-shot videos of total duration 6 minutes, which contain clearly defined fragments that correspond to several video recording activities;
  2. Amateur Videos: 5 single-shot amateur videos of total duration 17 minutes, found on the YouTube platform;
  3. Movie Excerpts: 13 single-shot parts of known movies of total duration 46 minutes which represent professional video content.

The videos of the first part were recorded in the external spaces of our (CERTH-ITI) facilities using an iPhone 5 smart-phone. These videos and their ground-truth are freely provided through the "Provided files" section below.

The videos of the second and third part can be found on the YouTube platform (links in the "Video collection" section below). For these videos we only provide the ground-truth data again through the "Provided files" section below.

Video Collection

For each video of the dataset we provide:

The above information is included in a table that is available here.

Provided files

The needed files for using the created dataset can be downloaded from here. After unpacking the compressed file ("dataset.zip") a structure of directories will be generated. In this structure:

  1. The "videos/own_videos" directory contains: a) the video files of the first part of the dataset, and b) the "list.txt" file which lists the video id, the filename and the frame-rate for each video of the "Own videos" collection, in a tab-separated format.
  2. The "videos/amateur_videos" directory contains the "list.txt" file which lists the video id, the filename and the frame-rate for each video of the "Amateur videos" collection, in a tab-separated format.
  3. The "videos/movie_excerpts" directory contains the "list.txt" file which lists the video id, the filename and the frame-rate for each video of the "Movie excerpts" collection, in a tab-separated format.
  4. The "ground_truth" directory contains 33 txt files (one per video) where each file is named after the ID of the corresponding video and includes the annotation of the sub-shots in the video; each line of this file corresponds to the starting frame of a sub-shot of the video (using a zero-based index of frames).
  5. The "evaluation" directory contains a simple Matlab script, called "eval_segm.m", that was used for evaluating the results of the sub-shot segmentation analysis.
  6. A readme file that contains the documentation of the dataset (i.e. the information that is available in this webpage).

Copyright

CERTH-ITI holds the copyright of the “Own videos” part of the dataset. These videos can be freely used for academic/research purposes only and their public playback is not allowed. CERTH-ITI does not own the copyright of the “Amateur Videos” and “Movie Excerpts” parts of the dataset. The use of these videos must abide by the YouTube copyright policy (https://www.youtube.com/yt/about/copyright/). Any users of these videos must accept full responsibility for the use of these parts of the dataset, including but not limited to the use of any copies of copyrighted videos that they may create from the dataset.

Publication

If you use this dataset, please cite the following scientific work:

[1] K. Apostolidis, E. Apostolidis, V. Mezaris, "A motion-driven approach for fine-grained temporal segmentation of user-generated videos", Proc. 24th Int. Conf. on Multimedia Modeling (MMM2018), Bangkok, Thailand, Feb. 2018.

Bib entry: @InProceedings{kapost_2018_MMM, author = {Apostolidis, Konstantinos and Apostolidis, Evlampios and Mezaris, Vasileios}, title = {A motion-driven approach for fine-grained temporal segmentation of user-generated videos}, booktitle = {The 24th International Conference on Multimedia Modeling (MMM)}, month = {Feb}, year = {2018}}

Contact

For any queries, please contact the authors of the aforementioned scientific work at: kapost@iti.gr (K. Apostolidis), apostolid@iti.gr (E. Apostolidis), bmezaris@iti.gr (V. Mezaris).

Acknowledgements

This work was supported by the EU’s Horizon 2020 research and innovation programme under grant agreement H2020-732665 EMMA.