The dataset comprises videos from a variety of event categories, such as politics, sports, natural disasters, accidents, wars, etc. Currently, it consists of 200 unique debunked videos (for simplicity also referred to as fake) and 180 unique verified videos (also referred to as real). In particular, different types of fake video are included:
The dataset was initially formed after investigating important and viral events. Fact-checking sites such as snopes.com and others were consulted both to help with the annotation of the videos and to discover more debunked videos. Another source of content was the Context Aggregation and Analysis service Context Aggregation and Analysis service developed within the InVID project as a tool for video verification. The service, being one of the few publicly available tools for video metadata analysis, generally attracts traffic from verification experts who submit suspicious videos for verification, often as part of using the InVID verification plug-in. All videos submitted to the service between November 2017 and January 2018 resulted in an initial pool of approximately 1600 videos. This set was filtered to remove non-UGV and other irrelevant content, and consecutively, was annotated as real or fake. The dataset contains videos from three major video sharing platforms: YouTube, Facebook, and Twitter.
The extended dataset was formed based on a largely automatic systematic process that combines text search and near-duplicate video retrieval, followed by manual annotation using a set of guidelines. More specifically:
The overall dataset consist of 3957 videos annotated as fake and 2458 annotated as real.
Facebook videos that were relevant to the dataset but were published by individual users (and thus could not be accessed through the API) were excluded from this dataset.
The video URLs and the associated annotations (fake/real) and near-duplicate video URLs are contained in csv files.
To obtain metadata about videos, you may use the respective platform API.
If you encounter any issues in this process, please get in touch with Olga Papadopoulou.
The video dataset is provided under the Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
The video dataset is supported by the InVID project, which is funded by the European Commission under contract number 687786.
If you use this video dataset for your research, please include a citation to the following paper: Papadopoulou, O., Zampoglou, M., Papadopoulos, S., & Kompatsiaris, Y. (2018). A Corpus of Debunked and Verified User-Generated Videos. Online Information Review. Accepted for publication.