VERGE

Collections

Internet Archive Book Images (I.A.B.I.)

I.A.B.I. is a dataset (about 12GB & keep growing). The dataset consists of a photo_id, a jpeg url or video url, and some corresponding metadata such as the title, description, title, camera type, title, and tags. Plus about 49 million of the photos are geotagged!

More info about the dataset is available here

Wikipedia Dataset

Two collections of Wikipedia images were used during the four years of the task: the Wikipedia INEX Multimedia Collection consisting of 151,519 images in 2008 and 2009, and the Wikipedia Retrieval 2010 Collection consisting of 237,434 images in 2010 and 2011. A number of topics were developed in order to respond to diverse multimedia information needs; there were 75 topics in 2008, 45 in 2009, 70 in 2010, 50 in 2011. The ground truth for these topics was created by assuming binary relevance (relevant vs. non relevant) and by assessing only the images in the pools created by the retrieved images contained in the runs submitted by the participants each year; a pool depth of 100 was used in 2008, 2010, and 2011, and a pool depth of 50 in 2009.

More info about the dataset is available here

BBC EastEnders

Approximately 244 video files (totally 300 GB, 464 h) with associated metadata, each containing a week's worth of BBC EastEnders programs in MPEG-4/H.264 format. This dataset is available only for the participants of the TRECVID 2015 INS task.

More info about the dataset is available here