MKLab successfully participated to the Ad-hoc Video Search (AVS) of TRECVID 2016. The AVS task attempts to model the end-user video search use-case, where the user is looking for segments of video containing persons, objects, activities, locations, etc. and combinations of the former. To express this information need, the user provides to the retrieval system a short query in natural language, e.g., “Find shots of a person playing guitar outdoors”. The experiments this year were performed on a set of Internet Archive videos totalling about 600 hours of video duration, and using 30 different queries. As per AVS guidelines, the experimental runs could be either fully automatic ones, or manually-assisted; the latter allowing some form of human intervention in the way that the retrieval system interprets or extends the original natural language query.
Our fully automatic runs performed very well in this challenging task, compared to the runs of the other participating institutions from all over the world. Specifically, our best run was ranked 2nd-best, achieving an inferred average precision of 0.051 (compared to 0.054 reached by the best-performing participant in the fully-automatic category, and 0.040 reached by the 3rd best-performing one). Interestingly, our fully automatic runs also compared favorably to the manually-assisted runs that were submitted to AVS: with an inferred average precision of 0.051, our best fully automatic run also outperformed the runs of all but one participant in the manually-assisted run category. Our participation to the AVS task this year was supported by the EU’s Horizon 2020 research and innovation programme under grant agreements H2020-693092 MOVING and H2020-687786 InVID.
Indicative results of our system are shown in the following figure: