Semantic Video Analysis Evaluation Framework

Framework description

Aim of this framework is to provide a test bed for the evaluation of semantic video analysis approaches to: a) motion-based recognition, and b) modality fusion and temporal context exploitation. The task considered is semantic video shot classification.

In this work, a multi-modal context-aware approach to semantic video analysis is evaluated. Overall, the examined video sequence is initially segmented into shots and for every resulting shot appropriate color, motion and audio features are extracted. Then, Hidden Markov Models (HMMs) are employed for performing an initial association of each shot with the semantic classes that are of interest separately for each modality. Subsequently, a graphical modeling-based approach is proposed for jointly performing modality fusion and temporal context exploitation. Novelties of this work include the combined use of contextual information and multi-modal fusion, and the development of a new representation for providing motion distribution information to HMMs. Specifically, an integrated Bayesian Network (BN) is introduced for simultaneously performing information fusion of the individual modality analysis results and exploitation of temporal context, contrary to the usual practice of performing each task separately. Contextual information is in the form of temporal relations among the supported classes. Additionally, a new computationally efficient method for providing motion energy distribution-related information to HMMs, which supports the incorporation of motion characteristics from previous frames to the currently examined one, is presented. The final outcome of the overall video analysis approach is the association of a semantic class with every shot. The proposed methods are evaluated in four datasets belonging to the domains of tennis, news and volleyball broadcast video.



The provided datasets include: a) the ground truth video annotations at shot level, b) the estimated motion fields, and c) the computed single-modality analysis results.


Download the readme file.

Download the ground truth annotations and the single-modality analysis results.

Download the motion fields:

tennis1 tennis2 tennis3 tennis4 tennis5 tennis6 tennis7 tennis8

news1 news2 news3 news4 news5 news6 news7 news8 news9

volley1 volley2 volley3 volley4 volley5 volley6 volley7 volley8 volley9 volley10



G. Th. Papadopoulos, V. Mezaris, I. Kompatsiaris and M. G. Strintzis, "Joint Modality Fusion and Temporal Context Exploitation for Semantic Video Analysis", EURASIP Journal on Advances in Signal Processing, submitted to.