Multimedia Knowledge and Social Media Analytics Laboratory


VIDI-Video: Interactive semantic video search with a large thesaurus of machine-learned audio-visual concepts. VIDI-Video project takes on the challenge of creating a substantially enhanced semantic access to video, implemented in a search engine. The engine will boost the performance of video search by forming a 1000 element thesaurus detecting instances of audio, visual or mixed-media content. This project's approach is to let the system learn many, possibly weaker, detectors instead of modelling a few of them carefully.

Concrete outputs will be a fully implemented audio-visual search engine, consisting of two main parts, viz. a learning system and a runtime system, where the former will feed its results into the latter after each round of training-and-thesaurus-update. The learning system will consist of software to be developed for overall video processing; visual analysis; audio analysis; integrated feature detector; and multimedia query and user interface.

The key objectives of this project are:

  • to build a large scale thesaurus well-spread over the semantic clues
  • to design, adapt and evaluate methods to learn large thesauri of detectors
  • to define and evaluate powerful sets of visual, audio, and cross-modal invariant features
  • to deliver effective interaction with the user
  • to evaluate the approach in relevant application areas