VIDI-Video: Interactive semantic video search with a large thesaurus of machine-learned audio-visual concepts. VIDI-Video project takes on the challenge of creating a substantially enhanced semantic access to video, implemented in a search engine. The engine will boost the performance of video search by forming a 1000 element thesaurus detecting instances of audio, visual or mixed-media content. This project's approach is to let the system learn many, possibly weaker, detectors instead of modelling a few of them carefully.
Concrete outputs will be a fully implemented audio-visual search engine, consisting of two main parts, viz. a learning system and a runtime system, where the former will feed its results into the latter after each round of training-and-thesaurus-update. The learning system will consist of software to be developed for overall video processing; visual analysis; audio analysis; integrated feature detector; and multimedia query and user interface.
The key objectives of this project are:
- to build a large scale thesaurus well-spread over the semantic clues
- to design, adapt and evaluate methods to learn large thesauri of detectors
- to define and evaluate powerful sets of visual, audio, and cross-modal invariant features
- to deliver effective interaction with the user
- to evaluate the approach in relevant application areas