Verge Search Engine Interface



VERGE comprises several modules and each of them integrates certain technologies. Here follows a brief description of the technologies used in each module.

Visual Similarity Search by Example Module

Two techniques have been applied for retrieving visually similar images. In both techniques during the query phase, a feature vector, defined by each technique, is extracted from the query image and a distance metric is computed between the query's image descriptor and the descriptors of the dataset images. These distances are ranked according to their level of similarity with the query example. The techniques that are applied for retrieving visually similar images are:

MPEG-7 based technique

This technique captures global information such as colour and texture. Two descriptor combinations were selected to represent the visual content; 1) ColorLayout and EdgeHistogram and 2) ColorLayout and Color Structure.

SIFT based technique

This technique captures local information by applying the Lowes SIFT transform. The method adopted is an implementation of the bag-of- visual words approach where a large amount of local descriptors (training data) is used for learning the visual words which are extracted by applying clustering using a fixed number of clusters.
In both cases an r-tree structure is constructed to support efficient indexing and fast retrieval.


Textual Information Processing Module

The textual query module attempts to exploit the shot audio information. This audio information is processed off-line with the application of Automatic Speech Recognition and Machine Translation to the initial video, so that specific sets of keywords can be assigned to each shot. Textual processing module differentiates by versions, thus we will describe briefly the basic of each version.

Version 9.0.

In this version, the text-retrieval code base is re-written as a set of Perl modules and the full-text retrieval engine is migrated from KinoSearch to the Lemur Toolkit, which is an open-source framework designed to facilitate research in language modeling and information retrieval. Term weights for each keyword are still computed using the BM25 text algorithm. Moreover, the query expansion based on WordNet synsets as well as the concepts suggestions introduced in version 8.0 are preserved buta semantic similarity measure that calculates the similarity between query terms and concept terms (broader, narrower and related terms) is also introduced, in order to achieve better results with the query expansion and concept suggestion functions.

Version 8.0.

In this version, a full-text retrieval engine using KinoSearch, which is a Perl search engine library based on Lucene, is introduced to allow indexing and query functions. Term weights for each keyword are computed using the BM25 text algorithm. Moreover the systems recall is boosted by using query expansion, which is implemented by generating a list of synonyms for each query term based on WordNet synsets. Finally, to assist the user in subsequent query iteration tasks, traditional thesaurus concept suggestions by mapping WordNet hypernyms to broader terms and hyponyms to narrower terms are generated.

Version 7.0.

In this version, the text algorithm employed by the module is the BM25 algorithm, which incorporates both normalised document length and term frequency. The module is further capable of providing related keywords to the searcher by processing the associated text of the initial results and eventually extracting the most frequent keywords. In that way the module receives feedback from the results and suggests additional input to the user for submitting similar queries.

High Level Concept Retrieval Module

This module provides high level concept (e.g. animal, landscape, outdoor, etc.) selection for the user. After an off line preprocessing the images are sorted based on similarity co-efficients for each concept. The procedure required for the extraction of high level concept information is based on a combination of MPEG-7 and SIFT based global image features. A set of SVM classifiers is used to create classification models for the MPEG-7 and BoW features, using the first half of the development set for the training procedure. The output of the classification is the Degree of Confidence (DoC), by which the query may be classified in a particular concept. After the classification process for both MPEG-7 and BoW features is completed, the respective DoCs are used as input to a second SVM. This stage-2 classification uses for training the second half of the development set and self-optimized parameters to create a classification model. The results from this 2-stage SVM on the testing set are sorted by DoC, and the 2000 higher in the rank are indexed in the database to support the concept retrieval.