Improving Diversity in Image Search via Supervised Relevance Scoring (Dataset)
Results returned by commercial image search engines should include relevant and diversified depictions of queries in order to ensure good coverage of users' information needs. While relevance has drastically improved in recent years, diversity is still an open problem. In this paper we propose a reranking method that could be implemented on top of such engines in order to provide a better balance between relevance and diversity. Our method formulates the reranking problem as an optimization of a utility function that jointly considers relevance and diversity. Our main contribution is the replacement of the unsupervised definition of relevance that is commonly used in this formulation with a supervised classification model that strives to capture a query and application-specific notion of relevance. This model provides more accurate relevance scores that lead to significantly improved diversification performance. Furthermore, we propose a stacking-type ensemble learning approach that allows combining multiple features in a principled way when computing the relevance of an image. An empirical evaluation carried out on the datasets of the MediaEval 2013 and 2014 "Retrieving Diverse Social Images" (RDSI) benchmarks confirms the superior performance of the proposed method compared to other participating systems as well as a state-of-the-art, unsupervised reranking method.
We make available (here) the version of the MediaEval 2014 Retrieving Diverse Social Images dataset that we used in our experiments. In particular, we provide the following data, separately for the development and the test set of the collection:
- 9 alternative relevance-based orderings (along with the corresponding relevance scores) of the images of each query location. These orderings are obtained by applying the 3 variants of the sMMR method (sMMRa, sMMRq, sMMRaq w=1000) as described in section 5 of the paper, using 3 different features (vlad, cnn, bow), again as described in section 5 of the paper. The format of the files is the same as the one of the submission files that were asked from the participants of the task.
- The vlad, cnn and bow features for all images of the collection (both the location images and the wikipedia images). The data files follow the same structure as the ones provided by the organizers of the task for other types of features.
The data can be used to reproduce the experimental results presented in the paper as well as to try new methodologies based on the provided image features and relevance scores.
If you use the dataset in your research, please cite the following paper:
Eleftherios Spyromitros-Xioufis, Symeon Papadopoulos, Alexandru Lucian Ginsca, Adrian Popescu, Yiannis Kompatsiaris, and Ioannis Vlahavas. 2015. Improving Diversity in Image Search via Supervised Relevance Scoring. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (ICMR '15). ACM, New York, NY, USA, 323-330. DOI: http://dx.doi.org/10.1145/2671188.2749334
In case you have any questions about the dataset you can contact:
- Eleftherios Spyromitros-Xioufis (firstname.lastname@example.org)
- Symeon Papadopoulos (email@example.com)