Information retrieval effectiveness evaluation typically takes one of two forms: batch experiments based on static test collections, or online experiments tracking user’s interactions with a live system. Test collection experiments are sometimes viewed as introducing too many simplifying assumptions to accurately predict the usefulness of a system to its users. As a result, there is great interest in creating test collections that better model the variability encountered in real-life search scenarios. This includes experimenting over a variety of queries, corpora or even users and their interactions with the search results. In this talk I will discuss different ways of incorporating user behaviour in batch experimentation, how to model the variance introduced to measurements of effectiveness, and how to extend our statistical significance test arsenal to allow comparing search algorithms.

Lecturer: Dr. Evangelos Kanoulas