Lecturer: Dr. Barla Cambazoglu
Large-scale retrieval systems are indispensable tools for accessing the information available in the Web. In practice, there are two fundamental challenges faced by these systems: i) achieving high effectiveness when serving user queries and ii) doing this efficiently. In this context, effectiveness refers to understanding users’ information needs, often expressed by a few query terms, and providing high-quality search results that satisfy these needs. Efficiency refers to the speed at which a retrieval system is able to respond to search requests and its capability to operate under heavy query workloads. This tutorial aims to give an overview of the techniques employed by the state-of-the-art web retrieval systems to tackle the above-mentioned effectiveness and efficiency challenges. The main body of the tutorial is accordingly composed of two parts (they are roughly balanced in terms of length). In the first part of the tutorial, we first take a system-centric view and discuss document indexing and query processing in web retrieval systems (covering many technical issues that may affect the quality of generated search results, such as spam filtering, deduplication, ranking, result diversification, snippet generation). We then take a user-centric view and discuss various evaluation metrics employed for estimating user satisfaction with search results. In the second part of the tutorial, we shift our focus to efficiency and scalability issues in web retrieval. In particular, we present alternative indexing and search architectures as well as some specific optimizations, such as index compression, skipping, early termination, and result caching. Moreover, we briefly talk about the implications of efficiency improvements on users’ engagement with the retrieval system.