As new information streams/sources emerge and the volume of information grows, it is increasingly important to have an effective infrastructure that will ensure both the quality, the efficiency and the scalability of a search service. Starting from the traditional information retrieval system architecture, this lecture will provide an overview of important IR advances that enhance a search system in 4 aspects, namely: increasing system efficiency; ensuring that it copes with big data; processing modern real-time data streams (e.g. Twitter); and facilitating more effective ranking using state-of-the-art learning to rank techniques. In particular, in this lecture, we will discuss how caching strategies, dynamic pruning techniques and index compression can speedup the document retrieval process. Moreover, we will cover recent advances enabling the distribution of IR system components over multiple machines, for instance indexing using MapReduce, or distributed architectures of retrieval. The lecture will also discuss how to develop distributed search infrastructures that can process real-time streams using frameworks such as Storm or Spark. Finally, we will discuss how the introduction of learning to rank technologies requires new search infrastructures to both train and deploy effective supervised search systems.

Lecturer: Dr Craig Macdonald