Category: Uncategorized (page 1 of 4)

Models for Information Retrieval and Recommendation

Online information services personalize the user experience by applying recommendation systems to identify the information that is most relevant to the user. The question how to estimate relevance has been the core concept in the field of information retrieval for many years. Not so surprisingly then, it turns out that the methods used in online recommendation systems are closely related to the models developed in the information retrieval area. In this lecture, I present a unified approach to information retrieval and collaborative filtering, and demonstrate how this let’s us turn a standard information retrieval system into a state-of-the-art recommendation system.

Lecturer: Prof. Arjen P. de Vries

Prof. Arjen P. de Vries

ArjenProf. Arjen P. de Vries is the head of the Information Access group at the Centrum Wiskunde & Informatica (CWI) in Amsterdam, and a part-time full professor in the area of multimedia data spaces at the Technical University of Delft. Starting with his PhD research into multimedia databases at University of Twente, De Vries has always been interested in search solutions where the advantages of automated search are combined with user interactions. He published more than 100 refereed papers, two of these recognized with a best (student) paper award (CIVR 2004 and ECIR 2007), and participated in several EU FP6 and FP7 research projects, including Vitalas (FP6), PuppyIR (FP7) and COMSODE (FP7).

De Vries (co-)chaired SIGIR Amsterdam in 2007 and has held a number of program chair positions, including CIKM (2012) and ECIR (2012, 2014). He is PC chair of SIGIR 2017 in Japan and the new ACM ICTIR conference in 2015, and has recently been appointed Editor-in-Chief of Springer’s Information Retrieval Journal (IRJ). De Vries is a member of the TREC PC, and a steering committee member of INEX. In November 2009, De Vries co-founded CWI spin-off company Spinque, to bring his “Search by Strategy” approach to the market: an iterative 2-stage search process that separates search strategy definition (the how) from actual searching and browsing the collection (the what).

Lecture: Models for Information Retrieval and Recommendation

Multilingual summarisation

Automatic text summarization is a crucial add-on in the context of Information Retrieval since it allows the user to quickly grasp potentially large amounts of retrieved material and thus assess its relevance to their needs. Especially if the material stems from multilingual sources, its summarization is an asset.

The lecture aims to give an overview of the state of the art in summarization, with a special focus on multilingual techniques. In the first part of the lecture, we will introduce the traditional distinction between extractive and abstractive summarization and present in some depth modern approaches in both paradigms. Both single document and multiple document summarization will be considered. In the second part, multilingual summarization will be addressed. First, we will elaborate on language-independent techniques that are used in the state of the art for single respectively multiple document summarization. Then, we will discuss how we can obtain summaries in the language of the preference of the user from multilingual material. The third part will be dedicated to the presentation of the evaluation measures used to assess the quality of summarization techniques. To conclude, we will discuss how summaries can be taken advantage of in IR itself.

Lecturer: Leo Wanner

Leo Wanner

leo_wannerLeo Wanner is ICREA Research Professor of Computational Linguistics at the Department of Information and Communication Technologies, Pompeu Fabra University (UPF) in Barcelona. He received his Diploma in Computer Science from the University of Karlsruhe (Germany), and his PhD in Computational Linguistics from the University of the Saarland (Germany). Prior to joining ICREA and UPF, Leo held positions at the Institute for Integrated Publication and Information Systems of the Fraunhofer Gesellschaft, Germany, the University of Waterloo, Canada, Information Sciences Institute of the University of Southern California, and the University of Stuttgart, Germany.

Leo’s current research interests revolve around several areas of Natural Language Processing – including multilingual text generation and summarization (both mono- and document), dependency-oriented parsing and semantic content analysis, computational lexicography and visual analytics in linguistic and content-oriented applications. Recently, Leo has been involved in a series of national and European large scale projects in the field, in six of them as Project Coordinator or as Scientific Coordinator. He published his work in over 150 papers in refereed journals and conferences. Leo is member of the editorial board of the Computational Intelligence Journal and serves regularly on Program Committees of leading conferences in Computational Linguistics (ACL, COLING, EMNLP, IJCNLP, LREC, INLG, …).

Lecture: Multilingual summarisation

Information Retrieval Infrastructures

As new information streams/sources emerge and the volume of information grows, it is increasingly important to have an effective infrastructure that will ensure both the quality, the efficiency and the scalability of a search service. Starting from the traditional information retrieval system architecture, this lecture will provide an overview of important IR advances that enhance a search system in 4 aspects, namely: increasing system efficiency; ensuring that it copes with big data; processing modern real-time data streams (e.g. Twitter); and facilitating more effective ranking using state-of-the-art learning to rank techniques. In particular, in this lecture, we will discuss how caching strategies, dynamic pruning techniques and index compression can speedup the document retrieval process. Moreover, we will cover recent advances enabling the distribution of IR system components over multiple machines, for instance indexing using MapReduce, or distributed architectures of retrieval. The lecture will also discuss how to develop distributed search infrastructures that can process real-time streams using frameworks such as Storm or Spark. Finally, we will discuss how the introduction of learning to rank technologies requires new search infrastructures to both train and deploy effective supervised search systems.

Lecturer: Dr Craig Macdonald

Dr Craig Macdonald

craig_mcdonald
Dr Craig Macdonald is a Lord Kelvin Adam Smith Research Fellow at the University of Glasgow. He has worked on effective & efficient models and approaches for a variety for information retrieval
tasks. His early research focused on models for identifying persons with relevant expertise within
organisations. More recently, he has embraced research of social media (blogs, tweets etc.), as
well as efficient search, learning to rank and information retrieval evaluation. Dr Macdonald coordinates the development of the Terrier open source information retrieval (IR) platform, which has been downloaded over 32,000 times by academic, government and industrial institutions worldwide. I have disseminated my research within premier international IR conferences (e.g.SIGIR, CIKM, ECIR, WSDM) and journals (e.g. TOIS, IRJ, IPM). Finally, as part of Text REtrieval Conference (TREC) – a renowned international forum for evaluating search engines – he has co-led the creation of evaluation methodologies and standard testbeds, firstly for the Blog (2006-2010), Microblog (2011-2012) and Web tracks (2014), which are dedicated to blog, real-time and Web search, respectively, in conjunction with the National Institute of Standards and Technology (USA), Univ. of Maryland/Twitter (USA), & Microsoft Research (USA).

Lecture: Information Retrieval Infrastructures

Professor Maarten de Rijke

maartenMaarten de Rijke is full professor of Information Processing and Internet in the Informatics Institute at the University of Amsterdam. He holds MSc degrees in Philosophy and Mathematics (both cum laude), and a PhD in Theoretical Computer Science. He worked as a postdoc at CWI, before becoming a Warwick Research Fellow at the University of Warwick, UK. He joined the University of Amsterdam in 1998, and was appointed full professor in 2004.

De Rijke leads the Information and Language Processing Systems group, one of the world’s leading academic research groups in information retrieval. During the most recent computer science research assessment exercise, the group achieved maximal scores on all dimensions. His research focus is on intelligent information access, with projects on self-learning search engines, semantic search, and social media analytics.

A Pionier personal innovational research incentives grant laureate (comparable to an advanced ERC grant), De Rijke has helped to generate over 40MEuro in project funding. With an h-index of 53 he has published over 650 papers, published or edited over a dozen books, is the Editor-in-Chief of ACM Transactions on Information Systems and of Springer’s Information Retrieval book series, (associate) editor for various journals and book series, and a current and former coordinator of retrieval evaluation tracks at TREC, CLEF and INEX. He was co-chair for SIGIR 2013, general chair for ECIR 2014, co-chair “web search systems and applications” for WWW 2015, short paper co-chair for SIGIR 2015, and program co-chair for information retrieval for CIKM 2015. He is also general co-chair of WSDM 2017.

He is the director of Amsterdam Data Science and of the University of Amsterdam’s Ad de Jonge Center for Intelligence and Security Studies. He’s a former director of the Intelligent Systems Lab (ISLA) and of the Center for Creation, Content and Technology (CCCT).

The retrieval and language technology developed by his research group is being used by organizations around the Netherlands and beyond, and has given rise to various spin-off initiatives.

Lecture: Information Retrieval Foundational Models and Concepts

Information Retrieval Foundational Models and Concepts

To help structure the lectures, I will take the canonical architecture of a modern search engine as my starting point and organize the important concepts, results and key research issues around this architecture. A fundamental distinction that we will encounter is between offline and online stages within this architecture. Stages that receive their input sequentially are said to operate in an online modality. The key difference with the offline (or “batch”) stages is that in online stages we update our knowledge after the arrival of every new datapoint, whereas offline techniques are used when we have access to all training examples at once. Online approaches could be used in the case of a process occurring in time, for example an evolving search session of an individual user, in which case a ranker might update as time goes on and we get more and more samples of the user’s queries and interactions.

The first lecture will focus on the offline stages of the canonical information retrieval architecture. This includes crawling, document enrichment, aggregation of external sources related to a given document (anchor texts, click features, …).

In the second lecture, I will focus on online stages of the canonical information retrieval architecture. This includes query auto-completion, query understanding, ranking and retrieval, and result page generation. Because the search engine does not know the whole input, during the online stages we are forced to make decisions that may later turn out not to be optimal; a solid experimental framework is essential for informing and controlling this decision-making process.

Lecturer: Professor Maarten de Rijke

Evaluation Metrics for Clustering and Filtering Tasks

Following Stefano Mizzaro’s presentation on Axiometrics, in this talk we will review and compare the most popular evaluation metrics for Clustering and Filtering tasks. In order to compare and assess the adequacy of metrics, we will specify a few intuitive formal constraints for each task, which every suitable metric should satisfy. The analysis leads to some practical conclusions: for the clustering problem, there is only a metric pair (Bcubed Precision and Recall) that satisfies all formal constraints. For filtering metrics, we end up distinguishing three metric families which have mutually exclusive properties. The results provide useful guidance to select the most adequate evaluation metric for each application scenario.

Lecturer: Dr. Julio Gonzalo

Dr. Julio Gonzalo

JulioJulio Gonzalo (UNED, Madrid, Spain) is head of the UNED research group in Natural Language Processing and IR (nlp.uned.es). He has recently been co-organizer of the RepLab Evaluation Campaign for Online Reputation Management Systems, co-organizer of the WePS evaluation campaign for Web People Search systems, general co-chair of the CLEF conference, and recipient of a Google Faculty Research Award together with Stefano Mizzaro and Enrique Amigó (co-lecturers at ESSIR 2015). His research interests include Entity-Oriented and Semantic Search, Evaluation Methodologies and Metrics in Information Access, and Information Access Technologies for Social Media. A list of his publications can be found at Google Scholar.

Lecture: Evaluation Metrics for Clustering and Filtering Tasks