User generated content, available in massive amounts on the Internet, comes in many “flavors” (i.e. micro message, text document, images and video) and is receiving increased attention due to its many potential applications. One important applications is the automatic generation of multimedia enrichments concerning users topic of interests and in particular the creation of event summaries using multimedia data. In this talk, an event-based cross media question answering system, which retrieves and summarizes events on a given user generated query topic is proposed. A framework for leveraging social media data to extract and illustrate social events automatically on any given query will be presented. The system operates in three stages. First, the input query is parsed semantically to identify the topic, location, and time information related to the event of interest (News in this scenario presented here). Then, we use the parsed information to mine the latest and hottest related News from social news web services. Third, to identify a unique event, we model the News content by latent Dirichlet Allocation and cluster the News using the DBSCAN algorithm. In the end, for each event, we retrieve both textual and visual content of News that refer the same event. The resulting documents are shown within a vivid interface featuring both event description, tag cloud and photo collage.
Popular question answering systems (i.e. YahooAnswers) and search engines retrieve documents on the basis of text information. The integration the visual information within the text-based search for video and image retrieval is still a hot research topic. In the second part of this talk, we propose to use visual information to enrich the classic text-based search for video retrieval. With the proposed framework, we endeavor to show experimentally, on a set of real world scenarios, that visual cues can effectively contribute to significant quality improvement of image/video retrieval. Experimental results show that mapping text-based queries to visual concepts improves the performance of the search system. Moreover, when appropriately selecting the relevant visual concepts for a query, a very substantial improvement of the system’s performance is achieved.
Based on the various results presented in this talk we argue that question answering (among other application) can greatly leverage from cross media analysis to the benefit of users.
by Benoit Huet, EURECOM, France.
Dr. Benoit Huet is Assistant Professor in the Data Science Department of EURECOM (France). He received his BSc degree in computer science and engineering from the École Supérieure de Technologie Électrique (Groupe ESIEE, France) in 1992. In 1993, he was awarded the MSc degree in Artificial Intelligence from the University of Westminster (UK) with distinction, where he then spent two years working as a research and teaching assistant. He received his DPhil degree in Computer Science from the University of York (UK) for his research on the topic of object recognition from large databases. He was awarded the HDR (Habilitation to Direct Research) from the University of Nice Sophia Antipolis, France, in October 2012 on the topic of Multimedia Content Understanding: Bringing Context to Content. He is associate editor for IEEE Multimedia Magazine, IEEE Transaction on Multimedia, Multimedia Tools and Application (Springer) and Multimedia Systems (Springer) and has been guest editor for a number of special issues (EURASIP Journal on Image and Video Processing, IEEE Multimedia, etc…). He regularly serves on the technical program committee of the top conference of the field (ACM MM/ICMR, IEEE ICME/ICIP). He chaired the following international conference: MMM2019, PCM2014 and multiple workshops. He is Technical Program co-Chair of the upcoming ACM Multimedia conference. He was chairing the IEEE MMTC Interest Group on Visual Analysis, Interaction and Content Management (VAIG) from 2010-1014. He has co-authored over 150 papers in Books, Journals and International conferences. His current research interests include Large Scale Multimedia Content Analysis, Mining and Indexing – Multimodal Fusion – Socially-Aware Multimedia.