Multimedia content is ubiquitous and immense; the retrieval of multimedia content represents an important and hard challenge. This talk will focus on the analysis and retrieval of multimedia content, most notably images and videos. It will start by considering which data granularities are useful for retrieval, and will discuss methods for the temporal segmentation of video to shots and scenes. It will continue with discussing the problem of image and video indexing with both low-level features and high-level concepts (semantic indexing). For the latter case, the use of Web-based retrieval services towards automatically generating training corporal will also be briefly examined. The talk will then proceed to discussing the indexing of video with complex event labels, using either training video samples or simply a short textual description of each sought event. Throughout the talk, machine learning methods that are in the core of multimedia indexing with concept and event labels will also be sketched. Finally, ideas for future research will be discussed.