The rapid advancements of digital technologies, as well as the progress and wide availability of digital cameras and sensors have resulted in a great increase of multimedia data production worldwide. This is also the case for multimedia data that describe the state of the environment, which include huge amounts of data streams from model systems, dedicated stations and amateur sensors, as well as visual environmental information, such as heatmaps and forest satellite images. In parallel, the success of citizen sciences and social networking tools has fostered the emergence of large and structured communities of nature observers (e.g. e-bird, xeno-canto, Tela Botanica, etc.), who started to produce outstanding collections of biodiversity multimedia records. Citizens have become increasingly aware of the important role that environmental data (e.g. weather forecast, air quality, life species distributions) play on health issues (e.g. allergies), as well as to a variety of other human  activities (e.g. agriculture, trip planning). In addition, such data are very important for environmental issues and phenomena, such as the greenhouse effect, the global warming and the climate change.

Therefore, there is an increasing need for the development of advanced techniques for analyzing, interpreting and aggregating environmental data provided in multimedia formats. This will allow for the generation of reliable measurements, as well as for the development of personalised applications that will take into account the state of the environment and the personal health conditions and preferences . In addition, we will be able to produce more accurate and timely knowledge of other living species, which is essential for a sustainable development of humanity and for biodiversity conservation.

Despite the fact that a large number of multimedia analysis techniques has been developed specifically for extracting events and behaviours in human-cantered and general purpose applications, such as sports, movies, surveillance, relatively little attention has been paid to the analysis, retrieval and interpretation of environmental information from multimedia content. Only very recent projects such as PESCaDO, Pl@ntNet and PASODOBLE have dealt with developing innovative services that take into account environmental information and investigated the extraction, fusion and semantic interpretation of the environmental information encoded in multimedia format such as weather, air quality, pollen forecasts or citizen’s multimedia records.