PatMedia
PatMedia Search Engine
PatMedia is a Hybrid Retrieval Engine for patent multimedia content developed by the Multimedia Knowledge Laboratory at the Informatics and Telematics Institute. This search engine is capable of retrieving content as in four modes:
- Patent Browsing: based on patent information. Further options as direct patent pdf document download and drawing section page browsing are supported.
- Visual Search: based on an advnaced algorithm that extracts an innovative feature for binary image retrieval named: the Adaptive Hierarchical Density Histograms, with a view to finding visually similar content. Due to the fact that patent figures are binary images, color and texture information is not available. Therefore, this algorithm employes a feature vector that represents an image based on its geometry. This non-segmentation point-density orientated technique seems to combine high accuracy at low computational cost.
- Text Retrieval: based on the textual description of the figures in the patent document. The textual description is extracted and process with the aid of text analysis, while matching of the text with the images is performed by identifying the figure label with OCR techniques.
- Hybrid Retrieval: combines the aforementioned search options with filtering based on figure textual description and category. Category information is extracted after performing text analysis on the figure descriptions.
PatMedia comprises the retrieval module of an integrated patent image retrieval framework that supports automatic extraction, processing and indexing of patent figures and related metadata from patent documents. The design and implementation of this framework is tailored to the special nature of patents as it builds upon advanced techniques from image analysis and content-based retrieval to enhance the performance of patent image search.
PatMedia Concept Extraction and Retrieval Framework
PatMedia concept extraction framework is based on supervised machine learning. The core of the developed framework consists of a Support Vector Machine structure. Specifically, an individual SVM classifier is trained for every defined high-level concept to detect the corresponding instances based on the generated low-level descriptors.
Currently, 6 concepts are supported, which are found in images belonging to patents of A43B IPC class: Cleat, Ski Boot, High Heel, Lacing Closure, Spring Heel, Tongue.
Four different approaches were considered:
- Visual: based on visual information. Specifically the Adaptive Hierarchical Density Histograms (AHDH) [1] were extracted as feature vectors.
- Extended Visual: based on visual information. The scores provided by the classifiers employed in the visual training case formed a score vector, which was fed to a final SVM classifier structure to generate the final confidence score.
- Textual: based on textual information. A bag of words implementation was realized. Lemur [2] was employed for textual information indexing.
- Visual + Textual: based on visual and textual information. The feature vector generated was a concatenation of AHDH and bag of words vectors.
PatMedia concept extraction demo. Click on the figure for a live demonstration.
Conferences and Events
PatMedia was presented and evaluated in the following patent related events and conferences:
- Gerard Ypma, "Evaluation of Patent Image Retrieval", Information Retrieval Facility Symposium 2010 (IRFS 2010), Vienna, Austria, June 1-4, 2010.
- Jane List, "Review of ITI Approach for Searching Non-Text Information in Patents", Information Retrieval Facility Symposium 2010 (IRFS 2010), Vienna, Austria, June 1-4, 2010.
- Stefanos Vrochidis, "Towards Patent Image Retrieval", International Patent Information Conference & Exposition, IPI-ConfEx 2009, Venice, Italy, March 1-4, 2009.
- Stefanos Vrochidis, "Patent Image Retrieval", Information Retrieval Facility Symposium 2008 (IRFS 2008), Vienna, Austria, November 5-7, 2008.
- Joan Codina, Emanuele Pianta, Stefanos Vrochidis and Symeon Papadopoulos, "Integration of Semantic, Metadata and Image search engines with a text search engine for patent retrieval", Semantic Search 2008 Workshop, Tenerife, Spain, June 2, 2008.
This work was supported by the projects PATExpert and CHORUS "Coordinated approacH to the EurOpean effoRt on aUdiovisual Search engines'' both funded by the European Commission.
PatMedia in the Web
PatMedia is listed in the Information Retrieval Facility Prototype Portal.
PatMedia is evaluated by Intellogist, a place for finding patent searching expertise. Read the PatMedia review here..
Testimonials
"This tool could speed up the searches of my patent examiners", Mr Benno Penzkofer, Director of the sector of the European patent office (EPO) on Image Processing and Computer Games, after the presentation of PatMedia during his visit in ITI in Thessaloniki.
"This is certainly something for the toolbox of every patent searcher. Even at the current state PatMedia already has added value for my job. I really hope that this is picked up and developed further and/or integrated in existing search tools.", Mr Gerard Ypma, Senior Patent Searcher at ASML company, during his presentation for PatMedia Evaluation in IRFS 2010 in Vienna.
References
[1] P. Sidiropoulos, S. Vrochidis, I. Kompatsiaris, "Content-Based Binary Image Retrieval using the Adaptive Hierarchical Density Histogram", Pattern Recognition Journal, Elsevier, Volume 44, Issue 4, pp 739-750, April 2011.
[2] Lemur project: http://www.lemurproject.org/