Patent Image Retrieval based on Classification and High level Concepts

Problem Statement

Nowadays, vast numbers of patent documents are submitted to patent offices worldwide, in order to describe and protect innovative artifacts, algorithms and other inventions. Figures, drawings and diagrams are almost always contained in patents, as a means to further specify the objects and ideas to be patented. Obviously, image examination is important to patent experts in their attempt to deeply understand the patent contents and find relevant inventions and for that reason, a tool that supports efficient patent image retrieval based on high level understandable concepts would be of great help to patent experts.The primary challenge in dealing with patent images is the inherent difficulty in indexing them in a reliable way. By nature, patent images are bi-level (black and white) since they depict technical information in diagrammatic form. Existing content-based image retrieval systems rely heavily on colour and texture image features. Such features are completely absent from patent figures; to make up for this, one would need to extract features that quantify the figure geometry, e.g. the shapes depicted within the figure and the spatial relations between them and associate them with high level concepts.

Description of Work

The objective of this work is to automatically classify patent images to general categories and investigate the potential of extracting high level concept information from them utilizing content-based information. This task includes the employment of well established machine learning methods for classification and concept detection and usage of low level visual features that describe efficiently the patent images. More specifically, a framework of Support Vector Machines (SVM) classifiers will be trained with a golden set to classify patent binary images to categories and provide degrees of confidence for each concept. A manual annotation effort of a patent database across predefined general categories (i.e. flowcharts, circuits, etc) and human understandable high level concepts (e.g. dipole, resistance, laser beam, etc.) will be required. Optimization experiments will be performed with the SVM framework by selecting alternative forms of the feature vector. The result of this task should be a standalone module that could be integrated to existing patent indexing and retrieval frameworks.