MDCA - Programme "Masse de Données - Connaissances Ambiantes" 2006

Automatic annotation and Visual concept Extraction for Image Retrieval – AVEIR

Submission summary

xte et motivation Contexte et motivation Retrieving images in very large databases has been an active field for several years now. Image retrieval systems roughly fall into two categories: content based image retrieval (CBIR) and retrieval using manual keyword annotation. For CBIR, queries are images, image parts or sometimes mixture of drawing and image characteristics. This approach never succeeded to close the semantic gap between user information need and the expressiveness limit of query by sample techniques in the image domain. Web search engines (e.g. Google, Yahoo) have developed image retrieval techniques relying on keyword annotations of images which are limited to simple keyword queries. Both approaches have up to now failed to reduce the well known semantic gap between user expectations and image expressive power. CBIR is mostly limited to (sometimes complex) comparisons based on low image features. Retrieval by text is limited, due to its weak recall: only images that were indexed with high confidence can be accessed while others are ignored. Besides, such search engines completely fail whenever the user is interested in the visual aspects of the image itself. A new emerging and maybe more challenging field in this domain is the automatic concept recognition from visual features. It relies on two key issues: "feature detection and rich image representation and indexing" and robust and accurate "image annotation". The project targets these two specific problems and proposes new and original solutions. The overall goal of the project is to enrich image retrieval systems with semantic indexation and annotation and with symbolic relational description, all being automatically extracted and built from the textual and image content of documents and web pages. This semantic and symbolic information will be used in order to reduce the visual ambiguity in images and to enhance the retrieval of images from large databases. As for the target application, we will consider in this project multi thematic general families of images such as those found on web pages, documents and professional collections like the classical Corel database. The project will develop 3 research axes. The first axis is focused on image analysis, feature extraction and visual feature representations. Most annotation systems divide images into blobs and annotate the collection of blobs. The originality of our proposal is to bypass this baseline approach and to develop rich image representations. First, state of the art image segmentation algorithms focusing on robustness of the segmentation will be used for identifying salient components of the image and on spatial relations between them (geometry, topology, adjacency) will be extracted, both imbedded in a high level attributed graph representation. Second, the representation will rely on multiple views (facets) of the image. The second axis is concerned with the automatic labeling of image components or objects with textual concepts. Labeling is formulated here as a classification problem where the labels are noisy and defined in an imprecise way. Labels are often defined at the global image level (not at the targeted component level) and with uncertainty. We propose to explore different formal statistical settings developed in the machine learning (ML) community and to adapt some ML paradigms for the annotation problem in order to make this labeling task fully automatic. The techniques we propose to use heavily rely on state of the state of the art and new machine learning methods. The third axis considers image retrieval and evaluation of the proposed algorithms. Retrieval will offer the possibility to use the rich image representations developed in the first axis, allowing the user to use high level semantic queries. Fusion of visual and semantic queries will be studied in this axis. Tests will be performed on classical benchmarks and annotated collections will be developed in the project and released as project deliverables. Tests will then be performed on different multimedia document collections and specific annotated corpora will be developed for the project and made available to the community. Four academic teams cooperate for the project. They have complementary skills as indicated below : ENST: image analysis, image representation and modeling, data fusion CLIPS: multimedia information retrieval LIP6: machine learning LSIS: retrieval and integration of heterogeneous information, image annotation techniques attendues Retombées scientifiques et techniques attendues The main results expected at the end of the AVEIR project are: definition of a model that represent different facets (views) of the images definition of probabilistic approaches for the automatic annotation of usages according to the image content and text describing the images, definition of a set of test collections for the evaluation of image annotation and retrieval prototype of image retrieval system based on the different advances of AVEIR. Multi-facets descriptions allow reducing image ambiguity and open promising perspectives for querying large image databases. The semantic labeling of complex image descriptions is however an open problem. For now, simple blob like representations have been used for automatic annotation. Adapting complex representations for general families of image databases is also challenging. We believe that the proposed approach has the potential to meet these challenges so as to bypass the limitations of the current approaches. The project handles both very practical problems (design of efficient and expressive image search engines) and open theoretical problems in the domains of visual concept representation, semantic concept extraction and machine learning problems. Retombées industrielles attendues Developing robust and accurate solution for the automatic semantic annotation of images has important consequences for many applications in the multimedia domain. The project will provide principled methods for this problem which could be developed for large scale application by future industrial collaboration. This project may have a strong impact for the development of national and European R&D projects.

Université

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE - DELEGATION REGIONALE ILE-DE-FRANCE SECTEUR PARIS A

Help of the ANR 372,917 euros
Beginning and duration of the scientific project: - 36 Months

Explorez notre base de projets financés

ANR makes available its datasets on funded projects, click here to find more.