Blanc SIMI 2 - Blanc - SIMI 2 - Science informatique et applications

Similarity of Locally Structured Data in Computer Vision – SoLStiCe

Submission summary

SoLSTiCe is a fundamental research project which aims at designing new models and tools for representing and managing images and videos in order to, e.g., retrieve images or videos which are similar to a query image or video; recognize objects in images; track objects in videos; or detect typical activities in videos. To tackle those applications, a major current trend is to use bag-of-visual-words (BoVW) models, the basic idea of which is to extract local features from small image regions so that images are mapped into a vector space of visual words. However, BoVW models as many other global models proposed in the literature do not integrate structural information such as spatial or temporal relationships holding between local features which hinders their applicability to realistic problems requiring large discriminance. The lack of structural information can be an advantage as it is easier to make the models invariant to a large class of transformations. However, the drawback is their lack of ability to model geometrical and temporal relationships between parts of objects and actions which is required for complex applications.

In this project, we would like to explore locally structured data (LSD), which combine visual features (such as interest points, segmented regions or visual words) with discrete structures (such as strings, trees, combinatorial maps or, more generally, graphs) in order to model local (spatio-temporal) relationships holding between these features. Using LSD for classification, recognition or indexing tasks will bring us to study 3 main issues:

- [Extracting LSD from images and videos:] We extract relevant visual features and structure them w.r.t. spatial and temporal relationships.
- [Measuring the similarity of LSD:] We design relevant similarity measures for comparing LSD, and efficient algorithms for computing these measures.
- [Mining LSD:] We characterize LSD by means of frequently (or infrequently) occurring patterns (itemsets, sequences or graphs) and use them to create discriminative features for solving computer vision tasks.

Two main issues in computer vision that motivate our LSD are the need to deal with occlusions and non-rigid objects. We propose to validate our new models and tools on three computer vision applications which share this need, i.e., the recognition of human actions and events in videos, the tracking of objects in videos and the recognition of objects in 3D scenes from 2D images or videos. These applications remain open problems and own complementary constraints mainly due to the different media types that are addressed (2D (+ t), 3D and 3D+t).

Marc SEBBAN (Laboratoire Hubert-Curien) – marc.sebban@univ-st-etienne.fr

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

LIRIS - CNRS Laboratoire d'InfoRmatique en Image et Systèmes d'information
LaHC Laboratoire Hubert-Curien

Help of the ANR 274,064 euros
Beginning and duration of the scientific project: January 2014 - 48 Months

Explorez notre base de projets financés

ANR makes available its datasets on funded projects, click here to find more.