The goal of this project is to detect "online" noticeable events in video sequences. By "noticeable event" we mean any event that draws attention by its spatial and/or temporal behavior. In this project, we wish to develop a general-purpose detection framework. This rules out supervised learning-based algorithms which require specific training data. Rather, we propose to characterize noticeable events as break points w.r.t. their context, which is more suited for our goal. Actually, we are not only interested in studying only the actors of the scene (i.e., any entity whose dynamics or visual aspect draws attention) and their individual behaviors, but also in studying their behaviors w.r.t. their environment. Each actor will be observed in a so-called "observation" region that will be determined using local or semi-local low-level generic primitives (points of interest, edgels, areas, etc.) still to be defined, as well as clustering methods applied on these primitives. The relationships between different actors or between the actors and the context will be modeled by spatial, temporal and spatio-temporal relations. Those will be defined over observation regions or over sets of descriptors to result in compact descriptions. In addition, they will be compared and their evolution over time studied. Modeling relations between scattered sets of visual primitives (including points of interest) as well as spatio-temporal relations, and defining their comparison measures is one of the novel contributions of this project. For this purpose, we propose to rely on both mathematical morphology and fuzzy sets. In addition, observation regions will be accurately tracked by an advanced tracking module based on a particle filter relying on a dynamic Bayesian network. The advantage of exploiting this module is twofold: first, the optimal filter will be guided in its prediction and estimation processes between consecutive time slices by the spatial relations computed before, hence resulting in better trackings; second, these estimations will be exploited to improve the determination of the descriptors and the observation regions in the next time slice. As observation regions evolve over time (their number may for instance vary) as well as their relations, we propose to make the structure of the dynamic Bayesian network evolve accordingly (in this case, such a network is called non-stationary). Thus, all the features of the studied scene can be integrated "online" into the dynamic Bayesian network. This is another originality of the project. Noticeable event detection will be performed either on a temporal or a spatial basis. For the former case, the non-stationary dynamic Bayesian network will allow us to detect break points due to its structure shifts over time. In addition, measuring the quality of the estimations provided by the particle filter will allow us to detect inaccurate trackings, which may be an indicator of contextual break points highlighted by a misguidance of the prediction process due to the spatial relations. Finally, measuring the evolution over time of the spatial relations will also provide evidence of local break points. Our approach will be tested against different kinds of scenarios (somebody leaving an object, somebody moving very differently from the surrounding crowd, etc.). It will then be validated on databases of sequences dedicated to event detection whose ground truth we will construct.
Madame Séverine DUBUISSON (Laboratoire d'Informatique de Paris 6) – email@example.com
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
LIP6 Laboratoire d'Informatique de Paris 6
LTCI Laboratoire Traitement et Communication de l’Information
IGN Institut National de l'Information Géographique et Forestière
Help of the ANR 278,271 euros
Beginning and duration of the scientific project: December 2012 - 36 Months