CE22 - Sociétés urbaines, territoires, constructions et mobilité 2021

Event camera for the perception of fast objects around Autonomous vehicles – CERBERE

CERBERE : Event camera for fast object perception around the autonomous vehicle

The project addresses the limitations of conventional perception systems in dynamic or low-light conditions. Event-based cameras offer high temporal resolution and low power consumption but remain underexploited. The project aims to develop new methods and multimodal approaches to fully leverage this sensing modality and enable more robust, responsive, and energy-efficient perception systems.

Context and objectives

In recent years, research and experimentation on autonomous vehicles have multiplied, with autonomous vehicles being one of the major challenges of tomorrow's mobility. In the near future, users will have access to fleets of shared autonomous vehicles that can be booked at any time via a smartphone, while reducing the risks associated with human driving, as more than 90% of accidents are related to human error. One of the main technological challenges for the autonomous vehicle is the understanding of its environment, which is usually perceived by sensors such as lidars, radars and cameras. The main objective of this project is the exploitation of a sensor in rupture with the existing solutions for the perception of the autonomous vehicle: the event camera. The event camera is a bio-inspired sensor that, instead of capturing static images - while scenes are dynamic - at a fixed frequency, measures changes in illumination at the pixel level and asynchronously. This property makes it particularly interesting for autonomous vehicles since it can address the remaining challenges in autonomous driving scenarios: scene with high dynamics (e.g. tunnel exit), latency and speed of detection of obstacles (other vehicles, pedestrians), while taking into account the constraints of computing power and limited data flow imposed by the autonomous vehicle. The use of event cameras requires the development of new algorithms, since the classical computer vision algorithms are not adapted, the data provided by the event camera being fundamentally different. The application context (perception for autonomous vehicles) is radically different from the works that can be found at the moment. Indeed, most of the works use a mobile event camera in a static scene, or a static event camera observing a dynamic scene. In this project, the objective is to exploit a camera embedded in the vehicle and observing a dynamic scene. The events generated by the camera will be due to both its own movement and that of the objects in the scene, so it will be necessary to be able to dissociate them, which remains a challenge at the moment. This change in the application context will lead to a number of new scientific challenges that we will try to solve in this project.

The project developed a comprehensive set of methods leveraging event-based cameras to enhance perception in dynamic driving scenarios. For moving-object detection, a new RGB–event fusion architecture, RENet, was introduced. It combines a multi-scale temporal aggregation module with a bidirectional calibration mechanism, enabling the system to exploit the high temporal resolution of event streams. This significantly improves detection performance in challenging situations such as low light, abrupt illumination changes, and fast object motion.

 

For moving-object segmentation, the EmoFormer model adopts an original strategy in which event data are used only during training, while inference relies solely on RGB images. This “asymmetric” fusion allows the model to benefit from the temporal richness of event signals while simplifying deployment, as no additional sensor is required during operation.

 

Three complementary approaches were explored for 3D reconstruction, another major axis of the project. A first, geometric approach adapts the Disparity Space Image to asynchronous event streams, enabling the generation of denser depth maps despite the sparse nature of the data. A second approach is built around a full event-based tracking and mapping pipeline capable of producing coherent reconstructions in highly dynamic environments. Finally, a deep-learning–based method uses spatio-temporal fusion through a Mamba module to identify and aggregate the most informative events, improving both the accuracy and temporal consistency of depth estimation.

 

For object recognition, a knowledge-distillation method was designed in which an RGB model acts as a teacher for an event-based student model, which then relies solely on event data at inference time. This strategy achieves high detection accuracy even in conditions where conventional images perform poorly.

 

These advances were made possible by the development of a complete acquisition system used to build the multimodal SPECTRA dataset—combining event cameras, RGB cameras, LiDAR, and GNSS—as well as by extending the DSEC dataset with new annotations dedicated to moving-object detection and segmentation.

The project has delivered significant results that reinforce the relevance of event-based cameras for perception in dynamic driving environments. Several major methodological contributions were achieved, including RENet for moving-object detection, which demonstrates a clear improvement in performance under challenging conditions thanks to its fine-grained fusion of event streams and RGB images. Segmentation also benefited from substantial progress with EmoFormer, whose multimodal training strategy enhances accuracy while simplifying inference. In the realm of 3D reconstruction, the geometric, event-based, and deep-learning approaches developed within the project enabled the generation of denser, more coherent depth maps better suited to the dynamics of urban scenes. Object recognition was likewise advanced through a knowledge-distillation method that allows inference to rely solely on event data without any notable loss in precision. Experimentally, the project led to the creation of the multimodal SPECTRA dataset—now one of the most comprehensive resources dedicated to event-based perception for driving—as well as the enrichment of the DSEC dataset with new annotations for moving-object detection and segmentation. Collectively, these results position the project as a reference point for the integration of event-based sensing into autonomous vehicle perception.

 

All results obtained within the project have been disseminated through scientific publications in leading international conferences and journals. The methods, datasets, and detailed experimental evaluations are therefore fully accessible through these works, ensuring the dissemination, transparency, and reproducibility of the project’s contributions.

The challenges identified throughout the project naturally point to several promising directions for future research, opening the way to both scientific and technological extensions. A first perspective is to integrate event-based cameras into a broader smart-city context, where these sensors would be deployed not only on vehicles, as demonstrated in CERBERE, but also within the roadside infrastructure. Such a distributed perception framework would enable richer scene understanding and support more advanced safety mechanisms.

 

Furthermore, the progress achieved in CERBERE paves the way for exploring application domains beyond the automotive sector. The unique properties of event-based cameras—high temporal resolution, low latency, and robustness under challenging conditions—make them highly relevant for industrial robotics, autonomous drones, and advanced surveillance systems, where fast reaction times and energy-efficient processing are essential.

To appear

In recent years, research and experimentation on autonomous vehicles have multiplied, with autonomous vehicles being one of the major challenges of tomorrow's mobility. In the near future, users will have access to fleets of shared autonomous vehicles that can be booked at any time via a smartphone, while reducing the risks associated with human driving, as more than 90% of accidents are related to human error.
One of the main technological challenges for the autonomous vehicle is the understanding of its environment, which is usually perceived by sensors such as lidars, radars and cameras. The main objective of this project is the exploitation of a sensor in rupture with the existing solutions for the perception of the autonomous vehicle: the event camera.

The event camera is a bio-inspired sensor that, instead of capturing static images - while scenes are dynamic - at a fixed frequency, measures changes in illumination at the pixel level and asynchronously. This property makes it particularly interesting for autonomous vehicles since it can address the remaining challenges in autonomous driving scenarios: scene with high dynamics (e.g. tunnel exit), latency and speed of detection of obstacles (other vehicles, pedestrians), while taking into account the constraints of computing power and limited data flow imposed by the autonomous vehicle.

The use of event cameras requires the development of new algorithms, since the classical computer vision algorithms are not adapted, the data provided by the event camera being fundamentally different. The application context (perception for autonomous vehicles) is radically different from the works that can be found at the moment. Indeed, most of the works use a mobile event camera in a static scene, or a static event camera observing a dynamic scene. In this project, the objective is to exploit a camera embedded in the vehicle and observing a dynamic scene. The events generated by the camera will be due to both its own movement and that of the objects in the scene, so it will be necessary to be able to dissociate them, which remains a challenge at the moment. This change in the application context will lead to a number of new scientific challenges that we will try to solve in this project.

The perception for the autonomous vehicle must be three-dimensional in order to localize the different entities (other vehicles, motorcycles, cyclists, pedestrians) and to determine if there is a danger or if the situation is normal. This is why we are particularly interested in the innovative theme of event-based 3D for autonomous vehicles.
In addition to the detection and 3D reconstruction of moving objects, a recognition step will also be necessary to allow the autonomous vehicle to make the most appropriate decision according to the situation. The most efficient approaches at the moment on classical images are those based on CNN (Convolutional Neural Networks). Given the structure of the data provided by the event camera, this type of network is not adapted and new approaches must be found.
The real-time aspect of the solution is very important if we do not want to lose the advantages of the event camera. An important part of this project will be dedicated to the Algorithm Architecture Adequacy (AAA) so that the developed algorithms can be integrated in the smart camera proposed by the industrial partner of this project.

Project coordination

Rémi BOUTTEAU (LABORATOIRE D'INFORMATIQUE, DE TRAITEMENT DE L'INFORMATION ET DES SYSTÈMES - EA 4108)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partnership

LITIS LABORATOIRE D'INFORMATIQUE, DE TRAITEMENT DE L'INFORMATION ET DES SYSTÈMES - EA 4108
MIS MODÉLISATION, INFORMATION ET SYSTÈMES - UR UPJV 4290
YUMAIN / YUMAIN
ImViA Imagerie et Vision Artificielle - EA 7535

Help of the ANR 656,718 euros
Beginning and duration of the scientific project: January 2022 - 48 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter