ANR-NSF (Mathématiques et Sciences du numérique) - Appel à projets générique 2022 - NSF Lead Agency

Small Omnidirectional BatVision: Learning How to Navigate from Cell Phone Audios – Omni-Batvision

Submission summary

Recent progress in audio-visual signal processing allows to leverage novel information from sound. Robots in simulation can perceive floorplans around corners, better estimate depth or navigate to fire alarms. Depth perception ahead of a binaural microphone can be achieved using echos, with synchronized camera images as cross-modal supervision. Even a rover on Mars uses ambient noise to map the planet’s subsurface layers and objects thrown around in a plastic box can be recognized by rattling sounds. Despite the potential of sound as sensor modality it is unclear how far methods can be improved, if at all, to solve useful tasks in real environments.The goal of this project is real-time 3D scene reconstruction from audio-visual data for safe navigation. If the sensors of modern smartphones provide enough spatial information we envision visually impaired persons could use them like an intelligent cane. Audio specifically arrives from 360°and propagates around corners. The phone could provide detailed layouts, indoor and outdoor, traversable paths and moving entities. A sensor-rig with a binaural microphone, a speaker and an RGB-D stereo camera will be built to collect audio-visual data from traversing different environments. An attached smartphone will record time-synchronized the same scene. The speaker will emit signals to exploit the echolocation principle but a part of the data will contain only naturally occurring sounds. As starting point, the authors’ depth prediction method will be adapted to predict the scene as occupancy map using cross-modal supervision. Offline 3D models provide supervision in unseen regions. A proof-of-concept whose reconstruction quality allows collision free navigation is considered success.

The projects’ outcome would allow navigational aids for visually impaired persons. Audio information can complement failing visual sensors e.g. for search and rescue robots or fire-fighters, in need of orientation in smoke and darkness. Listening cars could hear pedestrians around corners. Overlayed swarm-recorded audio-visual mappings, e.g. from a park, could even provide a public 3D map. Ultimately, this could complement mobile near-field LiDAR solutions to allow everybody to become a 3D content creator.

Project coordination

Sascha Hornauer (ARMINES - Association pour la Recherche et le Développement des Méthodes et Processus Industriels)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.


ARMINES ARMINES - Association pour la Recherche et le Développement des Méthodes et Processus Industriels
ICSI International Computer Science Institute

Help of the ANR 527,886 euros
Beginning and duration of the scientific project: October 2022 - 36 Months

Useful links

Explorez notre base de projets financés



ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter