Learning Generalizable 3D-based Robotic Manipulation Policies – 3D-GEM
Robotic manipulation in unstructured environments is a long-term goal, with the potential for significant societal and economic impacts such as in manufacturing and healthcare. However, current approaches suffer from significant limitations in generalization to novel environments, objects and tasks, which are essential for real-world applications. Most learning-based methods are trained and evaluated on a narrow range of tasks - typically basic pick-and-place skills, and focus on 2D images, lacking crucial 3D understanding. The 3D-GEM project aims to develop cutting-edge robotic manipulation systems by leveraging recent breakthroughs in artificial intelligence, particularly large language models and vision foundation models, to build 3D-based robotic manipulation foundation models. This initiative will establish a modular framework to tackle critical challenges, including data scarcity, generalization, dexterity, and efficiency. The project involves three key thrusts: (1) significantly enhancing the scale and quality of robot datasets; (2) advancing 3D embodied perception and task planning for comprehending complex 3D scenes and generating high-level grounded plans; (3) learning generalist 3D motion planning policies using multimodal sensors and model predictive control. These high-level and low-level modules will function in a closed-loop system to enable efficient task execution across diverse scenarios, ultimately improving the versatility and effectiveness of robotic systems.
Project coordination
Shizhe Chen (Centre Inria de Paris)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partnership
Centre Inria de Paris
Help of the ANR 325,745 euros
Beginning and duration of the scientific project:
September 2025
- 48 Months