CONTINT - Contenus numériques et interactions

Semantic visual analysis and 3D reconstruction of urban environments – SEMAPOLIS

Semantic visual analysis and semantized 3D reconstruction of urban environments

The increasing availability of images and 3D data of cities suggests many uses. However, these applications require rich information of a higher level: semantic (nature and relationships of observed objects) and/or geometric (3D models). But the techniques for producing this rich information remain limited, constrained by the small number of annotated data and the specificities of urban objects to be reconstructed in 3D. Semapolis contributed to addressing these scientific issues.

Automatically discovering semantic information in images and 3D data captured in urban settings opens up a wide range of applications

Digital city models have applications in many fields: construction and renovation (with energy concerns related to insulation, solar cells, lighting, etc.), traffic and navigation (acoustic impact, GPS, etc.), health and environment (diffusion of pollutants, microclimates, etc.), risk management (ageing structures, floods, etc.), entertainment (video games, films), education (virtual tourism), architecture (style study), etc.<br /><br />However, existing models are most often coarse because they are handmade, or made of heavy meshes, textured simply from images. This excludes most uses that require visual analyses, simulations or optimizations, and leaves only indicative virtual navigation and qualitative studies as the main applications. Even so, geometry and texture are often wrong when areas are invisible (e.g., occlusions by a tree or a vehicle) or reflective (e.g., windows, bodyworks).<br /><br />The Semapolis project aimed at developing advanced image analysis and learning techniques for semantization, navigation and enhanced reconstruction of 3D models of urban environments, with improved visual rendering.

To achieve its application goals, the Semapolis project had to develop new large-scale and weakly supervised learning methods: with large quantities of visual data but possibly raw, i.e., not or poorly annotated, as data annotation is often unavailable or too expensive.

Semapolis was designed in 2012-2013 on the basis of methods that were either well established at that time but still offer interesting prospects for improvement (e.g., graphical models) or relatively new and promising (e.g., image-based rendering, syntax analysis with grammars of form using reinforcement learning techniques).

But the successes and development of deep learning that followed the initial design of the project led us to significantly recompose the map of relevant methodological tools, without altering the purpose of the project. Semapolis researchers thus quickly began to explore general deep learning techniques and their uses for semantic urban analysis and reconstruction.

As for visual navigation in virtual 3D environments, the project remained focused on image-based rendering (IBR), in particular with the use of rich inferred semantic information.

Datasets: We created large-scale datasets of georeferenced ground-level images with weak geographic annotations, and a dataset on Art Deco facade semantization.

3D Reconstruction: We developed methods for combining heterogeneous and multi-resolution data sources to reconstruct more accurate 3D models. We improved 3D reconstruction with high-level primitives and for surfaces with little or no texture.

Low-level Semantic Segmentation: We developed methods for pixelwise semantic segmentation of images or videos, with applications to facades and urban settings.

Learning, with Weak, Little or No Supervision: We developed methods on learning with little or no supervision, and learning on synthetic data with domain adaptation, while offering a better understanding of CNN learning.

2D-2D/3D Correspondence and Alignment: We developed methods for extracting features from images of urban scenes, with application to place recognition, allowing operations even on non-photorealistic representations.

Structured Semantic Segmentation: We developed a method for learning shape grammars from examples. We studied inference with grammar variants and effective relaxations of grammatical constraints, e.g., to manage occultations. We carried out a related work on procedural modeling.

Object Detection: We developed an algorithm to learn architectural styles correlated with building construction dates from weakly labeled urban data. We can also identify visual differences between objects over time. We developed methods for detecting objects with accurate localization in images and videos, as well as object alignment with 3D models.

Image-Based Rendering: We developed a real-time rendering technique using depth synthesis and warping/blending, allowing execution on a mobile device. We proposed a multiview approach to hole image filling (inpainting) and methods to manage the difficult case of reflective surfaces (e.g., car windows and bodywork) and to treat interiors and thin structures.

Thanks to the innovative techniques developed in the project, it is possible to train a system on poorly annotated visual urban data so that it can identify objects of an architectural nature (e.g., windows, doors, balconies...) or in the urban landscape. Even in the absence of annotations (unsupervised learning), elements of any urban landscape can be automatically related.

This semantic information is an essential ingredient that prefigures the production of semantic urban 3D models. As of now, it already greatly improves the quality of navigation in virtual cities, with visual rendering specific to urban objects. A startup on this topic is in the making at Inria Sophia Antipolis. The work also serves as the basis for a large open source project on image-based rendering; a public release is planned for early 2020.

Semapolis results also opened new research perspectives, contributing to several other ongoing funded projects, in particular the ERC Advanced grant of George Drettakis for his FUNGRAPH project (A New Foundation for Computer Graphics with Inherent Uncertainty, 2017-2022), the ANR Jeune Chercheur grant (JCJC) of Mathieu Aubry for his EnHerit project (Enhancing Heritage Image Databases, 2018-2022, ANR-17-CE23-0008) and the ANR grant for the BIOM project (Building Indoor/Outdoor Modeling, 2018-2022, ANR-17-CE23-0003).

Regarding academic criteria, Semapolis is a massive success in terms of publications: 39 articles, mostly in top-tier journals (IJCV, PAMI, TOG...) and conferences (CVPR, ICCV, SIG-GRAPH...), including 19 papers with international collaborators and 22 papers with publicly available code and data (from the project web site Several of these articles have had a significant impact on the scientific community: 300-600 citations for the 4 most cited papers (citations at Google Scholar in September 2019). Besides, a large open source project on image-based rendering for high-quality virtual navigation, leveraging on the work in the project, is to be released early 2020.

Regarding more industrial criteria, Acute3D significantly improved its technology for 3D model reconstruction, in particular regarding the merging of heterogeneous data (lidar, aerial images, street-level pictures).

The goal of the SEMAPOLIS project is to develop advanced large-scale image analysis and learning techniques to semantize city images and produce semantized 3D reconstructions of urban environments, including proper rendering.

Geometric 3D models of existing cities have a wide range of applications, such as navigation in virtual environments and realistic sceneries for video games and movies. A number of players (Google, Microsoft, Apple) have started to produce such data. However, the models feature only plain surfaces, textured from available pictures. This limits their use in urban studies and in the construction industry, excluding in practice applications to diagnosis and simulation. Besides, geometry and texturing are often wrong when there are invisible or discontinuous parts, e.g., with occluding foreground objects such as trees, cars or lampposts, that are pervasive in urban scenes.

We wish to go beyond by producing semantized 3D models, i.e., models which are not bare surfaces but which identify architectural elements such as windows, walls, roofs, doors, etc. The semantic priors we use to analyze images will also let us reconstruct plausible geometry and rendering for invisible parts. Semantic information is useful in a larger number of scenarios, including diagnosis and simulation for building renovation projects, accurate shadow impact taking into account actual window location, and more general urban planning and studies such as solar cell deployment. Another line of applications concerns improved virtual cities for navigation, with object-specific rendering, e.g., specular surfaces for windows. Models can also be made more compact, encoding object repetition (e.g., windows) rather than instances and replacing actual textures with more generic ones according to semantics; it makes possible cheap and fast transmission over low-bandwidth mobile phone networks, and efficient storage in GPS navigation devices.

The primary goal of the project is to make significant contributions and advance the state-of-the-art in the following areas:

- Learning for visual recognition: Novel large-scale machine learning algorithms will be developed to recognize various types of architectural elements and styles in images. These methods will be able to fully exploit very large amounts of image data while at the same time requiring a minimum amount of user annotation (weakly supervised learning).

- Shape grammar learning: Techniques will be developed to learn stochastic shape grammars from examples, and corresponding architecture style. Learnt grammars will be able to rapidly adapt to a wide variety of specific building types without the cost of manual expert design. Learnt grammar parameters will also lead to better parsing: faster, more accurate and more robust.

- Grammar-based inference: Innovative energy minimization approaches will be developed, leveraging on bottom-up cues, to efficiently cope with the exponential number of grammar interpretations, in particular in the context of grammars featuring rich architectural elements. A principled aggregation of the statistical visual properties will be designed, to accurately score parsing trials.

- Semantized 3D reconstruction: Robust original techniques will be developed to synchronize multiple-view 3D reconstruction with the semantic analysis, preventing inconsistencies such as unaligned roof and windows at facade angles.

- Semantic-aware rendering: Image-based rendering techniques will be developed benefiting from semantic classification to greatly improve visual quality regarding: improved depth synthesis, adaptive warping and blending, hole filling and region completion.

To validate our research, we will run experiments based on various kinds of data concerning Paris (large-scale panoramas, smaller scale but denser and geo-referenced terrestrial and aerial images, cadastral maps, construction date database), reconstructing and rendering an entire neighborhood.

Project coordination

Renaud Marlet (Laboratoire d'Informatique Gaspard Monge) –

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.


Inria Paris - Rocquencourt Institut national de recherche en informatique et automatique
Inria Sophia-Antipolis Institut National de la Recherche en Informatique et en Automatique- Centre de Recherche Sophia Antipolis-Méditerranée- REVES
GREYC Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen
Acute3D Acute3D
LIGM Laboratoire d'Informatique Gaspard Monge

Help of the ANR 791,399 euros
Beginning and duration of the scientific project: September 2013 - 42 Months

Useful links

Explorez notre base de projets financés



ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter