Learning to synthesize 3D dynamic human motion – 3DMOVE
3DMOVE: Learning to synthesize 3D dynamic human motion
It has recently become possible to capture time-varying 3D point clouds at high spatial and temporal resolution, which allows in particular for high-quality acquisitions of human motion. Currently, first tools to process and analyze the data in a robust and automatic way are being developed. Such tools are critical to learning generative models of dynamic human motion. The objective of 3DMOVE is to compute high-quality generative models from a database of dense 3D motion sequences of humans.
Synthesizing high-quality human motions
It has recently become possible to capture time-varying 3D point clouds at high spatial and temporal resolution. This allows in particular for high-quality acquisitions of human bodies in motion. However, tools to process and analyze these data robustly and automatically are still missing. Such tools are critical to learning generative models of dynamic human motion, which can in turn be leveraged to create plausible synthetic human motion sequences. This has the potential to influence virtual reality applications such as virtual change rooms or crowd simulations, where plausible synthesis can aid in creating realism. The main objective of 3DMOVE is to automatically compute high-quality generative models from a database of raw dense 3D motion sequences for humans.
The key idea to tackle the 3DMOVE project is to leverage 4D motion sequences of humans that are captured densely in space and time to learn suitable generative models using recent machine learning techniques. This allows in particular to learn suitable low-dimensional representations of motion sequences that can decouple different factors of variation (such as body shape and motion).
To show the virtue of the developed generative models, they will be applied to synthesize new motions by transferring animations between characters. As it is well-known that evaluation methods of synthesized human motion sequences based on geometric errors are limited because of the uncanny valley effect, 3DMOVE will place a special emphasis on the evaluation of the results using perceptual user studies.
We developed a framework to generate temporally and spatially dense 4D human body motion. On the one hand generative modeling has been extensively studied as a per time-frame static fitting problem for dense 3D models such as mesh representations, where the temporal aspect is left out of the generative model. On the other hand, temporal generative models exist for sparse human models such as marker-based capture representations, but have not to our knowledge been extended to dense 3D shapes. We proposed to bridge this gap with a generative auto-encoder-based framework, which encodes morphology, global locomotion including translation and rotation, and multi-frame temporal motion as a single latent space vector. To assess its generalization and factorization abilities, we trained our model on a cyclic locomotion subset of AMASS , leveraging the dense surface models it provides for an extensive set of motion captures. Our results validate the ability of the model to reconstruct 4D sequences of human locomotions within a low error bound, and the meaningfulness of latent space interpolation between latent vectors representing different multi-frame sequences and locomotion types. We also illustrate the benefits of the approach for 4D human motion prediction of future frames from initial human locomotion frames, showing promising abilities of our model to learn realistic spatio-temporal features of human motion. We show that our model allows for data completion of both spatially and temporally sparse data.
Future works include extending this result to more general motions by learning to automatically segment the motions during training. In particular, given a set of arbitrary human motion sequences, the idea is to simultaneously learn a segmentation and representation.
We further plan to evaluate the generative models with user studies as evaluating synthetic human motions using purely geometric measures is know to be problematic. Synthetic human motions can be numerically close to captured ones while being highly unrealistic to human observers; this phenomenon is known as the uncanny valley. Our plan for evaluating the synthetic motion sequences drawn from our generative model is to perform user studies.
Mathieu Marsot, Stefanie Wuhrer, Jean-Sébastien Franco, Stephane Durocher. Multi-frame sequence generator of 4D human body motion. Research report 2021. hal.archives-ouvertes.fr/hal-03250297v2
It is now possible to capture time-varying 3D point clouds at high spatial and temporal resolution. This allows for high-quality acquisitions of human bodies and faces in motion. However, tools to process and analyze these data robustly and automatically are missing. Such tools are critical to learning generative models of human motion, which can be leveraged to create plausible synthetic human motion sequences. This has the potential to influence virtual reality applications such as virtual change rooms or crowd simulations. Developing such tools is challenging due to the high variability in human shape and motion and due to significant geometric and topological acquisition noise present in state-of-the-art acquisitions. The main objective of 3DMOVE is to automatically compute high-quality generative models from a database of raw dense 3D motion sequences for human bodies and faces. To achieve this objective, 3DMOVE will leverage recently developed deep learning techniques.
Madame Stefanie Wuhrer (Centre de Recherche Inria Grenoble - Rhône-Alpes)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Inria GRA Centre de Recherche Inria Grenoble - Rhône-Alpes
Help of the ANR 303,264 euros
Beginning and duration of the scientific project: February 2020 - 48 Months