Discovering Relevant Dimensions in Epidemic Dynamics for Data-Efficient Predictive Modeling – DiscoReel
Mathematical models of infectious disease dynamics have reached unprecedented resolution, integrating heterogeneities in contact patterns, mobility, susceptibility, and behavior at fine spatial and temporal scales. These developments, accelerated by the needs of precision public health, rely heavily on large, high-resolution data streams tracking the heterogeneous dynamics of pathogens and hosts (humans) at fine spatiotemporal scales.
This raises two challenges. First, opportunity determines the availability of data rather than public health needs, creating inequities in the ability to support evidence-based public health decisions and increasing the risk of personal data misuse. Second, these complex, massively data-driven, models excel at explaining what is observed, but struggle to extrapolate beyond specific contexts and across time and space: their predictive power is often confined to the specific contexts and datasets they were trained on, limiting their usefulness in scenario evaluation and policy planning.
This project proposes a paradigm shift: instead of building ever more detailed models to absorb ever more data, we need substantial theoretical progress to uncover the low-dimensional, general rules that govern epidemic dynamics across space, transmission routes, and host behavior, to extract robust knowledge from the data already available.
We hypothesize that, like many complex systems, epidemic evolution can be described by a small number of effective degrees of freedom and dynamical laws. The goal is to extract these quantities and make them usable for predictive modeling in real-world, possibly data-scarce settings.
The project combines two complementary methodological tracks. The first is founded in complex systems science and builds on renormalization group techniques, adapted to epidemic processes on structured populations. It will analyze how details in disease natural history, the scale at which host behavior is described, and spatial resolution affect the laws describing epidemic spread. The renormalization group flow will then be used to extract the relevant degrees of freedom, match them to measurable indicators and identify the laws describing their evolution. The second track relies on machine and deep learning. We will train variational autoencoders on epidemic data to identify latent variables that encode the system’s behavior. Then, we will use symbolic regression to uncover the equations that govern the evolution of these latent states and match them to observations from epidemic surveillance.
Our scientific objectives are: (1) to develop the theory to extract low-dimensional representations of epidemic dynamics; (2) to match these representations to observable quantities from surveillance, host mobility and mixing, or environmental data; (3) to use them for principled upscaling and extrapolation; and (4) to test their predictive and generalization power through case studies. These case studies will focus on two major public health threats to Europe: directly-transmitted respiratory (e.g., flu, SARS-CoV-2) and mosquito-borne (dengue) pathogens. They will estimate the effect of extreme climatic events on outbreak emergence and persistence. They will assess whether the inferred low-dimensional descriptors can generate weather-driven perturbations for which no data are available.
Expected results include the formal modeling schemes in the two approaches considered, their computational implementation, and pre-trained models to generate scenarios in the case studies. Everything – publications, models, code – will be available in open source and open access.
By clarifying how complex epidemic dynamics have a low-dimensional representation and how to extract and use such representations - this project aims to contribute to a more interpretable, generalizable, and equitable modeling ecosystem for epidemic preparedness and evidence-based public health policies.
Project coordination
Eugenio Valdano (SORBONNE UNIVERSITÉ)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partnership
iPLESP SORBONNE UNIVERSITÉ
Help of the ANR 374,567 euros
Beginning and duration of the scientific project:
December 2025
- 36 Months