COVID-19 - Coronavirus disease 2019

Integrating sequence and incidence data to analyse and manage virus outbreaks – PHYEPI

Submission summary

Understanding the spread of viral infectious diseases is central to informing public health decisions. Classical mathematical epidemiology primarily relies on incidence data (number of new cases per week) and contact tracing data. However, thanks to technological advances made in recent years, viral genetic sequence data from infected patients can now be obtained readily and quickly. These sequences contain rich information about the transmission network, as studied by the emerging field of phylodynamics.

Currently, most approaches, whether in mathematical epidemiology or in phylodynamics, use only part of the available data: the former field ignores sequence data, while the latter tends to neglect incidence data. From a public health perspective, it is important to extract as much information as possible by combining heterogeneous data. This is all the more important that each type of data has its strengths and weaknesses. For example, incidence data is easy to collect, but has the disadvantage that it is often aggregated and highly sensitive to sampling bias. Conversely, sequence data is more expensive to generate but contains a lot of information and is somewhat less sensitive to sampling bias.

We propose to extend a method that we have already validated and that is based on regression-Approximate Bayesian Computation (ABC) to combine heterogeneous data, in particular genetic sequences and incidence, to analyze viral epidemics.

Our preliminary results show that this project is feasible. From a conceptual point of view, since our method is based on summary statistics, combining heterogeneous data is easy as long as it is possible to simulate in silico data with the same structure as the biological data. From a technical point of view, we have already developed an R package that allows to quickly simulate phylogenies and time series for any compartmental model.

By analyzing the COVID-19 epidemic in different contexts, including that of France, we will be able to validate this combination of heterogeneous data for the analysis of viral epidemic outbeaks. We will also be able to obtain more precise information on the epidemiological parameters of the epidemic (notably the basic reproduction number R0), but also on biological parameters such as the length of the infectious period or the heterogeneity between infections.

From a more applied point of view, we will develop a pipeline that will ensure repeatabilty but also transposability of the analyses to different contexts. This will be implemented via a dedicated R package.

Project coordination

Samuel ALIZON (Maladies Infectieuses et Vecteurs : Ecologie, Génétique, Evolution et Contrôle)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.


PCCI Pathogenèse et contrôle des infections chroniques
MIVEGEC Maladies Infectieuses et Vecteurs : Ecologie, Génétique, Evolution et Contrôle
MIVEGEC Maladies Infectieuses et Vecteurs : Ecologie, Génétique, Evolution et Contrôle
MIVEGEC Maladies Infectieuses et Vecteurs : Ecologie, Génétique, Evolution et Contrôle

Help of the ANR 63,616 euros
Beginning and duration of the scientific project: June 2020 - 18 Months

Useful links

Explorez notre base de projets financés



ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter