DS0706 -

Big dataset, Big simulations, Big bang, Big problems: Algorithms of Bayesian reconstruction constrained by physics, application to cosmological data analysis – BIG4

Big dataset, Big simulations, Big bang, Big problems

Algorithms of Bayesian reconstruction constrained by physics, application to cosmological data analysis

Development of a complete Bayesian analysis chain, guided by physics and robust to systematics

The BIG4 project aims at developping new algorithms of statistical reconstruction of fields on grid, such as images or density fields, as well as provide an analysis environment through Web technologies. It will allow to create a synergistic platform and at high resolution to analyze Big Data. It will rely both on the physics of phenomena to reduce the uncertainties and scaling properties of likelihood functions to increase the speed of computations. The used techniques are taken from Hamiltonian sampling of parameter space with million of dimensions. This project aims in particular to revolutionize data analysis in astronomy, fields which under an increasing deluge of data, notably with the Euclid mission and the LSST project. Nevertheless it is anchored on real data by applying immediately the techniques to current surveys like SDSS3, SDSS4 and CosmicFlows-3. <br /> <br />This project will impact many scientific fields, notably in medical imagery, seismology and climatology. All these fields have to face these problems of reconstruction and visualization of data whose relation to physical fieds are non-linear. <br /> <br />Finally we will develop and provide to the community new tools of online visualization of these fields, relying on razor-edge WebGL technologies and modules developed by the community. These tools will allow to visualize density probability distribution which depend on million of parameters without having to download locally the raw reconstruction data.

This project relies on the most recent advances on statistical, algorithmic and numerical techniques, to build the ultimate inference machine of complex physics fields from noisy and sparse data. In order to do that, we start with methods based on Hamiltonian Markov chain fed from N-body simulations , neural networks with a deep learning architecture, and massive parallelisation.

In parallel, we initiate some thinking on methods to represent these distributions evolving on parameter space of millions of dimensions.

We have produced new fundamental results in statistics (Automatic physical inference; Charnock et al. 2018, PRD), modeling of systematic effeects in large cosmological surveys (Jasche & Lavaux, A&A, 2017), et detailed fitting of N-body simulation to cosmological data (Jasche & Lavaux, 2018, submitted to A&A).

Additionally, the core of our inference software is now publicly available (https://bitbucket.org/bayesian_lss_team/ares/). A server to submit interactive request on developped models has been put in place (https://cosmicflows.iap.fr/).

Many developments will take place following these first results. We are thinking now to greatly improving the speed of predictions of non-linear models using machine learning on examples requiring on limited, local, information. In parallel, we develop the possibility to leave a neural network with a very specific, physical, architecture to freely float to fit observations. In addition to these techniques adapted from industry, we will handle, in the next period, the hierarchical representation of initial conditions and the simplification of the dynamical model.

* Jasche & Lavaux, A&A, 2017, 606, A37
* Charnock, Lavaux, Wandelt, Phys. Rev. D, 2018, 97, 083004
* Hutschenreuter, S. et al, CQG (2018 accepté)
* Porqueres, Jasche, Enßlin, Lavaux, A&A 612, A31 (2018)
* Desmond, Ferreira, Lavaux,

The BIG4 project aims at developing new algorithms of statistical reconstruction of fields on structured grids, such as images and density fields and to also provide an analysis environment based on Web technologies . The project will rely both on the physics of phenomena to reduce the uncertainties and scaling properties of the likelihood to increase the computational speed. Employing techniques from the family of Hamiltonian sampling methods we are able to solve problems including non-linear physical dynamics with millions of parameters. This project intends to revolutionize data analysis in astronomy, for which the data flow is always increasing, notably with the Euclid mission and the LSST project. Nevertheless, BIG4 will be rooted in existing and forthcoming data from surveys like SDSS3, SDSS4 and CosmicFlows-3.

The project will also impact other scientific fields, such as medical imagery, seismology and climatology. All these scientific fields are facing problems of reconstruction and visualization of data that are related to the underlying physical fields in a non-linear way

Finally, we will develop new visualization tools of reconstructed density fields and provide them to the community. These tools will rely on latest WebGL technology and the modules developped for it by the community. These tools will facilitate the visualization of probability distributions which depend on millions of parameters without the necessity to download the raw reconstruction data.

Project coordinator

Monsieur Guilhem Lavaux (INSTITUT D'ASTROPHYSIQUE DE PARIS)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partner

IAP INSTITUT D'ASTROPHYSIQUE DE PARIS

Help of the ANR 316,278 euros
Beginning and duration of the scientific project: December 2016 - 48 Months

Useful links