JCJC - Jeunes chercheuses & jeunes chercheurs

Méthodologies avancées pour la modélisation des systèmes interdépendants - applications en physique expérimentale – METAMODEL

Submission summary

The objective of this research project is to develop the mathematical and algorithmic tools that enable us to study, model, and train systems of interdependent sub-systems. The research project is motivated by the data analysis problems in experimental physics. We propose to support these experiments by introducing to them state-of-the-art data analysis techniques, and, at the same time, to develop general machine learning methodologies that are applicable in domains that share similar features (in particular, text mining, bioinformatics, and robotics). The project will establish an important part of the research of the newly formed AppStat (Machine Learning and Applied Statistics) group in the Laboratoire de l'Accelerateur Lineare (LAL), lead by the project coordinator. We also foresee that the interdisciplinary nature of the project will foster fruitful collaborations between the statistics/machine learning community and the natural sciences communities of the University Paris Sud campus. In the framework of this broad domain, we identified three research themes together with three applications in the area of experimental particle physics. On the one hand, the applications will profit directly from the solutions to the proposed problems, and, on the other hand, the themes will contribute to the solution of important machine learning problems in general. The three themes are grouped around the computational and methodological questions of aggregating and training a system of interdependent black-box-type sub-systems. Objectives (themes) A) Modeling systems of systems. This theme will deal with the development of the mathematical framework in which a system of interdependent sub-systems can be described and analyzed. We propose two approaches. In the first, we amalgamate the sub-systems under a global Bayes net model, and use standard probabilistic tools to train the system and to carry out inference. The second approach is based on AdaBoost, one of the most influential supervised learning algorithms of the last decade. B) Hyper-parameter optimization using Gaussian processes. In a system of a large number of black-box-type sub-systems, one of most difficult problems will be the simultaneous optimization of a large number of hyper-parameters (parameters that determine the complexity of the individual sub-systems). The goal of this theme is to develop principled methods to carry out this optimization task based on a recent work of one of the team members on optimizing objective functions that are expensive to evaluate. C) Grid implementations. In this theme we propose to parallelize the methods developed in A) and B), and to implement them on the EGEE grid (a grid of several 10K's of CPU's, partially set up for satisfying the data analysis and storage needs of the Large Hadron Collider (LHC) project in CERN). Applications 1) The Pierre Auger experiment. The objective of this experiment is to study the properties of ultra-high energy cosmic ray particles by observing the generated atmospheric "particle showers". For this purpose, a large observatory of 1600 water tanks and four fluorescence detectors was built on the Pampas of Argentina. The methodologies developed in A) and B) will provide a framework to formalize the statistical models and to optimize the estimation procedure, and the tools developed in C) will make the procedure computationally feasible. 2) The ATLAS experiment. ATLAS is one of the experiments installed on the LHC. Its objective is to explore the fundamental nature of matter and the basic forces that shape our universe by observing the debris of head on collisions of protons. We will use the modeling framework proposed in A) to combine models of the four different detector mechanisms with the theoretical models of the various events. The enormous size of the data (petabytes per year) will make it unavoidable to use large computational resources and to adapt the statistical techniques to the parallel processing environment, along the lines of the project proposed in C). 3) The MEMPHYS experiment. The goal of the MEMPHYS project is to design and build a large water Cherenkov detector to study proton decay and neutrino physics. The first task in this project will be the development of intelligent triggers that separate noise from events. We will use the methodologies developed in A) to combine hardware and software trigger components with theoretical models that describe the events to be detected. We will also use the technique proposed in B) in the simulation-based optimization of the physical parameters of the detector.

Project coordination

Balázs KEGL (Organisme de recherche)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partner

Help of the ANR 149,946 euros
Beginning and duration of the scientific project: - 36 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter