Bayesian hierarchical inversion for mass spectrometry. Application to discovery and validation of new protein biomarkers. – BHI-PRO
BHI-PRO
Bayesian hierarchical inversion for mass spectrometry. Application to discovery and validation of new protein biomarkers.
Scientific and economic goals
The general scientific and economic goals are to improve the efficiency of the discovery, validation and marketing of protein biomarkers. Mass spectrometry (MS) based studies significantly shorten the delay with respect to standard immunoassays approaches and give directly access to multidimensional biomarker profiles. However, only few clinical proteomic studies using MS technologies led successfully to the identification of new robust markers calling questions about the reproducibility of MS–based proteomics. Two major crucial parameters have not been taken into account in current studies: the ratio between technological and biological variability and the concept of statistical power. For a fixed number of patients, the technological variability has to be decreased in order to increase the statistical power of the studies. We propose to control the technological variability thanks to an innovative adaptive Bayesian statistical inversion algorithm for recovering the protein content and the clinical status of the sample.
Large efforts are dedicated worldwide to develop mass spectrometry based analytical chains for the discovery, the validation and the quantification of protein biomarkers in complex matrices like urine or blood. However, mastering the technological variability on these analytical chains is a critical point. Adequate information processing is mandatory for data analysis to take into account the complexity of the analysed mixture, to improve the measurement reliability and to make the technology easier to use.
A proteomic analytical chain is a cascade of molecular events which can be depicted by a graph structure, each node being associated to an analytical level within the chain. Each branch of the graph corresponds to a molecular decomposition. Looking to molecular quantities, this molecular graph defines a hierarchical mixture model. In this BHI-PRO project, we propose to introduce a relevant hierarchical modelling of the MALDI and SRM/ MRM3 chains. The new Bayesian Hierarchical Inversion algorithms will rely on two advances: the first one is related to «proteomics and inverse problems«. The second challenging task is related to «inverse problems and stochastic sampling«. The proposed strategy relies on Bayesian statistics and stochastic sampling algorithms.
For biostatistics, among the advantages of using proteomics to discover and use biomarkers is the ability to test many biomarkers simultaneously to improve both sensitivity and specificity. However, as the number of variables increases, so does the likelihood of finding results that appear statistically significant by chance. Within this project, we propose to evaluate the statistical power of discrimination test in the developed Bayesian framework.
The main deliverables will be 2 versions of Bayesian Hierarchical Inversion software, the first dedicated to the MALDI platform in discovery mode, the second to the SRM/MRM3 platform in validation mode, and one biostatistical guideline report.
A first version of the Bayesian Hierarchical Inversion software in SRM mode (Selective Reaction Monitoring) has been developed and tested on experimental data. On synthetic samples with a relative knowledge of protein concentration, the coefficient of variation on estimated concentration is lower than 5% on certain proteins. On blood samples coming from a cohort of 203 patients associated with a colorectal cancer study, the quantitative performance evaluation has been achieved by comparison with an ELISA test. A coefficient of variation of 0.83 has been observed on the LFABP protein demonstrating a good correlation. These first results have been presented at the Research in Computational Molecular Biology Satellite Conference on Computational Proteomics (RECOMB CP) in San Diego (USA) on April 7th 2012.
From an experimental view point, for the MALDI acquisition chain, the analyses are in progress in order to set the experimental plan. Two data sets will be acquired by the end of 2012. In parallel, some tests have been achieved to initiate the model validation allowing delivering an experimental data set.
For the SRM and MRM3 acquisition chain, a new experimental campaign is engaged to evaluate the performances of the SRM/MRM3 software developed on BHI-PRO and to compare it with existing methods. The experimental plan has been constructed. The first part of these experiences has begun in June 2012. The second part dedicated to MRM3 is plan for the autumn 2012.
For the development of the MRM software, a new version in classification mode including a learning step on a cohort with M classes and the classification of a new sample in one of these classes is currently evaluated. For the MRM3 mode, the inversion software should be finalized for the end of September 2012. The performances of the software will be evaluated on the new experimental campaign.
About statistical studies, the development of statistical analytical software for both the MALDI and MRM modes will mainly take place on the second part of the project. Current studies are dedicated to the definition of the experimental plan for the MRM experiment and the statistical analysis of the results, including in particular the use of regression methods with imperfect references in order to compare the performances of MRM measurements with the one obtained by ELISA.
1. Gerfault L., Szacherski P., Giovannelli J.-F., Charrier J.-P., Mahe P., Grangeat P. (2012), «A hierarchical SRM acquisition chain model for improved protein quantification in serum samples«, Research in Computational Molecular Biology (RECOMB) Satellite Conference on Computational Proteomics 2012 (RECOMB CP), San Diego, USA, 6-8 april 2012.
hal.archives-ouvertes.fr/hal-00676587
2. Szacherski P., Giovannelli J. F., Grangeat P. (2011), «Apprentissage supervisé robuste de caractéristiques de classes. Application en protéomique«, XXIIIème Colloque GRETSI, 5-8 september 2011, Bordeaux, France.
hal.archives-ouvertes.fr/hal-00585531
3. Poster presented at the 5ème édition des Journées Collaboratives Lyonbiopôle, du 7 october 2011 : BHI-PRO : Bayesian hierarchical inversion for mass spectrometry. Application to discovery and validation of new protein biomarkers.
Large efforts are dedicated worldwide to develop mass spectrometry based analytical chains for the discovery, the validation and the quantification of protein biomarkers in complex matrices like urine or blood. The challenge is to combine high sensitivity to detect very small amounts of proteins, and a large separation capacity to reject the high content of background proteins and separate the signature of the targeted ones. However, mastering the technological variability on these analytical chains is a critical point to get significant results with an acceptable cost, analytical time and number of samples. Adequate information processing is mandatory for data analysis to take into account the complexity of the analysed mixture, to improve the measurement reliability and to make the technology easier to use.
A proteomic analytical chain is a cascade of molecular events which can be depicted by a graph structure, each node being associated to an analytical level within the chain. Each branch of the graph corresponds to a molecular decomposition. Looking to molecular quantities, this molecular graph defines a hierarchical mixture model. In this BHI-PRO project, we propose to introduce a relevant hierarchical modelling of the MALDI and MRM3 chains. The new Bayesian Hierarchical Inversion algorithms will rely on two advances: the first one is related to "proteomics and inverse problems". The challenge is to develop an instrument model including together physical phenomena involved in the measurement process. It yields the statement of direct model involving relevant parameters and organised in a hierarchical structure. The second challenging task is related to "inverse problems and stochastic sampling". It requires the development of a detection-estimation methodology for MALDI and an estimation methodology for MRM. The proposed strategy relies on Bayesian statistics and the exploration of the a posteriori law will be achieved thanks to Monte Carlo Markov Chain sampling algorithms.
For biostatistics, among the advantages of using proteomics to discover and use biomarkers is the ability to test many biomarkers simultaneously to improve both sensitivity and specificity. However, as the number of variables increases, so does the likelihood of finding results that appear statistically significant by chance. Within this project, we propose to evaluate the statistical power of discrimination test in the developed Bayesian framework.
This BHI-PRO project involves 3 signal processing research teams (CEA-LETI, CEA-LIST, IMS), 2 biostatistical research teams (LBS, CLIPP) and 2 proteomics platforms (CLIPP for MALDI and bioMérieux for MRM3). It is the first opportunity to combine in a single research project Bayesian inversion, biostatistics, and proteomics platforms in order to study the technological variability on 2 proteomics analytical chains: matrix assisted laser desorption ionization (MALDI) by CLIPP applied to a proteomic discovery model and Multiple Reaction Monitoring (MRM) mass spectrometry by bioMérieux on a colorectal cancer model including 8 candidate proteins for MRM validation mode.
The main deliverables will be 2 versions of Bayesian Hierarchical Inversion software, the first dedicated to the MALDI platform in discovery mode, the second to the MRM3 platform in validation mode, and one biostatistical guideline report.
Dissemination plan targets 4 relevant publications and participation to international conferences. Valorisation includes the diffusion of a Bayesian Hierarchical Inversion software package dedicated to MALDI MS acquisition available through free access, the transfer to bioMérieux of the Bayesian Hierarchical Inversion software package dedicated to MRM MS acquisition, the publication of biostatistical guidelines to use optimized protocols and to define Best Practices Rules in conjunction with the operation of the proposed Bayesian hierarchical inversion software for mass spectrometry data analysis.
Project coordination
Pierre GRANGEAT (CEA - CENTRE DE GRENOBLE)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partnership
CEA CEA - CENTRE DE GRENOBLE
bMx BIOMERIEUX SA
IMS CNRS - DELEGATION AQUITAINE LIMOUSIN
LBS CNRS - DELEGATION REGIONALE RHONE-AUVERGNE
CLIPP CHU DIJON
Help of the ANR 820,000 euros
Beginning and duration of the scientific project:
- 36 Months