MN - Modèles Numériques

Data mining for assessing and monitoring the hydrobiologic quality of running waters – Fresqueau

Data mining for assessing and monitoring the hydrobiologic quality of running waters

The objective of preserving or restoring the good status of waterbodies, required by the European Water Framework Directive (2000), underlines the necessity to have operational tools to help in the interpretation of the complex information concerning running waters and their functioning. In this perspective, the FRESQUEAU project aims at developing new methods for studying, comparing and exploiting all the parameters available concerning the status of running waters

Defining tools for assessing the hydrobiologic quality of running waters

The project will contribute to answer to two specific issues: (1) going further into the understanding of running waters functioning through the analysis of the taxons at the base of biological indices (2) connecting sources of pressures on the environment to the physicochemical and biological quality of running waters. Therefore an integrated database was built from public databases or research databases. The data are concerned with physicochemical and biological quality of running waters, hydrology, sampling and measure stations, context and forcing variables. All these data are characterized by a high heterogeneity and complexity, because of their own form as well as their spatial and temporal structure.

To exploit these data we adopted a knowledge discovery process. We first worked on data structuration and preparation, then we proposed to explore various data mining approaches and make them collaborating, always taking care of the assessment by the experts. Moreover, the participation of two consulting firms in this project guarantees the means for a ground validation. The final platform will include a data warehouse, a typology of stations, and a set of analysis and data mining methods. The typology of stations will be a mean to guide the analysis and interpretation of the measurements operated on the stations, in relation to the hydrobiologic functioning of running waters and the observed pressures, with the aim of evaluating its global status. Five steps are planned to evaluate and combine these techniques according to the different strategies of data exploration adopted until the exploration of the whole database.

The first step of the project allowed to collect the datasets describing the waterbodies and their environment for the Rhine-Meuse district on the one hand and the Rhone-Mediterranean and Corsica district on the other hand.
Finer datasets (temporally, spatially and semantically) were also acquired on the Saone basin and on the Alsace plain. The data integration into a specific database relied on the design of a data model. Elements concerning data quality were also collected (domain knowledge and statistical or topological measurements). An integrated database was developed and pooled between the partners. Besides we built two data warehouses (OLAP) allowing to explore the physicochemical statements, on the one hand, and the biological statements, on the other hand, according to various thematic, temporal and spatial dimensions,Then, a set of operational questions was established, each question being specified by a dataset from the database and by data mining methods to be tested. Several methods were studied and implemented: pattern mining in temporal sequences, supervised learning on relational tables, relational concepts analysis, spatial statistics. Possible combinations of these methods are also studied. The results are being interpreted with the hydrologists and hydrobiologists involved within the project.

The last step of the project will focus on the development of an operational tool, including the database, the mining methods and interfaces for requesting the data and visualising the analysis results. The tool will allow (1) to locate the anomalies and defects of the data (2) to help with the comparison and the interpretation of the data on waterbodies (3) to test and apply diagnosis methods of a waterbody status and its dynamic.

Lalande N. Impacts multi-échelles de l'occupation du sol sur l'état écologique des cours d'eau: élaboration et test d'un cadre d'analyse et de modélisation. Thèse AgroParisTech, 2013.
Fabrègue M., A. Braud, S. Bringay, F. Le Ber, M. Teisseire. OrderSpan: Mining Closed Partially Ordered Patterns. The Twelfth International Symposium on Intelligent Data Analysis (IDA 2013), London, United Kingdom, pp. 186-197, 2013.
Dolques X., F. Le Ber, M. Huchard. AOC-posets: a scalable alternative to Concept Lattices for Relational Concept Analysis. CLA 2013: 10th International Conference on Concept Lattices and Their Applications, La Rochelle, France, pp. 129-140, 2013.
Dolques X., F. Le Ber, M. Huchard, C. Nebut. Analyse Relationnelle de Concepts pour l'exploration de données relationnelles. EGC'2013: 13e Conférence Francophone sur l'Extraction et la Gestion des Connaissances, Toulouse, France. Hermann-Éditions, pp. 121-132, Revue des Nouvelles Technologies de l'Information, 2013.
Wiederkehr J., M. Fabrègue, B. Fontan, C. Grac, F. Labat, F. Le Ber, M. Trémolières. Multi index assessment of streams and associated uncertainties: application to macrophytes. 8th Symposium for European Freshwater Sciences, Münster, Germany, 2013.
Lalande N., L. Berrahou, G. Molla, et al. Feedbacks on data collection, data modeling and data integration of large datasets: application to Rhine- Meuse and Rhone-Mediterranean districts (France). 8th Symposium for European Freshwater Sciences, Münster, Germany, 2013.
Fabrègue M., A. Braud, S. Bringay, F. Le Ber, M. Teisseire. Including spatial relations and scales within sequential pattern extraction. DS'2012: 15th International Conference on Discovery Science, Lyon, France. LNAI 7569, pp. 209-223, 2012.

The objective of preserving or restoring the good status of waterbodies, required by the European Water Framework Directive (2000), underlines the necessity to have operational tools to help in the interpretation of the complex information concerning running waters and their functioning, as well as the assessment of the effectiveness of ongoing action programmes. In this perspective, the FRESQUEAU project aims at developing new methods for studying, comparing and exploiting all the parameters available concerning the status of running waters as well as the information describing the uses and measures taken. The tools developed will be integrated in an open source platform for
helping towards the interpretation of the running waters functioning. The originality of the proposed approach is to be able to link structural data and functional data through the use of a set of innovative methods, and thus to set up a real process for assisting in knowlede discovery. Different approaches of knowledge discovery will be tested and combined. In order to achieve this goal, the consortium gathers a set of experts in knowledge structuration and knowledge discovery in databases (in the four research laboratories involved, LHyGeS, TETIS, LSIIT and LIRMM), and experts in hydroecology (LHyGeS, TETIS and the consulting firms that are partners of the project, AQUASCOP and AQUABIO).

More precisely, the project will contribute to answer to two specific issues: (1) going further into the understanding of running waters functioning through the analysis of the taxons at the base of biological indices (2) connecting sources of pressures on the environment to the physicochemical and biological quality of running waters. To this aim the project leans upon physicochemical and biological data produced by the Water Agencies and the French National Agency for Water and Aquatic Environments (ONEMA), supplemented by the fine measurements operated by LHyGeS. Data describing the hydrographic network, the land use, the water-treatment plants, locally supplemented by surveys about agricultural activities and restoration actions, and by fine cartographies of the river spaces, produced by TETIS, will also be available for the project. All these data are characterized by a high heterogeneity and complexity, because of their own form as well as their spatial and temporal structure.

To exploit these data we will adopt a knowledge discovery process. We will first work on data structuration and preparation, then we propose to explore various data mining approaches and make them collaborating, always taking care of the assessment by the experts. Moreover, the participation of two consulting firms in this project guarantees the means for a ground validation. The final platform will include a data warehouse, a typology of stations, and a set of analysis and
data mining methods. The typology of stations will be a mean to guide the analysis and interpretation of the measurements operated on the stations, in relation to the hydrobiologic functioning of running waters and the observed pressures, with the aim of evaluating its global status. Five steps are planned to evaluate and combine these techniques according to the different strategies of data exploration adopted until the exploration of the whole database.

The challenge is thus both at the applicative and theoretical level. The work consists in (1) developing a tool allowing to evaluate the global functioning of
running waters on the basis of the various compartments of the ecosystem; (2) improving the methods of knowledge discovery from large heterogeneous, temporal and spatial amout of data. These methods will be generic, and tested and validated within this particularly interesting applicative framework. The consortium formed gathers the skills necessary to handle the scientific locks underlying the project and thus for the success of this project.

Project coordination

Florence Le Ber (ECOLE NATIONALE DU GENIE DE L'EAU ET DE L'ENVIRONNEMENT DE STRASBOURG) – florence.leber@engees.unistra.fr

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partner

AQUABIO AQUABIO
LSIIT UNIVERSITE DE STRASBOURG
TETIS CEMAGREF - CENTRE DE MONTPELLIER
UM2-LIRMM UNIVERSITE DE MONTPELLIER II [SCIENCES TECHNIQUES DU LANGUEDOC]
AQUASCOP AQUASCOP BIOLOGIE
LHYGES ECOLE NATIONALE DU GENIE DE L'EAU ET DE L'ENVIRONNEMENT DE STRASBOURG

Help of the ANR 813,493 euros
Beginning and duration of the scientific project: September 2011 - 39 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter