CE23 - Données, Connaissances, Big data, Contenus multimédias, Intelligence Artificielle

Data to Knowledge in Agriculture and Biodiversity – D2KAB

Data to Knowledge in Agronomy and Biodiversity (D2KAB)

D2KAB implements processes to extract and formalize knowledge – semantically rich, interoperable, open – from agronomy/agriculture and biodiversity/ecology data (data to knowledge). The project also studies scientific methods and tools to exploit and disseminate this knowledge in different scenarios in agriculture or biodiversity.

Use of Semantic Web and linked data technologies to “transform” data on the major challenges of agronomy and biodiversity into reusable and actionable knowledge.

Agronomy and biodiversity research communities shall address several major societal, economical, and environmental challenges. However, data are being produced in such a big volume and at such high pace, it questions our ability to transform them into actionable and reusable knowledge.<br /><br />We adopt in D2KAB an interdisciplinary approach of data science and semantics to provide means – ontologies, knowledge graphs – to produce and exploit FAIR data (Findable, Accessible, Interoperable, and Re-usable). To do so, we develop original methods and algorithms to address the specificities of our domain of interests, but also rely on existing tools and methods in the Semantic Web area.<br /><br />D2KAB brings together a multidisciplinary (and international) consortium of three computer science laboratories (UM-LIRMM, CNRS-I3S, STANFORD-BMIR), four applied informatics labs in agronomy or agriculture (INRAE-URGI, INRAE-MaIAGE, INRAE- IATE, INRAE-TSCF), two labs in ecology and ecosystems (CNRS-CEFE, INRAE-URFM), INRAE’s scientific & technical information and open science department (INRAE-DipSO) and and one association of agriculture stakeholders (ACTA). IRD is also a collaborator, as well as the SME Elzeard. The consortium’s informatics expertise ranges from ontologies and metadata, semantic Web, linked data, ontology alignment, knowledge reasoning and extraction, natural language processing to bioinformatics.. Our application scenarios are related to food packaging, wheat phenotyping data integration, semantic exploitation of Plant Health Bulletins, the management of ecosystem data and the analysis of plant trait/environment relationships.

The project is structured with three work-packages of research and development in informatics and two work-packages of driving scenarios. WP1 focuses on ontologies/ vocabularies and develops AgroPortal to make it an international reference platform for sharing and serving semantic resources in agri-food. WP2 focuses on the critical issue of ontology alignment and linking of semantic resources driven by the project use cases. WP3, starting from the heterogeneous data provided by the scenarios, develops the methods and deploys the means necessary for the construction of a distributed and federated knowledge graph for agronomy and biodiversity and its exploitation by innovative modes of visualization, navigation and research.

WP4 includes four driving scenarios in agronomy/agriculture. For example, a first development concerns the design of an ontology-based decision support system to either formulate a bio-sourced composite biodegradable packaging or select the most appropriate food packaging for a use. Another example concerns the development of an augmented semantic browser for Plant Health Bulletins (with a focus on cereals, vines (in partnership with IFV), market gardening (in partnership with Elzeard)) capable of searching a set of bulletins while displaying additional sources of information (weather archive, etc.). We also participate in the development of a unique scientific knowledge base for wheat phenotypes which is used by the international wheat information system WheatIS. WP5 develops semantic resources allowing the annotation of data for experimentation on ecosystems on the one hand and for observations in functional biogeography on the other. An example combining data sources relating to community ecology, plant traits and environmental factors is underway to understand the effects of climate change on vegetation (especially olive) in the Mediterranean Basin.

* Development of new functionalities (management of SKOS, instances, etc.) and maintenance of the AgroPortal ontology and semantic resource repository: agroportal.lirmm.fr
* Hosting and management of metadata for 145 semantic resources.
* Design and development of a FAIRness assessment method, called O’FAIRe for semantic resources: github.com/agroportal/fairness
* Generalization of our work on ontology repositories within the OntoPortal Alliance: ontoportal.org
* Development (and/or update) of a dozen semantic resources (ontologies, thesaurus) related to our scenarios and available on AgroPortal: CROPUSAGE, PPDO, ANAEETHES, TAXREF, INRAETHES, E-PHY, PO2, C3PO, etc. .
* Development of knowledge representation models and production of multiple knowledge graphs for the data of our scenarios: annotations of agricultural alert bulletins, weather data, observation data, annotations of scientific corpuses, ecosystem data, manufacture of bio-based biodegradable packaging, etc.
* Development of an index of the project's knowledge graphs and federated query methods on the project's distributed SPARQL endpoints.
* Development of methods for visualizing project knowledge graphs.
* Thesis on data linking in the framework of AgroLD data: agrold.southgreen.fr/agrold/
* Thesis on the hybridization of symbolic/semantic and machine learning methods in knowledge graphs with the SME Elzeard.
* Analysis of alignments between domain ontologies and production of richly documented alignments (in SSSOM format) between several semantic resources related to our scenarios.
* Development of integrated pipelines of knowledge extraction methods from textual data and sets of annotations e.g., for a corpus of Plant Health Bulletins (crop, phenological stage, weather) or for scientific corpus on wheat tender (varieties, genes, traits and phenotypes.
* Alignment of the semantic resources in our scenarios (crops and their uses (BSV), wheat traits and phenotypes) for the integration of heterogeneous data.
* Extension of the @Web platform for managing SHACL constraints on data.
* Acquisition, curation and consolidation of a corpus of trait-environment relationship data for the Mediterranean basin.

(to come)

D2KAB has produced around thirty scientific publications, a dozen semantic resources, several datasets in RDF or other standard formats and numerous components or new open source software. More details at www.d2kab.org

D2KAB is involved and associated with multiple actions and dissemination/communication/training events where we use our scenarios as demonstrators of the potential of semantic technologies in agronomy and biodiversity.

Agronomy and biodiversity shall address several major societal, economical, and environmental challenges. However, data are being produced in such big volume and at such high pace, it questions our ability to transform them into knowledge and enable, for instance, translational agriculture i.e., rapidly and efficiently transferring results from agronomy research into the farms (“bench to farmside”). Semantic interoperability enables data integration and fosters new scientific discoveries by exploiting various data acquired from different perspectives and domains.

D2KAB’s primary objective is to create a framework to turn agronomy and biodiversity data into –semantically described, interoperable, actionable, open– knowledge, along with investigating scientific methods and tools to exploit this knowledge for applications in science and agriculture. We will adopt an interdisciplinary semantic data science approach that will provide the means –ontologies and linked open data– to produce and exploit FAIR (Findable, Accessible, Interoperable, and Re-usable) data. To do so, we will develop original approaches and algorithms to address the specificities of our domain of interests, but also rely on existing tools and methods.

D2KAB involves a multidisciplinary (and international) research consortium of three computer science labs (UM-LIRMM, CNRS-I3S, STANFORD-BMIR), four bioinformatics, biology, agronomy and agriculture labs (INRA-URGI, INRA-MaIAGE, INRA-IATE, IRSTEA-TSCF), two ecology and ecosystems labs (CNRS-CEFE, INRA-URFM), one scientific & technical information unit (INRA-DIST), and one association of agriculture stakeholders (ACTA). The consortium’s expertise ranges from ontologies and metadata, semantic Web, linked data, ontology alignment, knowledge reasoning and extraction, natural language processing to bioinformatics, agronomy, food science, ecosystems, biodiversity and agriculture.

The project is structured with three work-packages of research and development in informatics and two work-packages of driving scenarios. WP1 will focus on ontologies/ vocabularies and turn the AgroPortal prototype into a reference platform that addresses the community needs and reaches a high level of quality regarding both content and services offered e.g., SKOS compliance, semantic search over linked data, text annotation, interoperability with other repositories. WP2 will focus on the critical issue of ontology alignment and develop new functionalities and state-of-the-art algorithms in AgroPortal using background knowledge methods validated in ag & biodiv. WP3 will design the methods and tools to reconcile the scenarios' heterogeneous ag & biodiv data sources and turn them into linked data within D2KAB distributed knowledge graph. It will also investigate exploitation of this graph through novel visualization, navigation and search methods.

WP4 includes four interdisciplinary research driving scenarios implementing translational agriculture. For instances, an ontology-driven decision support system to select the most appropriate food packaging or an augmented semantic reader for Plant Health Bulletins. We will provide a unique scientific knowledge base for wheat phenotypes and offer the first agricultural data resource empowered by linked open data. WP5 will develop semantic resources for the annotation of ecosystem experiments data and functional biogeography observations. A plant trait-environment-relationships study will be conducted to understand the impacts of climatic changes on vegetation of the Mediterranean Basin.

Within a dedicated work-package, we will focus on maximizing the impact of our research. Each of the project driving scenarios will produce concrete outcomes for ag & biodiv scientific communities and stakeholders in agriculture. We have planned multiple dissemination actions and events where we will use our driving scenarios as demonstrators of the potential of semantic technologies in agronomy and biodiversity.

Project coordination

Clement Jonquet (Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partner

INRA-URFM Ecologie des Forêts Méditerranéennes
UM-LIRMM Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier
STANFORD-BMIR Stanford University / Stanford Center for Biomedical Informatics Research
CNRS-I3S Laboratoire informatique, signaux systèmes de Sophia Antipolis
IRSTEA-TSCF Technologies et Systèmes d'Information pour les Agrosystèmes
CNRS-CEFE Centre d'Ecologie Fonctionnelle et Evolutive
ACTA ASSOCIATION COORDINATION TECHNIQUE AGRICOLE
INRA-DIST Délégation Information Scientifique et Technique
INRA-MaIAGE Mathématiques et Informatique Appliquée du Génome à l'Environnement Unité de recherche
INRA-URGI Unité de Recherche Génomique-Info
INRA-IATE Ingénierie des Agropolymères et Technologies Emergentes

Help of the ANR 951,176 euros
Beginning and duration of the scientific project: May 2019 - 48 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter