JCJC SIMI 2 - JCJC - SIMI 2 - Science informatique et applications

Semantic Indexing of French Biomedical Data Resources – SIFR

Semantic Indexing of French Biomedical Data Resources (www.lirmm.fr/sifr)

The volume of data in biomedicine is constantly increasing. Despite a large adoption of English in science, a significant quantity of these data uses the French language. Biomedical data integration and semantic interoperability is necessary to enable new scientific discoveries that could be made by merging different available data. A key aspect to address those issues is the use of terminologies and ontologies as a common denominator to structure biomedical data and make them interoperable.

Scientific and technical challenges in building ontology-based services to leverage biomedical ontologies and terminologies in indexing, mining and retrieval of French biomedical data.

The community has turned toward ontologies to design semantic indexes of data that leverage the medical knowledge for better information mining and retrieval. However, besides the existence of various English tools, there are considerably less ontologies available in French and there is a strong lack of related tools and services to exploit them. This lack does not match the huge amount of biomedical data produced in French, especially in the clinical world (e.g., electronic health records).<br /><br />The Semantic Indexing of French Biomedical Data Resources (SIFR) project proposes to investigate the scientific and technical challenges in building ontology-based services to leverage biomedical ontologies and terminologies in indexing, mining and retrieval of French biomedical data. Our main goal is to enable straightforward use of ontologies freeing health researchers to deal with knowledge engineering issues and to concentrate on the biological and medical challenges.<br /><br />The SIFR project brings together several young researchers at LIRMM to achieve this objective. Dr. Clement Jonquet, assistant professor at University of Montpellier since 2010, coordinates the project and capitalize on a strong experience in the field acquired after a 3 year postdoc at Stanford. He is accompanied by 2 young researchers (HDR): Dr. Sandra Bringay and Dr. Mathieu Roche both expert in biomedical data/text mining. In addition, highly qualified and experienced partners are associated to the project: (i)°Stanford BMIR, a worldwide leader providing (English-)ontology-based services to assist health professionals and researchers in the use of ontologies to design biomedical knowledge-based systems; (ii)°The TETIS group, a joint applied research unit (AgroParisTech, Irstea, Cirad) specialized in geographic information, environment and agriculture. (iii)°the Computational Biology Institute (IBC) of Montpellier.<br />

Usually, the content of biomedical resources is indexed to enable querying with keywords. However, there are obvious limits to keyword-based indexing: use of synonyms, polysemy, lack of domain knowledge. One way of using ontologies is by means of creating semantic annotations. When doing ontology-based indexing, we use these annotations to “bring together” the data elements from biomedical resources.

Up to now, the prevalent paradigm in the use of ontologies is that of manual annotation and curation. However, researchers have called for the need of automated annotation methods and for leveraging natural language processing tools in the curation process. Still, even if the issue is being currently addressed for English, French is not in the same situation: there is little readily available technology (i.e.,“off-the-shelf” technology) that allows the use of ontologies uniformly in various annotation and curation pipelines with minimal effort.

Within SIFR, we build an ontology-based indexing workflow (i.e., French Annotator) similar to what exists for English resources but dedicated and specialized for French. This service is available within a portal of ~10 French biomedical ontologies/terminologies which reuse the BioPortal technology, developed at Stanford University. Ontologies has been offered by the CISMEF group from Rouen University Hospital, or taken from the UMLS, or directly uploaded by users. The SIFR BioPortal has been released in June 2015: bioportal.lirmm.fr

Within the project, we work on several research questions from semantic indexing, text mining, terminology extraction, ontology enrichment, disambiguation, multilingualism in ontologies and semantic annotation in order to offer the community with services and applications capable of leveraging the use of biomedical ontologies in their data workflows.

• We achieved an exhaustive comparison of CISMeF HMTP and NCBO BioPortal, including the comparison of the annotation workflow and made CISMEF terminologies exportable.
• We develop a French biomedical ontology portal including the SIFR/French Annotator (http://bioportal.lirmm.fr/annotator). A service that for a given piece of text will return biomedical ontology concepts directly mentioned in the text or semantically expanded.
• We developed the BioTex methodology and tool (http://tubo.lirmm.fr/biotex) for automatic extraction of biomedical terms from plain text using existing extraction methods (e.g., C-Value) as well as keyword based indexing methods (e.g., Okapi, Tf-Idf) usually employed in information retrieval.
• We developed a proxy service for the NCBO Annotator (http://bioportal.lirmm.fr/ncbo_annotatorplus) that gives access to new features that has been investigated and implemented within SIFR.
• We work on multilingual mappings reconciliation and creation between French and English biomedical ontologies/terminologies.
• We work on automatic detection of emotion on public heath forums using text mining techniques. And we are building a patient vocabulary out of public patient-written resources.
• We work on semantic indexing and knowledge representation within the Viewpoints project with the goal of capturing formal data and informal contributions into an evolutionary knowledge graph.
• We kicked-off the AgroPortal project and platform (http://agroportal.lirmm.fr) which goals is to offer a reference ontology repository for the agronomic/plant domain.

We plan to capitalize upon the work already accomplished in the last 16 years in France, however, SIFR enables the emergence of new research domains and applications at LIRMM and materialize an important international collaboration with Stanford BMIR. SIFR will offer the French biomedical community (e.g., clinicians, health professionals, researchers) highly valuable ontology-based indexing services that will enhance their data production and consumption workflows. In addition, the results of the project are not limited to French (also include English, Spanish) and we are also transferring our results in the agronomic domain in the context of the new AgroPortal project (http://agroportal.lirmm.fr). The project will put France in a key position to lead future European projects related to multilingual data issues in biomedicine and other domains.

• General Web page: www.lirmm.fr/sifr
• Publications : bit.ly/194ImnR
• Code repository: github.com/sifrproject

The volume of data in biomedicine is constantly increasing. Despite a large adoption of English in science, a significant quantity of these data uses the French language. Usually, the content of the resources is indexed to enable querying with keywords. However, there are obvious limits to keyword-based indexing: use of synonyms, polysemy, lack of domain knowledge. Biomedical data integration and semantic interoperability is necessary to enable new scientific discoveries that could be made by merging different available data (i.e., translational research). A key aspect in addressing semantic interoperability for life sciences is the use of terminologies and ontologies as a common denominator to structure biomedical data and make them interoperable. Especially, the community has turned toward ontologies to design semantic indexes of data that leverage the medical knowledge for better information mining and retrieval. However, besides the existence of various English tools, there are considerably less ontologies available in French and there is a strong lack of related tools and services to exploit them. This lack does not match the huge amount of biomedical data produced in French, especially in the clinical world (e.g., electronic health records).

The Semantic Indexing of French Biomedical Data Resources (SIFR) project proposes to investigate the scientific and technical challenges in building ontology-based services to leverage biomedical ontologies and terminologies in indexing, mining and retrieval of French biomedical data. We will build an ontology-based indexing workflow (i.e., French Annotator) similar to what exists for English resources but dedicated and specialized for French. This will prepare the creation (in a future research project), of an index allowing semantic and multilingual search and mining of biomedical data resources. Within SIFR, we will follow the translational bioinformatics and semantic Web visions to discover new knowledge by recombining already existing knowledge. Our main goal is to enable straightforward use of ontologies freeing health researchers to deal with knowledge engineering issues and to concentrate on the biological and medical challenges.

The SIFR project brings together several young researchers at LIRMM to achieve this objective. Dr. Clement Jonquet, 31, assistant professor at University of Montpellier since 2010, will coordinate the project and capitalize on a strong experience in the field acquired after a 3 year postdoc at Stanford. He will be accompanied by 3 young assistant professors: Dr. Francois Scharffe (semantic Web), Dr. Sandra Bringay (data mining) and Dr. Mathieu Roche (NLP). In addition, highly qualified and experienced partners will be associated to the project: (i)°Stanford BMIR, a worldwide leader providing (English-)ontology-based services to assist health professionals and researchers in the use of ontologies to design biomedical knowledge-based systems; (ii)°CISMeF group, which is the national leader to provide French health terminology-based services. Furthermore, other academic and industrial partners have been also identified (e.g., Ontologos Corp, CNRS-INIST) and will collaborate to illustrate concrete valorization of the project outcomes in terms of scientific and economic impact.

We plan to capitalize upon the work already accomplished in the last 16 years in France, especially by the CISMeF group. However, SIFR will enable the emergence of new research domain and applications at LIRMM and will set up a strong collaboration with an international leader lab such as Stanford BMIR. SIFR will offer the French biomedical community (e.g., clinicians, health professionals, researchers) highly valuable ontology-based indexing services that will enhance their data production and consumption workflows. The project will put France in a key position to lead future European projects related to multilingual data issues in biomedicine and other domains.

Project coordination

Clement Jonquet (Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier, Université Montpellier 2) – jonquet@lirmm.fr

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partner

UM2-LIRMM Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier, Université Montpellier 2

Help of the ANR 276,640 euros
Beginning and duration of the scientific project: February 2013 - 48 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter