DS04 - Vie, santé et bien-être 2017

Communication, Literacy, Education, Accessibility, Readability – CLEAR

CLEAR - Make medical information better understandable

Patients often have difficulties to understand medical information by which they are concerned, such as diagnosis and treatments. The CLEAR project is issued from this context with huge scientific and societal needs. Indeed, there were almost no research on simplification of medical texts in French. Our project had the purpose to complete this task. The main accent has been put on creation of linguistic resources dedicated to simplification, because such resources play an important role in this

Which issues are relevant for the simplification of medical texts?

The project is oriented on several issues, like: (1) propose research work on patient needs, (2) process large amounts of heterogeneous and non-structured data, (3) adapt automatic methods to medical domain, (4) create resources for explaining medical terms in French. The results of the project can be exploited by medical professionals, by institutions and associations, and by patients. More particularly, patients obtain the possibility to access information and knowledge on disorders and their treatments. This permits to have a better management of healthcare process and guarantee a better participation in social life, despite the disease.

CLEAR project uses methods from Natural Language Processing NLP) and from AI. For different tasks, we exploit dedicated methods.

For instance, one of the objectives is to build a dictionary with explanations for medical terms, like in {myocardium, heart muscle}. Several methods are proposed for the task. They are related to information extraction. A major part of the methods is based on rules. In this way, the methods can work on non-annotated corpora, because the rules describe language structures of interest. We exploit for example the definitions (Myocardium is a heart muscle, which function is to...), reformulations (Myocardium, in other words heart muscle, plays an important role in...), morphological structure of terms (myo (muscle) + cardium (heart) = heart muscle), etc.

Methods from AI, by supervised learning, are also exploited. These methods rely on annotated reference data, usually issued from manual annotation, to create specific models and recognize the aimed information. For instance, we exploit supervised learning to detect sentences that are semantically close and parallel. These sentences are differentiated by their technicity and readability {difficult sentence, simplified sentence}. These sentence pairs are then exploited to simplify the documents: transform or rewrite difficult technical sentences into simplified sentences.

The evaluation performed with real users follows sociological methods.

 

CLEAR project provides several results :

- a comparable corpus, with pairs of documents, differenciated by their technicity and readability. The documents belong to three types: (1) encyclopedia articles from Wikipedia and Vikidia (2*3815 documents, 14M occ), (2) abstracts from scientific literature from the Cochrane collaboration (2*575 documents, 8M occ), (3) drug inserts written for patients and similar information for medical staff (2*11,800 documents, 278M occ).

- a parallel corpus with almost 11 000 aligned sentence pairs issued from medical texts. A subset of these sentences is aligned manually, the rest is aligned automatically through supervised learning. Sentences within pairs are differentiated by their readability.

- Wikilarge-FR corpus with almost 300,000 aligned sentence pairs from Wikipedia. This corpus has been created in English by other researchers and translated in French as part of the CLEAR project.

- a lexicon, with almost 8,000 term pairs in this format {myocardium, heart muscle}: medical terms are associated with their explanations.

- a typology of language transformations occurring during the simplification of medical texts. Among the most frequent transformations, we observe for instance synonyms and use of more general terms.

- 16 texts, general and medical, manually simplified. These texts belong to three types: encyclopedia articles on common topics, encyclopedia articles on medical topics, clinical cases.

- a corpus with 16 texts annotated with eye-tracking indicators. The eye-tracking experiments have been done with almost 90 persons. This corpus provides several indicators, like fixations (stops on words), saccades (movements between the stops) and regressions (backward movements).

- datasets created for the DEFT competition in 2019 and 2020 for the tasks on semantic similarity.

 

The project also proposed two approaches for the simplification: one rule-based, another exploiting supervised learning.

 

The project opens several perspectives. They are mainly related to the creation of resources and tools dedicated to the simplification.

Yet we assume that creation of resources (lexica, corpora, annotations) is a very important task. One one hand, such resources permit to better describe the needs and the specificity of simplification and its evaluation. On the other hand, these resources permit to develop tools for the simplification adapted to a given population.

 

The CLEAR project proposes innovative methods allowing the creation of linguistic resources and software dedicated to the simplification of medical texts written in French. The software is expected to be the mediator in the communication between patients and medical professionals. The project addresses several challenges, such as: research on patient needs, processing of large corpora with heterogeneous and non structured data, adaptation of automatic methods to the medical field, creation of a base with knowledge suited to the explicitation of medical terms in French. The project will produce resources that can be used by medical professionals to improve their interactions with patients. As for patients, they would gain a tool that provides the possibility to access knowledge on pathologies and their treatment, in order to enable a better management of pathologies by patients and their increased participation to social acivities despite their disease.

Project coordination

Natalia GRABAR (Maison Européenne des Sciences de l'Homme et de la Société Lille Nord-de-France/STL-Savoirs, Textes, Langage)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partnership

MESHS - STL UMR8163 Maison Européenne des Sciences de l'Homme et de la Société Lille Nord-de-France/STL-Savoirs, Textes, Langage
LISN Laboratoire Interdisciplinaire des Sciences du Numérique
LEPS EA 3412 LABORATOIRE EDUCATIONS ET PRATIQUES EN SANTÉ
AFH ASSOCIATION DES HEMOPHILES
SYNAPSE SYNAPSE DEVELOPPEMENT

Help of the ANR 610,853 euros
Beginning and duration of the scientific project: - 36 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter