DS0707 - Interactions humain-machine, objets connectés, contenus numériques, données massives et connaissance

Generating and Answering Ontological Queries over Semi-structured Medical Data – GoAsQ

Submission summary

More and more information on individuals (e.g., persons, events, biological objects) are available electronically in a structured or semi-structured form. However, selecting individuals satisfying certain constraints based on such data manually is a complex, error-prone, and time and personnel consuming effort. For this reason, tools that can automatically or semiautomatically answer questions based on the available data need to be developed. While simple questions can directly be expressed and answered using keywords in natural language, complex questions that can refer to type and relational information increase the precision of the retrieved results, and thus reduce the effort for posterior manual verification of the results. One example for this situation is the setting where electronic patient records are used to find patients satisfying non-trivial combinations of certain properties, such as eligibility criteria for clinical trials. Another example that will also be considered as a use case in this project is the setting where a student asks the examination office questions about study and examination regulations. In both cases, the original question is formulated in natural language.

In the GoAsq project, we will investigate, compare, and finally combine two different approaches for answering questions formulated in natural language over textual, semi-structured, and structured data. One approach is the text-based question answering that directly answers natural language questions using natural language processing and information extraction techniques. The other tries to translate the natural language questions into formal, database-like queries and then answer these formal queries w.r.t. a domain-dependent ontology using database techniques. The automatic translation is required since it would be quite hard for the people asking the questions (e.g. medical doctors, students) to formulate them as formal queries. The ontology allows to overcome the possible semantic mismatch between the person producing the source data (e.g., the GPs writing the clinical notes) and the person formulating the question (e.g., the researcher formulating the trial criteria). GoAsq can thus leverage recent advances obtained in the ontology community on accessing data through ontologies, called ontology-based query answering (OBQA). More precisely, in Task 1 of the project we investigate the two use cases mentioned above (eligibility criteria; study regulations). In Task 2 we will introduce and analyze extensions to existing formal query languages that are required by these use cases. Task 3 will develop techniques for extracting formal queries from textual queries, and Task 4 will evaluate the approach obtained this way, compare it with approaches for text-based question answering, and develop a hybrid approach that combines the advantages of both.

Project coordination

Yue MA (Université Paris Sud/Laboratoire de Recherche en Informatique)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.


LIMSI Laboratoire d'informatique pour la mécanique et les sciences de l'ingénieur
UPSUD/LRI Université Paris Sud/Laboratoire de Recherche en Informatique

Help of the ANR 271,133 euros
Beginning and duration of the scientific project: November 2015 - 36 Months

Useful links

Explorez notre base de projets financés



ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter