A generic solution for the extraction of semantic information from medical data for use in epidemiology
Computerized medical records are a potentially very important source of information for fields ranging from assisted decision support and evidence-based medicine to epidemiological surveillance. As much of this data is available as unstructured text, natural language processing techniques can be used to process and interpret it. The aim of this project is to develop a generic solution that extracts and structures medical data in order to make it usable in epidemiological or medical decision-making. <br /> The developed solution will be as domain-independent as possible to allow any new user to write their own expert rules, whatever their domain of medical expertise. The quality of data extracted by this solution will be evaluated within an epidemiological use case. Performance will be evaluated in two application scenarios: healthcare associated infections and cancer. <br /> <br />From a technological point of view, the project will involve the development of a semantic anaylsis engine that is both robust and accurate, able to deal with a variety of complex linguisitc phenomena, including the detection of temporal expressions and negation, and that will be coupled with a multi-terminology server. The SYNODOS solution will make a clear distinction between linguisitc rules and expert rules, making each module independent, thus allowing non-computer programmer users to generate their own rules and query the processed data. Expected results are an operational system and production environment that combine the different technological tools described above, and that will be made available on a commercial basis.
We propose the following approach:
• Identify the additions and adaptations to the linguistic, terminological and ontological resources needed to deal with the two use cases (detection of healthcare associated infections and diagnostic management of colon cancer).
• Design a tool that allows medical professionals to write their own “expert rules”. These rules allow new “facts” to be inferred from those derived from the document analysis (base facts) or from probabilistic and heuristic methods applied by medical experts.
• Implement the different modules as Web services
• Integrate the different modules into the general SYNODOS system, including a user interface allowing the user to access documents, write new “expert rules”, analyze documents and query the knowledge base to carry out epidemiological research
• Evaluate the methods and tools based on two use cases in order to gauge their performance.
Electronic Medical Record (EMR) contains information data that are crucial for biomedical research studies. In the recent years there has an exponential increase of scientific publications about using textual processing of medical data in fields as diverse as medical decision support, epidemiological studies or data and semantic mining. The ALADIN project (ANR TecSan - n° ANR-08-TECS-001),) demonstrated the feasibility and good performances of this type of approach via the development of a semantic analysis tool to detect Hospital Acquired Infections. This project has also highlighted some scientific and technological challenges that SYNODOS will address. The project SYNODOS brings together two academic structures, one expert in medical terminology (CISMeF) and the other in the field of epidemiology (LBBE) and two industrial, one specializing in software development and language resources (CELI ) and the other in the integration of business intelligence solutions and web technologies (VISEO).
The purpose of our project is to develop a generic solution for extracting semantics out of medical data and organize this medical information in such a way that it could be used to support epidemiological studies or medical decisions. The genericity of the resulting solution will be supported by the fact that medical staff will be able to write their own expert rules independently of their domain of specialties;
The project will also evaluate the quality of extracted information in the context of epidemiological studies. System performance will be evaluated in two domains: hospital acquired Infections and cancer. .
From a technological standpoint, project objectives are: development of fine grained linguistic rules to extract temporal expressions, interface between the semantic analyzer and multi-terminology server upstream during the extraction phase as well as interface between linguistic engine and knowledge representation. For the generation of expert system rules, Synodos proposes a general modular architecture that makes a clear distinction between linguistic rules and expert systems rules allowing the medical user to generate its own decision rule and enabling semantic queries on information extracted. Project outcomes will be an operational system integrating the various technological modules described above.
Marie-Hélène METZGER (Laboratoire de Biométrie et Biologie Evolutive) – email@example.com
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
LBBE - UCBL Laboratoire de Biométrie et Biologie Evolutive
CISMeF Equipe Catalogue et Index des Sites Médicaux et Francophones et Groupe Gestion de la Connaissance et Système d'Information de Santé
Help of the ANR 785,183 euros
Beginning and duration of the scientific project: September 2012 - 36 Months