Évaluation et développement de systèmes d'analyse et de compréhension de textes – Readers
The project proposes new unsupervised computational models to automatically extract background knowledge after reading large amounts of unstructured text. This automatically extracted knowledge is in the form of classes, categorized entities and predicates whose arguments are typified by probability distributions over classes. Classes themselves will be automatically organized into taxonomies related to the predicates in which they participate. In this way, new methods and models based on extensional definitions of concepts are developed for the automatic creation of knowledge bases close related to textual representations as to enable textual inferences. The extracted knowledge will be also linked to external human-made resources such as Freebase, DBPedia and WordNet, and the knowledge bases will be interfaced to several engines for disambiguation, relation extraction, relatedness and expansion. All these resources and tools will be available for the development of a reading machine as part of the project. The purpose of the reading machine is to answer questions about a given text. Texts are never self-contained and their interpretation always requires the recovering of large amounts of background knowledge. Thus, the Machine Reading technology under development must incorporate the recovering and use of large amounts of background knowledge into the processing of language. This Machine Reading technology will be evaluated through Multiple-Choice Reading Comprehension tests (MRC) developed by humans over documents unseen before by the machine. MRC tests enable objective and reproducible evaluation experiments, 100% reusable as benchmarks available for the international community. Interestingly, the industrial partner in charge of the Machine Reading system development will apply the reversing technology to automatically generate MRC tests for the automatic assessment of children reading abilities. This reading machine will work with at least two languages, English and French. The support and coordination of an international evaluation campaign for Machine Reading in multiple languages (English, Spanish, French, German, Italian, Romanian, Bulgarian and Arabic) is part of the proposal. This evaluation campaign will serve to measure the development progress of the project technology in a comparative/competitive environment.
Monsieur Peñas ANSELMO (Universidad Nacional de Educación a Distancia) – email@example.com
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
UPV/EHU Universidad del Pais Vasco
SYN Synapse Développement
U.E. University of Edinburgh
UNED Universidad Nacional de Educación a Distancia
Help of the ANR 232,138 euros
Beginning and duration of the scientific project: October 2012 - 42 Months