The main goal of this project is to produce software based on language processing and artificial intelligence that detects potential risks of different kinds (health, ecological, economical, etc.) in technical documents. We will concentrate on procedural documents which are, by large, the main type of technical document. Given a set of procedures (e.g. production launch, maintenance) over a certain domain produced by a company, and possibly given some domain knowledge (ontology or terminology), the goal is to process these procedures and to annotate them wherever potential risks are identified. Procedure authors are then invited to revise these documents.
Risk analysis is based on three types of considerations:
(1) Inappropriate ways of writing: complex expressions, implicit elements, gaps, inappropriate granularity level, etc.
(2) Incoherence among procedures : detection of unusual ways of realizing an action (e.g. unusual instrument, temperature, length of treatment, etc.) w.r.t. similar actions in other procedures,
(3) Domain requirements not followed in a procedure, therefore leading to risks.
This software will be a prototype offering the main functions. It will be based on an existing stable prototype: <TextCoop>, which is dedicated to text semantics: given a text grammar and lexical data, it tags in XML the corresponding structures. <TextCoop> is in particular dedicated to procedure analysis (titles, instructions, prerequisites, warnings, explanations, etc.). <TextCoop> is ‘just’ a language processing platform. To get a real operational value, it needs to be paired with applicative functionalities. One of the most crucial one is risk analysis and prevention in industrial processes via procedure analysis. We propose to pair <TextCoop> with add-ons to fulfill this task:
- a SAT4J solver (to check for coherence and completeness) ,
- some domain knowledge specified via a dedicated interface (Arias domain knowledge base)
- a rewriting system to produce formal language expressions from natural statements,
- two engines to handle the 3 points above together with interfaces and basic functionalities (display, reporting, etc.).
This project is realized by the conjunction of efforts in language processing, artificial intelligence and cognitive ergonomics to better fulfill user needs and to make it acceptable.
In terms of project management, a large user group is planned to provide us with feedback, to ensure we are on the right track.
In terms of valorization, our vision is to offer a freeware engine with some limited linguistic data to show users the services we offer. The added value, which produces returns on investments, is based on the fact that an industrial deployment and integration requires a lot of knowhow and language resources which will be ‘sold’ to our customers. Our strategy is to enrich the product/application/services catalogue of one or more companies since we need industrial support to integrate this prototype into industrial processes. We will develop for that purpose a methodology for industrial deployment of this type of technology. It is clear that the number of customers around the world is potentially very large, since every activity where there are risks should be interested in this work: energy, transportation, health care, chemistry, etc.
In terms of innovation, there are some products on requirement editing, but no realizations on the second stage of the process: procedure content analysis w.r.t. form (language) and contents (domain knowledge and requirements). Therefore our proposal fills a major gap.
Monsieur Patrick SAINT-DIZIER (UNIVERSITE TOULOUSE III [PAUL SABATIER]) – email@example.com
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
VALO-PRES Université de Toulouse
UPS-IRIT UNIVERSITE TOULOUSE III [PAUL SABATIER]
CRTD CONSERVATOIRE NATIONAL DES ARTS ET METIERS (CNAM)
Help of the ANR 248,947 euros
Beginning and duration of the scientific project: - 24 Months