SMILK is a Joint laboratory (Labcom) between WIMMICS (Web-Instrumented Man-Machine Interactions, Communities and Semantics, INRIA, I3S) lab and the Research and Innovation unit the company VISEO.
The purpose of this Labcom is to develop research and technologies on the one hand, retrieve, analyze, and reason about linking data from textual Web resources and other to use open Web data taking into account the social structures and interactions in order to improve the analysis and understanding of textual resources.
Automated Natural Language Processing (NLP), Web Open Data (Linked Open Data) and social networks (RS) are the three topics of this Labcom including their coupling studied in three ways: texts and linked data, Linked Data and social resources , texts and social resources.
Our first prototype is a Web browser plugin. When the user visits a web page, the plugin detects named entities in the field of cosmetics and highlights the different colors depending on their type (product names, scales, cosmetic group, division of a group). Each named entity is disambiguated and linked to the corresponding DBpedia resource to extract the data. When the user clicks on a highlighted entity graph is built on the fly, to visualize the links between all the information that has been extracted from the text, enriched with information from DBpedia and social networks (analysis of opinion, clouds of words ...).
The next step is the realization of evaluation campaigns with real users (customers Viséo) on a specific area with the first prototype.
In parallel we will work on the adaptation of the NLP technology to automatically extract structured data from social media and in particular the adaptation of NLP to the analysis of degraded texts.
SMILK is a joint laboratory (Labcom) between the WIMMICS team (Web-Instrumented Man-Machine Interactions, Communities and Semantics, INRIA, I3S) and the Research and Innovation unit of VISEO.
Natural Language Processing (NLP), Linked Open Data (LOD) and Social Networks (SN) as well as the links between them are at the core of this LabCom. We propose three ways for looking at these issues: text and related data, linked data and social resources, texts and social network. The purpose of SMILK is both to develop research and technologies in order to retrieve, analyze, and reason on textual data coming from Web sources, and to make use of LOD, social networks structures and interaction in order to improve the analysis and understanding of textual resources.
The availability of a large amount of public data (open data) and the application of Web principles to linking datasets together (linked data), have given rise to new opportunities for the already ongoing research, as well as opened new scientific challenges due to data heterogeneity and potential interlinking.
Topics covered by SMILK include: use of data and vocabularies published on the web in order to search, analyze, disambiguate and structure textual knowledge in a smart way, but also to feed internal information sources; reasoning on the combination of internal and public data and schemes, query and presentation of data and inferences in natural formats.
SMILK proposes to explore different scientific tracks such as a strong coupling between algorithms and models from linguistic and semantic levels, extraction and disambiguation guided by LOD as well as combining different ways of reasoning (e.g. logical inferences, approximation and similarities).
SMILK offers to WIMMICS the opportunity of applying its research on concrete industrial scenarios, and to VISEO the opportunity of complementing its expertise in NLP with WIMMICS expertise in Semantic Web, optimizing the costs of research and innovation and expanding its Business Intelligence offer with new ideas, solutions and services.
Monsieur Fabien Gandon (Institut de Recherche en Informatique et Automatique) – firstname.lastname@example.org
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
INRIA Institut de Recherche en Informatique et Automatique
Help of the ANR 300,000 euros
Beginning and duration of the scientific project: January 2014 - 36 Months