CE38 - Révolution numérique : rapports au savoir et à la culture

French-Speaking Digital Literature : identification, indexing and analysis of digital literature works – LIFRANUM

Submission summary

The LIFRANUM project aims at identifying and structuring the corpus of digital
literatures (sites, blogs, social networks) in Francophonie . This patrimonial dimension is
coupled with an epistemological inquiry into the literarity of the identified contents and the
dynamics of new sociabilities. Following a tracking of the URLs concerned, we will launch
crawlings in order to recover large sets of data. These will be stored in a data lake that will
rely for its consistency, on a simple indexing system, resulting from a taxonomy developed
from, in particular, the user experience.
We will use, for content indexing, data lakes , which store documents in their original format
while allowing their efficient interrogation through a metadata management system. The data
lake will then be the source for data mining tools, which will allow to highlight original topics,
calculate similarities between entities (e.g., documents, websites, authors). In this project we
aim at making those techniques interpretable so that the final users can understand the
suggested structurations.
The results of these analyzes will help specify and enrich the taxonomy . The description of
these original literary entities requires the development of a tool based on both digital
structures and the perception of users. Far from being irreconcilable, we postulate that these
two approaches are complementary, and even that the robustness of an analytical tool is
based on this double scientific anchoring. This taxonomy will help elaborate a simple
ontology (on the model of bibliographic ontologies) from which we can deduce a set of
metadata usable to characterize web entities. Beyond the constitution of the corpus of a new
and fundamental dimension of contemporary literature, our project, by exploiting data lakes
and data mining, revolutionizes the methods and the means of documentary description. The
coherence of the project is based inter alia on the articulation between different sciences and
methods for the construction and use of an object (in this case a corpus): analysis of the
practices, uses and reception of these objects, and moreover data analysis, text mining, the
whole used to structure a language (taxonomy, ontology and metadata set) that allows to
ensure the access of literary creations by diverse users while ensuring a rigorous
characterization of the objects and their structuring.
The project relies on two laboratories (literature, information-communication; computer
science) and on the BnF ; it is supported by the International Institute of Francophonie. The
collaboration between these partners, already in progress, has allowed us to initiate
research and empirically test the risks and solutions related to this project.
The objective of the project therefore concerns the literary community, but beyond that, aims
to make available to all disciplinary fields a corpus of magnitude as well as an innovative
methodology.The objective is, by depositing the corpus in a storage space of the
HUMA-NUM infrastructure, in agreement with the MSH, to produce a tool available for
scientific approaches and broad uses: linguistics, statistics, computer science, information
retrieval, natural language processing, among others.
We are putting in place, with the appropriate partners, pedagogical uses for audiences of
researchers, teachers and documentalists as well as high school students and university
students. The challenge is indeed daunting: it involves helping editorial practice (through
collaborative writing and writing workshops) as well as contributing to the analysis of new
literacy. We foresee varied deliverables in their form as in their support, destined to the
methodological accompaniment to the use of this corpus.

Gilles Bonnet (MARGE)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

MARGE MARGE
ERIC Entrepôts, Représentation et Ingénierie des connaissances
BNF Bibliothèque Nationale de France

Help of the ANR 380,052 euros
Beginning and duration of the scientific project: December 2019 - 48 Months

Explorez notre base de projets financés

ANR makes available its datasets on funded projects, click here to find more.