Innovative Techniques for the Advanced Learning Of Distributional Compositionality – ITALODISCO
This project aims to model semantic compositionality in a fully automatic and unsupervised way. Up till now, most work on the automatic acquisition of semantics only deals with individual words. The modeling of meaning beyond the level of individual words - i.e. the combination of words into larger units - has been much less thoroughly explored. This project proposes a data-driven approach that combines a number of important and innovative techniques. First of all, we rely on mathematical objects called tensors - the generalization of matrices - in order to adequately model the multi-way co-occurrences that come into play when dealing with compositionality. In combination with a latent factorization model, tensors are able to induce latent semantics from multi-way co-occurrences, which can subsequently be used for the modeling of compositional expressions. Secondly, we combine a tensor-based approach with advanced machine learning techniques, notably neural networks. Neural network techniques have recently shown impressive performance in a number of natural language processing tasks; by integrating them with our tensor-based approach, we aim to model the multi-way interaction of the various words within a compositional expression in a more profound way. Thirdly, we aim to combine the strengths of both distributional and formal semantics within one integrated approach. By combining the strengths of both approaches within a complementary framework, we expect to develop algorithms that are able to grasp the meaning of larger textual entities in a more profound and elaborate way. The proposed model aims to provide an implementation of compositionality that is entirely data-driven: the model is automatically constructed from large text corpora, and its performance is evaluated quantitatively.
Project coordination
Tim Van De Cruys (Institut de Recherche en Informatique de Toulouse)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partner
IRIT Institut de Recherche en Informatique de Toulouse
Help of the ANR 158,222 euros
Beginning and duration of the scientific project:
September 2014
- 36 Months