We propose to tackle the problem of ambiguities of visual and textual content by learning then combining their representations. As a final use case, we propose to solve a new scientific task, namely Multimedia Question Answering, that requires to rely on three different sources of information to answer a (textual) question with regard to visual data as well as an external knowledge base containing millions of unique entities, each being represetd by textual and visual content as well as some links to other entities. In practice, we focus on four types of entities, namely the persons, the organisations (companies, NGOs, intergovernmental organizations...), the geographical points of interest (touristic places, remarquable buildings...) and the objects (commercial products...). Achieving such an objective requires to progress on the disambiguation of each modality with respect to the other and the knowdge base. We also propose to merge the representations into a common tri-modal space, in which one should determine the content to associate to an entity to adequately represent it with regard to its type (person, object, organisation, place). An important work will deal with the representaiton of a particular entity into the common space, in which one should determine the content to associate to an entity to adequately represent it. Since such an entity can be associated to several vectors, each corresponding to a data that is originally in a possible different modality, the challenge consists in defining a representation that is quite compact (for permances) while still expressive enough to reflect the potential links of the entioty with a variety of other ones. The project has a potential economic impact in the fields of data intelligence, including applications in marketing, security, tourism and cultural heritage. In case of success, the output of the MEERQAT project could directly contribute to improve chatbots. During the project, the direct output will be mainly academic, that scientfic article with the corresponding material to reproduce experiments. We also plan to release a new benchmark for the proposed task, in the context of an international evaluation campaign.
Monsieur Hervé Le Borgne (Laboratoire d'Intégration des Systèmes et des Technologies)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
LIST Laboratoire d'Intégration des Systèmes et des Technologies
IRIT Institut de Recherche en Informatique de Toulouse
Inria Rennes Bretagne - Atlantique Centre de Recherche Inria Rennes - Bretagne Atlantique
LIMSI Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur
Help of the ANR 674,270 euros
Beginning and duration of the scientific project: March 2020 - 42 Months