Adada: Adaptive Datasets for Enhancing Reasoning in Large Language Models – Adada
Large language models (LLMs) have achieved remarkable success in various natural language processing tasks, but their ability to perform complex reasoning often falls short. To tackle implicit reasoning problems that arise in everyday scenarios, from interpreting rules in texts to evaluating products against specifications, LLMs need to go beyond linguistic fluency and acquire logical precision and multi-step problem-solving skills.
The Adada project proposes a novel framework to distill modern symbolic reasoning into LLMs through evolutive synthetic datasets. By generating machine-annotated tasks (MATs) tailored to specific downstream applications, Adada aims to continuously enhance LLMs for reasoning-intensive use cases such as technical documentation understanding, commonsense reasoning, and legal analysis.
Adada will develop a scalable, modular framework for syntax-guided and value-guided problem generation. The framework will integrate diverse MATs, including non-classical logics, induction, planning, and constraint satisfaction, by representing each task with a formal grammar, a solver, and a verbalization into natural language. An adversarial methodology will iteratively generate datasets exposing limitations in the LLMs' reasoning abilities, promoting concise, diverse, and challenging problems.
The project will investigate transfer learning between MATs and human-annotated tasks (HATs), providing insights into the relationships between different reasoning formalisms and their impact on natural language understanding. Adada will evaluate the enhanced LLMs on a suite of reasoning-intensive HATs spanning legal reasoning, medical question answering, and contradiction detection.
Project coordination
Damien Sileo (Institut national de la recherche en informatique et automatique)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partnership
Institut national de la recherche en informatique et automatique
Help of the ANR 279,437 euros
Beginning and duration of the scientific project:
September 2024
- 48 Months