CE23 - Intelligence Artificielle 2020

Emergent communication through curiosity-driven multi-agent reinforcement learning – ECOCURL

Emergent Communication through Curiosity-driven Multi-Agent Reinforcement Learning

The ECOCURL project proposes to leverage recent contributions in Multi-Agent Reinforcement Learning to study how compositional communication systems can emerge in artificial agent populations and support the open-ended discovery of increasingly complex cooperative strategies. The results of the project will be eventually demonstrated in a rich 3D environment and disseminated in the scientific community and high-visibility outreach events.

Hypotheses and objectives of the project

The ECOCURL project is grounded in the following hypotheses: • H1: Intrinsically-motivated learning can encourage emergent communication in cooperative multi-agent environments by guiding the agents towards the autonomously discovery of a diverse set of skills for improving their control over the environment (Moulin-Frier et al., 2014; Oudeyer & Smith, 2016). • H2: The structure of an emergent communication system, in particular its compositional nature, is shaped by constraints on the structure of the environment and the agents’ cognitive architectures (Battaglia et al., 2018; Kirby et al., 2014; Nowak et al., 2000) • H3: Compositional communication systems can support the acquisition of increasingly complex cooperative skills, paving the way towards open-ended cultural evolution in artificial agent populations (Colas et al., 2020; Tomasello et al., 2005; Vygotsky, 1978). The ECOCURL project (Emergent Communication through Curiosity-driven Multi-Agent Reinforcement Learning) addresses key issues of the Artificial Intelligence scientific evaluation committee related to reinforcement learning, multi-agent systems and representation learning through the following objectives. • O1 (addressing H1 and implemented in WP1): To design and implement a novel MARL algorithm combining intrinsically-motivated learning with compositional goal imagination through factored representations. • O2 (addressing H2 and implemented in WP2): To evaluate the role of the structure of the environment and of the agents’ cognitive architecture in the emergence of compositional communication. • O3 (addressing H3 and implemented in WP3): To evaluate how compositional communication can in turn support the open-ended discovery of increasingly complex cooperative strategies in a mixed cooperative-competitive scenario. • O4 (implemented in WP3 and exploited in WP4): To leverage the achievement of the above objectives to build an integrated demonstrator in a rich 3D environment showing how agent populations can co-acquire an open-ended repertoire of cooperative and communicative strategies (WP3), as well as to use this demonstrator for disseminating the project results in high-visibility outreach events (WP4).

Methods relevant to the ECOCURL project, emphasizing the contributions of the project beyond the state the art.

Contribution 1. The ECOCURL project will extend the concept of modular goal spaces proposed in the CURIOUS algorithm to a multi-agent setting and study how it can encourage the discovery of complex cooperative and communicative strategies.

Contribution 2. The ECOCURL project will extend the concept of language-based goal imagination proposed in IMAGINE to an emergent communication setup where agents learn both to perceive and to generate sequential communication signals. The prediction is that goal imagination can support both compositional generalization (as recently demonstrated in IMAGINE) and the open-ended discovery of cooperative skills in multi-agent environments

Contribution 3. The algorithm developed in the ECOCURL project will integrate learned factored representations through GNNs. Introducing relational inductive biases within deep learning architectures will support relational reasoning (involving world entities and other agents) and compositional generalization which are key to the ECOCURL project.

Contribution 4. The MARL algorithm developed in the ECOCURL project will be able to operate in both a centralized learning decentralized execution mode as well as in a fully decentralized mode. The first case will serve as a target upper bound for the second case during the evaluation of the algorithm performance. A prediction is that exploration based on learning progress, able to autonomously re-explore forgotten tasks as demonstrated in CURIOUS, will provide a generic solution to the non-stationarity problem in decentralized MARL.

Contribution 5. The ECOCURL project will improve the state of the art in MARL-based Emergent Communication by evaluating the joint role of intrinsic motivation, goal imagination, relational inductive bias and environmental dynamics in the emergence of compositional communication systems.

Contribution 6. The ECOCURL project will study how compositional communication can foster the open-ended discovery of increasingly complex behaviors in mixed cooperative-competitive environments and will demonstrate the mechanism in a rich 3D environment with complex intrinsic dynamics.

Results

During the first 18 months of the project, we have made the following contributions:

Social Network Structure Shapes Innovation: Experience-sharing in RL with SAPIENS ((Nisioti et al., 2022), paper submitted to NeurIPS 2022. Contributes to T1.3, T2.1, T2.3, T3.1, T3.2). This contribution is making significant progress towards O1 (T1.1 and T1.3), by proposing a novel MARL algorithm promoting collective innovation in RL agent populations through sequential experience sharing. It is also addressing parts of O2 (T2.1 and T2.3) by comparing the role of experience sharing in diverse environments, defined as different hierarchical innovation tasks. Finally, it is contributing to O3 by proposing measures of open-ended innovation in those environments.

Socially Supervised Representation Learning: the Role of Subjectivity in Learning Efficient Representations. ((Taylor et al., 2022), paper published at AAMAS 2022. Contributes to T2.2, T3.1). This contribution is contributing to O2 by proposing that multi-agent environments, where agents do not have access to the observations of others but can communicate within a limited range, guarantees a common context that can be leveraged in individual representation learning. It is contributing to O1 (T1.1) by proposing a cognitive architecture comprised of a population of autoencoders and to O2 (T2.2) by defining multiple loss functions capturing different aspects of effective communication, and examining their effect on the learned representations.

Plasticity and evolvability under environmental variability: the joint role of fitness-based selection and niche-limited competition. ((Nisioti & Moulin-Frier, 2022), paper published at GECCO 2022. Contributes to T2.1, T2.3, T3.1, T3.2). This contribution is contributing to O2 by studying the interplay between environmental dynamics and adaptation in a model of the evolution of plasticity and evolvability. We experiment with different types of environments characterized by the presence of niches and a climate function that determines the fitness landscape. We empirically show that environmental dynamics affect plasticity and evolvability differently and that the presence of diverse ecological niches favors adaptability even in stable environments.

We have also published several opinion and position papers explaining the positioning of the project wrt to the literature (Moulin-Frier & Oudeyer, 2021; Nisioti et al., 2021; Ten et al., 2022).

We have started an international collaboration with Microsoft Research, leading to the submission of (Nisioti et al., 2022) to the Neurips 2022 conference.

We have coorganized the second and third SMILES workshop: sites.google.com/view/smiles-workshop/

Prospects

We are now integrating the findings from these three algorithms and extending them towards an integrated MARL algorithm combining intrinsically-motivated learning with compositional goal selection. We evaluate this novel algorithm in a framework called “multi-agent autotelic learning”, which is multi-task and multi-agent, with agents selecting their own goals in a complex environment filled with various types of objects. This offers a large goal space to the agents, some of these goals requiring certain levels of cooperation to be achieved. Agents need to learn to sample from them and agree, through communication, on which goal to follow on each episode. This will achieve O1 and O2. We plan to submit a paper on this by the end of 2022.

Scientific productions and patents

Demirel, B., Moulin-Frier, C., Arsiwalla, X. D., Verschure, P. F. M. J., & Sánchez-Fibla, M. (2021). Distinguishing Self, Other, and Autonomy From Visual Feedback: A Combined Correlation and Acceleration Transfer Analysis. Frontiers in Human Neuroscience, 15. www.frontiersin.org/articles/10.3389/fnhum.2021.560657

Moulin-Frier, C., & Oudeyer, P.-Y. (2021). Multi-Agent Reinforcement Learning as a Computational Tool for Language Evolution Research: Historical Context and Future Challenges. Challenges and Opportunities for Multi-Agent Reinforcement Learning (COMARL), AAAI Spring Symposium Series, Stanford University, Palo Alto, California, USA. arxiv.org/abs/2002.08878

Nisioti, E., Jodogne-del Litto, K., & Moulin-Frier, C. (2021, December). Grounding an Ecological Theory of Artificial Intelligence in Human Evolution. NeurIPS 2021 - Conference on Neural Information Processing Systems / Workshop: Ecological Theory of Reinforcement Learning. hal.archives-ouvertes.fr/hal-03446961

Nisioti, E., Mahaut, M., Oudeyer, P.-Y., Momennejad, I., & Moulin-Frier, C. (2022). Social Network Structure Shapes Innovation: Experience-sharing in RL with SAPIENS (arXiv:2206.05060). arXiv. doi.org/10.48550/arXiv.2206.05060

Nisioti, E., & Moulin-Frier, C. (2022). Plasticity and evolvability under environmental variability: The joint role of fitness-based selection and niche-limited competition. Proceedings of the 2022 Genetic and Evolutionary Computation Conference (GECCO 2022). arxiv.org/abs/2202.08834

Taylor, J., Nisioti, E., & Moulin-Frier, C. (2022). Socially Supervised Representation Learning: The Role of Subjectivity in Learning Efficient Representations. International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2022). arxiv.org/abs/2109.09390

Ten, A., Oudeyer, P.-Y., & Moulin-Frier, C. (2022). Curiosity-driven exploration: Diversity of mechanisms and functions. In The Drive for Knowledge: The Science of Human Information Seeking. Cambridge University Press. hal.inria.fr/hal-03447896

Submission summary

What are the conditions for communication systems to emerge in populations of artificial agents? How emergent communication systems can in turn support the acquisition of an open-ended repertoire of cooperative skills? These questions are currently gaining considerable interest in the AI community due to recent advances in multi-agent reinforcement learning. Recent contributions have shown how simple communication can emerge in agent populations learning how to solve a cooperative task. However, these contributions do not take advantage of recent deep reinforcement learning algorithms allowing the autonomous discovery and learning of multiple tasks in parallel. The ECOCURL project will extend such algorithms to realistic multi-agent cooperative environments, showing how a compositional communication system can be co-acquired by the agents to support the acquisition of an open-ended repertoire of cooperative skills.

Clément Moulin-Frier (Centre de Recherche Inria Bordeaux - Sud-Ouest)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

INRIA Centre de Recherche Inria Bordeaux - Sud-Ouest

Help of the ANR 258,120 euros
Beginning and duration of the scientific project: January 2021 - 48 Months

Explorez notre base de projets financés

ANR makes available its datasets on funded projects, click here to find more.