Natural Language Programming for Conversational Cobots – COCOBOTS
COCOBOTS (Conversational Cobots)
The main goal is to develop a multimodal discourse model that would allow a human to use situated dialogue to teach a robot new concepts or actions when necessary without requiring recourse to a roboticist, programmer or any specialized training methods.
Natural language toolkit for conversational assistants and cobots
The general goal of COCOBOTS is to contribute to the development of models for situation conversation that would allow a human to effectively collaborate through conversation with conversational cobot to perform a task. An important aspect of this objective is to be able to use situated dialogue to teach the robot new concepts or actions when necessary without requiring recourse to a roboticist, programmer or any specialized training methods. The method targeted is one in which a robot is taught simple programs through conversation and then learns to combine these programs to build more and more complex programs. The principal challenge to our proposal is to ensure that meaning is built up in a compositional fashion to ensure clean programs, from the syntactic level to the discourse level, in a way that preserves the relation between linguistic expressions and the objects or actions to which they refer. Doing this will require investigating the use of a symbolic skeleton that we fill in with subsymbolic pairing of nonlinguistic content with linguistic expressions. Another central challenge is to account for different ways in which the environment can ground and contribute to discourse meaning on the one hand, and also how conversational grounding contributes to learning and grounding new concepts and actions.
- SDRT: A theory of discourse structure and relations (Correction, Question/Answers, Explanations, ...). This is a theory developed over the course of 30 years. It has been widely used to recover discourse structure in texts and more recently, to situated chats in the context of an online game.
- Language models to automatically parse conversations for discourse structure. We will extend older work on discourse parsing to exploit more recent neural language models.
- Simulated builder/constructor/robot. We use an extant neural builder for testing our parses of the Minecraft corpus. We have developed a simulated environment in WeBots for the COCOBOTS-specific data set. We have also created a simulator to collect new data.
- UR3/UR5 robot arms. Both Potsdam and Synergeticon have UR arms that we will use to test transfer from the simulated environment to the real world.
So far, we have nearly fully annotated an extant corpus based on interactions between an architect and builder in a simplified Minecraft world. We have also begun collecting data for a spin-off corpus more specifically designed for COCOBOTS.
We have also trained a discourse parser to find discourse relations in the Minecraft data and published some work on logical concepts understood by large language models and on semantic grounding
We expect the resulting data sets to be useful for future researchers.
The model of situated conversation will be the first for the kind of collaborative dialogues that one can imagine taking place between a human and a collaborative robot.
We will also be the first to connect this kind of discourse analysis with automatic/neural builders.
3 papers at international conferences/workshops (with others submitted or to be submitted soon)
The goal of COCOBOTS is to develop conversational assistants and cobots capable of interacting with human coworkers in sophisticated ways. One crucial such way is through the development of a natural language programming toolkit that will allow human users to teach new actions to conversational cobots and construct joint actions with them through natural conversation in an interactive way. Programming through conversation would allow a human user without sophisticated programming skills or access to massive amounts of training data to program an assistant or cobot on the spot in the way that we teach other humans, without having to rely on an expert programmer to intervene. One could try out an idea with the robot and then modify it just as one would do with another human when teaching or developing a joint action. Such a toolkit would open up a wide range of new markets for companies, such as LINAGORA, who specialize in the development of conversational assistants or cobots that need to perform actions such as alerting workers on an assembly line to malfunctioning equipment or physically intervening to fix that equipment. It would also bring increased value to companies, such as Airbus, that seek to boost their manufacturing output by adding cobots to assembly lines or maintenance tasks.
For the moment, the utility of conversational assistants or cobots is limited to carrying out commands and performing actions that are pre-defined via hard-coding or, in the case of robots, learned through demonstration or manual manipulation. A natural language programming toolkit would give a user without programming expertise the power to adapt their assistant to their needs via on the spot training.
To demonstrate the efficacy of our toolkit, COCOBOTS will develop a proof of concept featuring a simulated assembly cobot that is able to learn new concepts associated with manufacturing, such as a torx, and new actions, such as how to build a certain kind of bridge, by stringing together atomic actions as instructed by a human user. Bringing together conversational models with the capacity of cobots to physically interact in their environments will be crucial for testing our approach, as we think that the capacity to understand situated conversation (and in fact, any conversation at all) is greatly enhanced by physical interaction with the outside world. Observing a robot's interaction with objects in a physical environment or its ability to string together primitive actions based on multimodal, conversational instructions, will also provide clear criteria to evaluate our approach and show that following a program specified via natural language is more effective than hard coding, demonstration, or manual manipulation of a robot.
To develop the model of multimodal dialogue needed to make cobots truly conversational, we will build on a solid foundation of expertise of COCOBOTS members in semantic grounding (ANITI/CerCo, Airbus), dialogue models (University of Potsdam, LINAGORA, ANITI/IRIT), conversational assistants (LINAGORA) and robotics (ANITI/LAAS, Airbus). The novelty of our approach will lie in bringing together work on semantic and conversational grounding, which is generally pursued by separate communities, to develop a hybrid model that exploits the way that these processes influence each other. Our approach will require us to overcome three major challenges. First, we will need to bring the compositionality of referential meaning to bear on the semantic grounding of complex expressions using a hybrid AI approach. Second, we will need to account for the different ways that the nonlinguistic environment can ground and contributed to discourse meaning. Third, we will need to develop a model of situation discourse that provides a symbolic skeleton that we can then flesh out with subsymbolic pairing of nonlinguistic content with linguistic expressions.
Project coordination
Julie Hunter (LINAGORA GRAND SUD OUEST)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partnership
LINAGORA LINAGORA GRAND SUD OUEST
IRIT/ANITI IRIT/ANITI
Airbus Defence and Space GmbH
University of Potsdam
Help of the ANR 419,799 euros
Beginning and duration of the scientific project:
September 2021
- 36 Months