perSon recOgnition in Debate and broAdcast news – SODA
The call for projects REPERE is an evaluation campaign of people recognition technologies for audio-visual French TV shows. The competitors will have to propose systems based on various information sources present in the shows to determine, which will be used to determine who appears in the images, who speaks, which people names are pronounced or appear on the screen, and to which people those names correspond. Addressing all these points will require a mix of various skills: in speaker and face recognition (detection, segmentation, regrouping) to determine people biometric characteristics (voice, face); in speech recognition; in character recognition; and in natural language processing, in order to extract people names and to correctly associate them with people.
There are two laboratories involved in this proposal: the Laboratoire d’Informatique de l’Universtité de Maine (LIUM), which will be the coordinator, and the Swiss research institute IDIAP. The competences of the partners make it possible to cover all the topics of the challenge. LIUM has been developing since 2004 a powerful speaker diarization system (2nd at the ESTER 2005 evaluation campaign, 1st in ESTER 2 campaign in 2008). LIUM has also been working since 2006 on speaker identification using speech transcription to extract the names from the recordings. On the other side, IDIAP has developed since many years competences in automatic processing of audio and video data. Within this project, IDIAP will focus mainly on people detection and recognition, as well as on character recognition (OCR). IDIAP regularly took part in NIST evaluation campaigns. It organized and participated in one of the tasks of the CLEAR (Classification of Events, Activities, and Relationships) evaluation in 2006 and 2007, and is currently in charge of a face and speaker recognition evaluation campaign in the international conference ICPR 2010.
The project will benefit from the results previously obtained by the partners, but it will require integration efforts in order to build a complete system to follow the main speakers throughout a show and to name them. Research will focus on the combination of information coming from the various sources (acoustic signal, images, words and text). In addition, an original aspect of this project will be the recognition of people roles (presenter, journalist or regulator, guest), of their relations and their interactions (for example, who talks to whom), as well as the exploitation of these data in order to improve the performance of speaker/face diarization. For example, the interactions will help linking the speaker to the people on screen; while the roles will be able to facilitate speaker identification by allowing to target regions which contain more useful information.
Project coordination
sylvain meignier (UNIVERSITE DU MAINE) –
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
IDIAP Idiap Research Institute
Help of the ANR 308,880 euros
Beginning and duration of the scientific project:
- 36 Months