DONNEES - Appel flash science ouverte : Pratiques de recherche et données ouvertes

Open data, tools and challenges for speaker anonymization – Harpocrates

Submission summary

Recordings of our voice are increasingly being captured, stored and processed when we interact with voice driven interfaces and other automated services. For the most part, this is done without any nefarious intent. Even so, recordings of our voice contain inherently sensitive, personal information, information that should not be willingly entrusted to others. Examples include health and socio-economic status, geographical background, ethnicity, personality and emotion, in addition to information concerning our social circles, family and relatives. Since this information can be exploited for ethically reprehensible purposes, safeguards are required to prevent privacy infringements.

There are two general strategies: data protection or anonymization. While the best solution is naturally application dependent, anonymization can be more flexible and cost efficient. Anonymization techniques can be used to strip speech signals of their personally identifiable information while retaining intelligibility and quality. Anonymized speech signals can then be processed and stored without the possibility of information gleaned from them being matched to the speaker. Unfortunately, there are few solutions and progress in the field is hampered by the lack of open datasets and open tools. These are critical for assessment and the meaningful comparison of competing approaches. As a problem that requires pattern recognition techniques for assessment, the lack of such datasets and tools is a significant barrier.

Harpocrates will form a working group / implementation network that will not only collect and share the very first such open datasets and open tools, but will also launch, as part of the project, the first open challenges in anonymization. Open challenges will fuel progress and ensure that emerging technologies are transferred rapidly to industry which must meet increasingly stringent demands for privacy preservation. Anonymization will then form a critical component in delivering regulatory compliance and a critical consideration in privacy-by-design methodologies.

Project coordination


The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.


LIA Laboratoire d'Informatique d'Avignon
Inria Centre de Recherche Inria Nancy - Grand Est

Help of the ANR 97,026 euros
Beginning and duration of the scientific project: September 2019 - 18 Months

Useful links

Explorez notre base de projets financés



ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter