CE23 - Intelligence artificielle et science des données 2023

Multiple-attribute disentanglement and semantic privacy – SpeechPrivacy

Submission summary

As defined by the General Data Protection Regulation (GDPR), speech data falls within the scope of personal data. Its use in applications is furthermore considered high risk in the forthcoming AI act. Indeed, recordings of speech contain much more than the spoken content (words), but also, e.g., the speaker identity, sex, age, regional accent, etc. Potentially, all such personal, private information can be estimated from speech data and then used for nefarious or unwanted purposes. Privacy enhancing technologies are needed to protect speech technology users by preventing the use of speech recordings for purposes for which the user has not given their consent.
The protection of privacy as regards speech data is still underdeveloped on account of numerous challenges. The SpeechPrivacy project envisions an approach far beyond the existing solutions. It mitigates the need to trust SLT service providers and delivers full control of privacy to the user so that they can choose for themselves the privacy-sensitive attributes to which a service provider will have access. It will deliver a flexible solution to privacy preservation based on isolated/disentangled representations and the obfuscation of selected individual attributes reflecting distinct sources of personal information. The use-case scenarios are many-fold: witness protection; teenager online / social media privacy preservation; storage of health data recordings (by removing patient’s personal information and preserving quality for research access).
SpeechPrivacy objectives are: (1) specific, optimised solutions to disentangle age, sex, voice identity and regional accents, among other attributes in speech; (2) a solution able to disentangle multiple attributes including the above, and to assess the impact of their obfuscation upon other attributes and on utility for a given use case; (3) a solution for the robust detection of sensitive words as well as speaker age/sex information in the linguistic content, and to substitute them in the speech signal; (4) a user interface to give users full control of the privacy vs. utility trade-off.
The research is organised into a set of distinct work packages to reach our objectives. At first, we will develop solutions to obfuscate specific, single speech attributes, in an everything-but-one modelling approach. We will then extend this work to the consideration of multiple attributes and disentanglement, in a joint and adversarial learning approach. We will develop a framework in which distinct attributes are explicitly modelled by specific dimensions of a common representation that allows for selected, multiple attributes to be obfuscated simultaneously. The third work package involves semantic privacy and will address the protection of sensitive named entities, age and sex attributes in the linguistic content and their obfuscation in the acoustic signal. Last, the design of databases, protocols and metrics, common to all work packages, will be studied in a fourth package with demonstration activities designed to ensure dissemination and impact at the international level.

Vincent COLOTTE (Laboratoire lorrain de recherche en informatique et ses applications)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

LORIA Laboratoire lorrain de recherche en informatique et ses applications
LIA Laboratoire d'Informatique d'Avignon
EURECOM EURECOM

Help of the ANR 732,726 euros
Beginning and duration of the scientific project: January 2024 - 48 Months

Explorez notre base de projets financés

ANR makes available its datasets on funded projects, click here to find more.