Probabilistic Epistemic Logic Applied to Privacy – PRELAP
PRobabilistic Epistemic Logic Applied to Privacy (PRELAP)
With General Data Protection Regulation (GDPR) privacy enforcement is becoming critical for companies. It is also a major scientific challenge involving to model knowledge a curious attacker can obtain and determine what he can infer from it. This project uses probabilistic epistemic logic and its proof theory to automatically prove privacy ensured by a system can be violated by an attacker or a system design protects privacy.
Project’s objectives
Estimating privacy exposure implies being able to 1) model knowledge an attacker can obtain, and 2) determine what he can infer from it. Logical and computational methods are prime candidates to do it. To formally verify a property, logical methods necessitate to design bespoke logics and use them to formally describe the system. In the case of probabilistic and statistical knowledge, probabilistic epistemic logics are the key tool to design such bespoke logics. Probabilistic epistemic logics are specifically designed to describe and reason about probabilistic knowledge such as “the attacker knows with probability p that agent A knows S”. Combining them with imprecise probabilities makes it possible to model statistical knowledge. These new logics describing statistical knowledge are the necessary mathematical tools to model and analyse complex real-life examples such as privacy exposure of users on social networks.<br /><br />Our goal is to develop a family of probabilistic epistemic logics suitable for reasoning about privacy. This implies tackling 4 major scientific challenges:<br />(1) Modelling privacy case studies with epistemic logics. Our case studies are:<br />- MinExp: Minimal exposure of personal data while collecting data through forms.<br />- Privacy in social networks: develop a framework to describe social networks, privacy policies on it and knowledge that can be inferred from them.<br />(2) Developing the semantics of Dynamic Epistemic Logic (DEL) with Imprecise Probabilities in order to model more accurately statistical knowledge gained via data collection when attacking privacy. <br />(3) Developing the proof theory of probabilistic logics. This is the scientific ground of the algorithms implemented in the software tools.<br />(4) Creating software tools oriented toward automated proofs. We implement our new results in the Calculus Toolbox and improve the proof search efficiency of the Calculus Toolbox.
In order to develop suitable logics for to formalize a problem, we build our research on case studies designed with experts in the domain (here, privacy). For our case studies, Imprecise Probabilistic Epistemic Logic (IPEL) is our base logic to describe statistical knowledge.
To develop the semantics of IPEL, we use a methodology recently developed and used on Probabilistic Intuitionistic Dynamic Epistemic Logic. The strategy is to define first the algebraic semantics of the logic -here epistemic logic with imprecise probabilities- then use duality theory to find the relational semantics of the logic.
To develop the proof theory of the logics, we use sequent calculi -more specifically display calculi- and the so-called multi-type methodology. It was successfully used to design display calculi for Dynamic Epistemic Logics, Propositional Dynamic Logic and the Logic of Resources and Capabilities. This methodology enables us to combine different well-behaving display calculi in order to obtain a proof system for a more complex logic. In our case, we already have display calculi for Dynamic Epistemic Logic; hence, the key step is to develop display calculi for imprecise probabilities. The multi-type methodology will provide a framework to develop display calculi for classical imprecise probabilities and to combine these two calculi.
During the whole process, privacy experts are required to validate semantics implied by the model.
We already published work on imprecise probabilities:
[1] presents the theoretic foundations for the project. We present a modular two-layered modal logic to formalise reasoning based on incomplete, inconsistent, probabilistic information.
[2] develops Dempster-Shafer theory for all finite lattices. This result lays the ground to develop Dempster-Shafer theory for substructural logics including paraconsistent logics.
[3] lays the ground to develop many valued logics able to formalise, among other things, non-classical reasoning based on belief functions.
We are currently working on
- implementing the proof of concept for our formalisation of the minimal exposure case study;
- finishing the article on tableaux for the logics introduced in [1]. Tableaux provide an effective algorithm to check whether a formula is true or not in a given logic;
- studying the proof theory of the logics introduced in [1], more specifically we are designing display calculi for them;
- developing Dempster-Shafer theory and conditional probabilities in the context of paraconsistent reasoning.
[1] Marta Bílková, Sabine Frittella, Ondrej Majer, Sajad Nazari: Belief Based on Inconsistent Information. DaLí 2020: 68-86.dali 2020. arxiv.org/abs/2003.12906
[2] Sabine Frittella, Krishna Manoorkar, Alessandra Palmigiano, Apostolos Tzimoulis, Nachoem Wijnberg: Toward a Dempster-Shafer theory of concepts, International Journal of Approximate Reasoning, Volume 125, 2020, Pages 14-25, ISSN 0888-613X, doi.org/10.1016/j.ijar.2020.05.004.
[3] Willem Conradie, Sabine Frittella, Krishna Manoorkar, Sajad Nazari, Alessandra Palmigiano, Apostolos Tzimoulis, Nachoem M. Wijnberg: Rough concepts, Information Sciences, 2020, ISSN 0020-0255, doi.org/10.1016/j.ins.2020.05.074.
Privacy is a key concern in the digital age. As witnessed by the recent Cambridge Analytica scandal during the last US elections, privacy leaks can have far-reaching consequences. Before the elections, the company retrieved private data from a personality questionnaire filled in by social networks users. Those users did not fully appreciate the consequences of disclosing private information: giving access to their private data gave also access to private data others shared with them. Not only did the experts from the company exploit the raw information given by the users, but also the knowledge they inferred from it: probabilistic knowledge such as “Tom votes Republican with a probability of 1/3” and statistical knowledge such as “the group of people formed by Tom’s friends votes democrat with a probability between 0.9 and 1”. The company used this knowledge to identify specific groups of citizens and influence them via targeted advertisement. There is currently no means to anticipate and prevent such privacy exposure. Data policies exist to enforce privacy, but none of them addresses this kind of attacks combining personal data with prior statistical knowledge. The goal of this project is to provide mathematical and software tools to study such attacks by estimating possible privacy exposure.
Estimating privacy exposure implies being able to 1) model knowledge a curious attacker can obtain, and 2) determine what he can infer from it. Logical and computational methods are prime candidates to do it. Indeed, they are standard tools to automatically determine if a protocol, a policy or a software satisfies certain key properties such as “Agent A can infer voting intentions of Agent B from dataset D”. To formally verify a property, logical methods necessitate to design bespoke logics and use them to formally describe the system. In the case of probabilistic and statistical knowledge, probabilistic epistemic logics are the key tools to design such bespoke logics. They are specifically designed to reason about probabilistic knowledge such as “the attacker knows with probability p that agent A knows S”. Combining them with imprecise probabilities makes it possible to model statistical knowledge. These new logics describing statistical knowledge are the necessary mathematical tools to model and analyse complex real-life examples such as privacy exposure of users on social networks.
Combining epistemic logics with imprecise probabilities means integrating logical and probabilistic reasoning; it has been a major item of the research agenda in mathematical logic since Boole’s Laws of Thought. Since the 1980s, various probabilistic logics have been introduced to fulfil this purpose, and have been invaluable in such areas as decision theory, knowledge and belief representation and update. However, some problems remain unsolved: (1) there is no epistemic logic for statistical knowledge and (2) no proof theory exists for probabilistic epistemic logics, and so proof automation is impossible.
The project unites a unique team of experts in mathematical logic, proof theory, formal methods and privacy to tackle these problems. We are going to:
(1) develop epistemic logics able to model statistical knowledge;
(2) design proof systems for probabilistic logics;
(3) implement proof assistants based on these proof systems;
(4) study two real-life case studies, in order to be able to propose adequate privacy protecting algorithms and policies. Our case studies are limited data collection via forms and privacy policies on social networks.
Project coordination
Sabine Frittella (Sabine Frittella)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partnership
LIFO Sabine Frittella
Help of the ANR 205,986 euros
Beginning and duration of the scientific project:
December 2019
- 48 Months