Devising Intermediate representations and Semantics of COncurrency for Verified compilERs
Pervasive in modern software and hardware systems, shared-memory concurrency has often been ignored in safety-critical system designs. This is mainly due to the complexity of its semantics, that takes root in the aggressiveness of the optimizations that compilers and hardware perform on the code. DISCOVER leverages recent foundational work on formal verification and proof assistants to design, implement and verify compilation techniques used for high-level concurrent and managed programming languages. The correctness proof of the compilation from source to target dramatically changes the safety-critical application landscape, relieving the need for costly manual inspection of source and binary. The long-term goal of DISCOVER is to devise new formalisms and proof techniques able to scale to the mechanized correctness proof of a compiler involving a richer class of optimizations, leading to efficient and scalable applications, written in higher-level languages than those currently used.
The distinguishing approach of DISCOVER lies in devising appropriate abstraction layers, relying heavily on language-based techniques and intermediate representations of programs for designing semantic models, program and program transformation verification techniques. We argue that this is the corner-stone of scalable proof techniques.
The CompCertSSA middle-end, based on CompCert (a C compiler verified in Coq), is the experimental platform in which we have integrated several of our main results: a dominance-based proof technique providing a toolbox of lemmas for reasoning by induction on the control-flow paths taken by executions, as well as a verified fast dominance test, and a semantic property of non-interference between program variables, enabling variable reuse for program optimization when going out of SSA.
We have mechanized a correctness proof for a Concurrent Garbage Collector, programmed in an IR with adequate abstraction mechanisms to reason about the runtime environment. The proof has been done using a Rely-Guarantee program logic, that we designed to be concise, partly automated, as well as incremental.
We defined formal semantic models for program IR that are more flexible than the traditional control-flow graphs (Sea-of-Nodes and Monadic Gated SSA), that allow to abstract away non-essential dependencies between instructions, in particular, the order in which instructions are written in the program.
On the whole, we achieved several important results on program intermediate representations for compiler optimizations, as well as tools to reason about the functional correctness of important artefacts of concurrent runtime environments. In particular, the formal proof of the concurrent garbage collector is a major achievement. Our formalization of the Sea-of-Node is also an important stepping stone towards semantics models that support both aggressive optimizations and lightweight reasoning principles.
We are expecting that our semantic study of dependency-based IRs – both Sea-of-Nodes and Monadic Gated SSA - and of memory access optimizations on top of these, will give us a deeper understanding of the kind of compiler-induced relaxations one can observe in a shared-memory setting.
- Yannick Zakowski, David Cachera, Delphine Demange, Gustavo Petri, David Pichardie, et al.. Verifying a Concurrent Garbage Collector with a Rely-Guarantee Methodology. Journal of Automated Reasoning, Springer Verlag, 2018
- Delphine Demange, Yon Ferna´ndez de Retana, David Pichardie. Semantic reasoning about the sea of nodes. CC 2018 - 27th International Conference on Compiler Construction, Feb 2018, Vienna, Austria.
- Yannick Zakowski, David Cachera, Delphine Demange, David Pichardie. Verified Compilation of Linearizable Data Structures: Mechanizing Rely Guarantee for Semantic Refinement. SAC 2018 - The 33rd ACM/SIGAPP Symposium On Applied Computing, Apr 2018, Pau, France.
- Yannick Zakowski, David Cachera, Delphine Demange, Gustavo Petri, David Pichardie, et al.. Verifying a Concurrent Garbage Collector using a Rely-Guarantee Methodology. ITP 2017 - 8th International Conference on Interactive Theorem Proving, Sep 2017, Brasilia, Brazil.
- Delphine Demange, Yon Fernandez de Retana, Mechanizing conventional SSA for a verified destruction with coalescing. 25th International Conference on Compiler Construction, Mar 2016, Barcelona, Spain.
- Delphine Demange, David Pichardie, Léo Stefanesco. Verifying Fast and Sparse SSA-based Optimizations in Coq. 24th International Conference on Compiler Construction, CC 2015, 2015, London, United Kingdom.
- Sandrine Blazy, Delphine Demange, David Pichardie. Validating Dominator Trees for a Fast, Verified Domainance Test. Dominance Test. Interactive Theorem Proving, Aug 2015, Nanjing, China.
- Gurvan Cabon, David Cachera, David Pichardie. An extended Buffered Memory Model with Full Reorderings. Reorderings. FtFjp - Ecoop workshop, Jul 2016, Rome, Italy.
- Yon Fernandez de Retana. Vers la compilation vérifiée de Sea-of-Nodes : propriétés et raisonnement sémantiques. Manuscrit de thèse, Université Rennes 1, 2018. Français.
Shared-memory concurrency is pervasive in modern software and multiprocessors hardware systems, yet it remains a challenge to program. Standard multiprocessors (x86, Power, ARM etc) and mainstream programming languages (C, C++, Java) have subtle, weakly consistent, memory models, exposing complex and counter-intuitive behaviors to the programmer. This complexity is rooted in the aggressiveness of the optimizations performed by compilers and hardware, which make programs behave as if some of the instructions would execute out-of-order. This complex semantics makes programming in such a setting particularly error-prone. Hence the need to establish program correctness is very strong.
In this context, traditional techniques, such as the one used in the safety-critical industry hardly apply: manual inspection of binary and source code would be too costly, and testing multithreaded application is particularly hard, as relaxations exposed by the weak memory models occur with a very low frequence, and are almost never reproducible. Thus, proving program correctness statically, by means of static analysis, deductive program proof or model checking, becomes an attractive approach due to its exhaustive cover of possible executions. Recent years have witnessed considerable progress in the formalization of weak memory models for both hardware and languages, and verification techniques, based on program logic, start to emerge. However, these verification tools are usually applied at the source level of programs, and miscompilation (i.e. a bug in the compiler) could invalidate those hardwon guarantees.
Compiler correctness for shared-memory concurrency remains a wide-open problem, compared to the sequential case, and Leroy et al.'s ground-breaking achievements with the CompCert verified C compiler. Indeed, besides the challenges to verification that application-level concurrency introduces, components of the implementation may also run concurrently with application threads, improving scalability and performance, at the cost of additional complexity. The interaction among components is typically regulated by compiler-injected code that maintains and updates information useful to the runtime system during program execution. Examples of such code include handlers for memory allocation fast paths, read and write barriers for memory management, fences, etc. These concurrent snippets are sophisticated, can contain data-races (i.e. simultaneous, conflicting accesses to shared-memory), and must operate correctly in an environment subject to program transformations.
The DISCOVER project aims at leveraging recent foundational work on formal verification and proof assistants to design, implement and verify compilation techniques used for high-level concurrent and managed programming languages. The ultimate goal of DISCOVER is to devise new formalisms and proof techniques able to scale to the mechanized correctness proof of a compiler involving a rich class of optimizations, leading to efficient and scalable applications, written in higher-level languages than those currently handled by cutting-edge verified compilers.
In the light of recent work in optimizations techniques used in production compilers of high-level languages, control-flow-graph based intermediate representations seems too rigid. Indeed, the analyses and optimizations in these compilers work on more abstract representations, where programs are represented with data and control dependencies. The most representative representation is the sea-of-nodes form, used in the Java Hotspot Server Compiler, and which is the rationale behind the highly relaxed definition of the Java memory model. DISCOVER proposes to tackle the problem of verified compilation for shared-memory concurrency with a resolute language-based approach, and to investigate the formalization of adequate program intermediate representations and associated correctness proof techniques.
Madame Delphine Demange (Institut de recherche en informatique et systèmes aléatoires / Université de Rennes 1)
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
IRISA / UR1 Institut de recherche en informatique et systèmes aléatoires / Université de Rennes 1
Help of the ANR 301,649 euros
Beginning and duration of the scientific project: September 2014 - 48 Months