\lettrine[nindent=0ex]H{igh} level programming languages, in particular the lazy, pure, functional kind, liberate the programmer from having to think about the low-level details of how his code is going to be executed, and they give the compiler extra leeway in optimising the program.  This distance to the actual machine makes it harder to reason about the effect of the compiler’s transformations on the program’s performance. Therefore, these transformations are often only evaluated empirically by measuring the performance of a few benchmark programs. This yields useful evidence, but not universal assurance.

Formal semantics of programming languages can serve as guide rails to the implementation of a compiler, and formal proofs can universally show that the compiler does not inadvertently change the meaning of a program. Can they also be used effectively to establish that a program transformation performed by the compiler is indeed an optimisation?

In this thesis, I answer this question in three steps: I develop a new compiler transformation; I build the tools to analyse it in an interactive theorem prover; finally I prove safety of the transformation, i.e.\@ that the transformed program -- in a suitable abstract sense -- performs at least as well as the original one.

\medskip

My compiler transformation and accompanying program analysis \emph{Call Arity}, which is now shipped with the Haskell compiler GHC, solves a long-standing problem with the \emph{list fusion} program transformation: Accumulator passing list consumers like \hi{foldl} and \hi{sum} would, if they were allowed to take part in list fusion, produce badly performing code. Call Arity empowers the compiler to further rewrite such code, by eta-expanding function definitions, into a form that runs efficiently again. The key ingredient is a novel cardinality analysis based on the notion of \emph{co-call graphs}, which can detect whether a variable is used at most once, even in the presence of recursion.

I provide empirical evidence that my analysis is indeed able to solve the problem: Now list fusion can provide significant improvements in these cases. The measurements also show that there are instances besides list fusion where the transformation fires and improves the program. No program in the benchmark suite regressed as a result of introducing Call Arity.

\medskip

In order to be able to verify these statements formally, I formalise Launchbury’s natural semantics for lazy evaluation in the interactive theorem prover Isabelle. As Launchbury’s semantics is a very successful and commonly accepted semantics for lambda calculus with mutually recursive let-bindings that models lazy evaluation, it is a natural choice for this endeavour.

My formalisation uses nominal logic, in the form of the Isabelle package Nominal2, to handle the issue of names and binders, which is generally one of the main hurdles in any formalisation work in programming languages. It is one of the largest Isabelle developments using this method, and the first to effectively combine it with the HOLCF package for domain theory. My first attempt to combine these turned out to be a dead end. I explain how and why that did not go well and how I eventually overcame the challenges.

%\enlargethispage{1cm}
Furthermore, I give the first rigorous adequacy proof of Launchbury’s semantics. The proof sketch given by Launchbury has resisted past attempts to complete it. I found a more elegant and direct proof by slightly deviating from his outline.

\medskip

Equipped with this formalisation, I model the Call Arity analysis and transformation in Isabelle and prove that it does not degrade program performance. My abstract measure of performance is the number of allocations performed by the program; I explain why this is a suitable choice for my use case. The proof is modular and introduces \emph{trace trees} as a suitable domain for abstract cardinality analyses.

Every formal development, whether machine-checked or not, has a formalisation gap between the model and the modelled artefact. I discuss the breadth of the gap, in particular its limits given that Call Arity is but one part in a large, real-world compiler.
%I give examples of bugs that were found in the formalisation process, and an example of one that slipped through the formalisation gap.

\medskip

All in all I present novel program analyses to solve an open problem with \emph{list fusion} and to generally improve the compiler, and I demonstrate how formal methods can be used to prove an operational property -- safety -- at this high level.