Typed Self-Evaluation via Intensional Type Functions

1. Introduction
2. Step 1: Installation
3. Step 2: Evaluation

The full paper is available here. The appendix contains proofs of all theorems stated in the paper.

1 Introduction

This page describes the software artifact for the paper Typed Self-Evaluation via Intensional Type Functions by Matt Brown and Jens Palsberg and accepted to POPL'17.

The paper presents an F$_\omega^{\mu i}$, an extension of F$_\omega$ with recursive types and a Typecase type operator that enables intensional type functions. We use these features to define a GADT-style typed self-representation and program typed self-evaluators for F$_\omega^{\mu i}$. The purpose of the artifact is to provide tools that help build, type check, and experiment with representations, our self-evaluators, and other benchmark meta-programs. The artifact includes:

An implementation of F$_\omega^{\mu i}$ in Haskell that includes a parser, type checker, quoter, evaluator, and beta-equivalence checker. The parser includes syntactic sugar for defining abbreviations, loading definitions from other files, building representations, and normalizing terms.
Source code in our concrete syntax for all the operations described in the paper.
A suite a F$_\omega^{\mu i}$ test files.

2 Step 1: Installation

2.1 Requirements

The tests should be run on a modern Unix-like system (e.g. Linux or Mac OS X) with at least 8 Gigabytes of RAM. We assume that the Haskell build tool stack is installed. Click here for stack installation instructions.

2.2 Get the sources

Download from http://compilers.cs.ucla.edu/popl17/popl17-artifact.zip

Run the commands:

$ unzip popl17-artifact.zip
$ cd popl17-artifact/

2.3 Build the F$_\omega^{\mu i}$ implementation

Once stack is installed, he following commands will download and compile all the Haskell dependencies, and build and test the F$_\omega^{\mu i}$ implementation. Make sure to run these from within the popl17-artifact directory.

$ stack setup
$ stack build
$ stack test

3 Step 2: Evaluation

3.1 Description of concrete syntax

Our parser supports the declaration of abbreviations via the decl directive. For example, the following example (taken from fomega/lib/Id.fw) declares id to be the polymorphic application function, and Id to be its type.

decl Id : *  = ∀A:*. A → A;
decl id : Id = ΛA:*. λx:A. x;

We can load all the declarations from a file in fomega/lib/ with the load directive. For example, the following example (taken from fomega/id.fw) loads the above declarations from fomega/lib/Id.fw and then applies id to itself.

load "Id";
id Id id

Note that fomega/id.fw ends with a term, while fomega/lib/Id.fw does not. The files in fomega/lib are called libraries and consist of zero or more load and decl statements. The files in fomega/ are called programs and consist of zero or more load and decl statements followed by a term to be evaluated.

Another important difference between the artifact and the paper is the notation for representations of terms. The paper uses an overline notation $\overline{\cdot}$ to denote the representations of terms and types, while the artifact uses $[\cdot]$. The artifact does not have syntax for pre-representations of terms.

	Syntax in Paper	Syntax in Artifact
Representation of a type	$\overline{\forall \alpha:*. \alpha \to \alpha}$	`[∀A:*. A → A]`
Representation of a term	$\overline{\Lambda \alpha:*. \lambda x:\alpha. x}$	`[ΛA:*. λx:A. x]`

3.2 The Test Types

This artifact includes three types of test. Each type of test is to be run by the corresponding test runner. The types are:

Typecheck tests, which check if a program typechecks successfully. These tests are run with the bin/runTypecheckTest.sh test runner. Examples:
```
$ bin/runTypecheckTest.sh fomega/typecheckTestSuccess.fw
Succeeded.
$ bin/runTypecheckTest.sh fomega/typecheckTestFailure.fw
runTypecheckTest: unexpected app type: 
l = Var (Der (Var (Der (Star (Der (Box TypeOfBox)))) A0)) x0
ty l = Var () A0
```
The second example attempts to typecheck the term \A:*. \x:A. x x, which fails. This demonstrates that the test runner does in fact detect and report type errors (i.e. it doesn't simply output "Succeeded." for all inputs). It also demonstrates that our type errors are not user-friendly. In this artifact, we will not use the error messages. We use typecheck tests only to determine whether a program type checks.
An equivalence test, which checks if two terms or types are $β$-equivalent. The program should be a Church pair or terms or types. The runner extracts each component, normalizes them, and tests that the normal forms are $α$-equivalent. These tests are run with the bin/runEquivTest.sh test runner.
```
$ bin/runEquivTest.sh fomega/equivTestSuccess.fw
Succeeded.
$ bin/runEquivTest.sh fomega/equivTestFailure.fw
Failed.
Not alpha-equivalent: 
   fold (λNatF2001:* → ...
   fold (λNatF2079:* → ...
```
Again, the second test demonstrates that our equivalence test runner does catch and report failures when the two terms are not equivalent. In this case, it fails because the encoding of 0 is not equivalent to the encoding of 1. Again, we will not use these error messages.
A normalization test, which tests that a particular term normalizes. These tests are run with the bin/runNormalizationTest.sh test runner.
```
$ bin/runNormalizeTest.sh fomega/normalizeTestSuccess.fw
Normalizing...
Succeeded.
$ bin/runNormalizeTest.sh fomega/normalizeTestFailure.fw 
Normalizing...
runNormalizeTest: Heap exhausted;
Current maximum heap size is 2147483648 bytes (2048 MB);
use `+RTS -M<size>' to increase it.
```
The first test normalizes the term eqNat (plus one one) two, which involves recursion but is normalizing. The terms eqNat, plus, one and two are defined in fomega/lib/Nat.fw. The second test normalizes eqNat itself, which is not normalizing. It is a recursive function, and the normalizer keeps unrolling the fixpoint combinator until it eventually runs out of memory.

3.3 The Tests

We now describe the various test files included in the artifact, and relate them to the results in the paper. You can run each test individually, or run them all via the command:

$ bin/runAllTests.sh

3.3.1 Section 5

Theorem 5.1 states that any closed and well-typed term can be represented, and if the original term has type T, then its representation will have type Exp T. The following test includes several terms of increasing size and complexity.

$ bin/runTypecheckTest.sh fomega/Theorem5-1.fw

Theorem 5.2 states that all representations are strongly normalizing, including those of non-normalizing terms. The test fomega/Theorem5-2.fw involves normalizing the representations of several non-normalizing terms.

$ bin/runNormalizeTest.sh fomega/Theorem5-2.fw

3.3.2 Section 6

In Section 6 we discuss our self-evaluators: a weak-head normal form evaluator, a self-step left-most beta-reducer, and an implementation normalization by evaluation (NbE). In this section, we show some of the tests that demonstrate their correctness.

Weak head normal evaluation
The first test shows that the weak head normal form evaluator type checks with the type ∀T:*. Exp T → Exp T.
```
$ bin/runTypecheckTest.sh fomega/whnf.fw
```
Next, we show some behavioral tests. The terms weak head normal forms are abstractions over terms or types, or fold expressions. The evaluator does not reduce under the abstraction or fold. These tests demonstrate this.
```
$ bin/runEquivTest.sh fomega/whnf-term-abstraction.fw
$ bin/runEquivTest.sh fomega/whnf-type-abstraction.fw
$ bin/runEquivTest.sh fomega/whnf-fold.fw
```
The next set of unit tests show the WHNF evaluator performing some reductions.
```
$ bin/runEquivTest.sh fomega/whnf-term-application.fw
$ bin/runEquivTest.sh fomega/whnf-type-application.fw
$ bin/runEquivTest.sh fomega/whnf-unfold.fw
```
The next test is a larger test that exercises all of the evaluator.
```
$ bin/runEquivTest.sh fomega/whnf-fact.fw
```
Single-step left-most reduction
Our next self-evaluator performs a single step of left-most reduction.

The first test shows that the evaluator has the type (∀T:*. Exp T → Exp T).
```
$ bin/runTypecheckTest.sh fomega/step.fw
```
The first set of behavioral tests shows that a term in normal form is returned unchanged.
```
$ bin/runEquivTest.sh fomega/step-nf1.fw
$ bin/runEquivTest.sh fomega/step-nf2.fw
$ bin/runEquivTest.sh fomega/step-nf3.fw
```
The second set of behavioral tests demonstrate which redex step reduces.
```
$ bin/runEquivTest.sh fomega/step-redex1.fw
$ bin/runEquivTest.sh fomega/step-redex2.fw
$ bin/runEquivTest.sh fomega/step-redex3.fw
```
For a larger test, we use step to repeatedly reduce a term until a normal form is reached.
```
$ bin/runEquivTest.sh fomega/stepNorm-plus.fw
```
Normalization by Evaluation
The first test shows that our implementation of Normalization by Evaluation has the type (∀T:*. Exp T → Exp T).
```
$ bin/runTypecheckTest.sh fomega/nbe.fw
```
Next, we test the behavior of NbE by comparing its output with the expected results. To do this, we combine our syntax for building representations [ ... ] with the syntax for normalizing terms (using the meta-level evaluator) < ... >. For example, [<plus two three>] first normalizes plus two three, and then builds the representation of the normal form.
```
$ bin/runEquivTest.sh fomega/nbe-plus.fw
$ bin/runEquivTest.sh fomega/nbe-times.fw
$ bin/runEquivTest.sh fomega/nbe-fact.fw
```

3.3.3 Section 7

In Section 7 we discuss our benchmark meta-programs and the self-application of our self-evaluators. In this section we provide tests to demonstrate each of these points.

Benchmark meta-programs

We re-implement the five benchmark self-applicable meta-programs from our POPL'16 paper. They are the self-interpreter unquote that recovers a term from its representation, a continuation-passing-style transformation cps, a measure of the size of the term, a function nf that checks if its input represents a term in normal form, and a simple syntactic check isAbs. All except nf are implemented as a fold over the representation. It is possible to nf as a fold, but then it would only work on representations of closed terms. This version supports representations of open or closed terms.
1. foldExp
  Figure 15 in the paper shows the type of foldExp, the fold function used for these benchmarks. The following test verify that foldExp type checks with that type.
```
$ bin/runTypecheckTest.sh fomega/foldExp.fw
```
  The following tests that each of the 5 benchmark meta-programs type checks with the expected type.
```
$ bin/runTypecheckTest.sh fomega/popl16-benchmarks.fw
```
2. unquote
  We show four tests of the behavior of unquote. The test fomega/unquote-id.fw unquotes the polymorphic identity function. The test fomega/unquote-plus.fw unquotes the representation of plus and applies the result. We can't compare the unquoted plus with the original directly, since they are non-normalizing terms. The third unquotes the representation application of times two three. The fourth unquotes the representation of unquote itself, and uses the result to unquote the representation of times two three.
```
$ bin/runEquivTest.sh fomega/unquote-id.fw
$ bin/runEquivTest.sh fomega/unquote-plus.fw
$ bin/runEquivTest.sh fomega/unquote-times.fw
$ bin/runEquivTest.sh fomega/unquote-selfApply.fw
```
3. CPS
  We use a series of equivalent tests to demonstrate the behavior of cps on the representation of the polymorphic identity function ΛA:*. λx:A. x. Each test in the series simplifies the expected result.
```
$ bin/runEquivTest.sh fomega/cpsId1.fw
$ bin/runEquivTest.sh fomega/cpsId2.fw
$ bin/runEquivTest.sh fomega/cpsId3.fw
$ bin/runEquivTest.sh fomega/cpsId4.fw
```
  Here's another example on a slightly larger input:
```
$ bin/runEquivTest.sh fomega/cpsIdId1.fw
$ bin/runEquivTest.sh fomega/cpsIdId2.fw
$ bin/runEquivTest.sh fomega/cpsIdId3.fw
$ bin/runEquivTest.sh fomega/cpsIdId4.fw
$ bin/runEquivTest.sh fomega/cpsIdId5.fw
$ bin/runEquivTest.sh fomega/cpsIdId6.fw
$ bin/runEquivTest.sh fomega/cpsIdId7.fw
$ bin/runEquivTest.sh fomega/cpsIdId8.fw
```
  The following test type checks the self-application of cps. We do not check the result of the self-application, since it is non-normalizing.
```
$ bin/runTypecheckTest.sh fomega/cps-selfApply.fw
```
4. nf
  We test that our normal-form checker on several non-normal form terms, and on several normal form terms. In fomega/nf-false-3.fw we test that the term plus three four is not normal. However, it is normalizing, and in fomega/nf-true-3.fw we use our meta-level syntax for normalizing to test that it does evaluate to a normal form.
```
$ bin/runEquivTest.sh fomega/nf-false-1.fw
$ bin/runEquivTest.sh fomega/nf-false-2.fw
$ bin/runEquivTest.sh fomega/nf-false-3.fw
$ bin/runEquivTest.sh fomega/nf-true-1.fw
$ bin/runEquivTest.sh fomega/nf-true-2.fw
$ bin/runEquivTest.sh fomega/nf-true-3.fw
```
  The following test self-applies nf:
```
$ bin/runEquivTest.sh fomega/nf-selfApply.fw
```
5. size
  Our size operation counts the number of variables, (term or type) abstractions, (term or type) applications, folds, or unfolds.
```
$ bin/runEquivTest.sh fomega/size1.fw
$ bin/runEquivTest.sh fomega/size2.fw
$ bin/runEquivTest.sh fomega/size3.fw
```
  We can use our size measure to show that plus three four is at least as large as its normal form:
```
$ bin/runEquivTest.sh fomega/size-nbe.fw
```
  The application of size to itself is well-typed and normalizing. We do not check the result of the self-application, as it is not practical to express the expected result.
```
$ bin/runNormalizeTest.sh fomega/size-selfApply.fw
```
6. isAbs
  The following tests demonstrate the behavior of isAbs on each syntactic form of terms, other than variables. We can't test it on variables because only closed terms can be represented.
```
$ bin/runEquivTest.sh fomega/isAbs-abs.fw
$ bin/runEquivTest.sh fomega/isAbs-app.fw
$ bin/runEquivTest.sh fomega/isAbs-tabs.fw
$ bin/runEquivTest.sh fomega/isAbs-tapp.fw
$ bin/runEquivTest.sh fomega/isAbs-fold.fw
$ bin/runEquivTest.sh fomega/isAbs-unfold.fw
```
  The following test shows that isAbs can be self-applied.
```
$ bin/runEquivTest.sh fomega/isAbs-selfApply.fw
```
Self-application tests
At the end of Section 7 we discuss how each of our self-evaluators can be applied to its own representation. In each case, the result is a representation. We test that the result has the same behavior as the original self-evaluator by unquoting it and running it on an example term. The term fact five exercises all parts of the language.
```
$ bin/runEquivTest.sh fomega/whnf-selfApply.fw
$ bin/runEquivTest.sh fomega/step-selfApply.fw
$ bin/runEquivTest.sh fomega/nbe-selfApply.fw
```

3.3.4 Correctness of weak head-normal evaluator

We included as supplementary material a correctness proof of our weak head normal form evaluator. Several lemmas in the proof are machine-checked by our equivalence checker. The following tests will check those steps.

For these tests, we make use of an additional meta-level feature, unevaluated constants, which are implemented as free variables. We support both type and term constants. The declaration const A : *; declares a type constant A of kind star; The declaration const fix : (∀A:*. (A → A) → A); declares a term constant fix of type (∀A:*. (A → A) → A). We use fix constants to test the behavior of our evaluator, which as a recursive function does not normalize. In particular, we use combinators fix1 and fix2, each of which has type (∀A:*. (A → A) → A) → (∀A:*. (A → A) → A), to unfold a fixpoint combinator one or two times respectively. This enables us to control how many times a recursive function like our evaluator is unfolded.

$ bin/runEquivTest.sh fomega/whnfCorrectEvalEvalV.fw
$ bin/runEquivTest.sh fomega/whnfCorrectAbs.fw
$ bin/runEquivTest.sh fomega/whnfCorrectAppAbs.fw
$ bin/runEquivTest.sh fomega/whnfCorrectTAbs.fw
$ bin/runEquivTest.sh fomega/whnfCorrectTAppTAbs.fw
$ bin/runEquivTest.sh fomega/whnfCorrectFold.fw
$ bin/runEquivTest.sh fomega/whnfCorrectUnfoldFold.fw