# Typed Self-Evaluation via Intensional Type Functions

## Table of Contents

The full paper is available here. The appendix contains proofs of all theorems stated in the paper.

## 1 Introduction

This page describes the software artifact for the paper Typed Self-Evaluation via Intensional Type Functions by Matt Brown and Jens Palsberg and accepted to POPL'17.

The paper presents an F\(_\omega^{\mu i}\), an extension of F\(_\omega\) with recursive types and a Typecase type operator that enables intensional type functions. We use these features to define a GADT-style typed self-representation and program typed self-evaluators for F\(_\omega^{\mu i}\). The purpose of the artifact is to provide tools that help build, type check, and experiment with representations, our self-evaluators, and other benchmark meta-programs. The artifact includes:

- An implementation of F\(_\omega^{\mu i}\) in Haskell that includes a parser, type checker, quoter, evaluator, and beta-equivalence checker. The parser includes syntactic sugar for defining abbreviations, loading definitions from other files, building representations, and normalizing terms.
- Source code in our concrete syntax for all the operations described in the paper.
- A suite a F\(_\omega^{\mu i}\) test files.

## 2 Step 1: Installation

### 2.1 Requirements

The tests should be run on a modern Unix-like system (e.g. Linux or
Mac OS X) with at least 8 Gigabytes of RAM. We assume that the Haskell
build tool `stack`

is installed. Click here for `stack`

installation
instructions.

### 2.2 Get the sources

- Download from http://compilers.cs.ucla.edu/popl17/popl17-artifact.zip
Run the commands:

$ unzip popl17-artifact.zip $ cd popl17-artifact/

### 2.3 Build the F\(_\omega^{\mu i}\) implementation

Once `stack`

is installed, he following commands will download and
compile all the Haskell dependencies, and build and test the
F\(_\omega^{\mu i}\) implementation. Make sure to run these from within
the `popl17-artifact`

directory.

$ stack setup $ stack build $ stack test

## 3 Step 2: Evaluation

### 3.1 Description of concrete syntax

Our parser supports the declaration of abbreviations via the `decl`

directive. For example, the following example (taken from
`fomega/lib/Id.fw`

) declares `id`

to be the polymorphic application
function, and `Id`

to be its type.

decl Id : * = ∀A:*. A → A; decl id : Id = ΛA:*. λx:A. x;

We can load all the declarations from a file in `fomega/lib/`

with the
`load`

directive. For example, the following example (taken from
`fomega/id.fw`

) loads the above declarations from `fomega/lib/Id.fw`

and then applies `id`

to itself.

load "Id"; id Id id

Note that `fomega/id.fw`

ends with a term, while `fomega/lib/Id.fw`

does not. The files in `fomega/lib`

are called *libraries* and consist
of zero or more `load`

and `decl`

statements. The files in `fomega/`

are called *programs* and consist of zero or more `load`

and `decl`

statements followed by a term to be evaluated.

Another important difference between the artifact and the paper is the notation for representations of terms. The paper uses an overline notation \(\overline{\cdot}\) to denote the representations of terms and types, while the artifact uses \([\cdot]\). The artifact does not have syntax for pre-representations of terms.

Syntax in Paper | Syntax in Artifact | |
---|---|---|

Representation of a type | \(\overline{\forall \alpha:*. \alpha \to \alpha}\) | `[∀A:*. A → A]` |

Representation of a term | \(\overline{\Lambda \alpha:*. \lambda x:\alpha. x}\) | `[ΛA:*. λx:A. x]` |

### 3.2 The Test Types

This artifact includes three types of test. Each type of test is to be
run by the corresponding *test runner*. The types are:

Typecheck tests, which check if a program typechecks successfully. These tests are run with the

`bin/runTypecheckTest.sh`

test runner. Examples:$ bin/runTypecheckTest.sh fomega/typecheckTestSuccess.fw Succeeded. $ bin/runTypecheckTest.sh fomega/typecheckTestFailure.fw runTypecheckTest: unexpected app type: l = Var (Der (Var (Der (Star (Der (Box TypeOfBox)))) A0)) x0 ty l = Var () A0

The second example attempts to typecheck the term

`\A:*. \x:A. x x`

, which fails. This demonstrates that the test runner does in fact detect and report type errors (i.e. it doesn't simply output "Succeeded." for all inputs). It also demonstrates that our type errors are not user-friendly. In this artifact, we will not use the error messages. We use typecheck tests only to determine whether a program type checks.An equivalence test, which checks if two terms or types are $β$-equivalent. The program should be a

*Church pair*or terms or types. The runner extracts each component, normalizes them, and tests that the normal forms are $α$-equivalent. These tests are run with the`bin/runEquivTest.sh`

test runner.$ bin/runEquivTest.sh fomega/equivTestSuccess.fw Succeeded. $ bin/runEquivTest.sh fomega/equivTestFailure.fw Failed. Not alpha-equivalent: fold (λNatF2001:* → ... fold (λNatF2079:* → ...

Again, the second test demonstrates that our equivalence test runner does catch and report failures when the two terms are not equivalent. In this case, it fails because the encoding of

`0`

is not equivalent to the encoding of`1`

. Again, we will not use these error messages.A normalization test, which tests that a particular term normalizes. These tests are run with the

`bin/runNormalizationTest.sh`

test runner.$ bin/runNormalizeTest.sh fomega/normalizeTestSuccess.fw Normalizing... Succeeded. $ bin/runNormalizeTest.sh fomega/normalizeTestFailure.fw Normalizing... runNormalizeTest: Heap exhausted; Current maximum heap size is 2147483648 bytes (2048 MB); use `+RTS -M<size>' to increase it.

The first test normalizes the term

`eqNat (plus one one) two`

, which involves recursion but is normalizing. The terms`eqNat`

,`plus`

,`one`

and`two`

are defined in`fomega/lib/Nat.fw`

. The second test normalizes`eqNat`

itself, which is not normalizing. It is a recursive function, and the normalizer keeps unrolling the fixpoint combinator until it eventually runs out of memory.

### 3.3 The Tests

We now describe the various test files included in the artifact, and relate them to the results in the paper. You can run each test individually, or run them all via the command:

$ bin/runAllTests.sh

#### 3.3.1 Section 5

Theorem 5.1 states that any closed and well-typed term can be
represented, and if the original term has type `T`

, then its
representation will have type `Exp T`

. The following test includes
several terms of increasing size and complexity.

$ bin/runTypecheckTest.sh fomega/Theorem5-1.fw

Theorem 5.2 states that all representations are strongly normalizing,
including those of non-normalizing terms. The test
`fomega/Theorem5-2.fw`

involves normalizing the representations of
several non-normalizing terms.

$ bin/runNormalizeTest.sh fomega/Theorem5-2.fw

#### 3.3.2 Section 6

In Section 6 we discuss our self-evaluators: a weak-head normal form evaluator, a self-step left-most beta-reducer, and an implementation normalization by evaluation (NbE). In this section, we show some of the tests that demonstrate their correctness.

- Weak head normal evaluation
The first test shows that the weak head normal form evaluator type checks with the type

`∀T:*. Exp T → Exp T`

.$ bin/runTypecheckTest.sh fomega/whnf.fw

Next, we show some behavioral tests. The terms weak head normal forms are abstractions over terms or types, or fold expressions. The evaluator does not reduce under the abstraction or fold. These tests demonstrate this.

$ bin/runEquivTest.sh fomega/whnf-term-abstraction.fw $ bin/runEquivTest.sh fomega/whnf-type-abstraction.fw $ bin/runEquivTest.sh fomega/whnf-fold.fw

The next set of unit tests show the WHNF evaluator performing some reductions.

$ bin/runEquivTest.sh fomega/whnf-term-application.fw $ bin/runEquivTest.sh fomega/whnf-type-application.fw $ bin/runEquivTest.sh fomega/whnf-unfold.fw

The next test is a larger test that exercises all of the evaluator.

$ bin/runEquivTest.sh fomega/whnf-fact.fw

- Single-step left-most reduction
Our next self-evaluator performs a single step of left-most reduction.

The first test shows that the evaluator has the type

`(∀T:*. Exp T → Exp T)`

.$ bin/runTypecheckTest.sh fomega/step.fw

The first set of behavioral tests shows that a term in normal form is returned unchanged.

$ bin/runEquivTest.sh fomega/step-nf1.fw $ bin/runEquivTest.sh fomega/step-nf2.fw $ bin/runEquivTest.sh fomega/step-nf3.fw

The second set of behavioral tests demonstrate which redex

`step`

reduces.$ bin/runEquivTest.sh fomega/step-redex1.fw $ bin/runEquivTest.sh fomega/step-redex2.fw $ bin/runEquivTest.sh fomega/step-redex3.fw

For a larger test, we use

`step`

to repeatedly reduce a term until a normal form is reached.$ bin/runEquivTest.sh fomega/stepNorm-plus.fw

- Normalization by Evaluation
The first test shows that our implementation of Normalization by Evaluation has the type

`(∀T:*. Exp T → Exp T)`

.$ bin/runTypecheckTest.sh fomega/nbe.fw

Next, we test the behavior of NbE by comparing its output with the expected results. To do this, we combine our syntax for building representations

`[ ... ]`

with the syntax for normalizing terms (using the meta-level evaluator)`< ... >`

. For example,`[<plus two three>]`

first normalizes`plus two three`

, and then builds the representation of the normal form.$ bin/runEquivTest.sh fomega/nbe-plus.fw $ bin/runEquivTest.sh fomega/nbe-times.fw $ bin/runEquivTest.sh fomega/nbe-fact.fw

#### 3.3.3 Section 7

In Section 7 we discuss our benchmark meta-programs and the self-application of our self-evaluators. In this section we provide tests to demonstrate each of these points.

- Benchmark meta-programs
We re-implement the five benchmark self-applicable meta-programs from our POPL'16 paper. They are the self-interpreter

`unquote`

that recovers a term from its representation, a continuation-passing-style transformation`cps`

, a measure of the`size`

of the term, a function`nf`

that checks if its input represents a term in normal form, and a simple syntactic check`isAbs`

. All except`nf`

are implemented as a fold over the representation. It is possible to`nf`

as a fold, but then it would only work on representations of closed terms. This version supports representations of open or closed terms.`foldExp`

Figure 15 in the paper shows the type of

`foldExp`

, the fold function used for these benchmarks. The following test verify that`foldExp`

type checks with that type.$ bin/runTypecheckTest.sh fomega/foldExp.fw

The following tests that each of the 5 benchmark meta-programs type checks with the expected type.

$ bin/runTypecheckTest.sh fomega/popl16-benchmarks.fw

`unquote`

We show four tests of the behavior of unquote. The test

`fomega/unquote-id.fw`

unquotes the polymorphic identity function. The test`fomega/unquote-plus.fw`

unquotes the representation of`plus`

and applies the result. We can't compare the unquoted`plus`

with the original directly, since they are non-normalizing terms. The third unquotes the representation application of`times two three`

. The fourth unquotes the representation of`unquote`

itself, and uses the result to unquote the representation of`times two three`

.$ bin/runEquivTest.sh fomega/unquote-id.fw $ bin/runEquivTest.sh fomega/unquote-plus.fw $ bin/runEquivTest.sh fomega/unquote-times.fw $ bin/runEquivTest.sh fomega/unquote-selfApply.fw

`CPS`

We use a series of equivalent tests to demonstrate the behavior of

`cps`

on the representation of the polymorphic identity function`ΛA:*. λx:A. x`

. Each test in the series simplifies the expected result.$ bin/runEquivTest.sh fomega/cpsId1.fw $ bin/runEquivTest.sh fomega/cpsId2.fw $ bin/runEquivTest.sh fomega/cpsId3.fw $ bin/runEquivTest.sh fomega/cpsId4.fw

Here's another example on a slightly larger input:

$ bin/runEquivTest.sh fomega/cpsIdId1.fw $ bin/runEquivTest.sh fomega/cpsIdId2.fw $ bin/runEquivTest.sh fomega/cpsIdId3.fw $ bin/runEquivTest.sh fomega/cpsIdId4.fw $ bin/runEquivTest.sh fomega/cpsIdId5.fw $ bin/runEquivTest.sh fomega/cpsIdId6.fw $ bin/runEquivTest.sh fomega/cpsIdId7.fw $ bin/runEquivTest.sh fomega/cpsIdId8.fw

The following test type checks the self-application of

`cps`

. We do not check the result of the self-application, since it is non-normalizing.$ bin/runTypecheckTest.sh fomega/cps-selfApply.fw

- nf
We test that our normal-form checker on several non-normal form terms, and on several normal form terms. In

`fomega/nf-false-3.fw`

we test that the term`plus three four`

is not normal. However, it is normalizing, and in`fomega/nf-true-3.fw`

we use our meta-level syntax for normalizing to test that it does evaluate to a normal form.$ bin/runEquivTest.sh fomega/nf-false-1.fw $ bin/runEquivTest.sh fomega/nf-false-2.fw $ bin/runEquivTest.sh fomega/nf-false-3.fw $ bin/runEquivTest.sh fomega/nf-true-1.fw $ bin/runEquivTest.sh fomega/nf-true-2.fw $ bin/runEquivTest.sh fomega/nf-true-3.fw

The following test self-applies

`nf`

:$ bin/runEquivTest.sh fomega/nf-selfApply.fw

- size
Our

`size`

operation counts the number of variables, (term or type) abstractions, (term or type) applications, folds, or unfolds.$ bin/runEquivTest.sh fomega/size1.fw $ bin/runEquivTest.sh fomega/size2.fw $ bin/runEquivTest.sh fomega/size3.fw

We can use our

`size`

measure to show that`plus three four`

is at least as large as its normal form:$ bin/runEquivTest.sh fomega/size-nbe.fw

The application of

`size`

to itself is well-typed and normalizing. We do not check the result of the self-application, as it is not practical to express the expected result.$ bin/runNormalizeTest.sh fomega/size-selfApply.fw

- isAbs
The following tests demonstrate the behavior of

`isAbs`

on each syntactic form of terms, other than variables. We can't test it on variables because only closed terms can be represented.$ bin/runEquivTest.sh fomega/isAbs-abs.fw $ bin/runEquivTest.sh fomega/isAbs-app.fw $ bin/runEquivTest.sh fomega/isAbs-tabs.fw $ bin/runEquivTest.sh fomega/isAbs-tapp.fw $ bin/runEquivTest.sh fomega/isAbs-fold.fw $ bin/runEquivTest.sh fomega/isAbs-unfold.fw

The following test shows that

`isAbs`

can be self-applied.$ bin/runEquivTest.sh fomega/isAbs-selfApply.fw

- Self-application tests
At the end of Section 7 we discuss how each of our self-evaluators can be applied to its own representation. In each case, the result is a representation. We test that the result has the same behavior as the original self-evaluator by unquoting it and running it on an example term. The term

`fact five`

exercises all parts of the language.$ bin/runEquivTest.sh fomega/whnf-selfApply.fw $ bin/runEquivTest.sh fomega/step-selfApply.fw $ bin/runEquivTest.sh fomega/nbe-selfApply.fw

#### 3.3.4 Correctness of weak head-normal evaluator

We included as supplementary material a correctness proof of our weak head normal form evaluator. Several lemmas in the proof are machine-checked by our equivalence checker. The following tests will check those steps.

For these tests, we make use of an additional meta-level feature,
unevaluated constants, which are implemented as free variables. We
support both type and term constants. The declaration `const A : *;`

declares a type constant `A`

of kind star; The declaration ```
const
fix : (∀A:*. (A → A) → A);
```

declares a term constant `fix`

of type
`(∀A:*. (A → A) → A)`

. We use `fix`

constants to test the behavior of
our evaluator, which as a recursive function does not normalize. In
particular, we use combinators `fix1`

and `fix2`

, each of which has
type `(∀A:*. (A → A) → A) → (∀A:*. (A → A) → A)`

, to unfold a
fixpoint combinator one or two times respectively. This enables us to
control how many times a recursive function like our evaluator is
unfolded.

$ bin/runEquivTest.sh fomega/whnfCorrectEvalEvalV.fw $ bin/runEquivTest.sh fomega/whnfCorrectAbs.fw $ bin/runEquivTest.sh fomega/whnfCorrectAppAbs.fw $ bin/runEquivTest.sh fomega/whnfCorrectTAbs.fw $ bin/runEquivTest.sh fomega/whnfCorrectTAppTAbs.fw $ bin/runEquivTest.sh fomega/whnfCorrectFold.fw $ bin/runEquivTest.sh fomega/whnfCorrectUnfoldFold.fw