Typed Self-Evaluation via Intensional Type Functions
Table of Contents
The full paper is available here. The appendix contains proofs of all theorems stated in the paper.
1 Introduction
This page describes the software artifact for the paper Typed Self-Evaluation via Intensional Type Functions by Matt Brown and Jens Palsberg and accepted to POPL'17.
The paper presents an F\(_\omega^{\mu i}\), an extension of F\(_\omega\) with recursive types and a Typecase type operator that enables intensional type functions. We use these features to define a GADT-style typed self-representation and program typed self-evaluators for F\(_\omega^{\mu i}\). The purpose of the artifact is to provide tools that help build, type check, and experiment with representations, our self-evaluators, and other benchmark meta-programs. The artifact includes:
- An implementation of F\(_\omega^{\mu i}\) in Haskell that includes a parser, type checker, quoter, evaluator, and beta-equivalence checker. The parser includes syntactic sugar for defining abbreviations, loading definitions from other files, building representations, and normalizing terms.
- Source code in our concrete syntax for all the operations described in the paper.
- A suite a F\(_\omega^{\mu i}\) test files.
2 Step 1: Installation
2.1 Requirements
The tests should be run on a modern Unix-like system (e.g. Linux or
Mac OS X) with at least 8 Gigabytes of RAM. We assume that the Haskell
build tool stack is installed. Click here for stack installation
instructions.
2.2 Get the sources
- Download from http://compilers.cs.ucla.edu/popl17/popl17-artifact.zip
Run the commands:
$ unzip popl17-artifact.zip $ cd popl17-artifact/
2.3 Build the F\(_\omega^{\mu i}\) implementation
Once stack is installed, he following commands will download and
compile all the Haskell dependencies, and build and test the
F\(_\omega^{\mu i}\) implementation. Make sure to run these from within
the popl17-artifact directory.
$ stack setup $ stack build $ stack test
3 Step 2: Evaluation
3.1 Description of concrete syntax
Our parser supports the declaration of abbreviations via the decl
directive. For example, the following example (taken from
fomega/lib/Id.fw) declares id to be the polymorphic application
function, and Id to be its type.
decl Id : * = ∀A:*. A → A; decl id : Id = ΛA:*. λx:A. x;
We can load all the declarations from a file in fomega/lib/ with the
load directive. For example, the following example (taken from
fomega/id.fw) loads the above declarations from fomega/lib/Id.fw
and then applies id to itself.
load "Id"; id Id id
Note that fomega/id.fw ends with a term, while fomega/lib/Id.fw
does not. The files in fomega/lib are called libraries and consist
of zero or more load and decl statements. The files in fomega/
are called programs and consist of zero or more load and decl
statements followed by a term to be evaluated.
Another important difference between the artifact and the paper is the notation for representations of terms. The paper uses an overline notation \(\overline{\cdot}\) to denote the representations of terms and types, while the artifact uses \([\cdot]\). The artifact does not have syntax for pre-representations of terms.
| Syntax in Paper | Syntax in Artifact | |
|---|---|---|
| Representation of a type | \(\overline{\forall \alpha:*. \alpha \to \alpha}\) | [∀A:*. A → A] |
| Representation of a term | \(\overline{\Lambda \alpha:*. \lambda x:\alpha. x}\) | [ΛA:*. λx:A. x] |
3.2 The Test Types
This artifact includes three types of test. Each type of test is to be run by the corresponding test runner. The types are:
Typecheck tests, which check if a program typechecks successfully. These tests are run with the
bin/runTypecheckTest.shtest runner. Examples:$ bin/runTypecheckTest.sh fomega/typecheckTestSuccess.fw Succeeded. $ bin/runTypecheckTest.sh fomega/typecheckTestFailure.fw runTypecheckTest: unexpected app type: l = Var (Der (Var (Der (Star (Der (Box TypeOfBox)))) A0)) x0 ty l = Var () A0
The second example attempts to typecheck the term
\A:*. \x:A. x x, which fails. This demonstrates that the test runner does in fact detect and report type errors (i.e. it doesn't simply output "Succeeded." for all inputs). It also demonstrates that our type errors are not user-friendly. In this artifact, we will not use the error messages. We use typecheck tests only to determine whether a program type checks.An equivalence test, which checks if two terms or types are $β$-equivalent. The program should be a Church pair or terms or types. The runner extracts each component, normalizes them, and tests that the normal forms are $α$-equivalent. These tests are run with the
bin/runEquivTest.shtest runner.$ bin/runEquivTest.sh fomega/equivTestSuccess.fw Succeeded. $ bin/runEquivTest.sh fomega/equivTestFailure.fw Failed. Not alpha-equivalent: fold (λNatF2001:* → ... fold (λNatF2079:* → ...
Again, the second test demonstrates that our equivalence test runner does catch and report failures when the two terms are not equivalent. In this case, it fails because the encoding of
0is not equivalent to the encoding of1. Again, we will not use these error messages.A normalization test, which tests that a particular term normalizes. These tests are run with the
bin/runNormalizationTest.shtest runner.$ bin/runNormalizeTest.sh fomega/normalizeTestSuccess.fw Normalizing... Succeeded. $ bin/runNormalizeTest.sh fomega/normalizeTestFailure.fw Normalizing... runNormalizeTest: Heap exhausted; Current maximum heap size is 2147483648 bytes (2048 MB); use `+RTS -M<size>' to increase it.
The first test normalizes the term
eqNat (plus one one) two, which involves recursion but is normalizing. The termseqNat,plus,oneandtwoare defined infomega/lib/Nat.fw. The second test normalizeseqNatitself, which is not normalizing. It is a recursive function, and the normalizer keeps unrolling the fixpoint combinator until it eventually runs out of memory.
3.3 The Tests
We now describe the various test files included in the artifact, and relate them to the results in the paper. You can run each test individually, or run them all via the command:
$ bin/runAllTests.sh
3.3.1 Section 5
Theorem 5.1 states that any closed and well-typed term can be
represented, and if the original term has type T, then its
representation will have type Exp T. The following test includes
several terms of increasing size and complexity.
$ bin/runTypecheckTest.sh fomega/Theorem5-1.fw
Theorem 5.2 states that all representations are strongly normalizing,
including those of non-normalizing terms. The test
fomega/Theorem5-2.fw involves normalizing the representations of
several non-normalizing terms.
$ bin/runNormalizeTest.sh fomega/Theorem5-2.fw
3.3.2 Section 6
In Section 6 we discuss our self-evaluators: a weak-head normal form evaluator, a self-step left-most beta-reducer, and an implementation normalization by evaluation (NbE). In this section, we show some of the tests that demonstrate their correctness.
- Weak head normal evaluation
The first test shows that the weak head normal form evaluator type checks with the type
∀T:*. Exp T → Exp T.$ bin/runTypecheckTest.sh fomega/whnf.fw
Next, we show some behavioral tests. The terms weak head normal forms are abstractions over terms or types, or fold expressions. The evaluator does not reduce under the abstraction or fold. These tests demonstrate this.
$ bin/runEquivTest.sh fomega/whnf-term-abstraction.fw $ bin/runEquivTest.sh fomega/whnf-type-abstraction.fw $ bin/runEquivTest.sh fomega/whnf-fold.fw
The next set of unit tests show the WHNF evaluator performing some reductions.
$ bin/runEquivTest.sh fomega/whnf-term-application.fw $ bin/runEquivTest.sh fomega/whnf-type-application.fw $ bin/runEquivTest.sh fomega/whnf-unfold.fw
The next test is a larger test that exercises all of the evaluator.
$ bin/runEquivTest.sh fomega/whnf-fact.fw
- Single-step left-most reduction
Our next self-evaluator performs a single step of left-most reduction.
The first test shows that the evaluator has the type
(∀T:*. Exp T → Exp T).$ bin/runTypecheckTest.sh fomega/step.fw
The first set of behavioral tests shows that a term in normal form is returned unchanged.
$ bin/runEquivTest.sh fomega/step-nf1.fw $ bin/runEquivTest.sh fomega/step-nf2.fw $ bin/runEquivTest.sh fomega/step-nf3.fw
The second set of behavioral tests demonstrate which redex
stepreduces.$ bin/runEquivTest.sh fomega/step-redex1.fw $ bin/runEquivTest.sh fomega/step-redex2.fw $ bin/runEquivTest.sh fomega/step-redex3.fw
For a larger test, we use
stepto repeatedly reduce a term until a normal form is reached.$ bin/runEquivTest.sh fomega/stepNorm-plus.fw
- Normalization by Evaluation
The first test shows that our implementation of Normalization by Evaluation has the type
(∀T:*. Exp T → Exp T).$ bin/runTypecheckTest.sh fomega/nbe.fw
Next, we test the behavior of NbE by comparing its output with the expected results. To do this, we combine our syntax for building representations
[ ... ]with the syntax for normalizing terms (using the meta-level evaluator)< ... >. For example,[<plus two three>]first normalizesplus two three, and then builds the representation of the normal form.$ bin/runEquivTest.sh fomega/nbe-plus.fw $ bin/runEquivTest.sh fomega/nbe-times.fw $ bin/runEquivTest.sh fomega/nbe-fact.fw
3.3.3 Section 7
In Section 7 we discuss our benchmark meta-programs and the self-application of our self-evaluators. In this section we provide tests to demonstrate each of these points.
- Benchmark meta-programs
We re-implement the five benchmark self-applicable meta-programs from our POPL'16 paper. They are the self-interpreter
unquotethat recovers a term from its representation, a continuation-passing-style transformationcps, a measure of thesizeof the term, a functionnfthat checks if its input represents a term in normal form, and a simple syntactic checkisAbs. All exceptnfare implemented as a fold over the representation. It is possible tonfas a fold, but then it would only work on representations of closed terms. This version supports representations of open or closed terms.foldExpFigure 15 in the paper shows the type of
foldExp, the fold function used for these benchmarks. The following test verify thatfoldExptype checks with that type.$ bin/runTypecheckTest.sh fomega/foldExp.fw
The following tests that each of the 5 benchmark meta-programs type checks with the expected type.
$ bin/runTypecheckTest.sh fomega/popl16-benchmarks.fw
unquoteWe show four tests of the behavior of unquote. The test
fomega/unquote-id.fwunquotes the polymorphic identity function. The testfomega/unquote-plus.fwunquotes the representation ofplusand applies the result. We can't compare the unquotedpluswith the original directly, since they are non-normalizing terms. The third unquotes the representation application oftimes two three. The fourth unquotes the representation ofunquoteitself, and uses the result to unquote the representation oftimes two three.$ bin/runEquivTest.sh fomega/unquote-id.fw $ bin/runEquivTest.sh fomega/unquote-plus.fw $ bin/runEquivTest.sh fomega/unquote-times.fw $ bin/runEquivTest.sh fomega/unquote-selfApply.fw
CPSWe use a series of equivalent tests to demonstrate the behavior of
cpson the representation of the polymorphic identity functionΛA:*. λx:A. x. Each test in the series simplifies the expected result.$ bin/runEquivTest.sh fomega/cpsId1.fw $ bin/runEquivTest.sh fomega/cpsId2.fw $ bin/runEquivTest.sh fomega/cpsId3.fw $ bin/runEquivTest.sh fomega/cpsId4.fw
Here's another example on a slightly larger input:
$ bin/runEquivTest.sh fomega/cpsIdId1.fw $ bin/runEquivTest.sh fomega/cpsIdId2.fw $ bin/runEquivTest.sh fomega/cpsIdId3.fw $ bin/runEquivTest.sh fomega/cpsIdId4.fw $ bin/runEquivTest.sh fomega/cpsIdId5.fw $ bin/runEquivTest.sh fomega/cpsIdId6.fw $ bin/runEquivTest.sh fomega/cpsIdId7.fw $ bin/runEquivTest.sh fomega/cpsIdId8.fw
The following test type checks the self-application of
cps. We do not check the result of the self-application, since it is non-normalizing.$ bin/runTypecheckTest.sh fomega/cps-selfApply.fw
- nf
We test that our normal-form checker on several non-normal form terms, and on several normal form terms. In
fomega/nf-false-3.fwwe test that the termplus three fouris not normal. However, it is normalizing, and infomega/nf-true-3.fwwe use our meta-level syntax for normalizing to test that it does evaluate to a normal form.$ bin/runEquivTest.sh fomega/nf-false-1.fw $ bin/runEquivTest.sh fomega/nf-false-2.fw $ bin/runEquivTest.sh fomega/nf-false-3.fw $ bin/runEquivTest.sh fomega/nf-true-1.fw $ bin/runEquivTest.sh fomega/nf-true-2.fw $ bin/runEquivTest.sh fomega/nf-true-3.fw
The following test self-applies
nf:$ bin/runEquivTest.sh fomega/nf-selfApply.fw
- size
Our
sizeoperation counts the number of variables, (term or type) abstractions, (term or type) applications, folds, or unfolds.$ bin/runEquivTest.sh fomega/size1.fw $ bin/runEquivTest.sh fomega/size2.fw $ bin/runEquivTest.sh fomega/size3.fw
We can use our
sizemeasure to show thatplus three fouris at least as large as its normal form:$ bin/runEquivTest.sh fomega/size-nbe.fw
The application of
sizeto itself is well-typed and normalizing. We do not check the result of the self-application, as it is not practical to express the expected result.$ bin/runNormalizeTest.sh fomega/size-selfApply.fw
- isAbs
The following tests demonstrate the behavior of
isAbson each syntactic form of terms, other than variables. We can't test it on variables because only closed terms can be represented.$ bin/runEquivTest.sh fomega/isAbs-abs.fw $ bin/runEquivTest.sh fomega/isAbs-app.fw $ bin/runEquivTest.sh fomega/isAbs-tabs.fw $ bin/runEquivTest.sh fomega/isAbs-tapp.fw $ bin/runEquivTest.sh fomega/isAbs-fold.fw $ bin/runEquivTest.sh fomega/isAbs-unfold.fw
The following test shows that
isAbscan be self-applied.$ bin/runEquivTest.sh fomega/isAbs-selfApply.fw
- Self-application tests
At the end of Section 7 we discuss how each of our self-evaluators can be applied to its own representation. In each case, the result is a representation. We test that the result has the same behavior as the original self-evaluator by unquoting it and running it on an example term. The term
fact fiveexercises all parts of the language.$ bin/runEquivTest.sh fomega/whnf-selfApply.fw $ bin/runEquivTest.sh fomega/step-selfApply.fw $ bin/runEquivTest.sh fomega/nbe-selfApply.fw
3.3.4 Correctness of weak head-normal evaluator
We included as supplementary material a correctness proof of our weak head normal form evaluator. Several lemmas in the proof are machine-checked by our equivalence checker. The following tests will check those steps.
For these tests, we make use of an additional meta-level feature,
unevaluated constants, which are implemented as free variables. We
support both type and term constants. The declaration const A : *;
declares a type constant A of kind star; The declaration const
fix : (∀A:*. (A → A) → A); declares a term constant fix of type
(∀A:*. (A → A) → A). We use fix constants to test the behavior of
our evaluator, which as a recursive function does not normalize. In
particular, we use combinators fix1 and fix2, each of which has
type (∀A:*. (A → A) → A) → (∀A:*. (A → A) → A), to unfold a
fixpoint combinator one or two times respectively. This enables us to
control how many times a recursive function like our evaluator is
unfolded.
$ bin/runEquivTest.sh fomega/whnfCorrectEvalEvalV.fw $ bin/runEquivTest.sh fomega/whnfCorrectAbs.fw $ bin/runEquivTest.sh fomega/whnfCorrectAppAbs.fw $ bin/runEquivTest.sh fomega/whnfCorrectTAbs.fw $ bin/runEquivTest.sh fomega/whnfCorrectTAppTAbs.fw $ bin/runEquivTest.sh fomega/whnfCorrectFold.fw $ bin/runEquivTest.sh fomega/whnfCorrectUnfoldFold.fw