Typed Self-Evaluation via Intensional Type Functions
Table of Contents
The full paper is available here. The appendix contains proofs of all theorems stated in the paper.
1 Introduction
This page describes the software artifact for the paper Typed Self-Evaluation via Intensional Type Functions by Matt Brown and Jens Palsberg and accepted to POPL'17.
The paper presents an F\(_\omega^{\mu i}\), an extension of F\(_\omega\) with recursive types and a Typecase type operator that enables intensional type functions. We use these features to define a GADT-style typed self-representation and program typed self-evaluators for F\(_\omega^{\mu i}\). The purpose of the artifact is to provide tools that help build, type check, and experiment with representations, our self-evaluators, and other benchmark meta-programs. The artifact includes:
- An implementation of F\(_\omega^{\mu i}\) in Haskell that includes a parser, type checker, quoter, evaluator, and beta-equivalence checker. The parser includes syntactic sugar for defining abbreviations, loading definitions from other files, building representations, and normalizing terms.
- Source code in our concrete syntax for all the operations described in the paper.
- A suite a F\(_\omega^{\mu i}\) test files.
2 Step 1: Installation
2.1 Requirements
The tests should be run on a modern Unix-like system (e.g. Linux or
Mac OS X) with at least 8 Gigabytes of RAM. We assume that the Haskell
build tool stack
is installed. Click here for stack
installation
instructions.
2.2 Get the sources
- Download from http://compilers.cs.ucla.edu/popl17/popl17-artifact.zip
Run the commands:
$ unzip popl17-artifact.zip $ cd popl17-artifact/
2.3 Build the F\(_\omega^{\mu i}\) implementation
Once stack
is installed, he following commands will download and
compile all the Haskell dependencies, and build and test the
F\(_\omega^{\mu i}\) implementation. Make sure to run these from within
the popl17-artifact
directory.
$ stack setup $ stack build $ stack test
3 Step 2: Evaluation
3.1 Description of concrete syntax
Our parser supports the declaration of abbreviations via the decl
directive. For example, the following example (taken from
fomega/lib/Id.fw
) declares id
to be the polymorphic application
function, and Id
to be its type.
decl Id : * = ∀A:*. A → A; decl id : Id = ΛA:*. λx:A. x;
We can load all the declarations from a file in fomega/lib/
with the
load
directive. For example, the following example (taken from
fomega/id.fw
) loads the above declarations from fomega/lib/Id.fw
and then applies id
to itself.
load "Id"; id Id id
Note that fomega/id.fw
ends with a term, while fomega/lib/Id.fw
does not. The files in fomega/lib
are called libraries and consist
of zero or more load
and decl
statements. The files in fomega/
are called programs and consist of zero or more load
and decl
statements followed by a term to be evaluated.
Another important difference between the artifact and the paper is the notation for representations of terms. The paper uses an overline notation \(\overline{\cdot}\) to denote the representations of terms and types, while the artifact uses \([\cdot]\). The artifact does not have syntax for pre-representations of terms.
Syntax in Paper | Syntax in Artifact | |
---|---|---|
Representation of a type | \(\overline{\forall \alpha:*. \alpha \to \alpha}\) | [∀A:*. A → A] |
Representation of a term | \(\overline{\Lambda \alpha:*. \lambda x:\alpha. x}\) | [ΛA:*. λx:A. x] |
3.2 The Test Types
This artifact includes three types of test. Each type of test is to be run by the corresponding test runner. The types are:
Typecheck tests, which check if a program typechecks successfully. These tests are run with the
bin/runTypecheckTest.sh
test runner. Examples:$ bin/runTypecheckTest.sh fomega/typecheckTestSuccess.fw Succeeded. $ bin/runTypecheckTest.sh fomega/typecheckTestFailure.fw runTypecheckTest: unexpected app type: l = Var (Der (Var (Der (Star (Der (Box TypeOfBox)))) A0)) x0 ty l = Var () A0
The second example attempts to typecheck the term
\A:*. \x:A. x x
, which fails. This demonstrates that the test runner does in fact detect and report type errors (i.e. it doesn't simply output "Succeeded." for all inputs). It also demonstrates that our type errors are not user-friendly. In this artifact, we will not use the error messages. We use typecheck tests only to determine whether a program type checks.An equivalence test, which checks if two terms or types are $β$-equivalent. The program should be a Church pair or terms or types. The runner extracts each component, normalizes them, and tests that the normal forms are $α$-equivalent. These tests are run with the
bin/runEquivTest.sh
test runner.$ bin/runEquivTest.sh fomega/equivTestSuccess.fw Succeeded. $ bin/runEquivTest.sh fomega/equivTestFailure.fw Failed. Not alpha-equivalent: fold (λNatF2001:* → ... fold (λNatF2079:* → ...
Again, the second test demonstrates that our equivalence test runner does catch and report failures when the two terms are not equivalent. In this case, it fails because the encoding of
0
is not equivalent to the encoding of1
. Again, we will not use these error messages.A normalization test, which tests that a particular term normalizes. These tests are run with the
bin/runNormalizationTest.sh
test runner.$ bin/runNormalizeTest.sh fomega/normalizeTestSuccess.fw Normalizing... Succeeded. $ bin/runNormalizeTest.sh fomega/normalizeTestFailure.fw Normalizing... runNormalizeTest: Heap exhausted; Current maximum heap size is 2147483648 bytes (2048 MB); use `+RTS -M<size>' to increase it.
The first test normalizes the term
eqNat (plus one one) two
, which involves recursion but is normalizing. The termseqNat
,plus
,one
andtwo
are defined infomega/lib/Nat.fw
. The second test normalizeseqNat
itself, which is not normalizing. It is a recursive function, and the normalizer keeps unrolling the fixpoint combinator until it eventually runs out of memory.
3.3 The Tests
We now describe the various test files included in the artifact, and relate them to the results in the paper. You can run each test individually, or run them all via the command:
$ bin/runAllTests.sh
3.3.1 Section 5
Theorem 5.1 states that any closed and well-typed term can be
represented, and if the original term has type T
, then its
representation will have type Exp T
. The following test includes
several terms of increasing size and complexity.
$ bin/runTypecheckTest.sh fomega/Theorem5-1.fw
Theorem 5.2 states that all representations are strongly normalizing,
including those of non-normalizing terms. The test
fomega/Theorem5-2.fw
involves normalizing the representations of
several non-normalizing terms.
$ bin/runNormalizeTest.sh fomega/Theorem5-2.fw
3.3.2 Section 6
In Section 6 we discuss our self-evaluators: a weak-head normal form evaluator, a self-step left-most beta-reducer, and an implementation normalization by evaluation (NbE). In this section, we show some of the tests that demonstrate their correctness.
- Weak head normal evaluation
The first test shows that the weak head normal form evaluator type checks with the type
∀T:*. Exp T → Exp T
.$ bin/runTypecheckTest.sh fomega/whnf.fw
Next, we show some behavioral tests. The terms weak head normal forms are abstractions over terms or types, or fold expressions. The evaluator does not reduce under the abstraction or fold. These tests demonstrate this.
$ bin/runEquivTest.sh fomega/whnf-term-abstraction.fw $ bin/runEquivTest.sh fomega/whnf-type-abstraction.fw $ bin/runEquivTest.sh fomega/whnf-fold.fw
The next set of unit tests show the WHNF evaluator performing some reductions.
$ bin/runEquivTest.sh fomega/whnf-term-application.fw $ bin/runEquivTest.sh fomega/whnf-type-application.fw $ bin/runEquivTest.sh fomega/whnf-unfold.fw
The next test is a larger test that exercises all of the evaluator.
$ bin/runEquivTest.sh fomega/whnf-fact.fw
- Single-step left-most reduction
Our next self-evaluator performs a single step of left-most reduction.
The first test shows that the evaluator has the type
(∀T:*. Exp T → Exp T)
.$ bin/runTypecheckTest.sh fomega/step.fw
The first set of behavioral tests shows that a term in normal form is returned unchanged.
$ bin/runEquivTest.sh fomega/step-nf1.fw $ bin/runEquivTest.sh fomega/step-nf2.fw $ bin/runEquivTest.sh fomega/step-nf3.fw
The second set of behavioral tests demonstrate which redex
step
reduces.$ bin/runEquivTest.sh fomega/step-redex1.fw $ bin/runEquivTest.sh fomega/step-redex2.fw $ bin/runEquivTest.sh fomega/step-redex3.fw
For a larger test, we use
step
to repeatedly reduce a term until a normal form is reached.$ bin/runEquivTest.sh fomega/stepNorm-plus.fw
- Normalization by Evaluation
The first test shows that our implementation of Normalization by Evaluation has the type
(∀T:*. Exp T → Exp T)
.$ bin/runTypecheckTest.sh fomega/nbe.fw
Next, we test the behavior of NbE by comparing its output with the expected results. To do this, we combine our syntax for building representations
[ ... ]
with the syntax for normalizing terms (using the meta-level evaluator)< ... >
. For example,[<plus two three>]
first normalizesplus two three
, and then builds the representation of the normal form.$ bin/runEquivTest.sh fomega/nbe-plus.fw $ bin/runEquivTest.sh fomega/nbe-times.fw $ bin/runEquivTest.sh fomega/nbe-fact.fw
3.3.3 Section 7
In Section 7 we discuss our benchmark meta-programs and the self-application of our self-evaluators. In this section we provide tests to demonstrate each of these points.
- Benchmark meta-programs
We re-implement the five benchmark self-applicable meta-programs from our POPL'16 paper. They are the self-interpreter
unquote
that recovers a term from its representation, a continuation-passing-style transformationcps
, a measure of thesize
of the term, a functionnf
that checks if its input represents a term in normal form, and a simple syntactic checkisAbs
. All exceptnf
are implemented as a fold over the representation. It is possible tonf
as a fold, but then it would only work on representations of closed terms. This version supports representations of open or closed terms.foldExp
Figure 15 in the paper shows the type of
foldExp
, the fold function used for these benchmarks. The following test verify thatfoldExp
type checks with that type.$ bin/runTypecheckTest.sh fomega/foldExp.fw
The following tests that each of the 5 benchmark meta-programs type checks with the expected type.
$ bin/runTypecheckTest.sh fomega/popl16-benchmarks.fw
unquote
We show four tests of the behavior of unquote. The test
fomega/unquote-id.fw
unquotes the polymorphic identity function. The testfomega/unquote-plus.fw
unquotes the representation ofplus
and applies the result. We can't compare the unquotedplus
with the original directly, since they are non-normalizing terms. The third unquotes the representation application oftimes two three
. The fourth unquotes the representation ofunquote
itself, and uses the result to unquote the representation oftimes two three
.$ bin/runEquivTest.sh fomega/unquote-id.fw $ bin/runEquivTest.sh fomega/unquote-plus.fw $ bin/runEquivTest.sh fomega/unquote-times.fw $ bin/runEquivTest.sh fomega/unquote-selfApply.fw
CPS
We use a series of equivalent tests to demonstrate the behavior of
cps
on the representation of the polymorphic identity functionΛA:*. λx:A. x
. Each test in the series simplifies the expected result.$ bin/runEquivTest.sh fomega/cpsId1.fw $ bin/runEquivTest.sh fomega/cpsId2.fw $ bin/runEquivTest.sh fomega/cpsId3.fw $ bin/runEquivTest.sh fomega/cpsId4.fw
Here's another example on a slightly larger input:
$ bin/runEquivTest.sh fomega/cpsIdId1.fw $ bin/runEquivTest.sh fomega/cpsIdId2.fw $ bin/runEquivTest.sh fomega/cpsIdId3.fw $ bin/runEquivTest.sh fomega/cpsIdId4.fw $ bin/runEquivTest.sh fomega/cpsIdId5.fw $ bin/runEquivTest.sh fomega/cpsIdId6.fw $ bin/runEquivTest.sh fomega/cpsIdId7.fw $ bin/runEquivTest.sh fomega/cpsIdId8.fw
The following test type checks the self-application of
cps
. We do not check the result of the self-application, since it is non-normalizing.$ bin/runTypecheckTest.sh fomega/cps-selfApply.fw
- nf
We test that our normal-form checker on several non-normal form terms, and on several normal form terms. In
fomega/nf-false-3.fw
we test that the termplus three four
is not normal. However, it is normalizing, and infomega/nf-true-3.fw
we use our meta-level syntax for normalizing to test that it does evaluate to a normal form.$ bin/runEquivTest.sh fomega/nf-false-1.fw $ bin/runEquivTest.sh fomega/nf-false-2.fw $ bin/runEquivTest.sh fomega/nf-false-3.fw $ bin/runEquivTest.sh fomega/nf-true-1.fw $ bin/runEquivTest.sh fomega/nf-true-2.fw $ bin/runEquivTest.sh fomega/nf-true-3.fw
The following test self-applies
nf
:$ bin/runEquivTest.sh fomega/nf-selfApply.fw
- size
Our
size
operation counts the number of variables, (term or type) abstractions, (term or type) applications, folds, or unfolds.$ bin/runEquivTest.sh fomega/size1.fw $ bin/runEquivTest.sh fomega/size2.fw $ bin/runEquivTest.sh fomega/size3.fw
We can use our
size
measure to show thatplus three four
is at least as large as its normal form:$ bin/runEquivTest.sh fomega/size-nbe.fw
The application of
size
to itself is well-typed and normalizing. We do not check the result of the self-application, as it is not practical to express the expected result.$ bin/runNormalizeTest.sh fomega/size-selfApply.fw
- isAbs
The following tests demonstrate the behavior of
isAbs
on each syntactic form of terms, other than variables. We can't test it on variables because only closed terms can be represented.$ bin/runEquivTest.sh fomega/isAbs-abs.fw $ bin/runEquivTest.sh fomega/isAbs-app.fw $ bin/runEquivTest.sh fomega/isAbs-tabs.fw $ bin/runEquivTest.sh fomega/isAbs-tapp.fw $ bin/runEquivTest.sh fomega/isAbs-fold.fw $ bin/runEquivTest.sh fomega/isAbs-unfold.fw
The following test shows that
isAbs
can be self-applied.$ bin/runEquivTest.sh fomega/isAbs-selfApply.fw
- Self-application tests
At the end of Section 7 we discuss how each of our self-evaluators can be applied to its own representation. In each case, the result is a representation. We test that the result has the same behavior as the original self-evaluator by unquoting it and running it on an example term. The term
fact five
exercises all parts of the language.$ bin/runEquivTest.sh fomega/whnf-selfApply.fw $ bin/runEquivTest.sh fomega/step-selfApply.fw $ bin/runEquivTest.sh fomega/nbe-selfApply.fw
3.3.4 Correctness of weak head-normal evaluator
We included as supplementary material a correctness proof of our weak head normal form evaluator. Several lemmas in the proof are machine-checked by our equivalence checker. The following tests will check those steps.
For these tests, we make use of an additional meta-level feature,
unevaluated constants, which are implemented as free variables. We
support both type and term constants. The declaration const A : *;
declares a type constant A
of kind star; The declaration const
fix : (∀A:*. (A → A) → A);
declares a term constant fix
of type
(∀A:*. (A → A) → A)
. We use fix
constants to test the behavior of
our evaluator, which as a recursive function does not normalize. In
particular, we use combinators fix1
and fix2
, each of which has
type (∀A:*. (A → A) → A) → (∀A:*. (A → A) → A)
, to unfold a
fixpoint combinator one or two times respectively. This enables us to
control how many times a recursive function like our evaluator is
unfolded.
$ bin/runEquivTest.sh fomega/whnfCorrectEvalEvalV.fw $ bin/runEquivTest.sh fomega/whnfCorrectAbs.fw $ bin/runEquivTest.sh fomega/whnfCorrectAppAbs.fw $ bin/runEquivTest.sh fomega/whnfCorrectTAbs.fw $ bin/runEquivTest.sh fomega/whnfCorrectTAppTAbs.fw $ bin/runEquivTest.sh fomega/whnfCorrectFold.fw $ bin/runEquivTest.sh fomega/whnfCorrectUnfoldFold.fw