Register Allocation by Puzzle Solving

Experimental Results


In order to show that puzzles can be effective and useful to register allocation, we have implemented a register allocator based on puzzles. We have implemented our register allocator in the LLVM compiler framework, version 1.9. Our tests are executed on a 32-bit x86 Intel(R) Xeon(TM), with a 3.06GHz cpu clock, 3GB of free memory and 512KB L1 cache running Red Hat Linux 3.3.3-7. Currently our allocator passes all the tests that LLVM passes using its default allocator. In this page we show results of our experiments. We have been able to compile and run over 1.3 million lines of code, and the results are encouraging. The next sequence of tables use the following key:

The LLVM test suite provides a large number of benchmark applications. In addition to these benchmarks, we have also compiled SPEC CPU 2000. Unfortunately, one of the benchmarks: eon, cannot be run in our programming environment - even when compiled with gcc. The other floating point benchmarks, e.g wupwise, swim, mgrid, applu, etc, cannot be compiled because they are FORTRAN applications, and our compiler only handles C/C++ code. Nevertheless, we could compile all the integer benchmarks, and all the four floating-point applications implemented in C. This is the data that we have used in our paper. Due to space constraints, we could not include all the other benchmarks in the LLVM test suite in the paper, so we show it in this page. In general, the results for SPEC 2000 mirror the general behavior of the other benchmarks. The table below contains links to all the raw data that we have been gathering while implementing and debugging the puzzle solver. The rows are chronologically ordered, so they give a good idea of how our implementation has evolved.

We compare our puzzle solver with three other register allocators, all implemented in LLVM 1.9 and all compiling and running the same benchmark suite of 1.3 million lines of C code. The first is LLVM's default algorithm, which is an industrial-strength version of linear scan. The algorithm does aggressive coalescing before register allocation and handles holes in live ranges by filling them with other variables whenever possible. We use ELS (Extended Linear Scan) to denote this register allocator.

The second register allocator is the iterated register coalescing of George and Appel with the extensions by Smith, Ramsey, and Holloway presented in PLDI 2004 for handling register aliasing. We use EIRC (Extended Iterated Register Coalescing) to denote this register allocator. The third register allocator is based on partitioned Boolean quadratic programming (PBQP). The algorithm runs in worst-case exponential time and does optimal spilling with respect to a set of Boolean constraints generated from the program text. We use this algorithm to gauge the potential for how good a register allocator can be. Lang Hames and Bernhard Scholz produced the implementations of EIRC and PBQP that we are using.

Raw data produced for LLVM test suite
File Date Comments
Download it Jul 7, 2007 The first tests with the puzzle solver! None many files could be compiled then...
Download it Jul 8, 2007
Download it Jul 9, 2007
Download it Jul 10, 2007 A test with LLVM's default algorithm. Only to know how many programs are compiled with LLVM.
Download it Jul 14, 2007 The puzzle solver again.
Download it Jul 19, 2007
Download it Jul 21, 2007
Download it Jul 27, 2007
Download it Jul 28, 2007
Download it Jul 31, 2007
Download it Aug 15, 2007
Download it Aug 20, 2007
Download it Aug 23, 2007
Download it Sep 5, 2007
Download it Sep 14, 2007 An early experience with a different algorithm to split live ranges. The basic register allocator is still the puzzle solver.
Download it Sep 14, 2007 Second run with the new algorithm to split critical edges. This algorithm was only used twice.
Download it Sep 18, 2007 Graph coloring versus PBQP. LLC runs graph coloring. The beta compiler runs PBQP.
Download it Sep 18, 2007 LLC running PBQP.
Download it Sep 18, 2007 The puzzle solver.
Download it Sep 20, 2007
Download it Sep 23, 2007
Download it Sep 24, 2007 LLVM's default versus the puzzle solver, with no instruction folding allowed to either allocator. LLC-BETA is the puzzle solver.
Download it Sep 24, 2007 LLC running PBQP.
Download it Sep 24, 2007 Puzzles versus LLVM's default. LLC is running puzzles.
Download it Sep 28, 2007 LLVM's default versus puzzle in the full LLVM test suite. LLC is running LLVM's default.
Download it Oct 17, 2007
Download it Oct 18, 2007 Graph coloring (LLC) versus PBQP (LLC-BETA).
Download it Oct 29, 2007 LLVMS's default (LLC) versus puzzles.
Download it Oct 30, 2007 Graph coloring (LLC) versus PBQP.
Download it Oct 31, 2007 LLVMS's default (LLC) versus puzzles.
Download it Nov 3, 2007 LLVMS's default (LLC) versus puzzles.
Download it Nov 4, 2007 Graph coloring (LLC) versus PBQP.
Download it Nov 4, 2007
Download it Nov 4, 2007 Graph coloring (LLC) versus PBQP.
Download it Nov 5, 2007 LLVMS's default (LLC) versus puzzles.
Download it Nov 6, 2007 LLVMS's default (LLC) versus puzzles.
Download it Nov 8, 2007 LLVMS's default (LLC) versus puzzles.
Download it Nov 8, 2007 Graph coloring (LLC) versus puzzles.
Download it Nov 9, 2007 Graph coloring (LLC) versus PBQP.
Download it Nov 9, 2007 PBQP
Download it Nov 10, 2007 Graph coloring (LLC) versus PBQP.
Download it Nov 11, 2007 Graph coloring (LLC) versus PBQP.
Download it Nov 12, 2007 Graph coloring (LLC) versus LLVM's default.
Download it Nov 13, 2007 Graph coloring (LLC) versus LLVM's default.
Download it Dec 1, 2007 Puzzles versus LLVM's default in LLVM's benchmark suite.
Download it Dec 1, 2007 Puzzles versus LLVM's default in LLVM's benchmark suite.
Download it Dec 1, 2007 Puzzles versus LLVM's default in LLVM's benchmark suite.
Download it Dec 1, 2007 Puzzles versus LLVM's default in LLVM's benchmark suite.
Download it Dec 1, 2007 Puzzles versus LLVM's default in LLVM's benchmark suite.

Spec CPU 2000

We have used SPEC CPU 2000 in most of our experiments. And some of the results we have found are summarized in the next charts. Notice that we do not present data for ten of the floating point programs. These benchmarks could not be compiled by LLVM in our machine. This is not a problem with the puzzle solver, because these floating point benchmarks could not be compiled even with the default allocator of LLVM. In fact, LLVM with its default allocator, or with the puzzle solver, pass the same amount of tests. I think the problem is support for sse floating point instructions in LLVM. Yet, the set of spec benchmarks that we use are the same that has been used in the paper "Demand-Driven Alias Analysis for C", by Zheng and Rugina.

Stack Size

This chart compares the maximum amount of space that each assembly program reserves on its call stack. The stack size gives an estimate of how many different variables are being spilled by each allocator. The puzzle solver and extended linear scan (LLVM's default) tend to spill more variables than the other two algorithms.

Spill Code

Memory Accesses

This chart shows the amount of spill code produced by each allocator when compiling SPEC2000. The puzzle solver inserts marginally fewer memory-access instructions than PBQP, 1.2 fewer memory-access instructions than EIRC, and 9.6% fewer memory-access instructions than extended linear scan (LLVM's default). Note that although the puzzle solver spills more variables than the other allocators, it removes only part of the live range of a spilled variable. The bars are normalized to the puzzle solver.

Spill Code

Compilation time

The chart below compares the register allocation time and the total compilation time of the puzzle solver and extended linear scan (LLVM's default). On average, extended linear scan (LLVM's default) is less than 1 faster than the puzzle solver. The total compilation time of LLVM with the default allocator is less than 3% faster than the total compilation time of LLVM with the puzzle solver. We note that LLVM is industrial-strength and highly tuned software, in contrast to our puzzle solver.

Spill Code

Size of Assembly Per Compilation Time

The next chart compares the ratio between the size of the assembly code produced by the compiler and the compilation time of the register assignment pass. We only compare two algorithm: the puzzle solver, and extended linear scan, the default algorithm of LLVM. The final column in the chart gives the standard deviation found for each algorithm. We see that the puzzle solver has a much more predictable behavior than LLVM's default, that is, the time spent by the puzzle solver in the register allocation process is much more proportional to the size of the assembly code produced than the time spent by ELS. The last column of the chart gives the standard deviation for each algorithm. The standard deviation for the puzzle solver is 66467.11244, whereas the standard deviation of LLVM's default is 386426.5572. In other words, the deviance of ELS is 5.83 times bigger than that of the puzzle solver.

Spill Code

Running Time

This chart gives the run time of each benchmark compiled with different algorithms. The run time is normalized to gcc -O3, and it is the average of five runs of each benchmark. Considering all the benchmarks, the four allocators produce faster code than gcc; the fractions are: puzzle solver 0.944, extended linear scan (LLVM's default) 0.991, EIRC 0.954 and PBQP 0.929. If we remove the floating point benchmarks, i.e., mesa, amp, art, equake, then gcc -O3 is faster. The fractions are: puzzle Solver 1.015, extended linear scan (LLVM's default) 1.059, EIRC 1.025 and PBQP 1.008.

Spill Code

Number of move/swap instructions inserted per total of puzzles

The chart below reports on the number of move and swap instructions necessary to implement parallel copies, fixing the coloring between different puzzles. As the chart shows, a very small number of moves is necessary between consecutive puzzles. This is mostly due to our allocation order, that starts at the root of the dominator tree, and goes down toward the leaves. Move moves are necessary to implement phi and pi functions than internal parallel copies, although there exist many more of these parallel copies than phi and pi functions.

Spill Code