Register Allocation by Puzzle Solving

We have shown that register allocation can be viewed as solving a collection of puzzles. We model the register file as a puzzle board and the program variables as puzzle pieces; pre-coloring and register aliasing fit in naturally. For architectures such as x86, SPARC V8, and StrongARM, we can solve the puzzles in polynomial time, and we have augmented the puzzle solver with a simple heuristic for spilling. For SPEC CPU2000, our implementation is as fast as the extended version of linear scan used by LLVM, which is the JIT compiler in the openGL stack of Mac OS 10.5. Our implementation produces Pentium code that is of similar quality to the code produced by the slower, state-of-the-art iterated register coalescing algorithm with an extension by Smith, Ramsey, and Holloway from 2004. A tutorial on Register Allocation by Puzzle Solving is available at tutorial.

*Full version of the paper:*
	This is the full version of our paper. The appendices contain the proofs of the theorems stated in the main body of the paper, plus more detailed pseudo-code for the register assignment phase, with spilling and coalescing.

I will use this page to share our "discoveries" with other researchers. Mostly I will provide links for the documents that we write, and for part of the implementation. The code of our puzzle solver is available in this page. The puzzle based register allocator is able to pass all the tests that LLVM's default algorithm does. My implementation needs some more fine tuning though. There are room for optimizations in the spilling and coalescing heuristics, and also I believe it is possible to speed up the implementation a little bit.

Well, so, why should one see register allocation as a puzzle? Because that is a very elegant and powerful abstraction! It allows the register allocation to find an exact register assignment for an architecture as complex as x86! That is, if it is possible to find a register assignment that does not cause spilling, a puzzle based allocator will find one. That was an open problem until now. We found the largest known class of graphs that admit solution for classic NP-complete problems, such as pre-coloring extension and 2-weighted coloring. Furthermore, I really do not think anyone has ever studied the problem of register allocation at this level of granularity. Appel and George[Appel01] described a live range splitting strategy that allows optimal register assignment. The same has been done by Sarkar and Barik[Sarkar07]: they defined an optimal linear scan algorithm that relies on live range splitting. However, Appel and Sarkar do not describe any treatment for registers of different sizes. Our result is even more surprising if one consider that given an interval graph, the problem of finding an optimal 2-weighted coloring for it is NP-complete.

And, what can be accomplished from puzzles? Many things, I would say. The optimal assignment is very good at keeping variables in registers. The number of heuristics that can be added on top of puzzles is virtually unbounded, and this formalism is easy to understand. Currently we have one particular implementation. It does not mean that our algorithm is the definitive one. Puzzles can be used, for example, to increase the precision of the ILP model of Appel and George, or the register assignment strategy of Sarkar and Barik. Furthermore, a puzzle solver can be very fast. I am not an expert in C++, but our implementation of the puzzle solver, at least when compiling big programs, has about the same compilation time than LLVM's default algorithm, and mind that this is a JIT compiler used in the industry! I really think that a more careful implementation, aware of good data structures and special cases could boost the velocity of the current program that we have. Not all the patterns that can make puzzles occur in real architectures. I believe that, if the implementation is aware of these forbidden cases, it can be coded to run faster.

One feature that was remarkably helpful during the implementation was the type-system for register allocators that we submit to SAS'07. The debugger is really, really helpful. I think that without the type-system, I would never be able to finish the implementation. I have the debugger implemented in java, and there is a jar file available here. You can use it with the command

java -classpath=TypeChecker.jar typeChecker.TypeCheckApp < file

And if you want to know more about the type inference engine that we use, you can read our paper. In order to use the debugger, you must convert the output of your allocator to sMIRA, a language for describing register allocation directives. It is really worth the trouble: whoever has implemented a register allocator knows how hard is to find errors when they are present. The result of a bad allocation can surface many instructions past the problematic point, and the developer has no clues of what is going on with the code. Finding such errors is a nightmare. But with the debugger, it is as easy as it can be: the type-checker points the problematic instructions, and gives some hints of what may be happening, e.g a register is being redefined while the value that it uses is still alive, etc. But, be aware: the debugger is not yet 100% done. I still have to improve the error messages, to make them more intuitive and helpful. Yet, the debugger was really useful to me.

References

Download it [Appel01] Andrew W. Appel and Lal George Optimal Spilling for CISC Machines with Few Registers. PLDI. 243 - 253. ACM Press. 2001.
[Sarkar07] Vivek Sarkar and Rajkishore Barik Extended Linear Scan: an Alternate Foundation for Global Register Allocation. CC. 141 - 155. LNCS. 2007.

	[Appel01] Andrew W. Appel and Lal George Optimal Spilling for CISC Machines with Few Registers. PLDI. 243 - 253. ACM Press. 2001.
	[Sarkar07] Vivek Sarkar and Rajkishore Barik Extended Linear Scan: an Alternate Foundation for Global Register Allocation. CC. 141 - 155. LNCS. 2007.

Register Allocation by Puzzle Solving.

References