A new register allocation algorithm for LLVM.




I am currently implementing the register allocation algorithm described in the paper "Register Allocation via Coloring of Chordal Graphs, Pereira and Palsberg, APLAS'05"[Pereira05], using LLVM as the basic framework. The proposed algorithm relies on the fact that the interference graph of programs in SSA form are chordal. Such property allows to find the minimal number of registers necessary to compile a program in polynomial time. This project has been accepted by the Google Summer of Code initiative. I am logging my progress here.




What must be done

Two important modifications must be performed in the LLVM's code:
  1. A new register allocation pass must be implemented.
  2. The PHI deconstruction algorithm must be replaced.

1 - New register allocation algorithm

The register allocation algorithm uses the liveness analysis information to build the interference graph for the procedure where register allocation is been performed. The interference graph of a program in SSA form is always chordal, and can be optimally colored in polynomial time. The SSA structure also allows to implement the spilling phase before the graph coloring phase in a very efficient way, although it is non-optimal. The good results of the chordal based register allocation have been published in "Register Allocation via Coloring of Chordal Graphs, APLAS'05" [Pereira05]. The theoretical background is given in [Hack06b].

2 - New PHI deconstruction algorithm

Although SSA-form programs can be colored in polynomial time, care must be taken when deconstructing PHI functions, otherwise a new NP-complete problem will be found, as published in "Register Allocation after Classical SSA elimination is NP-complete, FOSSACS'06"[Pereira06]. In order to avoid the exponential problem, PHI functions cannot be replaced by copy instructions, as it is normally done. A simple, and elegant solution for the PHI deconstruction question has been published in "Optimal register allocation for SSA-form programs in polynomial time" [Hack06b], and also in "Register Allocation for Programs in SSA-Form, CC 2006"[Hack06a]. The published algorithm uses XOR instructions instead of copy instructions to destroy PHI functions. Basically, this algorithm takes into consideration the results of the register allocation phase, and adds one permutation of registers to each edge that reaches the basic block where PHI functions were destroyed. Permutations are implemented as sequences of swap operations. A swap operation exchanges the value of two variables without demanding a third store location. It can be implemented with three XOR instructions or with three subtractions, as the following example shows:

Example:

Let the following group of PHI functions be present at the header of block B_d. Assume that the register allocation phase has already been performed:

a(r1) := PHI(a1(r1), a2(r2))
b(r2) := PHI(b1(r2), b2(r1))

In this example, a, a1, a2, b, b1, and b2 are virtual registers, and r1 and r2 are physical registers. Assume that the live range of a2 and b2 reach block B_d coming from block B_s. In order to make this allocation semantically consistent, it is necessary to swap the values in registers a2 and b2, before reaching block B_d. This can be accomplished by inserting the permutation (r1, r2) := perm (r2, r1) in the edge that join B_s and B_d. Such permutation can be implemented as a sequence of three XOR instructions: r1 := (r1 xor r2); r2 := (r1 xor r2); r1 := (r1 xor r2).

Expected Contributions

Currently, LLVM provides three different options of register allocation: a simple algorithm that sends all the operands to memory, the local algorithm that uses interferences between live ranges to decide which registers to use, and the linear scan implementation. The proposed algorithm is considerably different than all these three implementations, and it produces results comparable to the iterative register coalescing algorithm, from George and Appel [George96], in a faster and more consistent way. The chordal approach will not perform as fast as the linear scan algorithm; however, it is likely to generate better results, because, it can always determine the minimum number of registers necessary to compile a program in the absence of spills.

My Background

I have implemented this algorithm for the JoeQ compiler in the Java language, and I have written two papers about register allocation: [Pereira05], and [Pereira06]. In order to get acquainted with LLVM, I have developed a small pass to generate the control flow graph with register allocation information. Such program can be found here. Given a C program such as this, it produces a graph such as this one here. This pass can be plugged in the llc tool. My other projects with register allocation can be found here.

References

Download it [Hack06a] Sebastian Hack, Daniel Grund, Gerhard Goos Register Allocation for Programs in SSA-Form. Compiler Construction. 247 - 262. LNCS. 2006.
Download it [Hack06b] Sebastian Hack, Gerhard Goos Optimal register allocation for SSA-form programs in polynomial time. Information Processing Letters. 98(4): 150 - 155. 2006.
Download it [George96] Lal George and Andrew W Appel Iterated Register Coalescing. Transactions on Programming Languages and Systems (TOPLAS). 18(3): 300 - 324. ACM Press. 1996.
Download it [Pereira06] Fernando Magno Quintão Pereira, Jens Palsberg. Register Allocation After Classical SSA Elimination is NP-Complete. Foundations of Software Science and Computation Structures. 79 - 93. LNCS. 2006.
Download it [Pereira05] Fernando Magno Quintão Pereira, Jens Palsberg. Register Allocation via Coloring of Chordal Graphs. The Third Asian Symposium on Programming Languages and Systems. 315 - 329. Spring. 2005.



Write to me

Last update: May 6th, 2006.