Wednesday, August 30, 2006

Documentation

I am back working on LLVM. This time I decided to write some documentation about the infrastructure that LLVM provides for register allocation. Check the html.

Thursday, August 24, 2006

Short break

I will not be working on the Summer of Code project on the next four days, i.e, there will not be any posting on this period. On this Tuesday night I return working on the project. I have many ideas of how to optimize the code that is been produced.

Tuesday, August 22, 2006

post X pre deconstruction

I finished compiling one comparison between my algorithm, and LLVM's linear scan algorithm. The results are not very good yet. But I plan to improve them. I already have some ideas of how to do this. Check the comparative table here. In this table, linear scan is the beta compiler, which is producing (unfortunately for me) faster code. By the way, I hope the window of your browser is big, otherwise it will be a bit painful to see all the results nicely aligned into columns.

Monday, August 21, 2006

Final code

With the new C sources, my compiler could pass all the expected tests on C files. I could not find all the C++ libraries, so, it is not compiling all the .cpp files. But the new table is much better than the old one. The final code is here. I still plan to carry on many optimizations on this code, but I think the post-deconstruction of phi-functions will not add any benefits to the register allocator. What should be done is a graph based algorithm that uses maximum cardinality search to find a good coloring. This is my next step, and my project for this September.

Saturday, August 19, 2006

stdlib.h

I've discovered why so many tests were failing: my cfrontend was wrong. My version of LLVM does not compile anything that imports stdlib.h. I sort of fixed this by copying the sources from the gcc compiler in my OS (gcc 4.0.1) to
cfrontend/ ppc/ llvm-gcc/ lib/gcc/ powerpc-apple-darwin8.5.0/ 3.4-llvm/ old_include/ stdlib.h
. It seems that it solved this problem, but there are some other problems when compiling cpp files. I tried many tests by hand, using this script. My compiler passed all of them, with the new gcc front end. Also, I discovered another bug: I am not handling the translation of two address instructions. Now I think I am handling it, at least for the Power PC architecture. I implemented a small pass to make this conversion, after register allocation has been performed. The code is here: .cpp and .h. Actually, my compiler did not pass these tests: 2006-01-23-InitializedBitField.c, 2006-01-23-UnionInit.c, but linear scan failed too, so I will assume that it is a bug in the platform.

I just remember that this may be important: besides my own source code, I am changing the following files from LLVM:

include/llvm/Passes.h
include/llvm/Target/MRegisterInfo.h
lib/Target/X86/X86RegisterInfo.h
lib/Target/X86/X86RegisterInfo.h
lib/Target/PPC/PPCRegisterInfo.h
lib/Target/PPC/PPCRegisterInfo.h
lib/Target/PowerPC/PPCTargetMachine.cpp
lib/CodeGen/Passes.cpp

Friday, August 18, 2006

Night tests

Yesterday I tried to run the night test, but something went quite wrong, and it end up stuffing 2 giga of data into my university account. Well, the computer complained all right, and it did not worked out. Anyways, it produced this table. I decided to upgrade my version of LLVM, and tried the whole thing again. The tests are running in this very moment, while I am writing this report. My command line was:
$HOME/Programs/ppc_llvm/utils/NewNightlyTest.pl -parallel -nickname pronesto :pserver:anon@llvm-cvs.cs.uiuc.edu:/var/cvs/llvm $HOME/Programs/ppc_llvm/projects/fer_tests $HOME/Programs/ppc_llvm/projects/fer_results.

Thursday, August 17, 2006

More about time.

Today I changed a little bit the structure of the compiler, in order to improve the velocity of the register allocation pass. Also, I run some more tests, mostly concerning floating point numbers. The algorithm is now much faster than one week ago, but it is still slower than linear scan. A comparison between the time of both algorithms is given here, for this program. However, linear scan is producing better code. This is due to two main reasons: 1) the excessive number of instructions inserted by the phi deconstruction algorithm, 2) the spilling strategy. I will try to optimize the spilling strategy tomorrow. About the phi deconstruction algorithm, the extra instructions may be a problem, because each swap demands three instructions, instead of two that would be used in simple move insertion. I will have to look more into this problem. By the way, the most recent version of my code can be obtained here.

Wednesday, August 16, 2006

The bug killer!

Good: now I can use valgrind. How? I had to port everything to a Linux machine. This brought endless sadness, so, I hope I don't have to do this again. Anyways, to tell LLVM on the X86 to generate code for Power PC is very easy: valgrind -v llc -march=ppc32 -f u8.bc -o ch.s. With valgrind I found the unmerciful error in my previous code; that error had costed me more than four hours of debugging. I think I fixed most of the memory errors, although I still may be having problems. I got this message here from valgrind. Do you know what it means? Well, I sort of know, but I am too busy with the other benchmarks, and I will look into it later. I still have to be able to use the automatic testing tool that comes with LLVM. Hum... there is something trick though: I cannot implement floating point swaps with XOR instructions... so, I am using three operations to swap a and b: sSUB a, b, a, SUB b, b, a, ADD a, b, a. It worked fine with all the benchmarks that I have tested, but... if someone uses financial applications, please, don't use this compiler. Actually, if you use financial applications, don't use IEEE floating point numbers.

Tuesday, August 15, 2006

Dangling pointers

Today I tried to install valgrind, but I discover that it works only on Linux. I tried for a while to put the linux working in the Mac, but it was bring more pain than gain, and I gave up. I discovered one bug in the phi deconstruction algorithm. It is very weird: the memory slot @0x8d29fdc is been overwritten by an operating that has nothing to do if that memory location. I wonder how many dangling pointers I have in the code...

Saturday, August 12, 2006

Latest time improvements

I've got some more time improvements in the implementation of the register allocation algorithm. Now I am using better data structures to encode the live ranges of variables. I still feel that it would have been better to use the data structures of linear scan, right in the beginning. However, by developing my own gadgets, I acquired more experience with C++, and a better understanding of compilers. I will put some more effort in improving the time of my implementation. The greatest problem now is the interference graph that I am using to know conflicts between pre-colored registers and the virtual registers. In order to implement a better algorithm, I am marking the conflicts between physical registers that are already present in the code, and the virtual registers. The construction of this interval graph is taking a lot of time. I am seriously thinking about removing this graph altogether, and spilling registers every time it is necessary. Anyways, check the time improvements here.

Wednesday, August 09, 2006

Speeding up

Good news! I am folding instructions, and I've greatly improved the time of the edge liveness analysis algorithm: more than 10 times speed up! The problem is my inexperience in C++. I was passing data structures by copy... thousands and thousands of times. The new algorithm is much, much faster, and takes about the time that the interval analysis used in LLVM does. Check the new code here, and take a look into the time improvement.

Sunday, August 06, 2006

Statistics

Today I tried to run LLVM tests, but I could not do it automatically. I think it has to do with some configuration issues. I run some tests manually, and the results are not good. Firstly, my algorithm does not produces good code (it is producing around 10% more instructions than linear scan); second, it is very slow. The edge analysis is taking most of the time. Take a look into the comparative result generated during the compilation of this program. This coming week I will try to optimize the algorithm. Today I started with some simple coalescing: when necessary to find a register for a definition, if it is a copy instruction, I try to reuse the register in the used virtual, if its live range is dead. I think that the greatest optimization will be instruction folding. I will try to look into this tomorrow.

Saturday, August 05, 2006

More tests

Today I finished some more tests, and all of them were success! Check my list of benchmarks here. All of these are small programs, that manipulate mostly integer data. Right now I am running LLVM tests, but it seems to be lasting forever, and I don't see any thing pointing to an end. More news tomorrow.

Friday, August 04, 2006

More programs compiled!

Today I finishes some fixes in the phi-deconstruction algorithm. Due to spilling, it is now quite different from the elegant algorithm firstly proposed. The register allocator has passed many more tests. Check this directory to see the programs that I have successfully compiled. Tomorrow I plan to look into the bug found in this assembly.

Wednesday, August 02, 2006

Sad truth

Today I had to face a frustrating truth: it is not possible to assign the same memory address to phi related virtuals. Wanna know why? Look into this simple sketch of a control flow graph. I will have to change the phi deconstruction algorithm again. The problem is that I did not thought about how spills would affect the algorithm. The problem now is that the parameter and the definition of the same phi function can be spilled values. If the register pressure is big enough, there will be no register to perform the memory transfer.

Tuesday, August 01, 2006

Phi-related virtuals

Today I implemented a pass to find which virtuals are related by the phi-functions. Phi related virtuals represent the same variable in the original program. Many things can be accomplished with this information. For example, it is possible to map all these variables to the same stack slot, in order to minimize the stack size. For example, these are the phi related virtuals for this control flow. The code of the pass is here: .cpp, and .h.

Ah, if you ever have problems such as Pass existing, but not found, remember to include llvm/PassSuport.h in the source code of your pass.