Numera: An effort to build a compiler
For a while now, I have been incrementally (and slowly) building a compiler for my language numera, which is heavily based on the examples of the “dragon book” written by Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. While the project is still in progress, I have managed to enter a stage where the compiler produces x86_64
assembly language that executes correctly. This post will go over some details and design choices made during the development.
Grammar
The grammar of the language is shown above.
program -> functions
functions -> functions function | ε
function -> define type id ( params) block
block -> { decls stmts }
decls -> decls decl | ε
decl -> type id;
stmts -> stmts stmt | ε
stmt -> type id = bool;
| type id [ num ];
| loc = bool;
| if ( bool ) stmt
| if ( bool ) stmt else stmt
| while ( bool ) stmt
| return expr;
| break;
| block
params | ε | type id, params
args | ε | id, args
loc -> loc [ bool ] | id | id ( args )
bool -> bool || join | join
join -> join && equality | equality
equality -> equality == rel | equality != rel | rel
rel -> expr < expr | expr <= expr | expr >= expr |
expr > expr | expr
expr -> expr + term | expr - term | term
term -> term * unary | term / unary | unary
unary -> ! unary | - unary | factor
factor -> ( bool ) | loc | num | real | true | false
It is a procedual language consisting of functions that call each other. All variables are defined within a function. This means that global variables are not permitted (at least for now). This decision was made to simplify the memory management and run-time environment. When all the variables are local they are associated with only one procedure, simplifying the run-time stack management. Another decision made in an effort to simplify the run-time stack management was to prohibit nested functions.
All the variables of a function need to be declared before assigning values to them, meaning that the grammar does not allow e.g. int a = 1
. All declarations also need to be declared before the statements associated with the function. This approach makes it easier to allocate memory for the variables on the run-time stack.
Types
The primitive types of the language are:
int
(64 bits)float
(64 bits)bool
(64 bits, acceptstrue
orfalse
)
In order to simplify the x86_64
code generation the size of all types was set to 64 bits.
In addition to the primitive types the following types are also supported:
[]int
[]float
[]bool
Runtime environment
The runtime environment of the compiler uses static allocation, meaning that all the (local) variables of the function are allocated space on the run-time stack. All the information needed to execute the function is stored in an activation record. An activation record holds the memory addresses of all the local variables and temporary values in a case there is any. If another function is called within the function, the parameters of the callee function are stored in the activation record of the caller. Activation records of a program recide on the run-time stack. The activation record of the function being executed resides on top of the stack. Once the execution of the function is complete, its activation record is popped from the stack. The activation record of the caller now resides on top of the stack and control is returned to the caller. Below is an illustration of this process where currently executing
function g(x, y)
is called by function f()
.
-----------
|f |
|integer a |
|integer b |
|integer x |
|integer y |
----------- <- pointer to the beginning of the activation
|g | record of g stored in register R
|integer c |
|integer d |
# We can access the values of the parameters using register R.
# The value of x is stored in <address stored in R> + 16 and
# the value of y is stored in <address stored in R> + 8.
# The values of c and d are stored in
# <address stored in R> - 8 and <address stored in R> - 16
# respectively. Note that stack grows towards lower addresses.
----------- <- pointer to the end of activation record of
g stored in the stack pointer SP
After we are done with function g(x, y)
we increase the address stored in SP
by 16 bytes (it now points to the end of activation record of f()
which is in control of the execution after done calling g(x, y)
).
Code generator
The target chosen for the compiler was x86_64
. The compiler uses the following registers:
%rax
, reserved for handling return values (also used as scratch register)%rbp
, reserved, points to the beginning of the activation record of the executing function%rsp
, reserved, points to the top of the stack (end of the activation record of the execution function)%rcx
, reserved, used as a scratch register for arithmetic and logical operations%rdx
, reserved, used as a scratch register for arithmetic and logical operations%rsi
, used freely%rdi
, used freely%r8
, used freely%r9
, used freely%r10
, used freely%r11
, used freely%r12
, used freely%xmm0
, reserved for handling return values (also used as scratch register)%xmm1
, reserved, used as a scratch register for arithmetic and relational operations%xmm2
, reserved, used as a scratch register for arithmetic and relational operations%xmm3
, used freely%xmm4
, used freely%xmm5
, used freely%xmm6
, used freely%xmm7
, used freely%xmm8
, used freely%xmm9
, used freely%xmm10
, used freely%xmm11
, used freely%xmm12
, used freely%xmm13
, used freely%xmm14
, used freely%xmm15
, used freely
An example program and generated x86_64 assembly is displayed below.
// program
define int fib(int i) {
if (i == 0) return 1;
if (i == 1) return 1;
return fib(i-1)+fib(i-2);
}
define int main() {
int[10] n; int i;
i = 0;
while (i < 10) {
n[i] = fib(i);
i = i + 1;
}
return 0;
}
// generated code
.global main
fib:
pushq %rbp
movq %rsp, %rbp
subq $64, %rsp
L1:
movq 16(%rbp), %rdx
movq $0, %rcx
cmpq %rcx, %rdx
sete %al
movzx %al, %rcx
movq %rcx, -8(%rbp)
movq -8(%rbp), %r12
movq %r12, %rcx
test %rcx, %rcx
je L3
L4:
movq $1, %rax
jmp L2
L3:
movq 16(%rbp), %rdx
movq $1, %rcx
cmpq %rcx, %rdx
sete %al
movzx %al, %rcx
movq %rcx, -16(%rbp)
movq -16(%rbp), %r11
movq %r11, %rcx
test %rcx, %rcx
je L5
L6:
movq $1, %rax
jmp L2
L5:
movq 16(%rbp), %rcx
movq $1, %rdx
subq %rdx, %rcx
movq %rcx, -24(%rbp)
movq -24(%rbp), %r10
pushq %r10
call fib
movq %rax, -32(%rbp)
addq $8, %rsp
movq 16(%rbp), %rcx
movq $2, %rdx
subq %rdx, %rcx
movq %rcx, -40(%rbp)
movq -40(%rbp), %r12
pushq %r12
call fib
movq %rax, -48(%rbp)
addq $8, %rsp
movq -32(%rbp), %r12
movq -48(%rbp), %r11
movq %r12, %rcx
movq %r11, %rdx
addq %rdx, %rcx
movq %rcx, -56(%rbp)
movq -56(%rbp), %rax
jmp L2
L2:
addq $64, %rsp
popq %rbp
ret
main:
pushq %rbp
movq %rsp, %rbp
subq $112, %rsp
L7:
movq $0, %rcx
movq %rcx, -88(%rbp)
L9:
movq -88(%rbp), %rdx
movq $10, %rcx
cmpq %rcx, %rdx
setl %al
movzx %al, %rcx
movq %rcx, -96(%rbp)
movq -96(%rbp), %r12
movq %r12, %rcx
test %rcx, %rcx
je L10
L11:
movq -88(%rbp), %rcx
movq $8, %rdx
imulq %rcx, %rdx
movq %rdx, -104(%rbp)
pushq -88(%rbp)
call fib
movq %rax, -112(%rbp)
addq $8, %rsp
movq -112(%rbp), %r12
movq -104(%rbp), %r11
movq %r12, %rcx
movq %rcx, -80(%rbp, %r11)
L12:
movq -88(%rbp), %rcx
movq $1, %rdx
addq %rdx, %rcx
movq %rcx, -88(%rbp)
jmp L9
L10:
movq $0, %rax
jmp L8
L8:
addq $112, %rsp
popq %rbp
ret
From the example it can be seen that no optimizations have yet been implemented in the code generator. The example displays low hanging fruits such as
//...
jmp L2
L2:
//...
and
//...
movq %rcx, -96(%rbp)
movq -96(%rbp), %r12
movq %r12, %rcx
//...
There compiler does not yet have any I/O implemented, so I have been using gdb
to inspect the results of the compiled program. The results of the example program can be inspected with gdb
by setting a breakpoint to L8. Once the breakpoint is reach, use info register rbp
to obtain the address correspointing to the base of the activation record of main()
, 0x7fffffffdcc0
in my case for example. By executing the command x/20x (<value stored in %rbp> - 80)
we obtain all the values stored in int[10] n
:
(gdb) x/20x (0x7fffffffdcc0 - 80)
0x7fffffffdc70: 0x00000001 0x00000000 0x00000001 0x00000000
0x7fffffffdc80: 0x00000002 0x00000000 0x00000003 0x00000000
0x7fffffffdc90: 0x00000005 0x00000000 0x00000008 0x00000000
0x7fffffffdca0: 0x0000000d 0x00000000 0x00000015 0x00000000
0x7fffffffdcb0: 0x00000022 0x00000000 0x00000037 0x00000000
Intermediate representation
The compiler also allows inspection of three-address code representation of the program. Three-address code of the program displayed in the previous chapter is shown below.
fib:
begin 0
L1:
t1 = i == 0
iffalse t1 goto L3
L4:
ret 1
goto L2
L3:
t2 = i == 1
iffalse t2 goto L5
L6:
ret 1
goto L2
L5:
t3 = i - 1
param t3
t4 = call fib 1
t5 = i - 2
param t5
t6 = call fib 1
t7 = t4 + t6
ret t7
goto L2
L2:
end
main:
begin 88
L7:
i = 0
L9:
t8 = i < 10
iffalse t8 goto L10
L11:
t9 = i * 8
param i
t10 = call fib 1
n [t9] = t10
L12:
i = i + 1
goto L9
L10:
ret 0
goto L8
L8:
end
Next steps
The planning to continue the incremental development of the compiler by implementing the following steps:
- Peephole optimization
- Testing and profiling environment
- I/O
- Switch statements
- More convenient alternatives for array initialization
- Size of boolean to 8 bits
- 32-bit types (current int -> long, current float -> double)