For a while now, I have been incrementally (and slowly) building a compiler for my language numera, which is heavily based on the examples of the “dragon book” written by Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. While the project is still in progress, I have managed to enter a stage where the compiler produces x86_64 assembly language that executes correctly. This post will go over some details and design choices made during the development.

Grammar

The grammar of the language is shown above.

program     ->      functions
functions   ->      functions function | ε
function    ->      define type id ( params) block
block       ->      { decls stmts }
decls       ->      decls decl | ε
decl        ->      type id;
stmts       ->      stmts stmt | ε
stmt        ->      type id = bool;
            |       type id [ num ];
            |       loc = bool;
            |       if ( bool ) stmt
            |       if ( bool ) stmt else stmt
            |       while ( bool ) stmt
            |       return expr;
            |       break;
            |       block
params      |       ε | type id, params
args        |       ε | id, args
loc         ->      loc [ bool ] | id | id ( args )
bool        ->      bool || join | join
join        ->      join && equality | equality
equality    ->      equality == rel | equality != rel | rel
rel         ->      expr < expr | expr <= expr | expr >= expr |
                    expr > expr | expr
expr        ->      expr + term | expr - term | term
term        ->      term * unary | term / unary | unary
unary       ->      ! unary | - unary | factor
factor      ->      ( bool ) | loc | num | real | true | false

It is a procedual language consisting of functions that call each other. All variables are defined within a function. This means that global variables are not permitted (at least for now). This decision was made to simplify the memory management and run-time environment. When all the variables are local they are associated with only one procedure, simplifying the run-time stack management. Another decision made in an effort to simplify the run-time stack management was to prohibit nested functions.

All the variables of a function need to be declared before assigning values to them, meaning that the grammar does not allow e.g. int a = 1. All declarations also need to be declared before the statements associated with the function. This approach makes it easier to allocate memory for the variables on the run-time stack.

Types

The primitive types of the language are:

  • int (64 bits)
  • float (64 bits)
  • bool (64 bits, accepts true or false)

In order to simplify the x86_64 code generation the size of all types was set to 64 bits.

In addition to the primitive types the following types are also supported:

  • []int
  • []float
  • []bool

Runtime environment

The runtime environment of the compiler uses static allocation, meaning that all the (local) variables of the function are allocated space on the run-time stack. All the information needed to execute the function is stored in an activation record. An activation record holds the memory addresses of all the local variables and temporary values in a case there is any. If another function is called within the function, the parameters of the callee function are stored in the activation record of the caller. Activation records of a program recide on the run-time stack. The activation record of the function being executed resides on top of the stack. Once the execution of the function is complete, its activation record is popped from the stack. The activation record of the caller now resides on top of the stack and control is returned to the caller. Below is an illustration of this process where currently executing
function g(x, y) is called by function f().

 -----------
|f          |
|integer a  |
|integer b  |
|integer x  |
|integer y  |
 -----------    <- pointer to the beginning of the activation
|g          |      record of g stored in register R
|integer c  |
|integer d  |
# We can access the values of the parameters using register R.
# The value of x is stored in <address stored in R> + 16 and
# the value of y is stored in <address stored in R> + 8.
# The values of c and d are stored in
# <address stored in R> - 8 and <address stored in R> - 16
# respectively. Note that stack grows towards lower addresses.
 -----------    <- pointer to the end of activation record of
                   g stored in the stack pointer SP

After we are done with function g(x, y) we increase the address stored in SP by 16 bytes (it now points to the end of activation record of f() which is in control of the execution after done calling g(x, y)).

Code generator

The target chosen for the compiler was x86_64. The compiler uses the following registers:

  • %rax, reserved for handling return values (also used as scratch register)
  • %rbp, reserved, points to the beginning of the activation record of the executing function
  • %rsp, reserved, points to the top of the stack (end of the activation record of the execution function)
  • %rcx, reserved, used as a scratch register for arithmetic and logical operations
  • %rdx, reserved, used as a scratch register for arithmetic and logical operations
  • %rsi, used freely
  • %rdi, used freely
  • %r8, used freely
  • %r9, used freely
  • %r10, used freely
  • %r11, used freely
  • %r12, used freely
  • %xmm0, reserved for handling return values (also used as scratch register)
  • %xmm1, reserved, used as a scratch register for arithmetic and relational operations
  • %xmm2, reserved, used as a scratch register for arithmetic and relational operations
  • %xmm3, used freely
  • %xmm4, used freely
  • %xmm5, used freely
  • %xmm6, used freely
  • %xmm7, used freely
  • %xmm8, used freely
  • %xmm9, used freely
  • %xmm10, used freely
  • %xmm11, used freely
  • %xmm12, used freely
  • %xmm13, used freely
  • %xmm14, used freely
  • %xmm15, used freely

An example program and generated x86_64 assembly is displayed below.

// program
define int fib(int i) {
    if (i == 0) return 1;
    if (i == 1) return 1;
    return fib(i-1)+fib(i-2);
}

define int main() {
    int[10] n; int i;
    i = 0;
    while (i < 10) {
        n[i] = fib(i);
        i = i + 1;
    }
    return 0;
}
// generated code
.global main
fib:
	pushq %rbp
	movq %rsp, %rbp
	subq $64, %rsp
L1:
	movq 16(%rbp), %rdx
	movq $0, %rcx
	cmpq %rcx, %rdx
	sete %al
	movzx %al, %rcx
	movq %rcx, -8(%rbp)
	movq -8(%rbp), %r12
	movq %r12, %rcx
	test %rcx, %rcx
	je L3
L4:
	movq $1, %rax
	jmp L2
L3:
	movq 16(%rbp), %rdx
	movq $1, %rcx
	cmpq %rcx, %rdx
	sete %al
	movzx %al, %rcx
	movq %rcx, -16(%rbp)
	movq -16(%rbp), %r11
	movq %r11, %rcx
	test %rcx, %rcx
	je L5
L6:
	movq $1, %rax
	jmp L2
L5:
	movq 16(%rbp), %rcx
	movq $1, %rdx
	subq %rdx, %rcx
	movq %rcx, -24(%rbp)
	movq -24(%rbp), %r10
	pushq %r10
	call fib
	movq %rax, -32(%rbp)
	addq $8, %rsp
	movq 16(%rbp), %rcx
	movq $2, %rdx
	subq %rdx, %rcx
	movq %rcx, -40(%rbp)
	movq -40(%rbp), %r12
	pushq %r12
	call fib
	movq %rax, -48(%rbp)
	addq $8, %rsp
	movq -32(%rbp), %r12
	movq -48(%rbp), %r11
	movq %r12, %rcx
	movq %r11, %rdx
	addq %rdx, %rcx
	movq %rcx, -56(%rbp)
	movq -56(%rbp), %rax
	jmp L2
L2:
	addq $64, %rsp
	popq %rbp
	ret
main:
	pushq %rbp
	movq %rsp, %rbp
	subq $112, %rsp
L7:
	movq $0, %rcx
	movq %rcx, -88(%rbp)
L9:
	movq -88(%rbp), %rdx
	movq $10, %rcx
	cmpq %rcx, %rdx
	setl %al
	movzx %al, %rcx
	movq %rcx, -96(%rbp)
	movq -96(%rbp), %r12
	movq %r12, %rcx
	test %rcx, %rcx
	je L10
L11:
	movq -88(%rbp), %rcx
	movq $8, %rdx
	imulq %rcx, %rdx
	movq %rdx, -104(%rbp)
	pushq -88(%rbp)
	call fib
	movq %rax, -112(%rbp)
	addq $8, %rsp
	movq -112(%rbp), %r12
	movq -104(%rbp), %r11
	movq %r12, %rcx
	movq %rcx, -80(%rbp, %r11)
L12:
	movq -88(%rbp), %rcx
	movq $1, %rdx
	addq %rdx, %rcx
	movq %rcx, -88(%rbp)
	jmp L9
L10:
	movq $0, %rax
	jmp L8
L8:
	addq $112, %rsp
	popq %rbp
	ret

From the example it can be seen that no optimizations have yet been implemented in the code generator. The example displays low hanging fruits such as

//...
	jmp L2
L2:
//...

and

//...
	movq %rcx, -96(%rbp)
	movq -96(%rbp), %r12
	movq %r12, %rcx
//...

There compiler does not yet have any I/O implemented, so I have been using gdb to inspect the results of the compiled program. The results of the example program can be inspected with gdb by setting a breakpoint to L8. Once the breakpoint is reach, use info register rbp to obtain the address correspointing to the base of the activation record of main(), 0x7fffffffdcc0 in my case for example. By executing the command x/20x (<value stored in %rbp> - 80) we obtain all the values stored in int[10] n:

(gdb) x/20x (0x7fffffffdcc0 - 80)
0x7fffffffdc70:	0x00000001	0x00000000	0x00000001	0x00000000
0x7fffffffdc80:	0x00000002	0x00000000	0x00000003	0x00000000
0x7fffffffdc90:	0x00000005	0x00000000	0x00000008	0x00000000
0x7fffffffdca0:	0x0000000d	0x00000000	0x00000015	0x00000000
0x7fffffffdcb0:	0x00000022	0x00000000	0x00000037	0x00000000

Intermediate representation

The compiler also allows inspection of three-address code representation of the program. Three-address code of the program displayed in the previous chapter is shown below.

fib:
	begin 0
L1:
	t1 = i == 0
	iffalse t1 goto L3
L4:
	ret 1
	goto L2
L3:
	t2 = i == 1
	iffalse t2 goto L5
L6:
	ret 1
	goto L2
L5:
	t3 = i - 1
	param t3
	t4 = call fib 1
	t5 = i - 2
	param t5
	t6 = call fib 1
	t7 = t4 + t6
	ret t7
	goto L2
L2:
	end
main:
	begin 88
L7:
	i = 0
L9:
	t8 = i < 10
	iffalse t8 goto L10
L11:
	t9 = i * 8
	param i
	t10 = call fib 1
	n [t9] = t10
L12:
	i = i + 1
	goto L9
L10:
	ret 0
	goto L8
L8:
	end

Next steps

The planning to continue the incremental development of the compiler by implementing the following steps:

  • Peephole optimization
  • Testing and profiling environment
  • I/O
  • Switch statements
  • More convenient alternatives for array initialization
  • Size of boolean to 8 bits
  • 32-bit types (current int -> long, current float -> double)