Mlaskal compiler support classes
Assignment 5

Assignment 4 describes the basics of the symbol tables. In Assignments 5 and 6, the symbol tables are accessed as read-only (except for definition/usage checking of labels in Assignment 6).

This section describes the principles of code generation and certain symbol table details required to handle variable and constant references in the code.

Code generation

During generation, intermediate is represented by code fragments which are held by the smart pointer mlc::icblock_pointer. Each fragment contains a sequence of instructions; in addition, positions in the sequence can be marked with labels (which will be used to handle branching, loops, and goto's in Assignment 6).

The function mlc::icblock_create creates an empty block; the function mlc::icblock_merge_and_kill is used to concatenate two or more blocks (note that it does the concatenation by moving all the instructions to a newly created block; i.e. the source blocks are emptied).

Instructions may be appended only at the end of a block, using the function mlaskal::labeled_icblock::append_instruction.

The complete code fragment corresponding to a procedure or function shall be passed to the function mlc::symbol_tables::set_subprogram_code. For the main block, the corresponding function is mlc::symbol_tables::set_main_code. Both functions require the name of the subroutine or the name of the program (from the program IDENT; clause).

These functions will automatically add the necessary prologue and epilogue code (allocating/releasing local variables; return instructions).

Creating instructions

Each instruction is represented by an object owned by the pointer mlaskal::ai_ptr; the instruction objects are created by calling the template function mlaskal::make_ai. The template argument is a C++ type which represents the instruction mnemonics; the namespace ai holds all these types.

The following example appends a HALT instruction to a code fragment:

void add_halt(mlc::icblock_pointer ib)
{
ib->append_instruction( mlaskal::make_ai< ai::HALT>());
}

This could be abbreviated using the template function mlaskal::labeled_icblock::append:

void add_halt(mlc::icblock_pointer ib)
{
ib->append< ai::HALT>();
}

Loading constants

Instructions which load a constant require an index to the corresponding lexical table of constants, e.g.:

{
ib->append_instruction( mlaskal::make_ai< ai::LDLITI>( i));
}

This could be abbreviated as:

{
ib->append< ai::LDLITI>( i);
}

The instruction LDLITB accepts the bool value directly.

Accessing variables

Instructions which access variables carry an address:

Variable addresses are automatically computed by symbol tables when variable entries are created (note that variable entries representing function parameters are generated by the symbol tables upon entering the function body). The address is returned by the function mlc::variable_symbol::address which is inherited by the following symbol table entries:

The following example generates code which sums the values of a global and a local variable and stores the result into a parameter passed by reference:

void example(mlc::icblock_pointer ib,
{
ib->append< ai::GLDI>(gv->address()); // read from the global variable
ib->append< ai::LLDI>(lv->address()); // read from the local variable
ib->append< ai::ADDI>(); // add the values
ib->append< ai::LLDP>(rv->address()); // read the address
ib->append_instruction(new XSTI()); // write to the actual parameter
}

Array handling

For the [] operator, the left operand must be of a type represented by mlc::array_type. The corresponding element and index types may then be determined by element_type and index_type. The index type shall be of a mlc::range_type which carries the lowerBound and upperBound functions returning indexes of constants.

Array access requires pointer arithmetics, consisting of the following instructions:

Function return values

Function return values are defined by assignment statements with function name on the left-hand side. This situation is recognized using the kind of the symbol entry for the left-hand side identifier; however, correctness must be verified using nested and mlc::symbol_tables::my_function_name. If correct, the return value is defined by storing (using LSTI etc.) to the address returned by mlc::symbol_tables::my_return_address. Note that mlaskal functions can not return arrays.

Calling functions and procedures

Functions and procedures are called by the CALL instruction. Their symbol table entries (mlc::function_symbol and mlc::procedure_symbol) inherit the function code which returns a (symbolic) address of the subprogram code. The CALL instruction is generated as follows:

{
ib->append< ai::CALL>(sp->code());
}

Before the CALL instruction, actual parameters shall be stored on the stack. The actual arguments must be compared to the formal argument list, differences in number or types must be reported as errors while but minor type differences (integer vs. real) must be solved by inserting a conversion instruction (CVRTIR or CVRTRI). In addition, formal arguments decide if the argument is passed by reference (in this case, no type difference is allowed). (You may use mlc::identical_type to compare types but you must handle the integer vs. real cases specially.)

The formal argument list is represented by the container mlc::parameter_list_body returned by the function mlc::subprogram_symbol::parameters. The container has begin, end, and size methods like any C++ container.

Each argument is described by a mlc::parameter_entry object containing the following data members:

In the case of arguments passed by reference (Assignment 6), an address of the actual argument must be passed (see Array handling for pointer arithmetics instructions).

Calling convention

The calling convention is dictated by the symbol tables which compute the addresses of arguments and return values as seen by the called function. Therefore, the calling code must adhere to these rules:

When calling a function, a space for the return value must be created before storing the actual arguments, using an instruction like INITI.

The arguments must be pushed to the stack in the left-to-right order, i.e. the rightmost argument will appear at the top of the stack in the moment of CALL.

After the CALL instruction, space occupied by the arguments must be freed by instructions like DTORI. Note that these instructions must be appropriately typed.