Compiler Principles (NSWI098)

points	mark
600 or more	1 (excellent)
450-599	2 (well done)
350-449	3 (OK)
349 or less	failed

Read the Using your Cecko repository section before cloning the skeleton. The framework and skeleton files may be updated for various reasons, therefore, you will need a method to merge the updates into your project.

You will need to prepare your building environment as described in Prerequisites.

Assignment 1

You may use the C++ header/source files for C++ routines called from calexer.lex - in this case, add #include "casem.hpp" to calexer.lex.

Assignments 2 to 5

Test files

The Cecko source files have the usual suffix .c, the corresponding expected outputs are in the .gold files.

The test files whose names end with "n" contain intentional semantic errors and are expected to fail during compilation.

Submitting and evaluation

Note: For registering in Recodex, you need a working email address registered in CAS.

Recodex will compile it using GCC 12.2. Your solution should not use any 3rd party libraries or code (except those specified in the section Building and running Cecko). Only standard C++20 libraries should be used.

Recodex will compile your solution with the same library and will apply the same tests as in the framework. (However, Recodex will use a different build system.)

Recodex will insist on strict equivalence of your solution outputs to the gold files. However, we will manually inspect results of all tests which can compile and run but fail in the Recodex judge phase. Therefore, you may be assigned some points even if Recodex assigns zero.

Each home assignment has a maximal achievable points (100 points for assignments #1-#2, 150+ points for assignments #3-#5). Any difference (including your debug messages) between your compiler output and a provided correct output is penalized. The penalization depends on the severity of the problem.

(1) Lexical analysis

Extend the calexer.lex file so that the resulting scanner is capable of recognizing all the lexical elements described in doc/selected-grammar.md and returns the objects which represent them at the interface between the lexer and the parser.
All objects representing tokens are created using a call to a function like cecko::parser::make_WHILE(...) whose declarations are generated in <build-folder>/fmwk/ckdumper.hpp by the build system for cecko1. All these functions have a parameter location_type l, pass ctx->line() here.
The list of token names (and additional parameter types) may be found in ckdumper.y (it has to be the same as in caparser.y which is not used when building cecko1)
Some tokens (like >,<,>,<=,>= or +,-) are grouped together. The object representing the token is shared across each group (it will become the same terminal in the grammar); individual tokens are distinguished using an additional parameter of the object (an enumeration type associated to the group, like cecko::gt_cmpo::LT - see ckgrptokens.hpp).
Identifiers and literals have an additional parameter of the object - the text or value - pass the value of the correct type here
String literals shall not contain end-of-line (non-escaped \n). You will need 'start conditions' feature of FLEX to implement escape sequences. The effective value (with quotes removed and escapes replaced) shall be passed to make_STRLIT as std::string (cecko::CIName is just an alias)
Character literals shall not contain end-of-line (non-escaped \n) and they represent always only one char in Cecko. Remember, characters have type int in C.
Multiline comments are enclosed in /* and */ (not inside a string or one-line comment). Comments can contain any sequence of any characters. Multi-line comments can be nested - starts (/*) and ends (*/) have to be paired correctly. You will certainly need 'start conditions' to handle this.
One-line comments start with // (not inside a string or multi-line comment) and end with end of the line. Comments can contain any sequence of any characters except the end-of-line. You will certainly need 'start conditions' to handle this.
Call ctx->incline() whenever you meet end-of-line - this increments ctx->line(). The first line has number 1 (ctx->line() is already initialized to 1). (You must not use %option yylineno to count lines.)
Detection of and recovery from lexical errors, like end-of-line or end-of-file in the middle of a string (report EOLINSTRCHR or EOFINSTRCHR and terminate the string as if the terminating quote was present), end-of-file in comment (EOFINCMT).
Detection of malformed lexical elements, like 12xy or 0xcxy (it is not an INTLIT followed by an IDF). You shall report an error (BADINT), consume the whole malformed lexical element and produce one INTLIT token (with the value of 12) in this case.
Detection of too large integer literals (larger than MAX_INT=0x7FFFFFFF; Cecko behavior is slightly different from the standard here). You shall report an error (INTOUTRANGE) and produce the INTLIT token with a value of MAX_INT. (Beware: the atoi function has undefined behavior for integers greater than MAX_INT, you will need strtol or stoi or a hand-made implementation.)
Detection of too large hexadecimal escape sequences in char and string literals (larger than 0xFF). You shall report an error (BADESCAPE) and produce the INTLIT or STRLIT with \xFF substituted for the overflowed escape sequence.
If, after a backslash, you encounter a character other than specified in simple-escape-sequence, report the error (BADESCAPE), ignore the backslash and use the character after the backslash.
Use code like ctx->message(cecko::errors::EOFINCMT, ctx->line()) to signal an error. All error codes are declared in fmwk/ckcontext.hpp, the message texts are in fmwk/ckcontext.cpp. The err_def_s messages have an additional argument of type std::string_view

Example input (*.c) and outputs (*.gold) can be found in the tests directory.

(2) Parsing

Rewrite the grammar in doc/selected-grammar.md to Bison rules in caparser.y.
The declarative part of the caparser.y file is already prepared and you should not change it.
Remove all shift/reduce and reduce/reduce conflicts. Warning: bison does not report conflicts to stdout. CMake projects instruct bison to report the conflicts in <build_folder>/stud-sol/caparser.y.output, but you have to examine this file manually.
Extend calexer.lex with TYPEIDF token detection. The lexical element identifier corresponds to two tokens IDF and TYPEIDF depending on the content of compiler tables. Use if(ctx->is_typedef(yytext)) for detecting type identifier using compiler tables. Specifically for this assignment, we have predefined FILE identifier as a type identifier.

(3) Declarations

Implement semantic analysis for declarations.

100 points:

void, _Bool, char, int, TYPEIDF, const
pointer and function declarators
variable and function declarations and definitions
return statement containing an INTLIT

+10 points:

typedef definitions

+10 points:

array declarators (with an INTLIT in brackets)

+20 points:

enum declarations and definitions

+20 points:

struct declarations and definitions

You will not be awarded any bonus points if your solution receives less than 50 points from the 100-point base part.

(4) Expressions

Implement semantic analysis for expressions.

100 points:

Integer, character, and string literals
Implicit conversions:

Array to pointer (string literals are of type char[N])
_Bool/char to int, int to char

Function calls (including variadic functions like printf)

Arguments and return values of types char/int/pointer

int operators:

Unary +, -
Binary +, -, *, /, %

Pointer operators:

Unary *, &

Assignment operator (into char/int/pointer L-values)
Return statement

+10 points:

sizeof operator

+10 points:

int ++, --, +=, -=, *=, /=, %= operators
pointer ++, --, +=, -= operators

+20 points:

[] operator
Pointer+int, int+pointer, pointer-int operators

+10 points:

.,-> operators
Assignment, function arguments and return values of type struct

You will not be awarded any bonus points if your solution receives less than 50 points from the 100-point base part.

(5) Control-flow statements

Implement semantic analysis for control-flow statements and Boolean expressions.

100 points:

Implicit conversions:

char/int/pointer to _Bool

_Bool operations:

Unary !
Arguments and return values of type _Bool
Assignment operator into _Bool

int operators:

==,!=,<,>,<=,>=

if and if-else statements
while and while-do statements

+10 points:

for statement

+10 points:

pointer operators:

==,!=,<,>,<=,>=
pointer-pointer

+30 points:

&&, || operators with shortcut evaluation

+6 points:

Complex mixture of the elements above

You will not be awarded any bonus points if your solution receives less than 50 points from the 100-point base part.

2.10.2024	Compiler - an introduction	01-comp.pptx
9.10.2024	Lexical analysis	02-la.pptx
16.10.2024	Syntax analysis - an introduction	03-sxa.pptx
23.10.2024	Syntax analysis - top-down parsing	03-sxa.pptx
30.10.2024	Syntax analysis - bottom-up parsing	03-sxa.pptx
6.11.2024	Semantic analysis	05-saatr.pptx
13.11.2024	Intermediate code	04-ic.pptx
20.11.2024	Intermediate code generation	06-genic.pptx
27.11.2024	High-level optimizations	07-opt.pptx
4.12.2024	CPU architectures	08-cpu.pptx
11.12.2024	Code generation	08-cpu.pptx
18.12.2024	Runtime	09-rt.pptx
8.1.2025	Interpreted languages	10-interp.pptx

Visual Studio CMake Configuration		CMAKE_BUILD_TYPE	Download
x64-Debug	x64-Release
(default)	(incompatible)	Debug	llvm-17.0.1-install-x86_64-msvc-19.36-debug.zip (1.2 GB)
(incompatible)	(default)	RelWithDebugInfo	llvm-17.0.1-install-x86_64-msvc-19.36-relwithdebinfo.zip (651 MB)
(incompatible)	(works)	Release	llvm-17.0.1-install-x86_64-msvc-19.36-release.zip (332 MB)

NSWI098 Course

About

Warning for Erasmus students

English-language students from the Charles University

Contact

Grading

Home Assignments

Lectures

Assignments

Global assignment

Developing and testing

Assignment 1

Assignments 2 to 5

Test files

Submitting and evaluation

(1) Lexical analysis

(2) Parsing

(3) Declarations

Implement semantic analysis for declarations.

(4) Expressions

Implement semantic analysis for expressions.

(5) Control-flow statements

Implement semantic analysis for control-flow statements and Boolean expressions.

Prerequisites

Platform

Windows Subsystem for Linux

Git

Build tools

cmake

ninja and zlib

bison and flex

C++20 compiler

LLVM

Linux (including Ubuntu 22.04 in WSL) - using system-wide installation

Linux - installing and using private LLVM copy

Windows binaries

Building LLVM (advanced)

Using your Cecko repository

Changes to Cecko in 2023/24 wrt. 2022/23

Merging updates of the skeleton into your code

Configuration

Merging

Building and running Cecko

Building

Command-line

Visual Studio 2022

Visual Studio Code

Running

Command-line

Visual Studio 2022

Visual Studio Code

Links

A free online textbook by Douglas Thain

The Dragon Book

Manuals for the tools

Recordings of lectures