covid19 Course info for the distance learning

About

This is an informational web page for the Compiler Principles course (NSWI098), which is being taught at the Faculty of Mathematics and Physics, Charles University. The lectures are being held in S3, from 15:40 to 17:10 every Tuesday. The seminars are in S3 after the lecture (from 17:20).

You can find here:

Contact

The course is currently led by Jakub Yaghob and David Bednárek. Please direct inqueries of a specific topic (or home assignment) to the teacher who presented the topic.

Latest Updates

25.9.2020 Academic year 2020/2021 preparation.

Grading

The grading criteria for this course are summarized in the following. Please, read them carefully. In case any rules are not clear, contact Dr. Jakub Yaghob.

Home Assignments

There will be 5 home assignments which are designed to incrementally build a compiler for simplified C (called Cecko).

You may receive up to 100 points for assignments #1-#2 and more than 150 points for assignments #3-#5 (100 base points and more then 50 bonus points). You only get bonus points for assignments #3-#5 if you earn at least 50 base points in the same assignment. The sum of your points will determine your final mark as follows:

pointsmark
600 or more1 (excellent)
450-5992 (well done)
350-4493 (OK)
349 or lessfailed

Furthermore, 350 points are required to receive a credit for the seminar (which is also required). If you are not satisfied with your mark, but you have at least 350 points (and thus the credit), you may request a written examination where your mark will be determined. Once subscribing for the exam, your mark received from the assignments is no longer valid — i.e., you may fail the exam even if you have enough points from the assignments.

Each assignment has a strict deadline. Once the deadline is reached, all assignments are checked and graded. You may submit your solution after the deadline and ask for (re)evaluation, but you will receive penalty 10 points per day for late submission, which is additionally subtracted from points obtained for the assignment. This penalty is applied as the last points adjustment, i.e. after the decision on the allocation of bonus points.

Lectures

The domain of compilers is relatively steady in the computer science, at least for the compiler front-end. If you have any questions or suggestions related to the lectures, please contact the teachers.

The presented schedule is only informational. Teaching time for a matter may vary slightly depending on the student's grasp.

29.9.2020 Compiler - an introduction PPTX File Icon 01-comp.pptx
6.10.2020 Lexical analysis PPTX File Icon 02-la.pptx
13.10.2020 Syntax analysis - an introduction PPTX File Icon 03-sxa.pptx
20.10.2020 Syntax analysis - top-down parsing PPTX File Icon 03-sxa.pptx
27.10.2020 Syntax analysis - bottom-up parsing PPTX File Icon 03-sxa.pptx
3.11.2020 Semantic analysis PPTX File Icon 05-saatr.pptx
10.11.2020 Intermediate code PPTX File Icon 04-ic.pptx
24.11.2020 Intermediate code generation PPTX File Icon 06-genic.pptx
1.12.2020 High-level optimizations PPTX File Icon 07-opt.pptx
8.12.2020 CPU architectures PPTX File Icon 08-cpu.pptx
15.12.2020 Code generation PPTX File Icon 08-cpu.pptx
22.12.2020 Runtime PPTX File Icon 09-rt.pptx
5.1.2021 Interpreted languages PPTX File Icon 10-interp.pptx

Assignments

Motto: Kde to žije, tam je Mlaskal. (Kate) (not valid anymore, left here as a reminder)

Global assignment

The Cecko (simplified C) language is defined by a set of grammar rules. You may find them in the file doc/selected-grammar.md in the skeleton repository. The grammar is exactly in the same format as in the C programming language specification (working draft) Annex A, it is just simplified as Cecko is a subset of C.

Please note some further remarks:

Developing and testing

Read the Using your Cecko repository section before cloning the skeleton. The framework and skeleton files may be updated for various reasons, therefore, you will need a method to merge the updates into your project.

The skeleton framework contains the following folders:

You will need to prepare your building environment as described in Prerequisites.

Refer to Building and running Cecko for instructions on building and running the code.

Assignment 1

For the first assignment, you will submit the following files:

You may use the C++ header/source files for C++ routines called from calexer.lex - in this case, add #include "casem.hpp" to calexer.lex.

Assignments 2 to 5

For the last four assignments, you will submit the following files:

Test files

The Cecko source files have the usual suffix .c, the corresponding expected outputs are in the .gold files.

The test files whose names end with "n" contain intentional semantic errors and are expected to fail during compilation.

Submitting and evaluation

You shall submit your solution into Recodex.

Note: For registering in Recodex, you need a working email address registered in CAS.

Recodex will compile it using GCC 10.2. Your solution should not use any 3rd party libraries or code (except those specified in the section Building and running Cecko). Only standard C++17 libraries should be used.

Recodex will compile your solution with the same library and will apply the same tests as in the framework. (However, Recodex will use a different build system.)

Compilation and tests in Recodex takes some time - please be patient.

Recodex will insist on strict equivalence of your solution outputs to the gold files. However, we will manually inspect results of all tests which can compile and run but fail in the Recodex judge phase. Therefore, you may be assigned some points even if Recodex assigns zero.

Each home assignment has a maximal achievable points (100 points for assignments #1-#2, 150+ points for assignments #3-#5). Any difference (including your debug messages) between your compiler output and a provided correct output is penalized. The penalization depends on the severity of the problem.

(1) Lexical analysis

  • Extend the calexer.lex file so that the resulting scanner is capable of recognizing all the lexical elements described in doc/selected-grammar.md and returns the objects which represent them at the interface between the lexer and the parser.
  • All objects representing tokens are created using a call to a function like cecko::parser::make_WHILE(...) whose declarations are generated in <build-folder>/fmwk/ckdumper.hpp by the build system for cecko1. All these functions have a parameter location_type l, pass ctx->line() here.
  • The list of token names (and additional parameter types) may be found in ckdumper.y (it has to be the same as in caparser.y which is not used when building cecko1)
  • Some tokens (like >,<,>,<=,>= or +,-) are grouped together. The object representing the token is shared across each group (it will become the same terminal in the grammar); individual tokens are distinguished using an additional parameter of the object (an enumeration type associated to the group, like cecko::gt_cmpo::LT - see ckgrptokens.hpp).
  • Identifiers and literals have an additional parameter of the object - the text or value - pass the value of the correct type here
  • String literals shall not contain end-of-line (non-escaped \n). You will need 'start conditions' feature of FLEX to implement escape sequences. The effective value (with quotes removed and escapes replaced) shall be passed to make_STRLIT as std::string (cecko::CIName is just an alias)
  • Character literals shall not contain end-of-line (non-escaped \n) and they represent always only one char in Cecko. Remember, characters have type int in C.
  • Multiline comments are enclosed in /* and */ (not inside a string or one-line comment). Comments can contain any sequence of all characters. Multi-line comments can be nested - starts (/*) and ends (*/) have to be paired correctly. You will certainly need 'start conditions' to handle this.
  • One-line comments start with // (not inside a string or multi-line comment) and end with end of the line. Comments can contain any sequence of all characters. You will certainly need 'start conditions' to handle this.
  • Call ctx->incline() whenever you meet end-of-line - this increments ctx->line(). The first line has number 1 (ctx->line() is already initialized to 1). (You must not use %option yylineno to count lines.)
  • Detection and recovery of lexical errors, like end-of-line in the middle of a string, end-of-file in comment, incorrectly paired /* */, too large integers (store only lowest 32 bits - note that behaviour of atoi function is undefined for integers greater than MAX_INT), ...
  • Detection of malformed lexical elements, like 12xy, which is not INTLIT and IDF, but incorrectly written INTLIT (value 12), because there is no space between 12 and xy. You should read the whole malformed lexical element, but the returned value is a convertible value for the given lexical element, in our case it is 12. If an overflow of the convertible value occurs, report the malformed input error first, then overflow error.
  • Use code like ctx->message(cecko::errors::EOFINCMT, ctx->line()) to signal an error. All error codes are declared in fmwk/ckcontext.hpp, the message texts are in fmwk/ckcontext.cpp. The err_def_s messages have an additional argument of type std::string_view

Example input (*.c) and outputs (*.gold) can be found in the tests directory.

(2) Parsing

  • Rewrite the grammar in doc/selected-grammar.md to Bison rules in caparser.y.
  • The declarative part of the caparser.y file is already prepared and you should not change it.
  • Remove all shift/reduce and reduce/reduce conflicts. Warning: bison does not report conflicts to stdout. CMake projects instruct bison to report the conflicts in <build_folder>/stud-sol/caparser.y.output, but you have to examine this file manually.
  • Extend calexer.lex with TYPEIDF token detection. The lexical element identifier corresponds to two tokens IDF and TYPEIDF depending on the content of compiler tables. Use if(ctx->is_typedef(yytext)) for detecting type identifier using compiler tables. Specifically for this assignment, we have predefined FILE identifier as a type identifier.

(3) Declarations

Implement semantic analysis for declarations.

  • 100 points:
    • void, _Bool, char, int, TYPEIDF, const
    • pointer and function declarators
    • variable and function declarations and definitions
    • return statement containing an INTLIT
  • +10 points:
    • typedef definitions
  • +10 points:
    • array declarators (with an INTLIT in brackets)
  • +20 points:
    • enum declarations and definitions
  • +20 points:
    • struct declarations and definitions

You will not be awarded any bonus points if your solution receives less than 50 points from the 100-point base part.

(4) Expressions

Implement semantic analysis for expressions.

  • 100 points:
    • Integer, character, and string literals
    • Implicit conversions:
      • Array to pointer (string literals are of type char[N])
      • _Bool/char to int, int to char
      • char/int/pointer to _Bool
    • Function calls (including variadic functions like printf)
      • Arguments and return values of types _Bool/char/int/pointer
    • int operators:
      • Unary +, -
      • Binary +, -, *, /, %
    • Pointer operators:
      • Unary *, &
    • Assignment operator (into _Bool/char/int/pointer L-values)
    • Return statement
  • +10 points:
    • sizeof operator
  • +10 points:
    • ++, --, +=, -=, *=, /=, %= operators
  • +20 points:
    • [] operator
    • Pointer+int, int+pointer, pointer-int, pointer-pointer operators
  • +10 points:
    • .,-> operators

You will not be awarded any bonus points if your solution receives less than 50 points from the 100-point base part.

Prerequisites

Platform

Recodex uses Linux and GCC to compile and run your submissions. It may trigger errors which escaped detection in other compile/run environments. For such cases, it is advantageous to have a GCC-based build environment prepared beforehand.

On the other hand, the Cecko sources are intended to be multi-platform and the skeleton is regularly tested with both GCC and Microsoft Visual C++. Therefore, development using MSVC is possible (and encouraged).

Windows Subsystem for Linux

For developers equipped with Windows 10, Windows Subsystem for Linux (WSL) is recommended as the best approach to multi-platform development. Compiling, running and debugging in the WSL-GCC environment is integrated in Visual Studio 2019 (as well as VS Code) and fully supported by Cecko projects.

Note: Due to the small size of Cecko, there is no significant build-performance difference neither between WSL 1 and WSL 2 nor between Windows and Linux filesystems. However, if you plan to build LLVM on your own (instead of downloading binaries), placing the working folder in the Linux filesystem of WSL 2 may shorten the build time by an hour. Ubuntu (currently 20.04 Focal) is the recommended distribution for WSL (and the default supplied by Microsoft).

Git

Make yourself familiar with the principles of git.

In Linux, you will probably use command-line git which is present in all modern Linux distributions.

In Windows, TortoiseGit is the recommended option.

You may also try using the Git plugins in Visual Studio 2019 or VS Code; however, be aware that some advanced commands may not be available there.

To access gitlab.mff.cuni.cz, you will first need to login (using your SIS credentials) in its web interface and register your SSH keys there (if you have not done so yet).

Build tools

cmake

CMake 3.13 or newer is required. It is packaged in sufficiently new distributions of Linux (including WSL), installed by something like:

sudo apt install cmake

For Windows, cmake (and ninja) is installed with Visual Studio 2019 when C++ CMake tools for Windows is enabled in the Visual Studio Installer. C++ CMake tools for Linux is required when Visual Studio is used to build in WSL or remote Linux.

When using VS Code, cmake (as well as a C++ compiler) must be installed as an external tool.

ninja and zlib

If you use Visual Studio support for Linux, you may need to manually install ninja build system like:

sudo apt install ninja-build

In some Linux distributions, you may encounter linker complaining about -lz. The remedy is installing libz like:

sudo apt install zlib1g-dev

bison and flex

bison 3.4 and flex 2.6.4 or newer are required. They are present in any (sufficiently new) distribution of Linux (including WSL), installed by something like:

sudo apt install bison flex

For Windows, they may be downloaded from lexxmark/winflexbison. Unpack the .zip to a suitable location and add the folder to the Path variable using the Edit the system environment variables control panel dialog. This package contains win_bison.exe and win_flex.exe - these names are recognized by cmake.

If you are unable to install bison/flex into system folders or if cmake does not find (the correct version of) bison or flex, you may specify it explicitly when invoking cmake:

cmake -DBISON_ROOT=<bison_install_folder> -DFLEX_ROOT=<flex_install_folder> ...

Note that the bison executable must be located in <bison_install_folder>/bin etc.

Within Visual Studio, you may specify the BISON_ROOT and FLEX_ROOT in the CMake Settings.

Note: Although Chocolatey contains a winflexbison3 package, it (as of July 2020) does not satisfy our version requirements.

Note: You cannot use WSL installations of bison/flex for MSVC builds (and vice versa) because cmake would not invoke them properly. Therefore you will probably end with two bison/flex installations if you do both MSVC and GCC builds, similarly to having two installations of cmake in this case.

C++17 compiler

MSVC 19.26 (part of Visual Studio 2019) and gcc 9.3.0 are verified to be sufficient for the compilation of Cecko.

LLVM

Cecko uses the intermediate code (IR) and JIT x86 code generator (MCJIT) from the LLVM framework. Although the LLVM team recommends building LLVM from sources, it may be well beyond the capability and patience of most Cecko developers. Therefore, downloading a binary version is recommended.

Note: Cecko requires only headers and static libraries; however, all binary distributions of LLVM include also many executables and dynamic libraries. Some downloads/packages include only the executables which is insufficient for Cecko. This also applies to some downloads at LLVM Pre-Built Binaries

Linux (including WSL) - using system-wide installation

The simplest way to install the required LLVM modules is from a Linux distribution channel, like:

sudo apt install llvm-10

cmake will automatically find LLVM if installed in this way.

Note: Although these libraries are compatible with Debug mode, they do not contain full debug info, for example, stepping into LLVM procedures during debugging of Cecko may be impossible.

Beware: The machines at u-pl*.ms.mff.cuni.cz are equipped with a binary distribution of LLVM 9.0.0 which is insufficient to build Cecko (the build will fail at linker phase). You must use a private LLVM copy there as described in the following section. Be careful when specifying LLVM_DIR for cmake - in case of any mistake, cmake silently reverts to the system instalation.

Linux - installing and using private LLVM copy

Pre-built LLVM binaries for Cecko are available at gitlab.mff.cuni.cz. The version for Linux gcc is located at the branch gcc-x64-Release. Clone the branch into <llvm-install-folder> using:

git clone -b gcc-x64-Release git@gitlab.mff.cuni.cz:teaching/nswi098/cecko/llvm-install.git <llvm-install-folder>

The cloning may take some time, the download size is about 200 MB.

Note: Because publishing incomplete binaries may interfere with the LLVM licence, this gitlab repository is available only for logged-on users.

When building Cecko, cmake must be instructed to find LLVM by setting the LLVM_DIR cmake variable to the value <llvm-install-folder>/lib/cmake/llvm as follows:

cmake -DLLVM_DIR=<llvm-install-folder>/lib/cmake/llvm <...other-arguments...>

Windows binaries

Beware: The Windows versions of Pre-Built Binaries published by LLVM (as of July 2020) are not complete enough for Cecko.

Pre-built LLVM binaries for Cecko are available at gitlab.mff.cuni.cz. The version for Windows MSVC is located at the branch MSVC-x64-Debug. Clone the branch into <llvm-install-folder> using:

git clone -b MSVC-x64-Debug git@gitlab.mff.cuni.cz:teaching/nswi098/cecko/llvm-install.git <llvm-install-folder>

The cloning may take some time, the download size is about 600 MB because of full debug info.

Note: Because publishing incomplete binaries may interfere with the LLVM licence, this gitlab repository is available only for logged-on users.

When building Cecko, cmake must be instructed to find LLVM by setting the LLVM_DIR cmake variable to the value <llvm-install-folder>/lib/cmake/llvm.

In Visual Studio 2019, the LLVM_DIR variable is set in the CMake variables and cache section of the CMake Settings (CMakeSettings.json); you may run Generate Cache to pre-create this variable before setting it to the correct value.

In VS Code, the LLVM_DIR variable is located in the cmake.configureSettings entry of .vscode/settings.json.

When cmake is invoked from command-line, it may look like:

cmake -DLLVM_DIR=<llvm-install-folder>/lib/cmake/llvm <...other-arguments...>	

Building LLVM (advanced)

If you decide to ignore binary distributions and build LLVM on your own (beware, it may require several hours of downloading and building), the simplest approach is cloning llvm-project into a <llvm-project-folder>:

git clone https://github.com/llvm/llvm-project.git <llvm-project-folder>	

To build and install, create a <llvm-build-folder> and a <llvm-install-folder>, then configure, build and install using:

cd <llvm-build-folder>
cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=<llvm-install-folder> <llvm-project-folder>/llvm
CMAKE_BUILD_PARALLEL_LEVEL=8 cmake --build .
cmake --build . --target install

When building Cecko with the home-built LLVM, define the LLVM_DIR variable as <llvm-install-folder>/lib/cmake/llvm.

See also Building LLVM with CMake.

Using your Cecko repository

For each student registered for this course, a repository was created in gitlab.mff.cuni.cz/teaching/nswi098/2020.

If you cannot see your repository there, it is probably because your login was not registered in gitlab when the repository was created. Ask your teachers in this case.

Your remote repository shall remain private even after your work is finished and points awarded, otherwise you may be deemed guilty if a plagiarism case appears.

You shall clone your repository to a local folder as usual:

git clone <your-repository-url> <local-folder>

Each repository was already initialized by a copy of the skeleton. However, it will not automatically import any changes made to the skeleton after the creation.

Merging updates of the skeleton into your code

Configuration

At each local folder you use, you need to setup the upstream remote:

git remote add upstream git@gitlab.mff.cuni.cz:teaching/nswi098/cecko/skeleton.git

In TortoiseGit, the remote is created in the config accessible from the Git Sync… dialog through the Manage button. The resulting config shall look like this:

Merging

Whenever you think the skeleton repository may have been updated, execute a pull from upstream:

git pull upstream master

In TortoiseGit, the Pull command may be executed from the Git Sync… dialog after selecting the upstream remote as shown here:

If you encounter conflicts on an upstream pull, it probably means that you added your code to an unexpected place in the skeleton files. Try to resolve it carefully, if in doubts, ask your teachers.

Your work shall be pushed to/pulled from the default origin remote.

Building and running Cecko

Building

Command-line

Create a <build-folder>, by convention as <cecko-folder>/build. Then configure using these commands:

cd <build-folder>
cmake -DCMAKE_BUILD_TYPE=Debug -DLLVM_DIR=<llvm-install-folder>/lib/cmake/llvm <cecko-folder>

The -DLLVM_DIR option is usually not required if LLVM was installed as a system package.

Building is then done by invoking cmake --build . or make in <build-folder>.

The resulting executables are located in <build-folder>/stud-main and invoked like:

cd <build-folder>
<build-folder>/stud-main/cecko1 <cecko-folder>/test/test1.c

Visual Studio 2019

Open the <cecko-folder> with Visual Studio.

When opened for the first time, use PROJECT/CMake Settings to adjust the cmake configuration, the LLVM_DIR variable in particular (see Prerequisites). The default x64-Debug configuration is created automatically; to use WSL, add the WSL-GCC-Debug configuration, adjust it if necessary and switch to it.

See also the related Visual Studio Documentation.

Visual Studio Code

In Windows, install the Visual C++ compiler first. Then start an x64 Developer Command Prompt (which defines the environment where the compiler is visible), then run code from the command prompt. Then follow the documentation for the CMake Tools extension.

Running

Command-line

The resulting executables are located in <build-folder>/stud-main and invoked like:

cd <build-folder>
<build-folder>/stud-main/cecko1 <cecko-folder>/test/test1.c

Visual Studio 2019

Open DEBUG -> Debug and Launch Settings for … (launch.vs.json) and add the following settings into the x64 configuration(s) in use:

"currentDir": ".",
"args": [ "test\\test1.c" ]

For WSL configuration(s), use:

"cwd": ".",
"args": [ "test/test1.c" ]

For remote Linux configuration(s), adjust remoteCMakeListsRoot in CMakeSettings.json so that it does not use the $HOME variable, then use:

"cwd": "${cmake.remoteCMakeListsRoot}",
"args": [ "test/test1.c" ]

Note that WSL and remote Linux use the same debug config ("type": "cppgdb"); therefore, there is probably no setting that works simultaneously in both cases.

See also the related Visual Studio Documentation.

Visual Studio Code

T.B.D.

Distance learning

Course information for the distance learning

General information

The distance learning does not interrupt the course. We will use our already proven methods for distance learning.

We will use the ZOOM communication platform. Lectures will use one recurring meeting, therefore Meeting ID is the same for all lecture meetings. Each exercise will be scheduled and provided as a separate ZOOM meeting. Below is a timetable with corresponding ZOOM meeting IDs. Meeting ID is used for joining the lecture/exercise. Meeting password is placed in SIS and will be sent by an email.

Lectures and exercises will be recorded for anyone not able to attend an online lecture/exercise. All available recordings from lectures and exercises are available on the CUNI stream channel.

Lectures/exercises will take place according to the schedule in SIS. We will start with exercises 13.10.2020.

Rules for meetings

Lectures

All slides for lectures are available here.

Exercises

Exercises are used to specify precisely and deeply explain the homework assignments. You may find them here. We will use ReCodex for submitting and checking your solutions. Remember, exercises start 13.10.2020.

Timetable

DateMeeting IDMeeting linkRecorded lectureAdditional materials
29.9.2020 988 5131 0061 Join the lecture Recorded lecture
6.10.2020 988 5131 0061 Join the lecture Recorded lecture
13.10.2020 988 5131 0061 Join the lecture Recorded lecture
13.10.2020 975 9229 0201 Join the seminar Recorded seminar
20.10.2020 988 5131 0061 Join the lecture Recorded lecture
27.10.2020 988 5131 0061 Join the lecture Recorded lecture
27.10.2020 994 6377 9468 Join the seminar Recorded seminar PPTX File Icon HA2-Bison-SXA.pptx
3.11.2020 988 5131 0061 Join the lecture Recorded lecture
10.11.2020 988 5131 0061 Join the lecture Recorded lecture
10.11.2020 961 6255 3267 Join the seminar Recorded seminar PPTX File Icon nswi098-assignments.en.pptx framework doxymentation
24.11.2020 988 5131 0061 Join the lecture Recorded lecture
24.11.2020 961 6255 3267 Join the seminar Recorded seminar PPTX File Icon nswi098-assignment4.en.pptx framework doxymentation
1.12.2020 988 5131 0061 Join the lecture Recorded lecture
8.12.2020 988 5131 0061 Join the lecture Recorded lecture
8.12.2020 961 6255 3267 Join the seminar Recorded seminar
15.12.2020 988 5131 0061 Join the lecture Recorded lecture
22.12.2020 988 5131 0061 Join the lecture Recorded lecture
5.1.2021 988 5131 0061 Join the lecture Recorded lecture