About

This is an informational web page for the Compiler Principles course (NSWI098), which is being taught at the Faculty of Mathematics and Physics, Charles University. The lectures are being held in S5, from 12:20 to 13:50 every Wednesday. The seminars are in S5 after the lecture (from 14:00).

You can find here:

Contact

The course is currently led by Jakub Yaghob and David Bednárek. Please direct inqueries of a specific topic (or home assignment) to the teacher who presented the topic.

Latest Updates

18.9.2017 Academic year 2017/2018 preparation.
23.7.2016 A new web page was created.

Grading

The grading criteria for this course are summarized in the following. Please, read them carefully. In case any rules are not clear, contact Dr. Jakub Yaghob.

Home Assignments

There will be 6 home assignments which are designed to incrementally build a compiler for simplified Pascal (called Mlaskal).

You may receive up to 100 points for assignments #1-#3 and up to 150 points for assignments #4-#6. The sum of your points will determine your final mark as follows:

pointsmark
700 or more1 (excellent)
550-6992 (well done)
450-5493 (OK)
449 or lessfailed

Furthermore, 450 points are required to receive a credit for the seminar (which is also required). If you are not satisfied with your mark, but you have at least 450 points (an thus the credit), you may request a written examination where your mark will be determined. Once subscribing for the exam, your mark received from the assignments is no longer valid — i.e., you may fail the exam even if you have enough points from the assignments.

Each assignment has a strict deadline. Once the deadline is reached, all assignments are collected and graded. You may submit your solution after the deadline and ask for (re)evaluation, but you will receive penalty 10 points per day for late submission, which is additionally subtracted from points obtained for the assignment.

Lectures

The domain of compilers is relatively steady in the computer science, at least for the compiler front-end. If you have any questions or suggestions related to the lectures, please contact the teachers.

The presented schedule is only informational. Teaching time for a matter may vary slightly depending on the student's grasp.

3.10.2017 Compiler - an introduction, lexical analysis PPTX File Icon 01-comp.pptx PPTX File Icon 02-la.pptx
10.10.2017 Canceled
17.10.2017 Syntax analysis - an introduction PPTX File Icon 03-sxa.pptx
24.10.2017 Syntax analysis - top-down parsing PPTX File Icon 03-sxa.pptx
31.10.2017 Syntax analysis - bottom-up parsing PPTX File Icon 03-sxa.pptx
7.11.2017 Intermediate code PPTX File Icon 04-ic.pptx
14.11.2017 Semantic analysis PPTX File Icon 05-saatr.pptx
21.11.2017 Intermediate code generation PPTX File Icon 06-genic.pptx
28.11.2017 High-level optimizations PPTX File Icon 07-opt.pptx
5.12.2017 CPU architectures PPTX File Icon 08-cpu.pptx
12.12.2017 Code generation PPTX File Icon 08-cpu.pptx
19.12.2017 Runtime PPTX File Icon 09-rt.pptx
9.1.2018 Interpreted languages PPTX File Icon 10-interp.pptx

Assignments

Motto: Kde to žije, tam je Mlaskal. (Kate)

Global assignment

The Mlaskal (simplified Pascal) language is defined by a set of syntax diagrams:

Program JPEG File Icon mlaskal-prog.jpeg
Block JPEG File Icon mlaskal-block.jpeg
Procedure, function JPEG File Icon mlaskal-proc.jpeg
Type JPEG File Icon mlaskal-type.jpeg
Special types JPEG File Icon mlaskal-spectype.jpeg
Statement JPEG File Icon mlaskal-stmt.jpeg
Variable JPEG File Icon mlaskal-var.jpeg
Expression JPEG File Icon mlaskal-expr.jpeg
Constants JPEG File Icon mlaskal-const.jpeg
Numbers JPEG File Icon mlaskal-nums.jpeg
Identifiers JPEG File Icon mlaskal-id.jpeg

Please note some further remarks:

Links to the required software

Homepages:

Windows native ports:

Testing

Each home assignment has a set of tests. Tests are executed on different OS (Windows, Linux) with different compilers (MSVC 14, GCC 4.8/6.4/7.2) and on different environments (32- and 64-bits). Your solution should not use any 3rd party libraries or code. Only standard C++/C libraries available in all testing environments should be used.

Each home assignments has a maximal achievable points (100 points for assignments 1-3, 150 points for assignments 4-6). Any difference (including your debug messages) between your compiler output and a provided correct output is penalized. The penalization depends on a severity of a problem.

(1) Lexical analysis I

  • Look at the diagrams and determine, what is a lexical element and what is not.
  • Extend the du1l.lex file, so that the resulting scanner is capable of recognizing all correct lexical elements and returns the objects which represent them at the interface between the lexer and the parser.
  • In this task, you don't need to handle strings and comments and you don't have to handle values of identifiers and numeric literals.
  • Names of all tokens are listed in dummyg.y.
  • All objects representing tokens are created using a call to a function like yy::mlaskal_parser::make_BEGIN(...). These functions are generated from dummyg.y by bison. All these functions have a parameter - a line number - pass ctx->curline here.
  • Some tokens (like >,<,<>,<=.>= or +,-) are grouped together. The object representing the token is shared across each group (it will become the same terminal in the grammar); individual tokens are distinguished using an additional parameter of the object (an enumeration type associated to the group, like mlc::DUTOKGE_OPER_REL - see dutable.hpp).
  • Identifier and literals have an additional parameter of the object - the text or value - pass the default value of the type here, like mlc::ls_id_index()

Example input (*.mls) and outputs (*.out) can be found in the tests directory.

(2) Lexical analysis II

  • Recognize strings as lexical elements. Strings may contain two quotation marks ('') which should be returned as one quote character ' in the result. Strings can not contain end of the line (\n). You will probably need 'start conditions' feature of FLEX to complete this task.
  • Handling of comments. Comments are enclosed in { and }. Comments can contain any sequence of all characters. Comments can be nested (unlike traditional Pascal) - parentheses have to be paired correctly. You will certainly need 'start conditions' to handle this.
  • Maintain line numbers. You must not use %option yylineno. Use ctx->curline as the line counter, just ensure that it is incremented on every end-of-line and pass the value to every make_ function. The first line is number 1 (ctx->curline is already initialized to 1).
  • Store integral and real constants, strings and identifiers into the tables of the compiler. Functions like ctx->tab->ls_int().add() are used to add the values into the corresponding tables; there are four such tables for integers (ls_int), reals (ls_real), strings (ls_str), and identifiers (ls_id). The functions return a magic "index" that should be passed to the corresponding make_ functions. Identifiers must be converted to uppercase before add().
  • Detection and recovery of lexical errors, like end of line in the middle of a string, end of file in comment, incorrectly paired { }, too large integers (store only the lowest 31 bits of the number - note that behaviour of atoi function is undefined for integers greater than MAX_INT),...
  • Detection of malformed lexical elements, like 12xy, which is not UINT and IDENTIFIER, but incorrectly written UINT (value 12), because there is no space between 12 and xy. You should read the whole malformed lexical element, but the returned value is a convertible value for the given lexical element, in our case it is 12. The same situation goes for REAL. If an overflow of the convertible value occurs, report the malformed input error first, then overflow error.

(3) Parsing

  • Rewrite syntactic diagrams to Bison rules in du3g.y. Ignore the parts in green rectangles (range type, array type, and array indexing).
  • The declarative part of the du3g.y file is already prepared and you should not change it.
  • Remove all shift/reduce and reduce/reduce conflicts.

(4) Semantic analysis - declarations

  • Add C++ code fragments into du4g.y to decode declarations and store the extracted information to system tables.
  • 100 points: Declaration of procedures, functions, their parameters, variables (only simple types) and labels.
  • 150 points: All declarations, i.e., in addition, constant declarations, type declarations ( including record types ), and variables of record types .

(5) Semantic analysis - expressions and functions

    Implement semantic analysis for simple statements (see below) and generate stack oriented intermediate code from them.

  • 100 points:
    • Assignment statement (to a scalar variable or to the identifier of the current function)
    • Procedure call (with parameters passed by value)
    • Sequence of such statements
    These statements contain expressions built from these elements:
    • Name of a scalar variable (or function argument)
    • Boolean, number, or string constant
    • Function call (with parameters passed by value)
    • Unary + and - for int (instruction MINUSI) and real (instruction MINUSR)
    • Binary + for int-int (instruction ADDI), real-real (instruction ADDR), real-int, int-real (instructions CVRTIR and ADDR), and str-str (string concatenation - instruction ADDS)
    • Binary - and * for int-int, real-real, real-int, int-real (instructions SUBI, SUBR, MULI, MULR, possibly CVRTIR)
    • Binary / for int-int, real-real, real-int, int-real with real result (instruction DIVR, possibly CVRTIR)
    • Binary mod, div for int-int with int result (instructions MODI, DIVI)
  • 150 points: In addition, named constants in expressions and the operator . for accessing record fields . Assignment statements, argument passing, and return values operate on scalars only (within assignment 5).

The compiler now outputs intermediate code. This code is executed by the icm virtual machine which is available in source code as part of the du5.zip.

File suffix conventions:

  • .mo - Object - binary intermediate code, executed by icm
  • .moa - Assembly - textual intermediate code
  • .out - Compiler diagnostic output
  • .stcd - Symbol table dump in canonical order
  • .icmin - Run-time input file (read by READx procedures)
  • .icmout - Run-time output file (produced by WRITEx procedures)

(6) Semantic analysis - control-flow statements, Boolean expressions, pass by reference

Complete the semantic analysis and code generation so that the language is completely covered.

  • 100 points:
    • Passing arguments by reference (using GREF, LREF, LLDP, XLD* and XST* instructions)
    • if-then[-else], while-do, repeat-until, and for-[to|downto]-do statements (using JF, JT, JMP instructions)
    • goto and labeled statements (using JMP)
    • Unary Boolean not (using NOT)
    • Binary Boolean and/or (using AND/OR - no short-circuit evaluation)
    • Comparison operators =, <>, <, >, <=, >= over two Boolean, int, real, and string arguments or one int and one real arguments (using EQ*, NE*, LF*, GT*, LE*, GE* and CVRTIR)
  • 150 points: In addition, passing arrays (including arrays which are parts of larger arrays) in the following contexts:
    • Subprogram arguments passed by value
    • Subprogram arguments passed by reference
    • Assignment statement