Data Science

NDBI048 - WS 2021 [Czech version]

On the web pages of the faculty you can already find the update of the study plans regarding the course "Data Science".

General information:

  • Annotation (source: SIS)
  • Time and place of the lectures: Thursday 12:20 S5
  • Time and place of the practicals: Thursday 14:00 SW1
  • Lecturers: Irena Holubová and representatives of company Profinit (see below)
  • Details:
    • Basic knowledge (but not necessarily passing an exam) from the following subjects is assumed:
      • Database systems (NDBI025)
      • Pravděpodobnost a statistika I (NMAI059)
      together with basic knowledge of programming in Python.
    • The lectures will be given in Czech, the slides will be in English.
    • Consultations (in English) will take place upon request.
    • In the practicals we will use the following tools:
    • If the situation requires, the lectures and practicals will be given on-line using the zoom platform.
      • Meeting id: 958 1593 5208
      • Lectures and practicals will be in this case recorded. The recordings will be available here.
      • Login for access is 'student', password (same as for the zoom meeting) was sent in a bulk e-mail - if you did not get it, send me an e-mail.
    • Credits:
      • During the practicals, the currently discussed topic from the lecture will be demonstrated using a "school" dataset.
      • At the beginning of the semester, each student chooses own data set (from the Datasets Overview/Datasets Instructions or own, which he/she will have approved). He/she will then gradually apply the tasks (which will correspond to the topics discussed in the lectures) specified in the practicals to this set.
      • In the middle and at the end of the semester, each student submits a report describing the results.
        • First report (what we know about the data) - continuous evaluated qualities: -5 to +10 points (non-submission: -5 points)
        • Second report (what we found out from the data) - final quality evaluation: 0 to 40 points
      • Minimum for obtaining credits: 35 points
      • Overview of Points
    • Exam:
      • It will take the form of a written test (approximately 10 theoretical and practical questions covering the whole semester), from which it is possible to obtain up to 50 points.
      • Points from the practicals beyond the limit will be added to the points obtained from the exam test.
      • Conversion of points to a mark: > 25 points = mark 3, > 30 points = mark 2, > 40 points = mark 1

Organisation of Lectures and Practicals:

  • 30.9. 2021 - Course organization, credits/exam requirements. What is data science? Typical use cases, an overview of related methods and technologies. Map of follow-up lectures. Introduction of the sample data sets.
  • 7.10. 2021 - Technologies for data science I: Overview and comparison of technologies.
  • 13.10. 2021 - choice of an own dataset (Send me by e-mail your choice and an eventual alternative. For current status see Overview of Points.)

  • 14.10. 2021 - Phases of data science project, CRISP-DM methodology. Business understanding, data understanding.
  • 21.10. 2021 - Methods of data exploration and visualization
  • 28.10. 2021 - lecture and practicals are cancelled (national holiday)

  • 4.11. 2021 - Creation of an understandable report
    • Lecturer: Dominik Matula
    • Resources : Reporting
    • Practicals: Creation of the first report (interpretation of results from previous practicals, questions)
  • 10.11. 2021 - submission of the first report (by e-mail)

  • 11.11. 2021 - Data preparation (cleaning, transformation, feature extraction, ...)
  • 18.11. 2021 - Modeling I: Basic statistical models and performance evaluation
  • 25.11. 2021 - Modeling II: Applied Bayesianism
    • Lecturer: Petr Paščenko
    • Resources : Modeling II
    • Practicals: Text classification using the Bayesian model - example and own extension
      • Lecturer of the practicals: Petr Paščenko
      • Resources: Naive Bayes
  • 2.12. 2021 - Technologies for data science II: MLops versioning, documentation
    • Lecturer: Sergej Stamenov
    • Resources : ...
    • Practicals: MLFlow versioning
      • Lecturer of the practicals: Sergej Stamenov
      • Resources: ...
  • 9.12. 2021 - Introduction to modern database systems
    • Lecturer: Irena Holubová
    • Resources : ...
    • Practicals: Linking data in another format to own dataset, extension with new information using modern database systems
      • Lecturer of the practicals: Irena Holubová
      • Resources: ...
  • 16.12. 2021 - Big Data science, MapReduce/Apache Spark and Data Science
    • Lecturer: Irena Holubová
    • Resources : ...
    • Practicals: Application of MapReduce to own data set
      • Lecturer of the practicals: Irena Holubová
      • Resources: ...
  • 6.1. 2022 - Limits of statistical methods, distortion. Managerial view on a data science project.
    • Lecturer: Petr Paščenko
    • Resources : ...
    • Practicals: cancelled (space for finalization of the report)
  • 13.1. 2022 - submission of the second report (by e-mail)