Advanced Practical Course Data Science (Winter 2021/2022)

From NET Wiki
Jump to navigation Jump to search
Imbox content.png Note: The primary platform for communication in this course will be StudIP. All materials will be uploaded there.


Details

Workload/ECTS Credits: 180h, 6 ECTS
Module: M.Inf.1800 Fortgeschrittenen Praktikum Computernetzwerke
Lecturer: Prof. Xiaoming Fu; MSc. Fabian Wölk
Teaching assistant: TBD
Time: Friday 16:00 - 18:00
Place: 2.101(online)
UniVZ [1]


Course Organization

In this course, you will complete several practical tasks in the realm of data analysis. These tasks can include both exploratory (descriptive) data analysis as well as the application of machine learning algorithms to specific datasets.

While the focus of the course is strongly practical, to support students, the course will provide lectures on different aspects of practical machine learning in the early stages of the course, including:

  • Introduction to the practical machine learning pipeline
  • Exploratory data analysis
  • The Python Data Science stack
  • How to deal with unbalanced data
  • Advanced algorithms for Data Science (an overview of competition winning algorithms)
  • Parameter tuning for predictive models

Students need to submit their solutions to tasks by specific deadlines throughout the course. Note that this course thus requires a continuous effort throughout the whole semester. Solutions for each task have to be presented in class. A final report needs to be submitted at the end of the semester (September 30).

Prerequisites

  • You are highly recommended to have completed a course on Data Science (e.g., "Data Science and Big Data Analytics" taught by Dr. Steffen Herbold or the Course "Machine Learning" by Stanford University) before entering this course. You need to be familiar with basic statistics (distributions, p/t/z-tests, etc.), a range of machine learning algorithms (linear/logistic/lasso regression, k-means clustering, k-NN classification etc.), computer networking, and mobile communications.
  • Knowledge of any of the following languages: Python (course language), R, JAVA, Matlab or any language that features proper machine learning libraries

Schedule

When? What?
15.04.2021 Lecture 1: Introduction & The Data Science Pipeline
22.04.2021 No lecture (Girls Day)
29.04.2021 Lecture 2: The Python Data Science Stack - Task 1: Release
06.05.2021 Task 1: Intermediate meeting
13.05.2021 No lecture (Ascension Day)
20.05.2021 Lecture 3: Advanced Algorithms for Data Science // Task 1 report submission //Task 2: release
27.05.2021 Lecture 4: Evaluation and Tuning of Models
06.03.2021 No lecture
10.06.2021 No lecture
17.06.2021 No lecture
24.06.2021 // Task 3: release // Task 2 report submission
01.07.2021 No lecture
08.07.2021 Task 3: Intermediate meeting
15.07.2021 Final Presentation (TBD)
22.07.2021 Final Presentation (TBD)
31.09.2021 Final Report deadline (Including report and code)


Announcement

05/12/2021: Today will not have lecture. Task 1 will be released before 5 pm.

Due to the recent situations in the context of Covid-19, new information will be updated here in time, please check this webpage periodically to get the newest information.

General Description

Computer Networks Group, Institute of Computer Science, Universität Göttingen is collaborating with Göttinger Verkehrsbetriebe GmbH (represented by Dipl. Anne-Katrin Engelmann) and setting up this exciting course.

Prerequisites

  • You are highly recommended to have completed a course on Data Science (e.g., "Data Science and Big Data Analytics" taught by Dr. Steffen Herbold or the Course "Machine Learning" by Stanford University) before entering this course. You need to be familiar with computer networking and mobile communications.
  • Knowledge of any of the following languages: Python (course language), R, JAVA, Matlab or any language that features proper machine learning libraries

Grading

  • Participation: 50%
    • Task 1: 20%
    • Task 2: 30%
  • Presentation: 20%
    • Present on your work with a slide to the audience (in English).
    • 20 minutes of presentation followed by 10 minutes Q &A for one student.
    • 30 minutes of presentation followed by 15 minutes Q &A for a team with two students.

Suggestions for preparing the slides:  Get your audiences to quickly understand the general idea. Figures, tables, and animations are better than sentences. Don't forget a summary of your ideas and contributions. All quoted images, tables and text need to indicate their source. Note: The team needs to clearly introduce the division of their work, and both team members need to present their respective work and answer questions. 

  • Final report: 30%

The report must be written in English according to common guidelines for scientific papers, 6-8 pages for a student and 12-16 pages for a team of content (excluding bibliography, etc.) in double-column latex. Please note that you can not directly copy content from papers or webpages, as this will be considered plagiarism, and we will treat it seriously. All quoted images and tables need to indicate their source. The source code, data (or URL of data) and a manual should be uploaded with the report.

Schedule

Time Topic Output
w1
Lecture I: No
w2
Lecture II:
w3-4
No
w5-8

Task 1:

w8 (9th June)

Discussion on Task 1

NO
w9-13
Task 2 Report
17.08
Final presentations
24.08
Final report