Advanced Practical Course Data Science (Winter 2021/2022): Difference between revisions

(Created page with "{{Announcement|Note: The primary platform for communication in this course will be StudIP. All materials will be uploaded there.}} == Details == {{CourseDetails |credits=180...")
 
 
(20 intermediate revisions by the same user not shown)
Line 4: Line 4:
== Details ==
== Details ==
{{CourseDetails
{{CourseDetails
|credits=180h, 5-6 ECTS
|credits=180h, 6 ECTS
|module=M.Inf.1222 (Specialisation Computer Networks, 5 ECTS) or M.Inf.1129 (Social Networks and Big Data Methods, 5 ECTS) or M.Inf.1800 (Practical Course Advanced Networking, 6 ECTS)
|module=M.Inf.1800 Fortgeschrittenen Praktikum Computernetzwerke
|lecturer=[http://134.76.18.81/?q=people/prof-dr-xiaoming-fu Prof. Xiaoming Fu]
|lecturer=[http://134.76.18.81/?q=people/prof-dr-xiaoming-fu Prof. Xiaoming Fu]; [http://www.net.informatik.uni-goettingen.de/?q=people/weijun-wang MSc. Weijun Wang]
|ta= [http://www.net.informatik.uni-goettingen.de/?q=people/weijun-wang, MSc. Weijun Wang];[http://www.net.informatik.uni-goettingen.de/?q=people/fabian-wölk MSc. Fabian Wölk]
|ta=Guanxiong Luo, Weijun Wang
|time=Thurs. 14:00-16:00  
|time=Friday 16:00 - 18:00
|place= mostly will be online
|place=(online)
|univz= Lunivz link [https://univz.uni-goettingen.de/qisserver/rds?state=verpublish&status=init&vmfile=no&publishid=282662&moduleCall=webInfo&publishConfFile=webInfo&publishSubDir=veranstaltung&k_semester.semid=20211&idcol=k_semester.semid&idval=20211&getglobal=semester]
|univz=[https://univz.uni-goettingen.de/qisserver/rds?state=verpublish&status=init&vmfile=no&publishid=267540&moduleCall=webInfo&publishConfFile=webInfo&publishSubDir=veranstaltung]
}}
}}


==Announcement==
==Course Organization==
'''05/12/2021: Today will not have lecture. Task 1 will be released before 5 pm.'''
In this course, you will complete several practical tasks in the realm of data analysis. These tasks can include both exploratory (descriptive) data analysis as well as the application of machine learning algorithms to specific datasets.  


'''Due to the recent situations in the context of Covid-19, new information will be updated here in time, please check this webpage periodically to get the newest information.
While the focus of the course is strongly practical, to support students, the course will provide lectures on different aspects of practical machine learning in the early stages of the course, including:
'''


==General Description==
* Introduction to the practical machine learning pipeline
Computer Networks Group, Institute of Computer Science, Universität Göttingen is collaborating with Göttinger Verkehrsbetriebe GmbH (represented by Dipl. Anne-Katrin Engelmann) and setting up this exciting course.
* Exploratory data analysis
* The Python Data Science stack
* How to deal with unbalanced data
* Advanced algorithms for Data Science (an overview of competition winning algorithms)
* Parameter tuning for predictive models


This course covers two aspects of Smart Cities in the context of public transport: event monitoring and passenger counting.  
Students need to submit their solutions to tasks by specific deadlines throughout the course. Note that this course thus requires a continuous effort throughout the whole semester.
Solutions for each task have to be presented in class. A final report needs to be submitted at the end of the semester (September 30).
 
Data Science for Smart City, we focus on one specific data, i.e., visual data (images and videos). We try to build a system that uses the data analysis methods to extract useful information. This part collaborated with the Goettingen government and the Goettingen bus company.


The goal of this course is to:
The goal of this course is to:


* Help students to further understand computer networks and data science knowledge.
* Help students to further understand computer networks and data science knowledge.
 
* Help students to use computer science knowledge to build a practical AI system.
* Help students to use computer science knowledge to build a practical AI system.
 
* Guide students to utilize knowledge to improve the performance of the system.  
* Guide students to utilize knowledge to improve the performance of the system.  


Line 35: Line 39:


* Read state-of-art papers.
* Read state-of-art papers.
* Use programming to build systems including computer vision algorithms, embedded design programs, and SOCKET network programs.
* Use programming to build systems including computer vision algorithms, embedded design programs, and SOCKET network programs.
* Learn how to analyze city public transport sensor data.
* Learn how to analyze city public transport sensor data.


For the project we will design, implement, and deploy the system at several buses at specific positions with sub-systems consisting of:
The final task of students and implementation plan
 
The students will be divided into 2-person teams. Each group will take responsibility to reimplement (and possibly adopt) a different existing software architecture for all the bus lines used in our project. Two of the 2-person teams in each group will be responsible for one specific sub-task inside independently (in case one team can’t compete). The teams inside one group will therefore have to co-operate.  
* Depth camera (e.g. Intel RealSense D435)
 
* On-board computers (e.g. Raspberry Pi Zero, NVIDIA Jetson AGX Xavier)
 
* Power supply (e.g. EC Technology Powerbank)
 
All these sub-systems in each bus will be combined into one system which shall be deployed for ideally an initial period of 2 months, thus obtaining sufficient data patterns for further analysis.
 
Tasks of students and implementation plan
The students will be divided into 2 groups consisting of six 2-person teams. Each group will take responsibility to reimplement (and possibly adapt) a different existing software architecture for all the bus lines used in our project. Two of the 2-person teams in each group will be responsible for one specific sub-task inside independently (in case one team can’t compete). The teams inside one group will therefore have to co-operate.  
Note that we will give a default version of each module to guarantee the basic operation of the whole system.
Note that we will give a default version of each module to guarantee the basic operation of the whole system.
The main tasks are as follows:
1. Collect the video data of the depth cameras with a predefined interface or preinstalled SD card periodically.
2. Label corresponding objects/events in videos as the dataset.
3. Reimplement existing video analytics architecture (using open source code from papers) with collected depth image video.
(We split the architecture into modules. Each 2-person team takes care of one module then the group combines the modules together.)
4. Based on the implemented architecture, each team should develop an idea to improve the architecture. Then implement a demo, deploy in the bus system, show the collected results, and present the results in the final Smart City report.
a) The idea can be a new application.
b) The idea can also be an algorithm or module on how to improve the performance of the architecture.
Learning about such a fast-moving field is an exciting opportunity, but covering it in a traditional course setting comes with some caveats you should be aware of.
* No canonical curriculum: Many topics in mathematics and computer science such as linear algebra, real analysis, discrete mathematics, data structures and algorithms, etc come with well-established curricula; courses on such subjects can be found at most universities, and they tend to cover similar topics in a similar order. This is not the case for emerging research areas like deep learning: the set of topics to be covered, as well as the order and way of thinking about each topic, has not yet been perfected.
* Few learning materials: There are very few high-quality textbooks or other learning materials that synthesize or explain much of the content we will cover. In many cases, '''the research paper that introduced an idea is the best or only resource for learning about it'''.
* Theory lags experiments: At present, '''video analytics is primarily an empirically driven research field'''. We may use mathematical notation to describe or communicate our algorithms and ideas, and many techniques are motivated by some mathematical or computational intuition, but in most cases, we rely on experiments rather than formal proofs to determine the scenarios where one technique might outperform another. This can sometimes be unsettling for students, as the question “why does that work?” may not always have a precise, theoretically-grounded answer.
* Things will change: If you were to study deep learning ten years from now, it is very likely that it will look quite different from today. There may be new fundamental discoveries or new ways of thinking about things we already know; there may be some ideas we think are important today, that will turn out in retrospect not to have been. There may be similarly impactful results lurking right around the corner.


==Prerequisites==
==Prerequisites==
*You are ''highly recommended'' to have completed a course on Data Science (e.g., "[https://www.swe.informatik.uni-goettingen.de/lectures/data-science-and-big-data-analytics-ws2015 Data Science and Big Data Analytics" taught by Dr. Steffen Herbold] or the Course  "Machine Learning" by Stanford University) before entering this course. You need to be familiar with computer networking and mobile communications.
*You are ''highly recommended'' to have completed a course on Data Science (e.g., "[https://www.swe.informatik.uni-goettingen.de/lectures/data-science-and-big-data-analytics-ws2015 Data Science and Big Data Analytics" taught by Dr. Steffen Herbold] or the Course  "Machine Learning" by Stanford University) before entering this course. You need to be familiar with basic statistics (distributions, p/t/z-tests, etc.), a range of machine learning algorithms (linear/logistic/lasso regression, k-means clustering, k-NN classification etc.), computer networking, and mobile communications.
*Knowledge of any of the following languages: Python (course language), R, JAVA, Matlab or any language that features proper machine learning libraries
*Knowledge of any of the following languages: Python (course language), R, JAVA, Matlab or any language that features proper machine learning libraries
==Grading==
* Participation: 50%
** Task 1: 20%
** Task 2: 30%
* Presentation: 20%
**Present on your work with a slide to the audience (in English).
**20 minutes of presentation followed by 10 minutes Q &A for one student.
**30 minutes of presentation followed by 15 minutes Q &A for a team with two students.
Suggestions for preparing the slides:  Get your audiences to quickly understand the general idea. Figures, tables, and animations are better than sentences. Don't forget a summary of your ideas and contributions.
All quoted images, tables and text need to indicate their source.
Note: The team needs to clearly introduce the division of their work, and both team members need to present their respective work and answer questions. 
* Final report: 30%
The report must be written in English according to common guidelines for scientific papers, 6-8 pages for a student and 12-16 pages for a team of content (excluding bibliography, etc.) in double-column latex.
Please note that you can not directly copy content from papers or webpages, as this will be considered plagiarism, and we will treat it seriously. All quoted images and tables need to indicate their source.
The source code, data (or URL of data) and a manual should be uploaded with the report.


==Schedule==
==Schedule==
{| {{Prettytable|width=}}
{| {{Prettytable|width=}}
|-
|-
|{{Hl2|width =0.2}} |'''Time'''
|{{Hl2}} |'''When?'''
|{{Hl2|width =0.5}} |'''Topic'''
|{{Hl2}} |'''What?'''
|{{Hl2}} |'''Output'''
|-
| align="right" | 29.10.2021
| Lecture 1: Introduction
|-
| align="right" | 05.11.2021
| Lecture 2: The Data Science Pipeline
|-
| align="right" | 12.11.2021
| No Lecture
|-
| align="right" | 19.11.2021
| Lecture 3: The Python Data Science Stack - Task 1: Release
|-
| align="right" | 26.11.2021
| No lecture
|-
| align="right" | 03.12.2021
| Lecture 4: Video analysis in smart city - Task 2: Release
|-
|-
| align="right"|
| align="right" | 10.12.2021
w1
| TBD
| Lecture I:
| No
|-
|-
| align="right"|
| align="right" | 17.12.2021
w2
| TBD
| Lecture II:
|
|-
|-
| align="right"|
| align="right" | 24.12.2021
w3-4
| No lecture
|
| No
|-
|-
| align="right"|
| align="right" | 31.12.2021
  w5-8
| No lecture  
|
Task 1:
|
|-
|-
| align="right"|
| align="right" | 07.01.2022
w8 (9th June)
| No lecture
|
Discussion on Task 1
|NO
|-
|-
| align="right"|
| align="right" | 14.01.2022
w9-13
| Task 3 released.
|Task 2
|Report
|-
|-
| align="right" |
| align="right" | 23-25.02.2022
17.08
| Final Presentation
| Final presentations
|
|-
|-
| align="right" |
| align="right" | 28.02.2022
24.08
| Final Report deadline (Including report and code)
| Final report
|
|-
|-
|}
|}
==Grading==
* Participation:
** Task 1: 
** Task 2:
** Task 3:
* Presentation:
**Present on your work with a slide to the audience (in English).
**20 minutes of presentation followed by 10 minutes Q &A for one student.
**30 minutes of presentation followed by 15 minutes Q &A for a team with two students.
Suggestions for preparing the slides:  Get your audiences to quickly understand the general idea. Figures, tables, and animations are better than sentences. Don't forget a summary of your ideas and contributions.
All quoted images, tables and text need to indicate their source.
Note: The team needs to clearly introduce the division of their work, and both team members need to present their respective work and answer questions. 
* Final report:
The report must be written in English according to common guidelines for scientific papers, 6-8 pages for a student and 12-16 pages for a team of content (excluding bibliography, etc.) in double-column latex.
Please note that you can not directly copy content from papers or webpages, as this will be considered plagiarism, and we will treat it seriously. All quoted images and tables need to indicate their source.
The source code, data (or URL of data) and a manual should be uploaded with the report.
100

edits