STATISTICAL LEARNING FOR DATA SCIENCE - 1
- Academic year
- 2020/2021 Syllabus of previous years
- Official course title
- STATISTICAL LEARNING FOR DATA SCIENCE - 1
- Course code
- EM1401 (AF:336205 AR:176802)
- Modality
- On campus classes
- ECTS credits
- 6 out of 12 of STATISTICAL LEARNING FOR DATA SCIENCE
- Degree level
- Master's Degree Programme (DM270)
- Educational sector code
- SECS-S/01
- Period
- 1st Term
- Course year
- 1
- Where
- VENEZIA
- Moodle
- Go to Moodle page
Contribution of the course to the overall degree programme goals
data and solve forecasting and classification problems occurring in a wide variety fields including business, economics and technology.
Expected learning outcomes
independent research activities will enable students to:
1. (knowledge and understanding)
- acquire knowledge and understanding regarding advanced statistical learning methods for synthesis, prediction and classification using data also in presence of complex structures and high-dimensionality
2. (applying knowledge and understanding)
- pre-process a dataset and prepare it for further analysis
- apply autonomously advanced statistical methods for synthesizing information, make predictions and classifications using high-dimensional data
- apply autonomously statistical software for the analysis of high-dimensional data
3. (making judgements)
- make autonomous judgements about the validity and feasibility of different statistical techniques and understand the effects of these on the outcomes of the analyses
- present the results in a clear and concise manner, using tools for reproducible reports and
research
Pre-requirements
are expected to possess knowledge of statistics at STAT-100 level.
Contents
reproducible research. Such tools will be applied in the second part of the course about statistical learning.
Tools for data science and reproducible research
- Introduction to R and Rstudio
- Writing reports using Rmarkdown
Data visualization, data wrangling, data tyding
Statistical Inference
- Sampling
- Estimation
- Hypothesis testing
Statistical learning
- Linear regression
- Classification
- Resampling methods
- Linear model selection and regularization
- Nonlinear models
Referral texts
version. Springer. Webpage http://www-bcf.usc.edu/~gareth/ISL/ Chapters 1-7
Chester Ismay, Albert Y. Kim (2019) Statistical Inference via Data Science: A ModernDive
into R and the tidyverse! , CRC Press ( https://moderndive.com/ )
Yihui Xie (2019) bookdown: Authoring Books and Technical Documents with R Markdown,
CRC/Press ( https://bookdown.org/yihui/bookdown/ )
Additional reading and materials will be distributed during the course through Moodle.
Assessment methods
Partial assignments have the form of weekly quizzes starting from Week 2 of both modules. They take place during the last lecture of the week.
In total, by completing all 8 quizzes (4 in Module 1 and 4 in module 2) it is possibile to obtain 4 points.
The final written exam consists of four exercises designed to measure
1. the theoretical knowledge of the course topics,
2. the ability to apply them for solving real data problems.
The maximal score for each exercise is 7 points. The final score is the sum of the scores of the four exercises. During the
written test the use of books, notes, or electronic media is *not* allowed.
The final score is obtained as a sum of the scores obtained at partial assignments and at the final exam. A total score exceeding 30 corresponds to 30 with honors.
Teaching methods
description of methods and practice sessions describing the implementation and application
of the methods to real problems. Methods will be implemented with the statistical language R
( www.r-project.org ). Students are encouraged to bring their own laptops (not tablets!) and to experiment
with the code during the course.
Teaching language
Further information
The information refers to the whole course.
Students should register in the related course web page of the university e-learning platform moodle.unive.it