STATISTICAL LEARNING FOR DATA SCIENCE - 2

Academic year
2021/2022 Syllabus of previous years
Official course title
STATISTICAL LEARNING FOR DATA SCIENCE - 2
Course code
EM1401 (AF:358731 AR:188046)
Modality
On campus classes
ECTS credits
6 out of 12 of STATISTICAL LEARNING FOR DATA SCIENCE
Degree level
Master's Degree Programme (DM270)
Educational sector code
SECS-S/01
Period
2nd Term
Course year
1
Where
VENEZIA
Moodle
Go to Moodle page
The objective of the course is to develop statistical skills for the analysis of high dimensional data and solve forecasting and classification problems occurring in a wide variety fields including business, economics and technology.
Regular and active participation in the teaching activities offered by the course and in
independent research activities will enable students to:
1. (knowledge and understanding)
- acquire knowledge and understanding regarding advanced statistical learning methods for synthesis, prediction and classification using data also in the presence of complex structures and high-dimensionality
2. (applying knowledge and understanding)
- pre-process a dataset and prepare it for further analysis
- apply autonomously advanced statistical methods for synthesizing information, make predictions and classifications using high-dimensional data
- apply autonomously statistical software for the analysis of high-dimensional data
3. (making judgements)
- make autonomous judgements about the validity and feasibility of different statistical techniques and understand the effects of these on the outcomes of the analyses
- present the results in a clear and concise manner, using tools for reproducible reports and
The course will make use of basic mathematical and statistical concepts such as functions, integrals, derivatives, matrices, distributions, estimation and hypothesis testing. Students are expected to possess knowledge of statistics at STAT-100 level.
The course is divided into two parts. The first part focuses on introducing tools for
reproducible research. Such tools will be applied in the second part of the course about statistical learning.



Tools for data science and reproducible research
- Introduction to R and Rstudio
- Writing reports using Rmarkdown

Data wrangling, data tyding, data visualization

Statistical Inference
- Sampling
- Estimation
- Hypothesis testing

Statistical learning
- Linear regression
- Classification
- Resampling methods
- Linear model selection and regularization
- Nonlinear models
James G, Witten D, Hastie T, Tibshirani R (2015). An Introduction to Statistical Learning. 6th version. Springer. Webpage http://www-bcf.usc.edu/~gareth/ISL/ Chapters 1-7
Chester Ismay, Albert Y. Kim (2019) Statistical Inference via Data Science: A ModernDive into R and the tidyverse! , CRC Press ( https://moderndive.com/ )
Yihui Xie (2019) bookdown: Authoring Books and Technical Documents with R Markdown, CRC/Press ( https://bookdown.org/yihui/bookdown/ )
The achievement of the course objectives is assessed through partial assignments during the course and a final written exam.
Partial assignments have the form of weekly quizzes starting from Week 2 of both modules. They take place during the last lecture of the week.
In total, by completing all 8 quizzes (4 in Module 1 and 4 in module 2) it is possibile to obtain 4 points.

The final written exam consists of four exercises designed to measure
1. the theoretical knowledge of the course topics,
2. the ability to apply them for solving real data problems.
The maximal score for each exercise is 7 points. The final score is the sum of the scores of the four exercises.    During the
written test the use of books, notes, or electronic media is *not* allowed.

The final score is obtained as a sum of the scores obtained at partial assignments and at the final exam. A total score exceeding 30 corresponds to 30 with honors.
The course consists of a combination of conventional theoretical classes focused on description of methods and practice sessions describing the implementation and application of the methods to real problems. Methods will be implemented with the statistical language R ( www.r-project.org ). Students are encouraged to bring their own laptops (no tablets!) and to experiment with the code during the course. 
English
This is the second module of a 12 credit course. The information refers to the whole course.

Students should register in the related course web page of the university e-learning platform moodle.unive.it
written
Definitive programme.
Last update of the programme: 03/08/2021