STATISTICAL LEARNING FOR DATA SCIENCE - 1

Academic year
2025/2026 Syllabus of previous years
Official course title
STATISTICAL LEARNING FOR DATA SCIENCE - 1
Course code
EM1401 (AF:561295 AR:326598)
Teaching language
English
Modality
On campus classes
ECTS credits
6 out of 12 of STATISTICAL LEARNING FOR DATA SCIENCE
Degree level
Master's Degree Programme (DM270)
Academic Discipline
SECS-S/01
Period
1st Term
Course year
1
Where
VENEZIA
The objective of the course is to develop statistical skills for the analysis of high dimensional
data and solve forecasting and classification problems occurring in a wide variety fields including business, economics and technology.
Regular and active participation in the teaching activities offered by the course and in
independent research activities will enable students to:
1. (knowledge and understanding)
- acquire knowledge and understanding regarding advanced statistical learning methods for synthesis, prediction and classification using data also in presence of complex structures and high-dimensionality
2. (applying knowledge and understanding)
- pre-process a dataset and prepare it for further analysis
- apply autonomously advanced statistical methods for synthesizing information, make predictions and classifications using high-dimensional data
- apply autonomously statistical software for the analysis of high-dimensional data
3. (making judgements)
- make autonomous judgements about the validity and feasibility of different statistical techniques and understand the effects of these on the outcomes of the analyses
- present the results in a clear and concise manner, using tools for reproducible reports and
research
The course will make use of basic mathematical and statistical concepts such as functions, integrals, derivatives, matrices, distributions, estimation and hypothesis testing. Students
are expected to possess knowledge of statistics at STAT-100 level.
The course is divided into two parts. The first part focuses on introducing tools for
reproducible research. Such tools will be applied in the second part of the course about statistical learning.



Tools for data science and reproducible research
- Introduction to R and Rstudio
- Writing reports using Rmarkdown

Data tyding, wrangling, visualization and exploratory analysis.

Statistical Inference
- Sampling
- Estimation
- Hypothesis testing

Statistical learning
- Linear regression
- Classification
- Resampling methods
- Linear model selection and regularization
- Nonlinear models
James G, Witten D, Hastie T, Tibshirani R (2015). An Introduction to Statistical Learning. 6th
version. Springer. Webpage http://www-bcf.usc.edu/~gareth/ISL/ Chapters 1-7
Chester Ismay, Albert Y. Kim (2019) Statistical Inference via Data Science: A ModernDive
into R and the tidyverse! , CRC Press ( https://moderndive.com/ )
Yihui Xie (2019) bookdown: Authoring Books and Technical Documents with R Markdown,
CRC/Press ( https://bookdown.org/yihui/bookdown/ )
Additional reading and materials will be distributed during the course through Moodle.
The written exam takes place in the laboratory and consists of three exercises: one related to the first module and two related to the second module. Students must complete the exercises using R Studio and, at the end of the exam, submit a report in HTML format containing:

1. The R code used to solve the exercises.
2. The obtained results, both numerical and graphical.
3. Interpretation of the results.

After the publication of the written exam results, students who have achieved at least 18/30 will be invited to take a short oral exam. This phase will assess the same skills and knowledge required in the written exam.
written and oral
The grade of the written exam is an arithmetic mean of the grade for the first part (exercise 1) and the second part (exercises 2 and 3). Each exercise consists of 4-7 questions, and each question brings 1 point. To pass the exam, students need to obtain at least 60% of the total points.
The course consists of a combination of conventional theoretical classes focused on
description of methods and practice sessions describing the implementation and application
of the methods to real problems. Methods will be implemented with the statistical language R
( www.r-project.org ). Students are encouraged to bring their own laptops (not tablets!) and to experiment
with the code during the course.   
This is the first module of a 12 credit course.
The information refers to the whole course.

Students should register in the related course web page of the university e-learning platform moodle.unive.it
Definitive programme.
Last update of the programme: 21/03/2025