HEALTH DATA SCIENCE

Academic year
2024/2025 Syllabus of previous years
Official course title
HEALTH DATA SCIENCE
Course code
EM1413 (AF:506439 AR:293001)
Modality
On campus classes
ECTS credits
6
Degree level
Master's Degree Programme (DM270)
Educational sector code
SECS-S/05
Period
4th Term
Course year
1
Where
VENEZIA
This course aims to offer some scenario elements for one of the main applications of data science: health data.

Thus, the most common statistical tools for dealing with health data will be examined:

- statistical methods for analyzing categorical data
- statistical methods for analyzing healthcare costs
- survival models.

Statistical techniques will be explored theoretically and practically through laboratory activities (in R, analyzing real data from different health surveillance systems).
- Knowledge of information systems for health systems.

- Analysis skills in applying, comparing, and interpreting statistical methods for analyzing health data (theoretically and practically in R).
Having attended at least the Statistical Learning for Data Science course. Some concepts from the Statistical Learning for Data Science course will be covered in the first week, but a good knowledge of R and Rmarkdown is necessary to take the course properly.
1. Introduction to health data:
- Type of health data (rif. Etzioni)
- Survey: American case study Behavioral Risk Factor Surveillance System (BRFSS), and Italian one Progressi delle Aziende Sanitarie per la Salute in Italia (PASSI).
- Health data and health policies

2. Statistical models for health data:

2.a: Categorical data analysis (ref. Agresti):
- Analyzing contingency tables and comparing proportions. Relative risk, odds ratio, and chi-squared test of independence
- Logistic regression. Interpretation, evaluation, and selection. Categorical predictors and aggregated data.
- Poisson and negative binomial regression, Interpretation, evaluation, and selection.
- Multi-category logit models (for nominal and ordinal data). Interpretation, evaluation, and selection.
- Generalized linear models. How to define logistic regression as a generalized linear model.
- Generalized linear mixed models (logistic-normal model) Interpretation, evaluation, and selection.

2.b: Health care costs (rif. Etzioni):
- Log cost models and the lognormal distribution
- Gamma models for right-skewed cost outcomes
- Mixture models
- Other models for skewed data

3. Health data analysis lab

- Case studies and practical applications with R.
Agresti, A. (2018). An introduction to categorical data analysis. John Wiley & Sons. (ch 2, 3, 4, 6, 10)
Collett, D. (2015). Modelling survival data in medical research. CRC press. (ch 2, 3)
Etzioni, R. (2020). Statistics for Health Data Science (ch 1, 6)
Slides of the teacher available on moodle.
Indivual or droup project consisting of analyzing a dataset and writing a report + classroom presentation of the results obtained. You may use R and Rmarkdown or any other software.
To obtain a pass, students must be able to carry out main analyzes in R and be able to comment on them taking into account their application implications.
Frontal lectures (theory and practice in R)
English
written and oral

This subject deals with topics related to the macro-area "Human capital, health, education" and contributes to the achievement of one or more goals of U. N. Agenda for Sustainable Development

Definitive programme.
Last update of the programme: 30/06/2024