LAB OF INFORMATION SYSTEMS AND ANALYTICS

Anno accademico: 2019/2020 Programmi anni precedenti

Titolo corso in inglese: LAB OF INFORMATION SYSTEMS AND ANALYTICS

Codice insegnamento: ET7008 (AF:275098 AR:160593)

Lingua di insegnamento: Inglese

Modalità: In presenza

Crediti formativi universitari: 6

Livello laurea: Laurea

Settore scientifico disciplinare: INF/01

Periodo: 3° Periodo

Anno corso: 2

Sede: RONCADE

Inquadramento dell'insegnamento nel percorso del corso di studio

The goal of this course is to teach students methods and technologies for effective data analysis.

Risultati di apprendimento attesi

The course discusses fundamental technique for predictive and descriptive analysis of data.

Students will achieve the following learning outcomes:

Knowledge and understanding: i) understanding principles of non-supervised learning; ii) understanding principles of supervised learning.

Applying knowledge and understanding: i) being able to apply supervised and unsupervised analysis techniques; ii) being able to use data analysis software tools (e.g., scikit-learn).

Communication: i) reporting comprehensive comparative analysis among different data analysis methods

Prerequisiti

Students should have achieved the learning outcomes of courses "Introduction to Coding and Data Management" and "Probability and Statistics".

Contenuti

1. KDD Intro
- KDD Process, data types, mining tasks
2. Similarity Search
- Text representation
- Euclidean Distance, Jaccard Distance
3. Text processing
- Tokenization, Stemming, Lemamtization
- vector space
4. K-means Clustering
- taxonomy of clustering algorithms
- centroid-based clustering
- quality evaluation
5. Hierarchical clustering & DB-Scan
- agglomerative clustering, linkage measures
- density based clustering
- silhouette coefficient
6. Advanced Clustering
- Using custom similarity measures
- Pearson correlation coefficient
7. Introduction to Supervised Learning
- Model training, validation and tuning
- k-NN classifier
- Naive Bayes
8. Regression
- Linear and polynomial regression
9. Linear regression
- Regularization methods: Lasso and Ridge
10. Classification
- Logistic Classifier
- Support vector machines
11. Decision Trees
- Decision trees for classification and regression
- Feature Engineering
12. Model Evaluation
- Evaluation Measures
- Imbalanced data
13. Bias vs. Variance trade-off
- Over-fitting and Under-fitting
14. Ensemble methods
- Bagging and Boosting
15. Random Forest
- Random Forest and similarity measures
- Feature importance and selection

Testi di riferimento

- Python Data Science Handbook. O’Reilly. 2016.
- Lecture notes. Selected readings provided during the course.

Modalità di verifica dell'apprendimento

Learning outcomes are verified by a set of exercises and a project.

The exercises require to apply data analysis methods to a given dataset of limited complexity.

The project requires to conduct a comparative analysis of different tools applied to a specific dataset or problem.
The student must chose and motivate the most appropriate solution and deliver a report discussing a comparative analysis of the chosen methods.

Modalità di esame

scritto e orale

Metodi didattici

Lectures and hands-on sessions. The following software tools will be used during the course: Jupyter, scikit-learn.

Programma definitivo.

Data ultima modifica programma: 08/04/2019

Tipologia	Nome	Fornitore (Dominio)	Descrizione	Durata	Informativa
Necessario	_pk_id[*]	unive/WAI	*	30 giorni	Informativa
Necessario	_pk_ses[*]	unive/WAI	*	1 giorno	Informativa
Necessario	_pk_ref[*]	unive/WAI	*	6 mesi	Informativa
Necessario	_gsas	unive/google	Memorizza le preferenze dell'utente	3 mesi	Informativa
Necessario	_opensaml_req_cookie%	unive	Gestione autenticazione e SingleSignOn (shibboleth)	sessione	Informativa
Necessario	_shibsession[], _shibsstate[]	Unive.it (www.unive.it)	Mantiene i dati di sessione del SingleSignOn	Sessione	Informativa
Necessario	PHPSESSID	Unive.it (www.unive.it)	Identificatore univoco dell'utente per gli applicativi del sito	Sessione	Informativa
Necessario	cookie[*]	Unive.it (www.unive.it)	Memorizza le preferenze dell'utente sui cookie	1 mese	Informativa
Necessario	cookie	idp.unive.it	Memorizza le preferenze dell'utente sui cookie	1 mese	Informativa
Necessario	fe_typo_user	Unive.it (www.unive.it)	Identificatore univoco dell'utente per l'area riservata del sito	sessione	Informativa
Necessario	JSESSIONID	Unive.it (www.unive.it)	Utilizzato per creare le sessioni in area riservata	sessione	Informativa
Necessario	ADMCMD_prev	Unive.it (www.unive.it)	Utilizzato per la gestione degli accessi al cms typo3	sessione	Informativa
Necessario	unive.it	Unive.it (www.unive.it)	servono a registrare le preferenze sui cookies	6 mesi	Informativa
Necessario	noiframe	Unive.it (www.unive.it)	servono a registrare le preferenze sui cookies	6 mesi	Informativa
Google - Youtube	__Secure-1PAPISID	Google (google.com)	Utilizzato per finalità di targeting per costruire un profilo degli interessi dei visitatori del sito web al fine di mostrare pubblicità Google pertinente e personalizzata.	1 mese	Informativa
Google - Youtube	CONSENT	Google (google.com)	Utilizzato da google per memorizzare le preferenze dell'utente	17 anni	Informativa
Facebook - Pixel	Socialpix	Unive.it (www.unive.it)	Servono a registrare le preferenze sui cookiesc	6 mesi	Informativa Università Ca' Foscari
Facebook - Pixel	_fbp	Unive.it (www.unive.it)	Traccia gli utenti per il retargeting pubblicitario su Facebook	3 mesi	Informativa facebook