LAB OF INFORMATION SYSTEMS AND ANALYTICS

Anno accademico
2019/2020 Programmi anni precedenti
Titolo corso in inglese
LAB OF INFORMATION SYSTEMS AND ANALYTICS
Codice insegnamento
ET7008 (AF:275098 AR:160593)
Modalità
In presenza
Crediti formativi universitari
6
Livello laurea
Laurea
Settore scientifico disciplinare
INF/01
Periodo
3° Periodo
Anno corso
2
Sede
RONCADE
The goal of this course is to teach students methods and technologies for effective data analysis.
The course discusses fundamental technique for predictive and descriptive analysis of data.

Students will achieve the following learning outcomes:

Knowledge and understanding: i) understanding principles of non-supervised learning; ii) understanding principles of supervised learning.

Applying knowledge and understanding: i) being able to apply supervised and unsupervised analysis techniques; ii) being able to use data analysis software tools (e.g., scikit-learn).

Communication: i) reporting comprehensive comparative analysis among different data analysis methods
Students should have achieved the learning outcomes of courses "Introduction to Coding and Data Management" and "Probability and Statistics".
1. KDD Intro
- KDD Process, data types, mining tasks
2. Similarity Search
- Text representation
- Euclidean Distance, Jaccard Distance
3. Text processing
- Tokenization, Stemming, Lemamtization
- vector space
4. K-means Clustering
- taxonomy of clustering algorithms
- centroid-based clustering
- quality evaluation
5. Hierarchical clustering & DB-Scan
- agglomerative clustering, linkage measures
- density based clustering
- silhouette coefficient
6. Advanced Clustering
- Using custom similarity measures
- Pearson correlation coefficient
7. Introduction to Supervised Learning
- Model training, validation and tuning
- k-NN classifier
- Naive Bayes
8. Regression
- Linear and polynomial regression
9. Linear regression
- Regularization methods: Lasso and Ridge
10. Classification
- Logistic Classifier
- Support vector machines
11. Decision Trees
- Decision trees for classification and regression
- Feature Engineering
12. Model Evaluation
- Evaluation Measures
- Imbalanced data
13. Bias vs. Variance trade-off
- Over-fitting and Under-fitting
14. Ensemble methods
- Bagging and Boosting
15. Random Forest
- Random Forest and similarity measures
- Feature importance and selection
- Python Data Science Handbook. O’Reilly. 2016.
- Lecture notes. Selected readings provided during the course.
Learning outcomes are verified by a set of exercises and a project.

The exercises require to apply data analysis methods to a given dataset of limited complexity.

The project requires to conduct a comparative analysis of different tools applied to a specific dataset or problem.
The student must chose and motivate the most appropriate solution and deliver a report discussing a comparative analysis of the chosen methods.
Lectures and hands-on sessions. The following software tools will be used during the course: Jupyter, scikit-learn.
Inglese
scritto e orale
Programma definitivo.
Data ultima modifica programma: 08/04/2019