LAB OF INFORMATION SYSTEMS AND ANALYTICS

Academic year
2019/2020 Syllabus of previous years
Official course title
LAB OF INFORMATION SYSTEMS AND ANALYTICS
Course code
ET7008 (AF:320312 AR:160593)
Modality
On campus classes
ECTS credits
6
Degree level
Bachelor's Degree Programme
Educational sector code
INF/01
Period
3rd Term
Course year
2
Where
RONCADE
The goal of this course is to teach students methods and technologies for effective data analysis.
The course discusses fundamental technique for predictive and descriptive analysis of data.

Students will achieve the following learning outcomes:

Knowledge and understanding: i) understanding principles of non-supervised learning; ii) understanding principles of supervised learning.

Applying knowledge and understanding: i) being able to apply supervised and unsupervised analysis techniques; ii) being able to use data analysis software tools (e.g., scikit-learn).

Communication: i) reporting comprehensive comparative analysis among different data analysis methods
Students should have achieved the learning outcomes of courses "Introduction to Coding and Data Management" and "Probability and Statistics".
1. KDD Intro
- KDD Process, data types, mining tasks
2. Similarity Search
- Text representation
- Euclidean Distance, Jaccard Distance
3. Text processing
- Tokenization, Stemming, Lemamtization
- vector space
4. K-means Clustering
- taxonomy of clustering algorithms
- centroid-based clustering
- quality evaluation
5. Hierarchical clustering & DB-Scan
- agglomerative clustering, linkage measures
- density based clustering
- silhouette coefficient
6. Advanced Clustering
- Using custom similarity measures
- Pearson correlation coefficient
7. Introduction to Supervised Learning
- Model training, validation and tuning
- k-NN classifier
- Naive Bayes
8. Regression
- Linear and polynomial regression
9. Linear regression
- Regularization methods: Lasso and Ridge
10. Classification
- Logistic Classifier
- Support vector machines
11. Decision Trees
- Decision trees for classification and regression
- Feature Engineering
12. Model Evaluation
- Evaluation Measures
- Imbalanced data
13. Bias vs. Variance trade-off
- Over-fitting and Under-fitting
14. Ensemble methods
- Bagging and Boosting
15. Random Forest
- Random Forest and similarity measures
- Feature importance and selection
- Python Data Science Handbook. O’Reilly. 2016.
- Lecture notes. Selected readings provided during the course.
Learning outcomes are verified by a set of exercises and a project.

The exercises require to apply data analysis methods to a given dataset of limited complexity.

The project requires to conduct a comparative analysis of different tools applied to a specific dataset or problem.
The student must chose and motivate the most appropriate solution and deliver a report discussing a comparative analysis of the chosen methods.
Lectures and hands-on sessions. The following software tools will be used during the course: Jupyter, scikit-learn.
English
written and oral
Definitive programme.
Last update of the programme: 08/04/2019