DATA AND WEB MINING

Academic year
2024/2025 Syllabus of previous years
Official course title
DATA AND WEB MINING
Course code
CT0509 (AF:379696 AR:216894)
Modality
On campus classes
ECTS credits
6
Degree level
Bachelor's Degree Programme
Educational sector code
ING-INF/05
Period
1st Semester
Course year
3
Where
VENEZIA
Moodle
Go to Moodle page
This course is part of the educational activities of the Bachelor in Informatics.
The goal of this course is to enable students the understand and exploit predictive data analysis techniques including both supervised methods (classification and regression) and un-supervised methods (clustering and recommendation), with focus on web data (e.g., text documents). The course includes the exploitation of data mining software tools through the Python programming language.
The course discusses fundamental techniques for predictive and descriptive data analysis, with focus on Web data.

Students will achieve the following learning outcomes:

i) Knowledge and understanding: understanding principles of supervised and unsupervised learning; understanding the principles of web content mining.

ii) Applying knowledge and understanding: being able to apply supervised and unsupervised analysis techniques; being able to use data analysis software tools (e.g., scikit-learn).

iii) Making judgements: being able to choose the most appropriate method to a given problem and to evaluate its performance.

iv) Communication skills: reporting a sound and comprehensive comparative analysis among different data analysis methods

v) Learning skills: being able to autonomously adopt new techniques and methods.
Students should have achieved the learning outcomes of courses "Programming", "Probability and Statistics", "Linear Algebra"
(even without passing the corresponding exams).
- Knowledge Discovery in Databases
- Data pre-processing:
- Ordinal and Categorical Variables
- Classification and Regression:
- k-NN, Decision Trees
- Bias and Variance, overfitting and underfitting
- Ensemble methods: Bagging, Boosting, Random Forests
- Random Forests for feature selection, outlier detection
- Imbalanced data
- Evaluation: accuracy measures, cross-validation
- Clustering:
- k-means, k-medoids, Hierarchical, DB-Scan
- Distance measures, curse of dimensionality
- Intrinsic and extrinsic Evaluation
- Pattern Mining:
- Association rules
- Frequent itemsets mining algorithms
- Introduction to Artificial Neural Networks
Lecture notes. Selected readings provided during the course.
- Introduction to Data Mining (Global Edition), Tan, Steinbach, Karpatne, Kumar. Pearson. 2020.

Learning outcomes are verified by a written exam and a the discussion of a lab project.

The written exam consists in questions and short exercises regarding the theory of the subjects discussed during the course. The written exam evaluates the achievement of the learning outcomes i), ii) e iii).

The lab project requires to conduct a comparative analysis of different tools applied to a specific dataset, or to implement a data mining algorithm.
The student must chose and motivate the most appropriate solution and deliver a report, to be discussed with the teacher. The project work evaluates the achievement of the learning outcomes iii) iv) e v).

The grade is given by 70% written exam plus 30% final project.
Lessons include both theoretical and practical sessions.
Teaching material is delivered through the Moodle platform.
During the course, the python programming language is used together with the scikit-learn library. Students are encouraged to bring their own laptops.
Italian
written and oral
Definitive programme.
Last update of the programme: 21/02/2024