LEARNING WITH MASSIVE DATA

Academic year
2024/2025 Syllabus of previous years
Official course title
LEARNING WITH MASSIVE DATA
Course code
CM0638 (AF:513745 AR:286757)
Modality
On campus classes
ECTS credits
6
Degree level
Master's Degree Programme (DM270)
Educational sector code
ING-INF/05
Period
2nd Semester
Course year
1
Where
VENEZIA
Moodle
Go to Moodle page
The goal of this course is to teach students to design and develop algorithms for the analysis of large-scale data sources in highly parallel (multi-core) and distributed (cluster) environments. Some uses cases are chosen among the topics of data mining, web search, and social network analysis.
The course presents the fundamental techniques usually employed to solve large-scale data analysis problems with parallel algorithms.
Students acquire knowledge on models of parallel computing architectures, paradigms and environments of parallel programming, and design of algorithms for massive datasets.

Students will achieve the following learning outcomes:

i) Knowledge and understanding: understanding principles of multi-threading and distributed computing; understanding sources and models of costs massive datasets analysis solutions (cache, memory); understanding design patterns for massive data analysis.

ii) Applying knowledge and understanding: being able to design and develop algorithms for massive dataset analysis; being able to estimate and measure performance of a parallel program; being able to develop algorithms for massive dataset analysis by exploiting parallel programming patterns.

iii) Making judgements: being able to analyze different methods and algorithms and to choose the most appropriate to a given problem on the basis of a sound cost model.

iv) Communication skills: reporting a sound and comprehensive comparative analysis among different solutions supported by experiments.

v) Learning skills: being able to autonomously adopt new techniques and methods.
Students are expected to have a good background in computer architectures, algorithsm, operating systems and computer networks, C/C++/Python programming. A short C++ tutorial for Python programmers is available here: https://runestone.academy/ns/books/published/cpp4python/index.html .
- Cache-Aware and cache-oblivious Algorithms
- Thread Parallelism
- Large-scale parallelism
- Recommender systems
- Learning to Rank
- Link Analysis
- Advertising on the Web
Lecture notes.

Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman. Mining of Massive Datasets 3rd Edition. Cambridge University Press 2020.
Learning outcomes are verified by a written exam and a the discussion of a three project assignments.

The written exam consists in questions and short exercises regarding the theory of the subjects discussed during the course. The written exam evaluates the achievement of the learning outcomes i), ii) e iii).

Each assignment requires to design and develop an algorithm for a given massive data analysis task. The student is asked to choose the most appropriate solution, to motivate its choice and to provide a report to be discussed with the teacher. The assignments evaluate the achievement of the learning outcomes iii) iv) e v).

The grade is given by 70% written exam plus 30% final project.
Lectures and case studies.
English
written and oral
Definitive programme.
Last update of the programme: 21/02/2024