LEARNING WITH MASSIVE DATA

Academic year
2022/2023 Syllabus of previous years
Official course title
LEARNING WITH MASSIVE DATA
Course code
CM0638 (AF:398307 AR:214340)
Modality
On campus classes
ECTS credits
6
Degree level
Master's Degree Programme (DM270)
Educational sector code
ING-INF/05
Period
2nd Semester
Course year
1
Where
VENEZIA
Moodle
Go to Moodle page
The goal of this course is to teach students to design and develop algorithms for the analysis of large-scale data sources in highly parallel (multi-core) and distributed (cloud-based) environments. Some uses cases are chosen among the topics of data mining, web search, and social network analysis.
The course presents the fundamental techniques usually employed to solve large-scale data analysis problems with parallel algorithms.
Students acquire knowledge on models of parallel computing architectures, paradigms and environments of parallel programming, and design of algorithms for massive datasets.

Students will achieve the following learning outcomes:

Knowledge and understanding: i) understanding principles of multi-threading and distributed computing; ii) understanding sources and models of costs massive datasets analysis solutions (cache, memory, network); iii) understanding design patterns for massive data analysis.

Applying knowledge and understanding: i) being able to design and develop algorithms for massive dataset analysis; ii) being able to estimate and measure performance of a parallel program; iii) being able to develop algorithms for massive dataset analysis by exploiting parallel programming patterns

Making judgements: i) being able to analyze different methods and algorithms and to choose the most appropriate to a given problem on the basis of a sound cost model

Communication: i) reporting comprehensive comparative analysis among different solutions supported by experiments
The student is expected to have a good background in computer architectures, algorithms, operating systems and computer networks, C/Python programming.
- Cache-Aware and cache-oblivious Algorithms
- Thread Parallelism
- Large-scale parallelism
- Recommender systems
- Learning to Rank
- Link Analysis
- Advertising on the Web
Lecture notes.

Jure Leskovec, Anand RajaramanJeffrey David Ullman. Mining of Massive Datasets 3rd Edition. Cambridge University Press 2020.

Learning outcomes are verified by a written exam and a project.

The written exam consists in questions regarding the theory of the subjects discussed during the course.

The project requires to design and develop a novel algorithm for a given data analysis task. The student is asked to choose the most appropriate solution, to motivate its choice and to provide a report to be discussed with the teacher.
Lectures and hands-on sessions.
English
written and oral
Definitive programme.
Last update of the programme: 11/07/2022