HIGH PERFORMANCE COMPUTING

Academic year
2019/2020 Syllabus of previous years
Official course title
HIGH PERFORMANCE COMPUTING
Course code
CM0227 (AF:274867 AR:166148)
Modality
On campus classes
ECTS credits
6
Degree level
Master's Degree Programme (DM270)
Educational sector code
INF/01
Period
1st Semester
Course year
2
Where
VENEZIA
Moodle
Go to Moodle page
The goal of this course is to teach students to design and develop algorithms for the analysis of large-scale data sources in highly parallel (multi-core, GPU) and distributed (cloud-based) environments. Some uses cases are chosen among the topics of data mining, web search, and social network analysis.
The course presents the fundamental techniques usually employed to solve large-scale data analysis problems with parallel algorithms, ranging from methods for multi-core CPU architectures to Graphics Processing Units (GPU) -based clusters.
Students acquire knowledge on models of High Performance Computing architectures, paradigms and environments of parallel programming, and performance evaluation of parallel systems.

Students will achieve the following learning outcomes:

Knowledge and understanding: i) understanding principles of multi-threading and distributed computing; ii) understanding sources and models of costs in parallel environments (cache, memory, network); iii) understanding parallel programming patterns.

Applying knowledge and understanding: i) being able to design and develop parallel programs; ii) being able to estimate and measure performance of a parallel program; iii) being able to develop parallel programs by exploiting parallel programming patterns

Making judgements: i) being able to analyze different parallel programming patterns or different parallel solutions and to choose the most appropriate to a given problem on the basis of a sound cost model

Communication: i) reporting comprehensive comparative analysis among different parallel solutions supported by experiments
The students is expected to have a good background in computer architectures, operating systems and computer networks, programming in C++ and Python.
1. Introduction to High Performance Computing
- Motivations for Parallel Computing
- Different granularities: Instruction Level Parallelism, Multi-Core, GPU, Distributed Computing
- Examples of large scale applications that require High Performance Computing
2. Instruction level Parallelism
- Introduction to instruction level parallelism
- SIMD paradigm
- SSE and AVX instruction sets
- Intel Intrinsincs
3. Auto-Vectorization
- Data/control dependencies
- Loop Optimizations
- Compilers Auto-vectorization
- Pointer aliasing
- Guidelines for auto-vectorization
4. Cache Aware Algorithms
- Impact of cache in modern architectures
- Cache coherence protocols
- Cache-aware algorithms
- Cache-aware matrix multiplication.
5. Cache Oblivious Algorithms
- Cache-oblivious Models and Algorithms
- Cache-oblivious matrix multiplication
- Cache-oblivious sorting
- Using micro kernels to optimize cache usage and compile auto-vectorization.
- Software tools for evaluating cache performance
6. Thread Parallelism
- Modeling of parallel programs: speed-up, cost-optimality, scalability
- Threads vs. processes, shared memory programming
- C++ std threads, mutexes and condition variables (già fatti ...)
- The OpenMP paradigm: Managing threads and memory management, Parallelizing loops and scheduling policies, Managing parallel sections, Synchronization and introspection
- Multi-threaded algorithms, Impact of parallelism to cache efficiency
- Matrix Multiplication algorithms
7. Patterns of Parallelism
- Modeling parallelism: Task dependency graphs, Task interaction graphs, Parallelism degree and task granularities, Critical path, Mapping guidelines
- Patterns of Parallelism: Embarrassingly parallel problems, Thread pool (farm), Exploratory search, Pipelining, Vertex-centric
- Static vs. dynamic mapping
- Algorithms: quicksort, shellsort, bitonic-sort, prefix sum, connected components
9. HPC on large clusters:
- Distributed file systems
- Fault Tolerance
- The MapReduce paradigm
- The Spark framework
- Algorithms: All Pairs Similarity Search
10. Large-scale data parallelism on GPU:
- GPU architectures
- CUDA for GP-GPU computing
- GPU threads and memory hierarchies
- Parallel patterns and algorithms for GPUs
Lecture notes.

T. Rauber, G. Rünger, Parallel Programming for Multicore and Cluster Systems, 2nd Ed, Spinger
Learning outcomes are verified by a written exam and a project.

The written exam consists in questions regarding the theory of the subjects discussed during the course.

The project requires to design and develop a novel parallel algorithm for a given data analysis task. The student is asked to choose the most appropriate parallel solution, to motivate its choice and to provide a report to be discussed with the teacher.
Lectures and hands-on sessions.
English
written and oral
Definitive programme.
Last update of the programme: 26/08/2019