HIGH PERFORMANCE COMPUTING
- Academic year
- 2021/2022 Syllabus of previous years
- Official course title
- HIGH PERFORMANCE COMPUTING
- Course code
- CM0227 (AF:339791 AR:180699)
- Modality
- On campus classes
- ECTS credits
- 6
- Degree level
- Master's Degree Programme (DM270)
- Educational sector code
- INF/01
- Period
- 2nd Semester
- Course year
- 2
- Where
- VENEZIA
- Moodle
- Go to Moodle page
Contribution of the course to the overall degree programme goals
Expected learning outcomes
Students acquire knowledge on models of High Performance Computing architectures, paradigms and environments of parallel programming, and performance evaluation of parallel systems.
Students will achieve the following learning outcomes:
Knowledge and understanding: i) understanding principles of multi-threading and distributed computing; ii) understanding sources and models of costs in parallel environments (cache, memory, network); iii) understanding parallel programming patterns.
Applying knowledge and understanding: i) being able to design and develop parallel programs; ii) being able to estimate and measure performance of a parallel program; iii) being able to develop parallel programs by exploiting parallel programming patterns
Making judgements: i) being able to analyze different parallel programming patterns or different parallel solutions and to choose the most appropriate to a given problem on the basis of a sound cost model
Communication: i) reporting comprehensive comparative analysis among different parallel solutions supported by experiments
Pre-requirements
Contents
- Motivations for Parallel Computing
- Different granularities: Instruction Level Parallelism, Multi-Core, GPU, Distributed Computing
- Examples of large scale applications that require High Performance Computing
2. Instruction level Parallelism
- Introduction to instruction level parallelism
- SIMD paradigm
- SSE and AVX instruction sets
- Intel Intrinsincs
3. Auto-Vectorization
- Data/control dependencies
- Loop Optimizations
- Compilers Auto-vectorization
- Pointer aliasing
- Guidelines for auto-vectorization
4. Cache Aware Algorithms
- Impact of cache in modern architectures
- Cache coherence protocols
- Cache-aware algorithms
- Cache-aware matrix multiplication.
5. Cache Oblivious Algorithms
- Cache-oblivious Models and Algorithms
- Cache-oblivious matrix multiplication
- Cache-oblivious sorting
- Using micro kernels to optimize cache usage and compile auto-vectorization.
- Software tools for evaluating cache performance
6. Thread Parallelism
- Modeling of parallel programs: speed-up, cost-optimality, scalability
- Threads vs. processes, shared memory programming
- C++ std threads, mutexes and condition variables (già fatti ...)
- The OpenMP paradigm: Managing threads and memory management, Parallelizing loops and scheduling policies, Managing parallel sections, Synchronization and introspection
- Multi-threaded algorithms, Impact of parallelism to cache efficiency
- Matrix Multiplication algorithms
7. Patterns of Parallelism
- Modeling parallelism: Task dependency graphs, Task interaction graphs, Parallelism degree and task granularities, Critical path, Mapping guidelines
- Patterns of Parallelism: Embarrassingly parallel problems, Thread pool (farm), Exploratory search, Pipelining, Vertex-centric
- Static vs. dynamic mapping
- Algorithms: quicksort, shellsort, bitonic-sort, prefix sum, connected components
9. HPC on large clusters:
- Distributed file systems
- Fault Tolerance
- The MapReduce paradigm
- The Spark framework
- Algorithms: All Pairs Similarity Search
10. Large-scale data parallelism on GPU:
- GPU architectures
- CUDA for GP-GPU computing
- GPU threads and memory hierarchies
- Parallel patterns and algorithms for GPUs
Referral texts
T. Rauber, G. Rünger, Parallel Programming for Multicore and Cluster Systems, 2nd Ed, Spinger
Assessment methods
The written exam consists in questions regarding the theory of the subjects discussed during the course.
The project requires to design and develop a novel parallel algorithm for a given data analysis task. The student is asked to choose the most appropriate parallel solution, to motivate its choice and to provide a report to be discussed with the teacher.