Applied High Performance Computing

PhD School at the Faculty of SCIENCE at University of Copenhagen

Aim and content

This is a toolbox course where 80% of the seats are reserved to PhD students enrolled at the Faculty of SCIENCE at UCPH and 20% og the seats are reserved to PhD students from other Danish Universities/faculties (except CBS).
Anyone can apply for the course, but if you are not a PhD student at a Danish university (except CBS), you will be placed on the waiting list until enrollment deadline. After the enrollment deadline, available seats will be allocated to applicants on the waiting list.
The course is free of charge for PhD students at Danish universities (except CBS).
All other participants must pay the course fee (except if you are a master’s student from a Danish University).

Aim
Computational methods are becoming essential in many areas of science, and the solution to many problems depend on computers that are vastly faster and holds more memory than what a single high-end server can offer. Top supercomputers consist of up to a billion processor cores working in parallel and new supercomputers are mostly based on GPUs for high performance modelling and AI workloads. Programming such highly parallel computers is difficult, and ensuring both program correctness and high performance is non-trivial.
In this course students will learn how to get high performance from applications, how to use accelerators (GPUs), and how to parallelise programs inside a single server (shared mememory parallelisation) and across many computers (distrbuted memory parallelisation).
Lectures will introduce the theoretical concepts, and it is put in to practice through hands-on exercises. Students will learn to map algorithms to parallel architectures and how to decompose problems for parallel execution.
We will use ERDA to execute the programs on a real high performance computing infrastructure and evaluate both performance, scalability and correctness of the programs. The hands-on exercises use real-world examples to illustrate different techniques that are well-suited to each parallel architecture.
During the exercises we will each week introduce a new tool to aid in the development of high performing programs.
We will use Python, C++, and Fortran as the course languages, and most exercises will be available in all three, while some will only be avilable in C++ and Fortran. The students can use the language of their choice to complete the exercises.

Detailed Content
Week 1:
Single core performance: Memory access, vectorization, Structure-of-Arrays vs Arrays-of-Structures
Tools: Performance profiler, Makefile, Debugger
Platform: ERDA DAG
Exercise: Molecular dynamics [Python, C++, Fortran]

Week 2:
Data processing: Task farming, message passing
Tools: SLURM batch system
Platform: ERDA MODI SLURM cluster
Exercise: data processing workflows [Python, C++, Fortran]

Week 3:
Shared memory architecture: Threads, OpenMP
Tools: Thread checker
Platform: ERDA DAG
Exercise: Propagation of a seismic wave [C++, Fortran]

Week 4:
GPUs: SIMT architecture, programming for an infinite number of cores
Tools: GPU profiler
Platform: ERDA DAG with datacenter GPUs

Week 5:
Distributed memory architecture: MPI, domain decomposition
Tools: debugging and benchmarking across many computers
Platform: ERDA MODI SLURM cluster
Exercise: Flat world climate model [Python, C++, Fortran]

Learning outcomes
Intended learning outcome for the students who complete the course:

Knowledge:
The students will understand the challenges in addressing parallelization and adaptation to GPUs of applications and limitations of the available hardware.

Skills:
• Design and implement parallel applications.
• Use a SLURM batch system to execute large parallel applications on a supercomputer.
• Implement simple task farming to scale data analytics applications across many computers for orders of magnitude reduction in time-to-solution.
• Adapt a program to execute in parallel on a shared memory computer using OpenMP.
• Parallelize across computers with Message Passing Interface (MPI).
Transform a program to execute efficiently on a GPUs and understand when it is beneficial.

Competences:
The overall purpose of this course is to enable the student to write high performance parallel applications on a range of parallel computer architectures and be able to deploy data analytics workloads effortlessly on a supercomputer using a batch system.

Target Group
The course is aimed at PhD students, who need to understand how to use GPUs and parallel computing tools to scale their applications and data analytics workflows.

Recommended Academic Qualifications
Academic qualifications equivalent to a MSc degree.
It is necessary to have basic programming experience. Having some experience with applications in scientific modelling, simulation or data-processing is useful. Course languages are Python, C++ and Fortran.

Research Area
This course is broadly relevant for students working with large-scale data analysis and modelling both in SCIENCE and in other fields, such as life sciences, social sciences and economics.

Teaching and Learning Methods
The course is composed of sessions combining lectures and exercises. Each week a new topic will be introduced. and the students will get hands-on experience in applying, modifying, and programming.
The student can choose the programming language that is most relevant for them.

The generic structure for each week is:
• Preparation:
o background literature to give an overview and provide reference material
o motivational video explaining the science behind the exercise
o video introduction to the exercise

• Lecture:
o Covering the theory behind the topic, the relation and limitations in terms of hardware, and an introduction to the relevant methods

• Class instruction:
o Introduction to exercise
o Introduction to relevant tools
o Hands-on help with exercise

• Exercise:
O Practical exercise allowing the student to put the theory in practice using the programming language of choice

Type of Assessment
To pass the course, the student must hand-in at least four of the five exercises. Solutions to all exercises and general feedback on the exercises will be provided afterwards.

Literature
Literature will be provided on the Absalon page

Course coordinator
Troels Haugbølle, Asssociate Professor, haugboel@nbi.ku.dk

Dates
Every Wednesday 9 - 17 in a 5 week period starting last week in September 2025.
Weeks 39, 40, 41, 43, 44.

Course location
TBD

Registration
Deadline for registration: 12 August 2025

Seats to PhD students from other Danish universities will be allocated on a first-come, first-served basis and according to the applicable rules.
Applications from other participants will be considered after the deadline for registration.

Cancellation policy
• Cancellations made up to two weeks before the course starts are free of charge.
• Cancellations made less than two weeks before the course starts will be charged a fee of DKK 3.000
• Participants with less than 80% attendance cannot pass the course and will be charged a fee of DKK 5.000
• No-show will result in a fee of DKK 5.000
• Participants who fail to hand in any mandatory exams or assignments cannot pass the course and will be charged a fee of DKK 5.000

Course fee and participant fee
PhD courses offered at the Faculty of SCIENCE have course fees corresponding to different participant types.
In addition to the course fee, there might also be a participant fee.
If the course has a participant fee, this will apply to all participants regardless of participant type - and in addition to the course fee.

Back

Applied High Performance Computing

PhD School at the Faculty of SCIENCE at University of Copenhagen

Cookie consent