When

Mar 26, 2025 from 09:30 AM to 05:00 PM
(Europe/Berlin / UTC100)

Where

Online

Contact Name

Add event to calendar

iCal

Date and Time

The course will be held online on March 26 from 9:00 a.m. to 5:00 p.m. (CET).

Registered participants will receive the Zoom participation link via email the day before the course begins.

This course is part three of the three-event series, "From Zero to Multi-Node GPU Programming". Please register individually for each day you wish to attend:

Prerequisites

A free NVIDIA developer account is required to access the course material. Please register before the training at https://learn.nvidia.com/join.

Participants should additionally meet the following requirements:

  • Successful completion of Part 1: Fundamentals of Accelerated Computing with CUDA C/C++, or equivalent experience in implementing CUDA C/C++ applications, including:
    • Memory allocation
    • Host-to-device and device-to-host memory transfers
    • Kernel launches
    • Grid-stride loops
    • CUDA error handling
  • [Optional but recommended]: Prior attendance of Part 2: Accelerating CUDA C++ Applications with Multiple GPUs
  • Familiarity with the Linux command line
  • Experience using Makefiles

Learning Objectives

At the conclusion of the workshop, you will be able to:

  • Use several methods for writing multi-GPU CUDA C++ applications,
  • Use a variety of multi-GPU communication patterns and understand their tradeoffs,
  • Write portable, scalable CUDA code with the single-program multiple-data (SPMD) paradigm using CUDA-aware MPI and NVSHMEM,
  • Improve multi-GPU SPMD code with NVSHMEM’s symmetric memory model and its ability to perform GPU-initiated data transfers, and
  • Get practice with common multi-GPU coding paradigms like domain decomposition and halo exchanges.

Course Structure

Multi-GPU Programming Paradigms

  • Survey multiple techniques for programming CUDA C++ applications for multiple GPUs using a Monte-Carlo approximation of Pi CUDA C++ program.
  • Use CUDA to utilize multiple GPUs.
  • Learn how to enable and use direct peer-to-peer memory communication.
  • Write an SPMD version with CUDA-aware MPI.

Introduction to NVSHMEM

  • Learn how to write code with NVSHMEM and understand its symmetric memory model.
  • Use NVSHMEM to write SPMD code for multiple GPUs.
  • Utilize symmetric memory to let all GPUs access data on other GPUs.
  • Make GPU-initiated memory transfers.

Halo Exchanges with NVSHMEM

  • Practice common coding motifs like halo exchanges and domain decomposition using NVSHMEM, and work on the assessment.
  • Write an NVSHMEM implementation of a Laplace equation Jacobi solver.
  • Refactor a single GPU 1D wave equation solver with NVSHMEM.
  • Complete the assessment and earn a certificate.

Certification

Upon successfully completing the course assessments, participants will receive an NVIDIA DLI Certificate, recognizing their subject matter expertise and supporting their professional career growth.

Instructors

Dr. Sebastian Kuckuk, Markus Velten, both certified NVIDIA DLI Ambassadors.

The course is co-organised by NHR@FAUNHR@TUD and the NVIDIA Deep Learning Institute (DLI).

Prices and Eligibility

This course is open and free of charge for participants affiliated with academic institutions in European Union (EU) member states and Horizon 2020-associated countries.