- https://www.nat-esm.de/services/trainings/events/fundamentals-of-accelerated-computing-with-cuda-python
- Fundamentals of Accelerated Computing with CUDA Python
- 2025-04-02T09:00:00+02:00
- 2025-04-02T17:00:00+02:00
- At the conclusion of the workshop, you will have an understanding of the fundamental tools and techniques for GPU-accelerating Python applications with CUDA and Numba.
Apr 02, 2025 from 09:00 AM to 05:00 PM
(Europe/Berlin / UTC200)
Date and Time
The course will be held online on April 2 from 9:00 a.m. to 5:00 p.m. (CEST).
Registered participants will receive the Zoom participation link via email the day before the course begins.
Prerequisites
A free NVIDIA developer account is required to access the course material. Please register before the training at https://learn.nvidia.com/join.
Participants should additionally meet the following requirements:
- Basic Python competency, including familiarity with variable types, loops, conditional statements, functions, and array manipulations
- NumPy competency, including the use of ndarrays and ufuncs
- No previous knowledge of CUDA programming is required
Learning Objectives
At the conclusion of the workshop, you will have an understanding of the fundamental tools and techniques for GPU-accelerating Python applications with CUDA and Numba, including:
- GPU-accelerating NumPy ufuncs with just a few lines of code
- Configuring code parallelization using the CUDA thread hierarchy
- Writing custom CUDA device kernels for maximum performance and flexibility
- Using memory coalescing and on-device shared memory to increase CUDA kernel bandwidth
Course Structure
Introduction to CUDA Python with Numba
- Begin working with the Numba compiler and CUDA programming in Python.
- Use Numba decorators to GPU-accelerate numerical Python functions.
- Optimize host-to-device and device-to-host memory transfers.
Custom CUDA Kernels in Python with Numba
- Learn CUDA’s parallel thread hierarchy and how to extend parallel program possibilities.
- Launch massively parallel custom CUDA kernels on the GPU.
- Utilize CUDA atomic operations to avoid race conditions during parallel execution.
Multidimensional Grids, and Shared Memory for CUDA Python with Numba
- Learn multidimensional grid creation and how to work in parallel on 2D matrices.
- Leverage on-device shared memory to promote memory coalescing while reshaping 2D matrices.
Certification
Upon successfully completing the course assessments, participants will receive an NVIDIA DLI Certificate, recognizing their subject matter expertise and supporting their professional career growth.
Instructors
Dr. Sebastian Kuckuk, certified NVIDIA DLI Ambassador.
The course is co-organised by NHR@FAU and the NVIDIA Deep Learning Institute (DLI).
Prices and Eligibility
This course is open and free of charge for participants affiliated with academic institutions in European Union (EU) member states and Horizon 2020-associated countries.