Introduction to CUDA Programming

12.11

Instructor: Yixin Zhu

Guest Lecturer

Siteng Ma is an NVIDIA Solution Architect specializing in GPU computing solutions. He designs and researches applications of GPUs in accelerated computing, deep learning, and data science.

Topics Covered

CUDA Basics

  • CUDA Programming Model
    • CUDA Thread Hierarchy
    • CUDA Memory Hierarchy
    • Asynchronous Operations
    • Compute Capability
  • CUDA Code Example
    • “Hello world” code basics: C++ Language Extensions
    • Single Precision Alpha X PLUS Y (SAXPY)
    • Matrix Multiplication (matmul)

CUDA Optimization

  • Fundamental Optimization
    • Hiding Latency
    • Maximizing throughput
  • Advanced Optimization
    • Shared Memory
    • Streams and concurrency
    • MPS
  • Using accelerated libraries
    • cublas, cutlas, cdnn ……

GPU Performance Analysis

  • Nsight Systems
    • What’s the Nsight systems
    • How do we use Nsight systems – An example of LLM training timeline
  • Nsight Compute
    • What’s the Nsight compute
    • How do we use Nsight compute – An example of SAXPY
Previous
Next