12.04
Instructor: Yixin Zhu
Topics Covered
NVIDIA GPU: The Ideal Resource for AI
- Introduction to NVIDIA GPUs
- Overview of NVIDIA’s GPU architecture and its evolution
- Key features that make NVIDIA GPUs suitable for AI and machine learning tasks
- NVIDIA Accelerate Artificial intelengence
- Accelerate library
- Rapips
- CV-CUDA
- NCCL
- TensorRT
- cuDNN
- Triton
- Accelerate library
- How to scale GPU computing power
- Inter-node connction: PCIe, NVLink
- Multi-node connection: Infiniband, ROCE
- Scaling in AI: Strong Scaling, Weak Scaling
NVIDIA SuperPOD Reference Architecture: Optimal solution for LLM training
- GPU Cluster Component
- Whole picture of NVIDIA SuperPOD
- SuperPOD system design
- DGX/HGX
- Thousands of GPU Network design
- Compute Fabric
- Storage Fabric
- In-Band Management Network
- Out-Of-Band (OOB) Management Network
- Storage & management
- Storage Performance requirements
- Storage performance guidelines
- Management: NVIDIA Base Command Manager Essentials
NVIDIA NEMO Framework: Solutions to Accelerate the Neural Network
- Parallelisms
- Data Parallelism
- Distributed Data Parallel (DDP)
- Distributed Optimizer (DO)
- Fully Sharded Data Parallel (FSDP)
- Model Parallelism
- Tensor Parallelism (TP)
- Pipeline Parallelism (PP)
- Expert Parallelism (EP)
- Activation Partitioning
- Sequence Parallelism (SP)
- Context Parallelism (CP)
- Differences and Comparisons Between Parallel Methods
- Data Parallelism
- NVIDIA Acceleration Solution
- Low-precision Training (FP16, BF16, FP8)
- Flash Attention
- Activation Checkpointing
- CPU Offloading
- Computation and Communication Overlap