NVIDIA Reference Architecture for Modern AI

12.04

Instructor: Yixin Zhu

Topics Covered

NVIDIA GPU: The Ideal Resource for AI

  • Introduction to NVIDIA GPUs
    • Overview of NVIDIA’s GPU architecture and its evolution
    • Key features that make NVIDIA GPUs suitable for AI and machine learning tasks
  • NVIDIA Accelerate Artificial intelengence
    • Accelerate library
      • Rapips
      • CV-CUDA
      • NCCL
      • TensorRT
      • cuDNN
      • Triton
  • How to scale GPU computing power
    • Inter-node connction: PCIe, NVLink
    • Multi-node connection: Infiniband, ROCE
    • Scaling in AI: Strong Scaling, Weak Scaling

NVIDIA SuperPOD Reference Architecture: Optimal solution for LLM training

  • GPU Cluster Component
    • Whole picture of NVIDIA SuperPOD
    • SuperPOD system design
  • DGX/HGX
  • Thousands of GPU Network design
    • Compute Fabric
    • Storage Fabric
    • In-Band Management Network
    • Out-Of-Band (OOB) Management Network
  • Storage & management
    • Storage Performance requirements
    • Storage performance guidelines
    • Management: NVIDIA Base Command Manager Essentials

NVIDIA NEMO Framework: Solutions to Accelerate the Neural Network

  • Parallelisms
    • Data Parallelism
      • Distributed Data Parallel (DDP)
      • Distributed Optimizer (DO)
      • Fully Sharded Data Parallel (FSDP)
    • Model Parallelism
      • Tensor Parallelism (TP)
      • Pipeline Parallelism (PP)
      • Expert Parallelism (EP)
    • Activation Partitioning
      • Sequence Parallelism (SP)
      • Context Parallelism (CP)
    • Differences and Comparisons Between Parallel Methods
  • NVIDIA Acceleration Solution
    • Low-precision Training (FP16, BF16, FP8)
    • Flash Attention
    • Activation Checkpointing
    • CPU Offloading
    • Computation and Communication Overlap
Previous
Next