DPU Programming

12.18, 12.25

Instructor: Yixin Zhu

Guest Lecturer

Yan Cui, NVIDIA’s DPU & DOCA Evangelist, leads the advancement of DPU and DOCA solutions in China. He drives the growth of the DOCA Developer Community while fostering customer and partner success in next-generation data center infrastructure.

Introduction to NVIDIA DPU Programming: Unlocking the Power of AI Networking

This comprehensive course explores NVIDIA DPU Programming, a cutting-edge technology at the intersection of AI and networking. You will:

  1. Learn about NVIDIA Accelerated Computing and AI Networking
  2. Develop and deploy Data Center Infrastructure Applications using the NVIDIA DOCA Software Framework on the BlueField Networking Platform
  3. Discover how to accelerate AI workloads with NVIDIA AI Networking Technologies

The course consists of a two-part lecture series, comprising 6 one-hour units. Supplementary materials include a textbook and additional resources for extended learning.

By the end of this course, you will:

  • Understand NVIDIA’s leadership in AI through its end-to-end accelerated computing and AI networking technologies
  • Gain proficiency in developing for the NVIDIA BlueField-3 Networking Platform and NVIDIA DOCA Software Framework
  • Apply your skills to build Infrastructure Applications and Services for real-world scenarios

Seize this opportunity to harness the transformative potential of AI Networking with NVIDIA.

Topics Covered

Unit 1: NVIDIA Accelerated Computing and Networking for AI

  • NVIDIA Accelerated Computing
  • NVIDIA Networking for AI

Discover the power of NVIDIA accelerated computing and AI networking technologies. Learn about NVIDIA’s advantages as an end-to-end full-stack accelerated computing and networking provider.

Unit 2: NVIDIA Networking Technologies for AI

  • RDMA & RoCE
  • Magnum IO
  • Adaptive Routing and Congestion Control

Explore key NVIDIA Networking Technologies revolutionizing AI communications and workloads.

Unit 3: NVIDIA BlueField-3 Networking Platform and DOCA Software Framework Overview

  • DPU: Purpose and Benefits
  • NVIDIA BlueField-3 Networking Platform
  • NVIDIA DOCA Software Framework

Learn about the key features and benefits of the NVIDIA BlueField-3 Networking Platform and how to build innovative infrastructure applications using the NVIDIA DOCA Development Environment.

Unit 4: NVIDIA BlueField and DOCA: Installation and Usage

  • NVIDIA BlueField and DOCA Installation
  • NVIDIA BlueField and DOCA Usage

Discover the operation modes, management methods, and network interfaces on BlueField. Learn how NVIDIA BlueField Network Offload can improve performance and efficiency.

Unit 5: NVIDIA BlueField Network Offload: Enhancing Hardware Offload Capabilities

  • NVIDIA BlueField Operation Modes and Basic Configuration
  • NVIDIA BlueField Network Interface
  • Open vSwitch Offload

Learn to harness the power of NVIDIA BlueField network offload to enhance hardware capabilities.

Unit 6: NVIDIA DOCA Development Hands-On: DOCA Application Experience and Execution

  • NVIDIA DOCA Application Reference
  • DOCA Secure Channel Application Introduction
  • DOCA DPA All-to-all Application Introduction

Gain hands-on experience with DOCA Reference Applications. Master the execution and compilation of DOCA Applications, preparing you to design and build your own DOCA applications or services.

Logistics

Objectives

  • Understand the significance of NVIDIA Accelerated Computing and Networking Technologies
  • Master the fundamentals of NVIDIA BlueField-3 Networking Platform and NVIDIA DOCA Software Framework
  • Develop applications and services for creating secure and accelerated infrastructure for various workloads
  • Build applications or services on NVIDIA BlueField-3 using NVIDIA DOCA SDK and APIs

Prerequisites

  • Basic knowledge of networking and the OSI Model
  • Proficiency in Linux programming and command-line interface
  • Familiarity with C programming language

Grading

  • Basic Knowledge Online Test (in English): 5%
    • Individual work
    • Released: 12.25, Deadline: 12.31 23:59
    • Format: 40 single or multiple-choice questions in 70 minutes
    • Evaluation: Maximum 2 attempts, highest score counts
  • Programming Hands-on: 15% (DOCA Project 10%, Short Essay 5%)
    • Teams of 2-5 students
    • Released: 12.25, Deadline: 1.10 23:59
    • DOCA Development Environment available: 12.25 00:00 - 1.7 23:59
    • Project: Choose one from the Project List below
    • Evaluation: 2-page essay in Word or LaTeX, written in Chinese or English

Project List

1. NVIDIA DOCA Secure Channel

Difficulty ★★★★☆

Objectives

  1. Replicate the functionality of the NVIDIA DOCA Secure Channel Application
  2. Understand how to use DOCA Comm Channel APIs for:
    • Creating a secure communication channel
    • Exchanging messages between Host and BlueField-3 DPU
  3. Extend the Secure Channel functions to provide control services on BlueField-3 DPU

Introduction

The DOCA Secure Channel reference application leverages the DOCA Comm Channel API to create a secure, network-independent communication channel between the host and the NVIDIA BlueField DPU. Key features include:

  • Enabling host control of DPU services and offloads
  • Facilitating message exchange using a client-server framework
  • Supporting one-to-many communication (server to multiple clients)
  • Allowing communication between any PF/VF/SF on the host and the DPU server
  • Configurable message size and quantity for simulating heavy load

Note: DOCA SDK 2.5.0 introduced a new API for DOCA Comm Channel, offering high-performance data path and compatibility with DOCA progress engine. The old API will be deprecated in future releases.

References

  • Application source: /opt/mellanox/doca/applications/secure_channel/
  • Configuration file: /opt/mellanox/doca/applications/secure_channel/sc_params.json

System Design

The secure channel application operates in client mode (host) and server mode (DPU), allowing bidirectional message flow once a channel is established.

Application Architecture

The application is built on the DOCA Comm Channel API. The connection flow between client and server is as follows:

  1. Both sides initiate create()
  2. Server listens for new connections
  3. Server calls recvfrom() to prepare for message exchange
  4. Client executes connect() to initiate connection
  5. Client sends the first message
  6. Server responds

This architecture enables secure, efficient communication between the host and DPU, facilitating advanced network operations and offloads.

Compilation

To build the secure channel application:

  1. Direct build method:

    cd /opt/mellanox/doca/applications/
    meson /tmp/build -Denable_all_applications=false -Denable_secure_channel=true
    ninja -C /tmp/build

  2. Using meson_options.txt: a. Edit /opt/mellanox/doca/applications/meson_options.txt:

    • Set enable_all_applications to false
    • Set enable_secure_channel to true

    b. Run compilation commands:

    cd /opt/mellanox/doca/applications/
    meson /tmp/build
    ninja -C /tmp/build

The compiled doca_secure_channel will be created in /tmp/build/secure_channel/.

Running Application

The secure channel application requires compilation before execution. Use the following command to view usage instructions:

./doca_secure_channel -h

or

./doca_secure_channel --help

Application usage:

Usage: doca_secure_channel [DOCA Flags] [Program Flags]

DOCA Flags:
 -h, --help            Print a help synopsis
 -v, --version         Print program version information
 -l, --log-level       Set the (numeric) log level for the program
                       <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING,
                       50=INFO, 60=DEBUG, 70=TRACE>
 --sdk-log-level       Set the SDK (numeric) log level for the program
                       <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING,
                       50=INFO, 60=DEBUG, 70=TRACE>
 -j, --json <path>     Parse all command flags from an input json file

Program Flags:
 -s, --msg-size        Message size to be sent
 -n, --num-msgs        Number of messages to be sent
 -p, --pci-addr        DOCA Comm Channel device PCI address
 -r, --rep-pci         DOCA Comm Channel device representor PCI address
                       (needed only on DPU)

These flags allow you to configure the application’s behavior, including log levels, message size, number of messages, and PCI addresses for communication.

Running on BlueField
  1. Login to BlueField

  2. Enter the code folder

    dpu# cd /opt/mellanox/doca/applications
    dpu/opt/mellanox/doca/applications#
  3. Build DOCA Secure Channel Application on BlueField

    dpu/opt/mellanox/doca/applications# meson /tmp/build --Denable_all_applications=false --Denable_secure_channel=true
    dpu/opt/mellanox/doca/applications# ninja -C /tmp/build
  4. Check device PCIe address

    dpu# mst start
    dpu# mst status -v
    ……
    PCI devices:
    ------------
    DEVICE_TYPE             MST                           PCI       RDMA         NET                                     NUMA
    BlueField3(rev:1)       /dev/mst/mt41692_pciconf0.1   03:00.1   mlx5_1       net-en3f1pf1sf0,net-pf1hpf,net-p1       -1
    BlueField3(rev:1)       /dev/mst/mt41692_pciconf0     03:00.0   mlx5_0       net-en3f0pf0sf0,net-p0,net-pf0hpf       -1
  5. CLI example for running the application on BlueField:

    dpu# ./doca_secure_channel -s 256 -n 10 -p 03:00.0 -r 0b:00.0 

    Note: Both DOCA Secure Channel device PCIe address (03:00.0) and DOCA Comm Channel device representor PCIe address (0b:00.0) should match the addresses of the desired PCIe devices.

Running on Host
  1. Login to Host

  2. Enter the code folder

    host# cd /opt/mellanox/doca/applications
    host/opt/mellanox/doca/applications#
  3. Build DOCA Secure Channel Application on Host

    host/opt/mellanox/doca/applications# meson /tmp/build --Denable_all_applications=false --Denable_secure_channel=true
    host/opt/mellanox/doca/applications# ninja -C /tmp/build
  4. Check device representor PCIe address

    host# mst start
    host# mst status -v
    ……
    PCI devices:
    ------------
    DEVICE_TYPE             MST                                PCI       RDMA            NET                                     NUMA
    BlueField3(rev:1)       /dev/mst/mt41692_pciconf0         0b:00.0   mlx5_0          net-ens192f0np0                         -1
    BlueField3(rev:1)       /dev/mst/mt41692_pciconf0.1       0b:00.1   mlx5_1          net-ens192f1np1                         -1
  5. CLI example for running the application on Host:

    host# ./doca_secure_channel -s 256 -n 10 -p 0b:00.0

    Note: DOCA Comm Channel device representor PCIe address (0b:00.0) should match the address of the desired PCIe device.

Code Description

BlueField Side

  1. Set Secure channel configuration operation mode to Run endpoint in DPU:

    app_cfg.mode = SC_MODE_DPU;

  2. Parse cmdline/json arguments:

    register_secure_channel_params()

  3. Initialize Communication Channel context: init_cc()

    • Create Comm Channel endpoint:

      doca_comm_channel_ep_create()

    • Open Comm Channel DOCA device based on PCI address: open_doca_device_with_pci()

    • Open Comm Channel DOCA device representor based on PCI address: open_doca_device_rep_with_pci()

    • Set Comm Channel context properties, including DOCA device, max_msg_size, snd_queue_size, rcv_queue_size, set DOCA device representor: set_cc_properties()

    • Secure Channel secure_channel_server start listening: doca_comm_channel_ep_listen()

  4. Initiate all relevant signal and epoll file descriptors: init_signaling_polling()

    • Create Comm Channel send/receive epoll instance: fd = epoll_create1(0)

    • Create send/receive termination file descriptor, and add termination file descriptor to epoll instance:

      fd = signalfd(-1, &signal_mask, 0);

      epoll_ctl(*cc_send_epoll_fd, EPOLL_CTL_ADD, *send_interrupt_fd, &intr_fd)

  5. Extract the event_channel handles for user’s use. When the user send/receive packets with non-blocking mode, this handle can use epoll() to get interrupt when a new event happened:

    doca_comm_channel_ep_get_event_channel(ctx->ep, &ctx->cc_send_fd, &ctx->cc_recv_fd)

  6. Start threads and wait for them to finish: start_threads()

    • start sendto thread

      pthread_create(ctx->sendto_t, NULL, sendto_channel, (void *)ctx)

    • start recvfrom thread

      pthread_create(ctx->recvfrom_t, NULL, recvfrom_channel, (void *)ctx)

      • Add Comm Channel receive file descriptor to receive epoll instance:

        epoll_ctl(ctx->cc_recv_epoll_fd, EPOLL_CTL_ADD, ctx->cc_recv_fd, &recv_event)

      • while (1) {

        doca_comm_channel_ep_recvfrom(ctx->ep, recv_buffer, &msg_len, DOCA_CC_MSG_FLAG_NONE, &curr_peer);

        Check if interrupt was received (events[ev_idx].data.fd == ctx->recv_intr_fd), if yes, Receive thread exiting, total amount of messages received successfully.

        Signal send thread to start sending messages

        }

Host Side

  1. Parse cmdline/json arguments: register_secure_channel_params()

  2. Initialize Communication Channel context: init_cc()

    • Create Comm Channel endpoint:

      doca_comm_channel_ep_create()

    • Open Comm Channel DOCA device based on PCI address:

      open_doca_device_with_pci()

    • Set Comm Channel context properties, including DOCA device, max_msg_size, snd_queue_size, rcv_queue_size, set DOCA device representor:

      set_cc_properties()

    • Establish a connection with DPU node:

      doca_comm_channel_ep_connect()

  3. Initiate all relevant signal and epoll file descriptors: init_signaling_polling()

    • Create Comm Channel send/receive epoll instance:

      fd = epoll_create1(0)

    • Create send/receive termination file descriptor, and add termination file descriptor to epoll instance:

      fd = signalfd(-1, &signal_mask, 0);

      epoll_ctl(*cc_send_epoll_fd, EPOLL_CTL_ADD, *send_interrupt_fd, &intr_fd)

  4. Extract the event_channel handles for user’s use. When the user send/receive packets with non-blocking mode, this handle can use epoll() to get interrupt when a new event happened:

    doca_comm_channel_ep_get_event_channel(ctx->ep, &ctx->cc_send_fd, &ctx->cc_recv_fd)

  5. Start threads and wait for them to finish: start_threads()

    • start recvfrom thread:

      pthread_create(ctx->recvfrom_t, NULL, recvfrom_channel, (void *)ctx)

    • start sendto thread:

      pthread_create(ctx->sendto_t, NULL, sendto_channel, (void *)ctx)

      • Add Comm Channel send file descriptor to send epoll instance

        epoll_ctl(ctx->cc_send_epoll_fd, EPOLL_CTL_ADD, ctx->cc_send_fd, &send_event)

      • while (msg_nb) {

        result=doca_comm_channel_ep_sendto(ctx->ep, send_buffer, ctx->cfg->send_msg_size, DOCA_CC_MSG_FLAG_NONE, ctx->peer);

        //Check if interrupt was received: events[ev_idx].data.fd == ctx->send_intr_fd

        If yes, send thread exiting, total amount of messages sent successfully

        }

Project Direction

  1. Modify message parameters:

    • Experiment with different message sizes using the -s or --msg-size flag.
    • Vary the number of messages sent using the -n or --num-msgs flag.
    • Example: ./doca_secure_channel -s 512 -n 100 -p <PCI_ADDRESS> [-r <REP_PCI_ADDRESS>]
  2. Enhance logging and debugging:

    • Increase the log level using the -l or --log-level flag for more detailed output.
    • Add print statements in the source code to show detailed information about:
      • Channel connection establishment
      • Message transmission progress
      • Timing information for performance analysis
  3. Implement JSON-based configuration:

    • Create a JSON file with various configurations (e.g., sc_params.json)
    • Run the application using the JSON file: ./doca_secure_channel --json ./sc_params.json
  4. Explore different deployment scenarios:

    • Test communication between different PF/VF/SF combinations
    • Verify behavior with multiple clients connecting to the server (DPU) side
  5. Error handling and resilience:

    • Implement more robust error checking and handling in the application code
    • Test application behavior under various error conditions (e.g., connection loss, invalid parameters)
  6. Performance optimization:

    • Profile the application to identify potential bottlenecks
    • Experiment with different buffer sizes and threading models for improved performance
  7. Extended functionality:

    • Implement a simple control protocol over the secure channel
    • Add support for bidirectional simultaneous communication
  8. Integration with other DOCA applications:

    • Explore how the Secure Channel can be used in conjunction with other DOCA applications or services

Documentation

For detailed information about the NVIDIA DOCA Secure Channel Application, refer to the official guide: NVIDIA DOCA Secure Channel Application Guide

Key sections to review in the documentation:

  • System Design and Application Architecture
  • DOCA Libraries used (DOCA Comch)
  • Compilation instructions
  • Running the Application (including command-line flags and JSON-based deployment)
  • Application Code Flow

2. NVIDIA DOCA DPA All-to-All

Difficulty: ★ ★ ★ ★ ☆

Objective

  • Replicate the functionality of NVIDIA DOCA DPA All-to-all Application
  • Understand how to use DOCA DPA APIs for accelerating MPI all-to-all collective on BlueField-3 DPU
  • Extend the DPA All-to-all functions to improve Collective Operation performance on BlueField-3 DPU

Introduction

The NVIDIA DPA All-to-All application demonstrates how the Message Passing Interface (MPI) all-to-all collective can be accelerated using the Data Path Accelerator (DPA). In an MPI collective, all processes within the same job call the collective routine.

Given a communicator of n ranks, the application performs a collective operation where all processes send and receive the same amount of data from all other processes (hence “all-to-all”).

System Design

All-to-all is an MPI method. MPI is a standardized and portable message passing standard designed for parallel computing architectures. An MPI program consists of several processes running in parallel.

All-to-All Operation
All-to-All Operation
  • Each process in the diagram divides its local sendbuf into n blocks (4 in this example), each containing sendcount elements (4 in this example). Process i sends the k-th block of its local sendbuf to process k, which places the data in the i-th block of its local recvbuf.

  • Implementing the all-to-all method using DOCA DPA offloads the copying of elements from the srcbuf to the recvbufs to the DPA, freeing the CPU to perform other computations.

Application Architecture

The following diagram illustrates the differences between host-based all-to-all and DPA all-to-all operations:

Host-based vs DPA All-to-All
Host-based vs DPA All-to-All
  • In DPA all-to-all, DPA threads perform the all-to-all operation, freeing the CPU for other computations.
  • In host-based all-to-all, the CPU must still perform the all-to-all operation at some point and is not completely available for other computations.

Compilation

To build only the DPA all-to-all application:

cd /opt/mellanox/doca/applications/
meson /tmp/build -Denable_all_applications=false -Denable_dpa_all_to_all=true
ninja -C /tmp/build

Alternatively, users can set the desired flags in the meson_options.txt file:

  1. Edit the following flags in /opt/mellanox/doca/applications/meson_options.txt:

    • Set enable_all_applications to false
    • Set enable_dpa_all_to_all to true
  2. Run the following compilation commands:

cd /opt/mellanox/doca/applications/
meson /tmp/build
ninja -C /tmp/build

The doca_dpa_all_to_all executable is created under /tmp/build/dpa_all_to_all/.

Running Application

The DPA all-to-all application is provided in source form. Therefore, compilation is required before execution.

Application usage instructions (run ./doca_dpa_all_to_all -h or ./doca_dpa_all_to_all --help):

Usage: doca_dpa_all_to_all [DOCA Flags] [Program Flags]

DOCA Flags:
-h, --help                Print a help synopsis
-v, --version             Print program version information
-l, --log-level           Set the (numeric) log level for the program 
                          <10=DISABLE, 20=CRITICAL, 30=ERROR, 
                          40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>
--sdk-log-level           Set the SDK (numeric) log level for the program 
                          <10=DISABLE, 20=CRITICAL, 30=ERROR, 
                          40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>
-j, --json <path>         Parse all command flags from an input json file

Program Flags:
-m, --msgsize <Message size>   The message size - the size of the 
                               sendbuf and recvbuf (in bytes). 
                               Must be in multiples of integer size.
                               Default is size of one integer times
                               the number of processes.
-d, --devices <IB device names> IB devices names that support DPA, separated
                                by comma without spaces (max of two 
                                devices). If not provided, a random
                                IB device will be chosen.
Running on BlueField
  1. Login to BlueField

  2. Enter the code folder:

    dpu# cd /opt/mellanox/doca/applications
    dpu/opt/mellanox/doca/applications#
  3. MPI is used for compilation and running of this application. Ensure that MPI is installed on your setup. By default, DOCA All will provide openmpi but not mpicc. Run the following commands:

    • Check if mpicc is installed:

      dpu# dpkg -l | grep mpich

    • If not installed, install mpicc:

      dpu# apt-get install mpich

  4. Build DOCA DPA All-to-all Application on BlueField:

    # meson /tmp/build --Denable_all_applications=false --Denable_dpa_all_to_all=true
    # ninja -C /tmp/build
  5. Check the mlx device name on BlueField:

    # mst status -v
    ……
    PCI devices:
    ------------
    DEVICE_TYPE             MST                           PCI       RDMA         NET                                     NUMA
    BlueField3(rev:1)       /dev/mst/mt41692_pciconf0.1   03:00.1   mlx5_1       net-en3f1pf1sf0,net-pf1hpf,net-p1       -1
    BlueField3(rev:1)       /dev/mst/mt41692_pciconf0     03:00.0   mlx5_0       net-en3f0pf0sf0,net-p0,net-pf0hpf       -1
  6. Run DPU All-to-all application with 4 processes, 32 bytes as message size, and mlx5_0 as the InfiniBand device:

    # mpirun -np 4 /tmp/build/doca_dpa_all_to_all -m 32 -d "mlx5_0"
DPA All-to-All Execution Output 1
DPA All-to-All Execution Output 1
DPA All-to-All Execution Output 2
DPA All-to-All Execution Output 2

Notes:

  • -d specifies the RDMA device shown in the previous step
  • -m is the message size, representing the size of the sendbuf & recvbuf. It’s divided into nProcs * Buffer size, and BufSize is further divided into message count * nProcs

Code Description

  1. Initialize MPI:

    MPI_Init(&argc, &argv);

  2. Parse application arguments:

    • Initialize arg parser resources and register DOCA general parameters:
      doca_argp_init();
    • Register the application’s parameters:
      register_all_to_all_params();
    • Parse the arguments:
      doca_argp_start();
    • Only let the first process (of rank 0) parse the parameters to then broadcast them to the rest of the processes.
  3. Check and prepare the needed resources for the all_to_all call:

    • Check the number of processes (maximum is 16).
    • Check the msgsize. It must be in multiples of integer size and at least the number of processes times integer size.
    • Allocate the sendbuf and recvbuf according to msgsize.
  4. Prepare the resources required to perform the all-to-all method using DOCA DPA:

    • Initialize DOCA DPA context:
      • Open DOCA DPA device (DOCA device that supports DPA):
        open_dpa_device();
      • Create DOCA DPA context using the opened device:
        doca_dpa_create();
    • Create the required events for the all-to-all:
      create_dpa_a2a_events() {
          doca_dpa_event_create(doca_dpa, DOCA_DPA_EVENT_ACCESS_DPA, DOCA_DPA_EVENT_ACCESS_CPU, DOCA_DPA_EVENT_WAIT_DEFAULT, &comp_event, 0); 
          for (i = 0; i < resources->num_ranks; i++)
              doca_dpa_event_create(doca_dpa, DOCA_DPA_EVENT_ACCESS_REMOTE, DOCA_DPA_EVENT_ACCESS_DPA, DOCA_DPA_EVENT_WAIT_DEFAULT, &(kernel_events[i]), 0);
      }
    • Create DOCA DPA worker (for the endpoints):
      doca_dpa_worker_create();
    • Prepare DOCA DPA endpoints:
      • Create DOCA DPA endpoints as the number of processes/ranks:
        for (i = 0; i < resources->num_ranks; i++)
            doca_dpa_ep_create();
      • Connect the local process’ endpoints to the other processes’ endpoints:
        connect_dpa_a2a_endpoints();
      • Export the endpoints to DOCA DPA device endpoints and copy them to DPA heap memory:
        for (int i = 0; i < resources->num_ranks; i++) {
            result = doca_dpa_ep_dev_export();
            doca_dpa_mem_alloc();
            doca_dpa_h2d_memcpy();
        }
    • Prepare the memory required for the all-to-all method:
      prepare_dpa_a2a_memory();
  5. Launch the alltoall_kernel using DOCA DPA kernel launch:

    • Every MPI rank launches a kernel of up to MAX_NUM_THREADS (16 in this example).
    • Launch alltoall_kernel using kernel_launch:
      doca_dpa_kernel_launch();
    • Copy the relevant sendbuf to the correct recvbuf for every process:
      for (i = thread_rank; i < num_ranks; i += num_threads)
          doca_dpa_dev_put_signal_nb();
    • Wait until the alltoall_kernel has finished:
      doca_dpa_event_wait_until();
  6. Destroy the a2a_resources:

    • Free all the DOCA DPA memories:
      doca_dpa_mem_free();
    • Unregister all the DOCA DPA host memories:
      doca_dpa_mem_unregister();
    • Destroy all the DOCA DPA endpoints:
      doca_dpa_ep_destroy();
    • Destroy the DOCA DPA worker:
      doca_dpa_worker_destroy();
    • Destroy all the DOCA DPA events:
      doca_dpa_event_destroy();
    • Destroy the DOCA DPA context:
      doca_dpa_destroy();
    • Close the DOCA device:
      doca_dev_close();

Project Direction

  1. Enhance the code with additional parameters:

    • Add input for running multiple iterations
    • Calculate and report execution time
  2. Increase the number of DPA Execution Units (EUs) to test alltoall performance

  3. Implement additional customizations and extensions:

    • Add multi-server support
    • Integrate secure_channel logic
    • Explore other MPI collective operations that could benefit from DPA acceleration

Documents

OpenMPI
DOCA DPA
DOCA MMAP
DOCA RDMA
DOCA AlltoAll

Resources

Book

Homepages & Documents

Self-paced Free Online Courses

Free DOCA Development Environment

QR Code for Free DOCA Development Environment
QR Code for Free DOCA Development Environment

Scan QR-Code for applying free DOCA Development Environment by NVIDIA Authorized Partner DPU & DOCA Excellence Center. An-Link is a new DPU & DOCA Excellence Center and will provide access to Free DOCA Development Environment later.

Previous