Azure Batch Ubuntu-HPC 22.04 pool with containers cannot detect CUDA GPUs

Sonia Das 0 Reputation points
2025-04-23T20:38:24.46+00:00

We have an Azure Batch pool configured as follows:

  • OS: Linux
  • NodeAgentSKUId: batch.node.ubuntu 22.04
  • ImageReference:
      {
        "publisher": "microsoft-dsvm",
        "offer":     "ubuntu-hpc",
        "sku":       "2204",
        "version":   "latest"
      }
    
  • ContainerConfiguration: Docker-compatible enabled.

This pool was created to replace our Ubuntu 20.04 nodes, which reach end of standard support on 31 May 2025 Ubuntu. After upgrading, any container task that tries to invoke CUDA code (e.g. via nvidia-smi or an FFmpeg/CUDA pipeline) logs:

libevent use phreads:0
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/app/VModeApp/vmode/stitched.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    creation_time   : 2024-01-29T10:48:33.000000Z
    encoder         : Lavf60.3.100
  Duration: 00:00:10.44, start: 0.000000, bitrate: 139581 kb/s
  Stream #0:0[0x1](und): Video: h264 (Constrained Baseline) (avc1 / 0x31637661), yuvj420p(pc, bt709/unknown/unknown, progressive), 7680x3840, 137474 kb/s, 29.97 fps, 29.97 tbr, 90k tbn (default)
      Metadata:
        creation_time   : 2024-01-29T10:48:33.000000Z
        handler_name    : VideoHandler
        vendor_id       : [0][0][0][0]
      Side data:
        stereo3d: 2D, view: packed, primary eye: none
        spherical: equirectangular 
[swscaler @ 0x4d24c00] deprecated pixel format used, make sure you did set range correctly
/usr/local/vcpkg/buildtrees/popsift/src/v0.9-f30485bff3.clean/src/popsift/common/device_prop.cu:23
    Cannot get the current CUDA deviceno CUDA-capable device is detected
Sentry is attempting to send 2 pending events
Waiting up to 2 seconds
Press Ctrl-C to quit

Despite running on N-series VMs and using the verified DSVM HPC image, the containerized workload cannot see the GPU.

We need guidance on why the Ubuntu 22.04 DSVM HPC image isn’t exposing CUDA devices within containers and how to resolve it.

Thanks in advance !

Azure Batch
Azure Batch
An Azure service that provides cloud-scale job scheduling and compute management.
368 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.