Hi Henrik Aldermo,
Glad the issue is resolved for you. I will have this answer promoted by reposting it. This is in the attempt to help others looking for a solution for a similar issue.
The extension (Microsoft.HpcCompute.NvidiaGpuDriverWindows
) always installs the latest GRID driver. The extension might be trying to install a driver that is incompatible with Windows Server 2019.
Windows Server 2019 does support the GPU but may require a specific driver version that the extension doesn’t install correctly. For links to all previous Nvidia GRID driver versions, visit GitHub.
Since Windows Server 2022 is officially supported, the extension installs the correct GRID 17.5 driver without issues.
However, in your case the affected system is a virtual machine scale set, so installing drivers manually on every new instance is not viable.
A working solution by specifying the driver version in the NVidia Driver Extension, as documented in the "Known Issues" section here: https://learn.microsoft.com/en-us/azure/virtual-machines/extensions/hpccompute-gpu-windows#known-issues
For a Virtual Machine Scale set, use the following:
az vmss extension set --resource-group MyResourceGroup --vmss-name MyVmss --name NvidiaGpuDriverWindows --publisher Microsoft.HpcCompute --settings "{'driverVersion':'538.46'}"
To view the settings:
az vmss extension list --resource-group MyResourceGroup --vmss-name MyVmss
Please remember to "Accept Answer" if any answer/reply helped, so that others in the community facing similar issues can easily find the solution.