nvidia-smi command could communicate with nvidia driver (Microsoft azure N series)

Nilesh Maharjan 0 Reputation points
2024-11-14T23:54:13.27+00:00

Issue Summary: Following this guide, https://learn.microsoft.com/en-us/azure/virtual-machines/linux/n-series-driver-setup I'm having trouble installing NVIDIA GPU drivers on an Azure N-series VM running Ubuntu. I've tried multiple approaches but keep encountering different issues.

What I've Tried So Far:

Installation with Secure Boot Enabled:

During the installation, I was prompted to set a password. After the installation was completed, I rebooted the VM but never saw an option to enter this password, which seems to prevent the driver from initializing.

Installation with Secure Boot Disabled:

I attempted the installation with Secure Boot disabled. The installation was completed without prompting for a password, but after running a sudo reboot, the VM became stuck in a reboot loop and never fully starts.

Questions:

Why am I not seeing the option to enter the Secure Boot password after rebooting? Is there a way to ensure that the Secure Boot password prompt appears? How can I resolve the reboot loop issue when Secure Boot is disabled?

Any insights or alternative methods to install NVIDIA drivers on an N-series VM would be greatly appreciated!

Azure Virtual Machines
Azure Virtual Machines
An Azure service that is used to provision Windows and Linux virtual machines.
8,743 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Akshay kumar Mandha 2,915 Reputation points Microsoft External Staff
    2024-11-15T19:51:44.7333333+00:00

    Hi Nilesh Maharjan,
    Thanks for patience while we are reviewing your question.

    Based on the query, I understand your concern why it happens at that time it might be issue when secure boot not enabled properly, I suggest that they check if Secure Boot is enabled correctly. This may be the cause of the problem. If Secure Boot is enabled properly, they can try restarting the VM in the Azure portal.

    For more little bit details, during the installation of the NVIDIA GPU drivers, they will need to enroll the signing key because it belongs to a kernel module. When Secure Boot is enabled, they have to import the key, which is similar to the Machine Owner Key (MOK). After importing the MOK, they should reboot the VM

    For more information, please refer to the documents and forum below. If you're already familiar with them, that's great
    Ubuntu
    Installing the NVIDIA Driver
    How to install nvidia driver with secure boot enabled?

    For the reboot loop with Secure Boot disabled Restarting the VM from the Azure portal can sometimes help resolve this. If the reboot loop continues, this could indicate a configuration issue.

    Disclaimer: For third party links Microsoft is providing this information as a convenience to you. Microsoft does not control these sites and has not tested any software or information found on these sites; therefore, Microsoft cannot make any representations regarding the quality, safety, or suitability of any software or information found there

    If you need any further assistance or additional information, please don't hesitate to let us know. We're here to support you and are always ready to help with anything to provide support whenever you need it.


  2. Esraa Abdelmaksoud 0 Reputation points
    2025-05-04T06:09:46.1033333+00:00

    Problem Statement:
    I was getting the message NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running. whenever I ran nvidia-smi. The error was appearing because of a corrupted installation due to Secure Boot.

    When you try to turn off Secure Boot through the terminal, it won't be turned off. Also, you won't be prompted to reenter the password.

    Solution:

    1. Turn off Secure Boot from the Azure platform, not the terminal. (Your VM page => Settings => Configration => Uncheck "Enable secure boot")
    2. Remove your corrupted driver installations
         sudo apt remove --purge '^nvidia-.*'
         sudo apt autoremove
         sudo apt clean
      
    3. Install the driver after a sudo update.
         sudo apt update
         sudo apt install -y nvidia-driver-570
         sudo reboot
      
    4. Run nvidia-smi to see your properly installed driver details.
    +-----------------------------------------------------------------------------------------+
    | NVIDIA-SMI 570.133.07             Driver Version: 570.133.07     CUDA Version: 12.8     |
    |-----------------------------------------+------------------------+----------------------+
    | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
    |                                         |                        |               MIG M. |
    |=========================================+========================+======================|
    |   0  NVIDIA H100 NVL                Off |   00000001:00:00.0 Off |                    0 |
    | N/A   40C    P0             64W /  400W |       0MiB /  95830MiB |      0%      Default |
    |                                         |                        |             Disabled |
    +-----------------------------------------+------------------------+----------------------+
                                                                                             
    +-----------------------------------------------------------------------------------------+
    | Processes:                                                                              |
    |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
    |        ID   ID                                                               Usage      |
    |=========================================================================================|
    |  No running processes found                                                             |
    +-----------------------------------------------------------------------------------------+
    
    
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.