A node in a pool is stuck in rebooting state since last 4 days

Tanmay Sharma 0 Reputation points
2025-03-24T08:38:46.6766667+00:00

A node (tvmps_8f4c7b22047eb71169466eb201a4437548f517ea99832a76a7dc3a564307646c_d) in a pool called OSAthresholdS_v2 is stuck since 19th March. We did a RDP in the machine and see no data for the task it is trying to run. We have also disabled the job which the node had taken just remove possible dependency on the node but still the node is stuck and is not rebooting. The deallocation of the node from the pool also does not work, this is really a frustrating thing. Following are few things we already did just to save time of the support engineer to:

  • Remove the Node from the Pool
  • Check for Configuration Issues
  • Reboot the Node
Azure Batch
Azure Batch
An Azure service that provides cloud-scale job scheduling and compute management.
368 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Geethasri.V 65 Reputation points Microsoft External Staff
    2025-03-27T13:05:19.7866667+00:00

    Hi Tanmay Sharma,

    Since the node is stuck in the "rebooting" state and cannot be reimaged or deleted due to auto-scaling, try these steps:

    Disable Auto-Scaling Temporarily:

    az batch pool autoscale disable --pool-id 
    OSAthresholdS_v2
    

    After disabling auto-scale, try deleting the node:

    az batch node delete --pool-id OSAthresholdS_v2 --node-id tvmps_8f4c7b22047eb71169466eb201a4437548f517ea99832a76a7dc3a564307646c_d 
    

    Once removed, re-enable auto-scaling if needed.

    Manually Resize the Pool (Alternative to Deletion):

    az batch pool resize --pool-id OSAthresholdS_v2 --target-dedicated-nodes <new_count> 
    

    This forces Azure to replace the problematic node.

    Use Azure REST API for Forceful Removal: If CLI deletion still fails, try using the Batch Node Removal API: https://learn.microsoft.com/en-us/rest/api/batchservice/pool/remove-nodes?view=rest-batchservice-2024-07-01&tabs=HTTP

    Refer to the below document:

    Disable Auto-Scaling: https://learn.microsoft.com/en-us/azure/batch/batch-automatic-scaling#disable-autoscale

    If you have any further queries, please let us know we are glad to help you.

    If it was helpful, please click "Upvote" on this post to let us know.

    Thank You.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.