Databricks Cluster Start-up Fails on TEST Workspace – NPIP_TUNNEL_SETUP_FAILURE
Hello,
We are experiencing an issue with our Databricks TEST workspace where clusters fail to start. The error message indicates an NPIP tunnel setup failure.
All our workspaces (TEST, QA, and PROD) are configured with No Public IP access enabled and share identical network policies and topology. Each workspace uses private endpoint connections to a private DNS zone located within our corporate network.
The QA and PROD workspaces function without issues. However, the TEST workspace fails during cluster start-up with the following error:
{
"reason": {
"code": "NPIP_TUNNEL_SETUP_FAILURE",
"type": "SERVICE_FAULT",
"parameters": {
"databricks_error_message": "NPIP tunnel setup failure during launch. Please try again later and contact Databricks if the problem persists. \nInstance bootstrap failed command: WaitForNgrokTunnel\nFailure message (may be truncated): Timed out waiting for ngrok tunnel to be up"
}
},
"add_node_failure_details": {
"failure_count": 1,
"resource_type": "container",
"will_retry": false
}
}
Environment Details:
Region: switzerlandnorth
- NSGs: Associated with both public and private databricks subnets, allowing all required inbound and outbound traffic.
Outbound network access: FULL_ACCESS enabled.
No additional firewall is configured.
Troubleshooting Performed:
Recreated compute cluster.
Deleted all cluster instances, waited 30 minutes, and then created a new cluster.
Verified NSG rules and configuration profiles.
Compared network and workspace configurations with working QA and PROD environments.
Could you please investigate this issue and advise further?
Best regards, Maros