Share via


Query route optimized serving endpoints

This article describes how to fetch the appropriate authentication credentials and URL so you can query your route optimized model serving or feature serving endpoint.

Requirements

  • A model serving endpoint or feature serving endpoint that has route optimization enabled, see Route optimization on serving endpoints.
  • Authentication token. Route optimized endpoints only support OAuth tokens. PAT tokens are not supported.

Fetch the route optimized URL

When you create a route optimized endpoint, a unique route optimized URL is created for your endpoint. Route optimized endpoints can only be queried using their dedicated URL. The format of the URL is as follows:

https://<unique-id>.<shard>.serving.azuredatabricks.net/<workspace-id>/serving-endpoints/<endpoint-name>/invocations

You can get this URL from either of the following:

  • Using the GET /api/2.0/serving-endpoints/{name} API call. The URL is present in the response object of the endpoint as endpoint_url. This field is only populated if the endpoint is route optimized.

  • The serving endpoints details page in the Serving UI.

Route optimized endpoint URL

Fetch an OAuth token and query the endpoint

To query your route optimized endpoint you must use an OAuth token. Databricks recommends using service principals in your production applications to fetch OAuth tokens programmatically. The following sections describes recommended guidance on how to fetch an OAuth token for test and production scenarios.

Fetch an OAuth token using the Serving UI

The following steps show how to fetch a token in the Serving UI. These steps are recommended for development and testing your endpoint.

For production use, like using your route optimized endpoint in an application, your token is fetched using a service principal. See Fetch an OAuth token programmatically for recommended guidance for fetching your OAuth token for production use cases.

From the Serving UI of your workspace:

  1. On the Serving endpoints page, select your route optimized endpoint to see endpoint details.
  2. On the endpoint details page, select the Use button.
  3. Select the Fetch Token tab.
  4. Select Fetch OAuth Token button. This token is valid for 1 hour. Fetch a new token if your current token expires.

After you fetch the OAuth token, query your endpoint using your endpoint URL and OAuth token.

REST API

The following is a REST API example:


URL="<endpoint-url>"
OAUTH_TOKEN="<token>"

curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OAUTH_TOKEN" \
  --data "@data.json" \
  "$URL"

Python

The following is a Python example:


import requests
import json

url = "<url>"
oauth_token = "<token>"

data = {
    "dataframe_split": {
        "columns": ["feature_1", "feature_2"],
        "data": [
            [0.12, 0.34],
            [0.56, 0.78],
            [0.90, 0.11]
        ]
    }
}

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {oauth_token}"
}

response = requests.post(url, headers=headers, json=data)

# Print the response
print("Status Code:", response.status_code)
print("Response Body:", response.text)

Fetch an OAuth token programmatically

For production scenarios, Databricks recommends setting up service principals to embed within your application to programmatically fetch OAuth tokens. These fetched tokens are used to query route optimized endpoints.

Follow the steps in Authorize unattended access to Azure Databricks resources with a service principal using OAuth through step 2 to create your service principal, assign permissions and create an OAuth secret for your service principal. After your service principal is created, you must give the service principal at least Query permission on the endpoint. See Manage permissions on your model serving endpoint.

The Databricks Python SDK provides an API to directly query a route optimized endpoint.

Note

The Databricks SDK is also available in Go, see Databricks SDK for Go.

The example requires the following to query a route optimized endpoint using the Databricks SDK:

  • Serving endpoint name (the SDK fetches the correct endpoint URL based on this name)
  • Service principal client ID
  • Service principal secret
  • Workspace hostname

The following is a query example:

from databricks.sdk import WorkspaceClient
import databricks.sdk.core as client

endpoint_name = "<Serving-Endpoint-Name>" ## Insert the endpoint name here

# Initialize Databricks SDK
c = client.Config(
    host="<Workspace-Host>", ## For example, my-workspace.cloud.databricks.com
    client_id="<Client-Id>", ## Service principal ID
    client_secret="<Secret>"   ## Service principal secret
)
w = WorkspaceClient(
    config = c
)

response = w.serving_endpoints_data_plane.query(endpoint_name, dataframe_records = ....)

Fetch an OAuth token manually

For scenarios where the Databricks SDK or the Serving UI can not be used to fetch your OAuth token, you can manually fetch an OAuth token. The guidance in this section mainly applies to scenarios where users have a customized client that they want to use for querying the endpoint in production.

When you fetch an OAuth token manually, you must specify authorization_details in the request.

  • Construct the <token-endpoint-URL> by replacing https://<databricks-instance> with the workspace URL of your Databricks deployment. For example, https://<databricks-instance>/oidc/v1/token.
  • Replace <client-id> with the service principal’s client ID, which is also known as an application ID.
  • Replace <client-secret> with the service principal’s OAuth secret that you created.
  • Replace <endpoint-id> with the endpoint ID of the route optimized endpoint. This is the alpha-numeric ID of the endpoint that you can find in the hostName of the endpoint URL.
  • Replace <action> with the action permission given to the service principal. The action can be query_inference_endpoint or manage_inference_endpoint.

REST API

The following is a REST API example:



export CLIENT_ID=<client-id>
export CLIENT_SECRET=<client-secret>
export ENDPOINT_ID=<endpoint-id>
export ACTION=<action>

curl --request POST \
--url <token-endpoint-URL> \
--user "$CLIENT_ID:$CLIENT_SECRET" \
--data 'grant_type=client_credentials&scope=all-apis'
--data-urlencode 'authorization_details=[{"type":"workspace_permission","object_type":"serving-endpoints","object_path":"'"/serving-endpoints/$ENDPOINT_ID"'","actions": ["'"$ACTION"'"]}]'

Python

import os import requests

# Set your environment variables or replace them directly here
CLIENT_ID = os.getenv("CLIENT_ID")
CLIENT_SECRET = os.getenv("CLIENT_SECRET")
ENDPOINT_ID = os.getenv("ENDPOINT_ID")
ACTION = "query_inference_endpoint" # Can also be `manage_inference_endpoint`

# Token endpoint URL
TOKEN_URL = "<token-endpoint-URL>"

# Build the payload, note the creation of authorization_details
payload = { 'grant_type': 'client_credentials', 'scope': 'all-apis', 'authorization_details': f'''[{{"type":"workspace_permission","object_type":"serving-endpoints","object_path":"/serving-endpoints/{ENDPOINT_ID}","actions":["{ACTION}"]}}]''' }

# Make the POST request with basic auth
response = requests.post( TOKEN_URL, auth=(CLIENT_ID, CLIENT_SECRET), data=payload )

# Check the response
if response.ok:
  token_response = response.json()
  access_token = token_response.get("access_token")
  if access_token:
    print(f"Access Token: {access_token}")
  else:
    print("access_token not found in response.")
else: print(f"Failed to fetch token: {response.status_code} {response.text}")

After you fetch the OAuth token, query your endpoint using your endpoint URL and OAuth token.

REST API

The following is a REST API example:


URL="<endpoint-url>"
OAUTH_TOKEN="<token>"

curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OAUTH_TOKEN" \
  --data "@data.json" \
  "$URL"

Python

The following is a Python example:


import requests
import json

url = "<url>"
oauth_token = "<token>"

data = {
    "dataframe_split": {
        "columns": ["feature_1", "feature_2"],
        "data": [
            [0.12, 0.34],
            [0.56, 0.78],
            [0.90, 0.11]
        ]
    }
}

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {oauth_token}"
}

response = requests.post(url, headers=headers, json=data)

# Print the response
print("Status Code:", response.status_code)
print("Response Body:", response.text)