Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article describes the state-of-the-art open models that are supported by the Databricks Foundation Model APIs in pay-per-token mode.
Note
See Foundation Model APIs limits for the pay-per-token models only supported in US regions.
You can send query requests to these models using the pay-per-token endpoints available in your Databricks workspace. See Use foundation models and pay-per-token supported models table for the names of the model endpoints to use.
In addition to supporting models in pay-per-token mode, Foundation Model APIs also offers provisioned throughput mode. Databricks recommends provisioned throughput for production workloads. This mode supports all models of a model architecture family (for example, DBRX models), including the fine-tuned and custom pre-trained models supported in pay-per-token mode. See Provisioned throughput Foundation Model APIs for the list of supported architectures.
You can interact with these supported models using the AI Playground.
Meta Llama 4 Maverick
Important
See Applicable model developer licenses and terms for the Llama 4 Community License and Acceptable Use Policy.
Llama 4 Maverick is a state-of-the-art large language model built and trained by Meta. It is the first of the Llama model family to use a mixture of experts architecture for compute efficiency. Llama 4 Maverick supports multiple languages and is optimized for precise image and text understanding use cases. Currently, Databricks support of Llama 4 Maverick is limited to text understanding use cases. Learn more about Llama 4 Maverick.
As with other large language models, Llama 4 output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.
Meta Llama 3.3 70B Instruct
Important
Starting December 11, 2024, Meta-Llama-3.3-70B-Instruct replaces support for Meta-Llama-3.1-70B-Instruct in Foundation Model APIs pay-per-token endpoints.
See Applicable model developer licenses and terms for the LLama 3.3 Community License and Acceptable Use Policy.
Meta-Llama-3.3-70B-Instruct is a state-of-the-art large language model with a context of 128,000 tokens that was built and trained by Meta. The model supports multiple languages and is optimized for dialogue use cases. Learn more about the Meta Llama 3.3.
Similar to other large language models, Llama-3’s output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.
Meta Llama 3.1 405B Instruct
Important
The use of this model with Foundation Model APIs is in Public Preview. Reach out to your Databricks account team if you encounter endpoint failures or stabilization errors when using this model.
See Applicable model developer licenses and terms for the Llama 3.1 Community License and Acceptable Use Policy.
Meta-Llama-3.1-405B-Instruct is the largest openly available state-of-the-art large language model, built and trained by Meta, and is distributed by Azure Machine Learning using the AzureML Model Catalog. The use of this model enables customers to unlock new capabilities, such as advanced, multi-step reasoning and high-quality synthetic data generation. This model is competitive with GPT-4-Turbo in terms of quality.
Like Meta-Llama-3.1-70B-Instruct, this model has a context of 128,000 tokens and support across ten languages. It aligns with human preferences for helpfulness and safety, and is optimized for dialogue use cases. Learn more about the Meta Llama 3.1 models.
Similar to other large language models, Llama-3.1’s output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.
Meta Llama 3.1 8B Instruct
Important
Customers are responsible for ensuring their compliance with the terms of Anthropic's Acceptable Use Policy.
Meta-Llama-3.1-8B-Instruct is a state-of-the-art large language model with a context of 128,000 tokens that was built and trained by Meta. The model supports multiple languages and is optimized for dialogue use cases. Learn more about the Meta Llama 3.1.
Similar to other large language models, Llama-3’s output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.
Anthropic Claude 3.7 Sonnet
Important
Customers are responsible for ensuring their compliance with the terms of Anthropic's Acceptable Use Policy. See also the Databricks Master Cloud Services Agreement.
Claude 3.7 Sonnet is a state-of-the-art, hybrid reasoning model built and trained by Anthropic. It is a large language model and reasoning model that is able to rapidly respond or extend its reasoning based on the complexity of the task. When in extended thinking mode, Claude 3.7 Sonnet's reasoning steps are visible to the user. Claude 3.7 Sonnet is optimized for various tasks such as code generation, mathematical reasoning and instruction following.
As with other large language models, Claude 3.7 output may omit some facts and occasionally produce false information. Databricks recommends using retrieval augmented generation (RAG) in scenarios where accuracy is especially important.
This endpoint is hosted by Databricks Inc. in AWS within the Databricks security perimeter.
GTE Large (En)
Important
GTE Large (En) is provided under and subject to the Apache 2.0 License, Copyright (c) The Apache Software Foundation, All rights reserved. Customers are responsible for ensuring compliance with applicable model licenses.
General Text Embedding (GTE) is a text embedding model that can map any text to a 1024-dimension embedding vector and an embedding window of 8192 tokens. These vectors can be used in vector indexes for LLMs, and for tasks like retrieval, classification, question-answering, clustering, or semantic search. This endpoint serves the English version of the model and does not generate normalized embeddings.
Embedding models are especially effective when used in tandem with LLMs for retrieval augmented generation (RAG) use cases. GTE can be used to find relevant text snippets in large chunks of documents that can be used in the context of an LLM.
BGE Large (En)
BAAI General Embedding (BGE) is a text embedding model that can map any text to a 1024-dimension embedding vector and an embedding window of 512 tokens. These vectors can be used in vector indexes for LLMs, and for tasks like retrieval, classification, question-answering, clustering, or semantic search. This endpoint serves the English version of the model and generates normalized embeddings.
Embedding models are especially effective when used in tandem with LLMs for retrieval augmented generation (RAG) use cases. BGE can be used to find relevant text snippets in large chunks of documents that can be used in the context of an LLM.
In RAG applications, you may be able to improve the performance of your retrieval system by including an instruction parameter. The BGE authors recommend trying the instruction "Represent this sentence for searching relevant passages:"
for query embeddings, though its performance impact is domain dependent.