Microsoft.Extensions.AI libraries (Preview)

Article
2025-05-02

.NET developers need a way to integrate and interact with a growing variety of artificial intelligence (AI) services in their apps. The Microsoft.Extensions.AI libraries provide a unified approach for representing generative AI components, and enables seamless integration and interoperability with various AI services. This article introduces the libraries and provides in-depth usage examples to help you get started.

The packages

The 📦 Microsoft.Extensions.AI.Abstractions package provides the core exchange types: IChatClient and IEmbeddingGenerator<TInput,TEmbedding>. Any .NET library that provides an AI client can implement the IChatClient interface to enable seamless integration with consuming code.

The 📦 Microsoft.Extensions.AI package has an implicit dependency on the Microsoft.Extensions.AI.Abstractions package. This package enables you to easily integrate components such as telemetry and caching into your applications using familiar dependency injection and middleware patterns. For example, it provides the UseOpenTelemetry(ChatClientBuilder, ILoggerFactory, String, Action<OpenTelemetryChatClient>) extension method, which adds OpenTelemetry support to the chat client pipeline.

Which package to reference

Libraries that provide implementations of the abstractions typically reference only Microsoft.Extensions.AI.Abstractions.

To also have access to higher-level utilities for working with generative AI components, reference the Microsoft.Extensions.AI package instead (which itself references Microsoft.Extensions.AI.Abstractions). Most consuming applications and services should reference the Microsoft.Extensions.AI package along with one or more libraries that provide concrete implementations of the abstractions.

Install the packages

For information about how to install NuGet packages, see dotnet package add or Manage package dependencies in .NET applications.

API usage examples

The following subsections show specific IChatClient usage examples:

Request a chat response
Request a streaming chat response
Tool calling
Cache responses
Use telemetry
Provide options
Pipelines of functionality
Custom IChatClient middleware
Dependency injection
Stateless vs. stateful clients

The following sections show specific IEmbeddingGenerator usage examples:

Sample implementation
Create embeddings
Pipelines of functionality

The `IChatClient` interface

The IChatClient interface defines a client abstraction responsible for interacting with AI services that provide chat capabilities. It includes methods for sending and receiving messages with multi-modal content (such as text, images, and audio), either as a complete set or streamed incrementally. Additionally, it allows for retrieving strongly typed services provided by the client or its underlying services.

.NET libraries that provide clients for language models and services can provide an implementation of the IChatClient interface. Any consumers of the interface are then able to interoperate seamlessly with these models and services via the abstractions.

Request a chat response

With an instance of IChatClient, you can call the IChatClient.GetResponseAsync method to send a request and get a response. The request is composed of one or more messages, each of which is composed of one or more pieces of content. Accelerator methods exist to simplify common cases, such as constructing a request for a single piece of text content.

using Microsoft.Extensions.AI;

IChatClient client = new SampleChatClient(
    new Uri("http://coolsite.ai"), "target-ai-model");

Console.WriteLine(await client.GetResponseAsync("What is AI?"));

The core IChatClient.GetResponseAsync method accepts a list of messages. This list represents the history of all messages that are part of the conversation.

Console.WriteLine(await client.GetResponseAsync(
[
    new(ChatRole.System, "You are a helpful AI assistant"),
    new(ChatRole.User, "What is AI?"),
]));

The ChatResponse that's returned from GetResponseAsync exposes a list of ChatMessage instances that represent one or more messages generated as part of the operation. In common cases, there is only one response message, but in some situations, there can be multiple messages. The message list is ordered, such that the last message in the list represents the final message to the request. To provide all of those response messages back to the service in a subsequent request, you can add the messages from the response back into the messages list.

List<ChatMessage> history = [];
while (true)
{
    Console.Write("Q: ");
    history.Add(new(ChatRole.User, Console.ReadLine()));

    ChatResponse response = await client.GetResponseAsync(history);
    Console.WriteLine(response);

    history.AddMessages(response);
}

Request a streaming chat response

The inputs to IChatClient.GetStreamingResponseAsync are identical to those of GetResponseAsync. However, rather than returning the complete response as part of a ChatResponse object, the method returns an IAsyncEnumerable<T> where T is ChatResponseUpdate, providing a stream of updates that collectively form the single response.

await foreach (ChatResponseUpdate update in client.GetStreamingResponseAsync("What is AI?"))
{
    Console.Write(update);
}

Tip

Streaming APIs are nearly synonymous with AI user experiences. C# enables compelling scenarios with its IAsyncEnumerable<T> support, allowing for a natural and efficient way to stream data.

As with GetResponseAsync, you can add the updates from IChatClient.GetStreamingResponseAsync back into the messages list. Because the updates are individual pieces of a response, you can use helpers like ToChatResponse(IEnumerable<ChatResponseUpdate>) to compose one or more updates back into a single ChatResponse instance.

Helpers like AddMessages compose a ChatResponse and then extract the composed messages from the response and add them to a list.

List<ChatMessage> chatHistory = [];
while (true)
{
    Console.Write("Q: ");
    chatHistory.Add(new(ChatRole.User, Console.ReadLine()));

    List<ChatResponseUpdate> updates = [];
    await foreach (ChatResponseUpdate update in
        client.GetStreamingResponseAsync(history))
    {
        Console.Write(update);
    }
    Console.WriteLine();

    chatHistory.AddMessages(updates);
}

Tool calling

Some models and services support tool calling. To gather additional information, you can configure the ChatOptions with information about tools (usually .NET methods) that the model can request the client to invoke. Instead of sending a final response, the model requests a function invocation with specific arguments. The client then invokes the function and sends the results back to the model with the conversation history. The Microsoft.Extensions.AI library includes abstractions for various message content types, including function call requests and results. While IChatClient consumers can interact with this content directly, Microsoft.Extensions.AI automates these interactions pro. It provides the following types:

AIFunction: Represents a function that can be described to an AI model and invoked.
AIFunctionFactory: Provides factory methods for creating AIFunction instances that represent .NET methods.
FunctionInvokingChatClient: Wraps an IChatClient to add automatic function-invocation capabilities.

The following example demonstrates a random function invocation (this example depends on the 📦 Microsoft.Extensions.AI.Ollama NuGet package):

using Microsoft.Extensions.AI;

string GetCurrentWeather() => Random.Shared.NextDouble() > 0.5 ? "It's sunny" : "It's raining";

IChatClient client = new OllamaChatClient(new Uri("http://localhost:11434"), "llama3.1")
    .AsBuilder()
    .UseFunctionInvocation()
    .Build();

ChatOptions options = new() { Tools = [AIFunctionFactory.Create(GetCurrentWeather)] };

var response = client.GetStreamingResponseAsync("Should I wear a rain coat?", options);
await foreach (var update in response)
{
    Console.Write(update);
}

The preceding code:

Defines a function named GetCurrentWeather that returns a random weather forecast.
Instantiates a ChatClientBuilder with an OllamaChatClient and configures it to use function invocation.
Calls GetStreamingResponseAsync on the client, passing a prompt and a list of tools that includes a function created with Create.
Iterates over the response, printing each update to the console.

Cache responses

If you're familiar with Caching in .NET, it's good to know that Microsoft.Extensions.AI provides other such delegating IChatClient implementations. The DistributedCachingChatClient is an IChatClient that layers caching around another arbitrary IChatClient instance. When a novel chat history is submitted to the DistributedCachingChatClient, it forwards it to the underlying client and then caches the response before sending it back to the consumer. The next time the same history is submitted, such that a cached response can be found in the cache, the DistributedCachingChatClient returns the cached response rather than forwarding the request along the pipeline.

using Microsoft.Extensions.AI;
using Microsoft.Extensions.Caching.Distributed;
using Microsoft.Extensions.Caching.Memory;
using Microsoft.Extensions.Options;

var sampleChatClient = new OllamaChatClient(new Uri("http://localhost:11434"), "llama3.1");

IChatClient client = new ChatClientBuilder(sampleChatClient)
    .UseDistributedCache(new MemoryDistributedCache(
        Options.Create(new MemoryDistributedCacheOptions())))
    .Build();

string[] prompts = ["What is AI?", "What is .NET?", "What is AI?"];

foreach (var prompt in prompts)
{
    await foreach (var update in client.GetStreamingResponseAsync(prompt))
    {
        Console.Write(update);
    }
    Console.WriteLine();
}

This example depends on the 📦 Microsoft.Extensions.Caching.Memory NuGet package. For more information, see Caching in .NET.

Use telemetry

Another example of a delegating chat client is the OpenTelemetryChatClient. This implementation adheres to the OpenTelemetry Semantic Conventions for Generative AI systems. Similar to other IChatClient delegators, it layers metrics and spans around other arbitrary IChatClient implementations.

using Microsoft.Extensions.AI;
using OpenTelemetry.Trace;

// Configure OpenTelemetry exporter.
string sourceName = Guid.NewGuid().ToString();
TracerProvider tracerProvider = OpenTelemetry.Sdk.CreateTracerProviderBuilder()
    .AddSource(sourceName)
    .AddConsoleExporter()
    .Build();

var sampleChatClient = new SampleChatClient(
    new Uri("http://coolsite.ai"), "target-ai-model");

IChatClient client = new ChatClientBuilder(sampleChatClient)
    .UseOpenTelemetry(
        sourceName: sourceName,
        configure: c => c.EnableSensitiveData = true)
    .Build();

Console.WriteLine((await client.GetResponseAsync("What is AI?")).Text);

(The preceding example depends on the 📦 OpenTelemetry.Exporter.Console NuGet package.)

Alternatively, the LoggingChatClient and corresponding UseLogging(ChatClientBuilder, ILoggerFactory, Action<LoggingChatClient>) method provide a simple way to write log entries to an ILogger for every request and response.

Provide options

Every call to GetResponseAsync or GetStreamingResponseAsync can optionally supply a ChatOptions instance containing additional parameters for the operation. The most common parameters among AI models and services show up as strongly typed properties on the type, such as ChatOptions.Temperature. Other parameters can be supplied by name in a weakly typed manner, via the ChatOptions.AdditionalProperties dictionary.

You can also specify options when building an IChatClient with the fluent ChatClientBuilder API by chaining a call to the ConfigureOptions(ChatClientBuilder, Action<ChatOptions>) extension method. This delegating client wraps another client and invokes the supplied delegate to populate a ChatOptions instance for every call. For example, to ensure that the ChatOptions.ModelId property defaults to a particular model name, you can use code like the following:

using Microsoft.Extensions.AI;

IChatClient client = new OllamaChatClient(new Uri("http://localhost:11434"))
    .AsBuilder()
    .ConfigureOptions(options => options.ModelId ??= "phi3")
    .Build();

// Will request "phi3".
Console.WriteLine(await client.GetResponseAsync("What is AI?"));
// Will request "llama3.1".
Console.WriteLine(await client.GetResponseAsync("What is AI?", new() { ModelId = "llama3.1" }));

Functionality pipelines

IChatClient instances can be layered to create a pipeline of components that each add additional functionality. These components can come from Microsoft.Extensions.AI, other NuGet packages, or custom implementations. This approach allows you to augment the behavior of the IChatClient in various ways to meet your specific needs. Consider the following code snippet that layers a distributed cache, function invocation, and OpenTelemetry tracing around a sample chat client:

// Explore changing the order of the intermediate "Use" calls.
IChatClient client = new ChatClientBuilder(new OllamaChatClient(new Uri("http://localhost:11434"), "llama3.1"))
    .UseDistributedCache(new MemoryDistributedCache(Options.Create(new MemoryDistributedCacheOptions())))
    .UseFunctionInvocation()
    .UseOpenTelemetry(sourceName: sourceName, configure: c => c.EnableSensitiveData = true)
    .Build();

Custom `IChatClient` middleware

To add additional functionality, you can implement IChatClient directly or use the DelegatingChatClient class. This class serves as a base for creating chat clients that delegate operations to another IChatClient instance. It simplifies chaining multiple clients, allowing calls to pass through to an underlying client.

The DelegatingChatClient class provides default implementations for methods like GetResponseAsync, GetStreamingResponseAsync, and Dispose, which forward calls to the inner client. A derived class can then override only the methods it needs to augment the behavior, while delegating other calls to the base implementation. This approach is useful for creating flexible and modular chat clients that are easy to extend and compose.

The following is an example class derived from DelegatingChatClient that uses the System.Threading.RateLimiting library to provide rate-limiting functionality.

using Microsoft.Extensions.AI;
using System.Runtime.CompilerServices;
using System.Threading.RateLimiting;

public sealed class RateLimitingChatClient(
    IChatClient innerClient, RateLimiter rateLimiter)
        : DelegatingChatClient(innerClient)
{
    public override async Task<ChatResponse> GetResponseAsync(
        IEnumerable<ChatMessage> messages,
        ChatOptions? options = null,
        CancellationToken cancellationToken = default)
    {
        using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
            .ConfigureAwait(false);
        if (!lease.IsAcquired)
            throw new InvalidOperationException("Unable to acquire lease.");

        return await base.GetResponseAsync(messages, options, cancellationToken)
            .ConfigureAwait(false);
    }

    public override async IAsyncEnumerable<ChatResponseUpdate> GetStreamingResponseAsync(
        IEnumerable<ChatMessage> messages,
        ChatOptions? options = null,
        [EnumeratorCancellation] CancellationToken cancellationToken = default)
    {
        using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
            .ConfigureAwait(false);
        if (!lease.IsAcquired)
            throw new InvalidOperationException("Unable to acquire lease.");

        await foreach (var update in base.GetStreamingResponseAsync(messages, options, cancellationToken)
            .ConfigureAwait(false))
        {
            yield return update;
        }
    }

    protected override void Dispose(bool disposing)
    {
        if (disposing)
            rateLimiter.Dispose();

        base.Dispose(disposing);
    }
}

As with other IChatClient implementations, the RateLimitingChatClient can be composed:

using Microsoft.Extensions.AI;
using System.Threading.RateLimiting;

var client = new RateLimitingChatClient(
    new OllamaChatClient(new Uri("http://localhost:11434"), "llama3.1"),
    new ConcurrencyLimiter(new() { PermitLimit = 1, QueueLimit = int.MaxValue }));

Console.WriteLine(await client.GetResponseAsync("What color is the sky?"));

To simplify the composition of such components with others, component authors should create a Use* extension method for registering the component into a pipeline. For example, consider the following UseRatingLimiting extension method:

using Microsoft.Extensions.AI;
using System.Threading.RateLimiting;

public static class RateLimitingChatClientExtensions
{
    public static ChatClientBuilder UseRateLimiting(
        this ChatClientBuilder builder,
        RateLimiter rateLimiter) =>
        builder.Use(innerClient =>
            new RateLimitingChatClient(innerClient, rateLimiter)
        );
}

Such extensions can also query for relevant services from the DI container; the IServiceProvider used by the pipeline is passed in as an optional parameter:

using Microsoft.Extensions.AI;
using Microsoft.Extensions.DependencyInjection;
using System.Threading.RateLimiting;

public static class RateLimitingChatClientExtensions
{
    public static ChatClientBuilder UseRateLimiting(
        this ChatClientBuilder builder,
        RateLimiter? rateLimiter = null) =>
        builder.Use((innerClient, services) =>
            new RateLimitingChatClient(
                innerClient,
                services.GetRequiredService<RateLimiter>())
        );
}

Now it's easy for the consumer to use this in their pipeline, for example:

HostApplicationBuilder builder = Host.CreateApplicationBuilder(args);

builder.Services.AddChatClient(services =>
    new SampleChatClient(new Uri("http://localhost"), "test")
        .AsBuilder()
        .UseDistributedCache()
        .UseRateLimiting()
        .UseOpenTelemetry()
        .Build(services));

The previous extension methods demonstrate using a Use method on ChatClientBuilder. ChatClientBuilder also provides Use overloads that make it easier to write such delegating handlers. For example, in the earlier RateLimitingChatClient example, the overrides of GetResponseAsync and GetStreamingResponseAsync only need to do work before and after delegating to the next client in the pipeline. To achieve the same thing without writing a custom class, you can use an overload of Use that accepts a delegate that's used for both GetResponseAsync and GetStreamingResponseAsync, reducing the boilerplate required:

using Microsoft.Extensions.AI;
using System.Threading.RateLimiting;

RateLimiter rateLimiter = new ConcurrencyLimiter(new()
{
    PermitLimit = 1,
    QueueLimit = int.MaxValue
});

IChatClient client = new OllamaChatClient(new Uri("http://localhost:11434"), "llama3.1")
    .AsBuilder()
    .UseDistributedCache()
    .Use(async (messages, options, nextAsync, cancellationToken) =>
    {
        using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken).ConfigureAwait(false);
        if (!lease.IsAcquired)
            throw new InvalidOperationException("Unable to acquire lease.");

        await nextAsync(messages, options, cancellationToken);
    })
    .UseOpenTelemetry()
    .Build();

For scenarios where you need a different implementation for GetResponseAsync and GetStreamingResponseAsync in order to handle their unique return types, you can use the Use(Func<IEnumerable<ChatMessage>,ChatOptions,IChatClient,CancellationToken, Task<ChatResponse>>, Func<IEnumerable<ChatMessage>,ChatOptions, IChatClient,CancellationToken,IAsyncEnumerable<ChatResponseUpdate>>) overload that accepts a delegate for each.

Dependency injection

IChatClient implementations are often provided to an application via dependency injection (DI). In this example, an IDistributedCache is added into the DI container, as is an IChatClient. The registration for the IChatClient uses a builder that creates a pipeline containing a caching client (which then uses an IDistributedCache retrieved from DI) and the sample client. The injected IChatClient can be retrieved and used elsewhere in the app.

using Microsoft.Extensions.AI;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;

// App setup.
var builder = Host.CreateApplicationBuilder();
builder.Services.AddDistributedMemoryCache();
builder.Services.AddChatClient(new OllamaChatClient(new Uri("http://localhost:11434"), "llama3.1"))
    .UseDistributedCache();
var host = builder.Build();

// Elsewhere in the app.
var chatClient = host.Services.GetRequiredService<IChatClient>();
Console.WriteLine(await chatClient.GetResponseAsync("What is AI?"));

What instance and configuration is injected can differ based on the current needs of the application, and multiple pipelines can be injected with different keys.

Stateless vs. stateful clients

Stateless services require all relevant conversation history to be sent back on every request. In contrast, stateful services keep track of the history and require only additional messages to be sent with a request. The IChatClient interface is designed to handle both stateless and stateful AI services.

When working with a stateless service, callers maintain a list of all messages. They add in all received response messages and provide the list back on subsequent interactions.

List<ChatMessage> history = [];
while (true)
{
    Console.Write("Q: ");
    history.Add(new(ChatRole.User, Console.ReadLine()));

    var response = await client.GetResponseAsync(history);
    Console.WriteLine(response);

    history.AddMessages(response);
}

For stateful services, you might already know the identifier used for the relevant conversation. You can put that identifier into ChatOptions.ChatThreadId. Usage then follows the same pattern, except there's no need to maintain a history manually.

ChatOptions statefulOptions = new() { ChatThreadId = "my-conversation-id" };
while (true)
{
    Console.Write("Q: ");
    ChatMessage message = new(ChatRole.User, Console.ReadLine());

    Console.WriteLine(await client.GetResponseAsync(message, statefulOptions));
}

Some services might support automatically creating a thread ID for a request that doesn't have one. In such cases, you can transfer the ChatResponse.ChatThreadId over to the ChatOptions.ChatThreadId for subsequent requests. For example:

ChatOptions options = new();
while (true)
{
    Console.Write("Q: ");
    ChatMessage message = new(ChatRole.User, Console.ReadLine());

    ChatResponse response = await client.GetResponseAsync(message, options);
    Console.WriteLine(response);

    options.ChatThreadId = response.ChatThreadId;
}

If you don't know ahead of time whether the service is stateless or stateful, you can check the response ChatThreadId and act based on its value. If it's set, then that value is propagated to the options and the history is cleared so as to not resend the same history again. If the response ChatThreadId isn't set, then the response message is added to the history so that it's sent back to the service on the next turn.

List<ChatMessage> chatHistory = [];
ChatOptions chatOptions = new();
while (true)
{
    Console.Write("Q: ");
    chatHistory.Add(new(ChatRole.User, Console.ReadLine()));

    ChatResponse response = await client.GetResponseAsync(chatHistory);
    Console.WriteLine(response);

    chatOptions.ChatThreadId = response.ChatThreadId;
    if (response.ChatThreadId is not null)
    {
        chatHistory.Clear();
    }
    else
    {
        chatHistory.AddMessages(response);
    }
}

The `IEmbeddingGenerator` interface

The IEmbeddingGenerator<TInput,TEmbedding> interface represents a generic generator of embeddings. Here, TInput is the type of input values being embedded, and TEmbedding is the type of generated embedding, which inherits from the Embedding class.

The Embedding class serves as a base class for embeddings generated by an IEmbeddingGenerator. It's designed to store and manage the metadata and data associated with embeddings. Derived types, like Embedding<T>, provide the concrete embedding vector data. For example, an Embedding<float> exposes a ReadOnlyMemory<float> Vector { get; } property for access to its embedding data.

The IEmbeddingGenerator interface defines a method to asynchronously generate embeddings for a collection of input values, with optional configuration and cancellation support. It also provides metadata describing the generator and allows for the retrieval of strongly typed services that can be provided by the generator or its underlying services.

Sample implementation

The following sample implementation of IEmbeddingGenerator shows the general structure.

using Microsoft.Extensions.AI;

public sealed class SampleEmbeddingGenerator(
    Uri endpoint, string modelId)
        : IEmbeddingGenerator<string, Embedding<float>>
{
    private readonly EmbeddingGeneratorMetadata _metadata =
        new("SampleEmbeddingGenerator", endpoint, modelId);

    public async Task<GeneratedEmbeddings<Embedding<float>>> GenerateAsync(
        IEnumerable<string> values,
        EmbeddingGenerationOptions? options = null,
        CancellationToken cancellationToken = default)
    {
        // Simulate some async operation.
        await Task.Delay(100, cancellationToken);

        // Create random embeddings.
        return new GeneratedEmbeddings<Embedding<float>>(
            from value in values
            select new Embedding<float>(
                Enumerable.Range(0, 384).Select(_ => Random.Shared.NextSingle()).ToArray()));
    }

    public object? GetService(Type serviceType, object? serviceKey) =>
        serviceKey is not null
        ? null
        : serviceType == typeof(EmbeddingGeneratorMetadata)
            ? _metadata
            : serviceType?.IsInstanceOfType(this) is true
                ? this
                : null;

    void IDisposable.Dispose() { }
}

The preceding code:

Defines a class named SampleEmbeddingGenerator that implements the IEmbeddingGenerator<string, Embedding<float>> interface.
Has a primary constructor that accepts an endpoint and model ID, which are used to identify the generator.
Implements the GenerateAsync method to generate embeddings for a collection of input values.

The sample implementation just generates random embedding vectors. You can find actual concrete implementations in the following packages:

Create embeddings

The primary operation performed with an IEmbeddingGenerator<TInput,TEmbedding> is embedding generation, which is accomplished with its GenerateAsync method.

using Microsoft.Extensions.AI;

IEmbeddingGenerator<string, Embedding<float>> generator =
    new SampleEmbeddingGenerator(
        new Uri("http://coolsite.ai"), "target-ai-model");

foreach (Embedding<float> embedding in
    await generator.GenerateAsync(["What is AI?", "What is .NET?"]))
{
    Console.WriteLine(string.Join(", ", embedding.Vector.ToArray()));
}

Accelerator extension methods also exist to simplify common cases, such as generating an embedding vector from a single input.

ReadOnlyMemory<float> vector = await generator.GenerateEmbeddingVectorAsync("What is AI?");

Pipelines of functionality

As with IChatClient, IEmbeddingGenerator implementations can be layered. Microsoft.Extensions.AI provides a delegating implementation for IEmbeddingGenerator for caching and telemetry.

using Microsoft.Extensions.AI;
using Microsoft.Extensions.Caching.Distributed;
using Microsoft.Extensions.Caching.Memory;
using Microsoft.Extensions.Options;
using OpenTelemetry.Trace;

// Configure OpenTelemetry exporter
string sourceName = Guid.NewGuid().ToString();
TracerProvider tracerProvider = OpenTelemetry.Sdk.CreateTracerProviderBuilder()
    .AddSource(sourceName)
    .AddConsoleExporter()
    .Build();

// Explore changing the order of the intermediate "Use" calls to see
// what impact that has on what gets cached and traced.
IEmbeddingGenerator<string, Embedding<float>> generator = new EmbeddingGeneratorBuilder<string, Embedding<float>>(
        new SampleEmbeddingGenerator(new Uri("http://coolsite.ai"), "target-ai-model"))
    .UseDistributedCache(
        new MemoryDistributedCache(
            Options.Create(new MemoryDistributedCacheOptions())))
    .UseOpenTelemetry(sourceName: sourceName)
    .Build();

GeneratedEmbeddings<Embedding<float>> embeddings = await generator.GenerateAsync(
[
    "What is AI?",
    "What is .NET?",
    "What is AI?"
]);

foreach (Embedding<float> embedding in embeddings)
{
    Console.WriteLine(string.Join(", ", embedding.Vector.ToArray()));
}

The IEmbeddingGenerator enables building custom middleware that extends the functionality of an IEmbeddingGenerator. The DelegatingEmbeddingGenerator<TInput,TEmbedding> class is an implementation of the IEmbeddingGenerator<TInput, TEmbedding> interface that serves as a base class for creating embedding generators that delegate their operations to another IEmbeddingGenerator<TInput, TEmbedding> instance. It allows for chaining multiple generators in any order, passing calls through to an underlying generator. The class provides default implementations for methods such as GenerateAsync and Dispose, which forward the calls to the inner generator instance, enabling flexible and modular embedding generation.

The following is an example implementation of such a delegating embedding generator that rate-limits embedding generation requests:

using Microsoft.Extensions.AI;
using System.Threading.RateLimiting;

public class RateLimitingEmbeddingGenerator(
    IEmbeddingGenerator<string, Embedding<float>> innerGenerator, RateLimiter rateLimiter)
        : DelegatingEmbeddingGenerator<string, Embedding<float>>(innerGenerator)
{
    public override async Task<GeneratedEmbeddings<Embedding<float>>> GenerateAsync(
        IEnumerable<string> values,
        EmbeddingGenerationOptions? options = null,
        CancellationToken cancellationToken = default)
    {
        using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
            .ConfigureAwait(false);

        if (!lease.IsAcquired)
        {
            throw new InvalidOperationException("Unable to acquire lease.");
        }

        return await base.GenerateAsync(values, options, cancellationToken);
    }

    protected override void Dispose(bool disposing)
    {
        if (disposing)
        {
            rateLimiter.Dispose();
        }

        base.Dispose(disposing);
    }
}

This can then be layered around an arbitrary IEmbeddingGenerator<string, Embedding<float>> to rate limit all embedding generation operations.

using Microsoft.Extensions.AI;
using System.Threading.RateLimiting;

IEmbeddingGenerator<string, Embedding<float>> generator =
    new RateLimitingEmbeddingGenerator(
        new SampleEmbeddingGenerator(new Uri("http://coolsite.ai"), "target-ai-model"),
        new ConcurrencyLimiter(new()
        {
            PermitLimit = 1,
            QueueLimit = int.MaxValue
        }));

foreach (Embedding<float> embedding in
    await generator.GenerateAsync(["What is AI?", "What is .NET?"]))
{
    Console.WriteLine(string.Join(", ", embedding.Vector.ToArray()));
}

In this way, the RateLimitingEmbeddingGenerator can be composed with other IEmbeddingGenerator<string, Embedding<float>> instances to provide rate-limiting functionality.

Build with Microsoft.Extensions.AI

You can start building with Microsoft.Extensions.AI in the following ways:

Library developers: If you own libraries that provide clients for AI services, consider implementing the interfaces in your libraries. This allows users to easily integrate your NuGet package via the abstractions.
Service consumers: If you're developing libraries that consume AI services, use the abstractions instead of hardcoding to a specific AI service. This approach gives your consumers the flexibility to choose their preferred service.
Application developers: Use the abstractions to simplify integration into your apps. This enables portability across models and services, facilitates testing and mocking, leverages middleware provided by the ecosystem, and maintains a consistent API throughout your app, even if you use different services in different parts of your application.
Ecosystem contributors: If you're interested in contributing to the ecosystem, consider writing custom middleware components.

For more samples, see the dotnet/ai-samples GitHub repository. For an end-to-end sample, see eShopSupport.

Share via