Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Warning
The Semantic Kernel Vector Store functionality is in preview, and improvements that require breaking changes may still occur in limited circumstances before release.
Semantic Kernel Vector Store connectors support multiple ways of generating embeddings.
Embeddings can be generated by the developer and passed as part of a record when using a VectorStoreRecordCollection
or can be generated internally to the VectorStoreRecordCollection
.
Letting the Vector Store generate embeddings
You can configure an embedding generator on your vector store, allowing embeddings to be automatically generated during both upsert and search operations, eliminating the need for manual preprocessing.
To enable generating vectors automatically on upsert, the vector property on your data model is defined as the source type, e.g. string but still decorated with a VectorStoreVectorPropertyAttribute
.
[VectorStoreRecordVector(1536)]
public string Embedding { get; set; }
Before upsert, the Embedding
property should contain the string from which a vector should be generated. The type of the vector stored in the database (e.g. float32, float16, etc.) will be derived from the configured embedding generator.
Important
These vector properties do not support retrieving either the generated vector or the original text that the vector was generated from. They also do not store the original text. If the original text needs to be stored, a separate Data property should be added to store it.
Embedding generators implementing the Microsoft.Extensions.AI
abstractions are supported and can be configured at various levels:
On the Vector Store: You can set a default embedding generator for the entire vector store. This generator will be used for all collections and properties unless overridden.
using Microsoft.Extensions.AI; using Microsoft.SemanticKernel.Connectors.Qdrant; using OpenAI; using Qdrant.Client; var embeddingGenerator = new OpenAIClient("your key") .GetEmbeddingClient("your chosen model") .AsIEmbeddingGenerator(); var vectorStore = new QdrantVectorStore(new QdrantClient("localhost"), new QdrantVectorStoreOptions { EmbeddingGenerator = embeddingGenerator });
On a Collection: You can configure an embedding generator for a specific collection, overriding the store-level generator.
using Microsoft.Extensions.AI; using Microsoft.SemanticKernel.Connectors.Qdrant; using OpenAI; using Qdrant.Client; var embeddingGenerator = new OpenAIClient("your key") .GetEmbeddingClient("your chosen model") .AsIEmbeddingGenerator(); var collectionOptions = new QdrantVectorStoreRecordCollectionOptions<MyRecord> { EmbeddingGenerator = embeddingGenerator }; var collection = new QdrantVectorStoreRecordCollection<ulong, MyRecord>(new QdrantClient("localhost"), "myCollection", collectionOptions);
On a Record Definition: When defining properties programmatically using
VectorStoreRecordDefinition
, you can specify an embedding generator for all properties.using Microsoft.Extensions.AI; using Microsoft.Extensions.VectorData; using Microsoft.SemanticKernel.Connectors.Qdrant; using OpenAI; using Qdrant.Client; var embeddingGenerator = new OpenAIClient("your key") .GetEmbeddingClient("your chosen model") .AsIEmbeddingGenerator(); var recordDefinition = new VectorStoreRecordDefinition { EmbeddingGenerator = embeddingGenerator, Properties = new List<VectorStoreRecordProperty> { new VectorStoreRecordKeyProperty("Key", typeof(ulong)), new VectorStoreRecordVectorProperty("DescriptionEmbedding", typeof(string), dimensions: 1536) } }; var collectionOptions = new QdrantVectorStoreRecordCollectionOptions<MyRecord> { VectorStoreRecordDefinition = recordDefinition }; var collection = new QdrantVectorStoreRecordCollection<ulong, MyRecord>(new QdrantClient("localhost"), "myCollection", collectionOptions);
On a Vector Property Definition: When defining properties programmatically, you can set an embedding generator directly on the property.
using Microsoft.Extensions.AI; using Microsoft.Extensions.VectorData; using OpenAI; var embeddingGenerator = new OpenAIClient("your key") .GetEmbeddingClient("your chosen model") .AsIEmbeddingGenerator(); var vectorProperty = new VectorStoreRecordVectorProperty("DescriptionEmbedding", typeof(string), dimensions: 1536) { EmbeddingGenerator = embeddingGenerator };
Example Usage
The following example demonstrates how to use the embedding generator to automatically generate vectors during both upsert and search operations. This approach simplifies workflows by eliminating the need to precompute embeddings manually.
// The data model
internal class FinanceInfo
{
[VectorStoreRecordKey]
public string Key { get; set; } = string.Empty;
[VectorStoreRecordData]
public string Text { get; set; } = string.Empty;
// Note that the vector property is typed as a string, and
// its value is derived from the Text property. The string
// value will however be converted to a vector on upsert and
// stored in the database as a vector.
[VectorStoreRecordVector(1536)]
public string Embedding => this.Text;
}
// Create an OpenAI embedding generator.
var embeddingGenerator = new OpenAIClient("your key")
.GetEmbeddingClient("your chosen model")
.AsIEmbeddingGenerator();
// Use the embedding generator with the vector store.
var vectorStore = new InMemoryVectorStore(new() { EmbeddingGenerator = embeddingGenerator });
var collection = vectorStore.GetCollection<string, FinanceInfo>("finances");
await collection.CreateCollectionAsync();
// Create some test data.
string[] budgetInfo =
{
"The budget for 2020 is EUR 100 000",
"The budget for 2021 is EUR 120 000",
"The budget for 2022 is EUR 150 000",
"The budget for 2023 is EUR 200 000",
"The budget for 2024 is EUR 364 000"
};
// Embeddings are generated automatically on upsert.
var records = budgetInfo.Select((input, index) => new FinanceInfo { Key = index.ToString(), Text = input });
await collection.UpsertAsync(records);
// Embeddings for the search is automatically generated on search.
var searchResult = collection.SearchAsync(
"What is my budget for 2024?",
top: 1);
// Output the matching result.
await foreach (var result in searchResult)
{
Console.WriteLine($"Key: {result.Record.Key}, Text: {result.Record.Text}");
}
Generating embeddings yourself
Constructing an embedding generator
See Embedding Generation for examples on how to construct Semantic Kernel ITextEmbeddingGenerationService
instances.
See Microsoft.Extensions.AI.Abstractions for information on how to construct Microsoft.Extensions.AI
embedding generation services.
Generating embeddings on upsert with Semantic Kernel ITextEmbeddingGenerationService
public async Task GenerateEmbeddingsAndUpsertAsync(
ITextEmbeddingGenerationService textEmbeddingGenerationService,
IVectorStoreRecordCollection<ulong, Hotel> collection)
{
// Upsert a record.
string descriptionText = "A place where everyone can be happy.";
ulong hotelId = 1;
// Generate the embedding.
ReadOnlyMemory<float> embedding =
await textEmbeddingGenerationService.GenerateEmbeddingAsync(descriptionText);
// Create a record and upsert with the already generated embedding.
await collection.UpsertAsync(new Hotel
{
HotelId = hotelId,
HotelName = "Hotel Happy",
Description = descriptionText,
DescriptionEmbedding = embedding,
Tags = new[] { "luxury", "pool" }
});
}
Generating embeddings on search with Semantic Kernel ITextEmbeddingGenerationService
public async Task GenerateEmbeddingsAndSearchAsync(
ITextEmbeddingGenerationService textEmbeddingGenerationService,
IVectorStoreRecordCollection<ulong, Hotel> collection)
{
// Upsert a record.
string descriptionText = "Find me a hotel with happiness in mind.";
// Generate the embedding.
ReadOnlyMemory<float> searchEmbedding =
await textEmbeddingGenerationService.GenerateEmbeddingAsync(descriptionText);
// Search using the already generated embedding.
IAsyncEnumerable<VectorSearchResult<Hotel>> searchResult = collection.SearchEmbeddingAsync(searchEmbedding, top: 1);
List<VectorSearchResult<Hotel>> resultItems = await searchResult.ToListAsync();
// Print the first search result.
Console.WriteLine("Score for first result: " + resultItems.FirstOrDefault()?.Score);
Console.WriteLine("Hotel description for first result: " + resultItems.FirstOrDefault()?.Record.Description);
}
Tip
For more information on generating embeddings, refer to Embedding generation in Semantic Kernel.
Embedding dimensions
Vector databases typically require you to specify the number of dimensions that each vector has when creating the collection.
Different embedding models typically support generating vectors with various dimension sizes. E.g., OpenAI text-embedding-ada-002
generates vectors with 1536 dimensions. Some models also allow a developer to choose the number of dimensions they want in the
output vector. For example, Google text-embedding-004
produces vectors with 768 dimensions by default, but allows a developer to
choose any number of dimensions between 1 and 768.
It is important to ensure that the vectors generated by the embedding model have the same number of dimensions as the matching vector in the database.
If creating a collection using the Semantic Kernel Vector Store abstractions, you need to specify the number of dimensions required for each vector property either via annotations or via the record definition. Here are examples of both setting the number of dimensions to 1536.
[VectorStoreRecordVector(Dimensions: 1536)]
public ReadOnlyMemory<float>? DescriptionEmbedding { get; set; }
new VectorStoreRecordVectorProperty("DescriptionEmbedding", typeof(float), dimensions: 1536);
Tip
For more information on how to annotate your data model, refer to defining your data model.
Tip
For more information on creating a record definition, refer to defining your schema with a record definition.
Coming soon
More info coming soon.
Coming soon
More info coming soon.