Use this article to learn how to set up the requirements for starting with custom text classification and create a project.
Prerequisites
Before you start using custom text classification, you will need:
Create a Language resource
Before you start using custom text classification, you will need an Azure AI Language resource. It is recommended to create your Language resource and connect a storage account to it in the Azure portal. Creating a resource in the Azure portal lets you create an Azure storage account at the same time, with all of the required permissions pre-configured. You can also read further in the article to learn how to use a pre-existing resource, and configure it to work with custom text classification.
You also will need an Azure storage account where you will upload your .txt
documents that will be used to train a model to classify text.
Note
- You need to have an owner role assigned on the resource group to create a Language resource.
- If you will connect a pre-existing storage account, you should have an owner role assigned to it.
Create Language resource and connect storage account
Note
You shouldn't move the storage account to a different resource group or subscription once it's linked with the Language resource.
Create a new resource from the Azure portal
Go to the Azure portal to create a new Azure AI Language resource.
In the window that appears, select Custom text classification & custom named entity recognition from the custom features. Select Continue to create your resource at the bottom of the screen.
Create a Language resource with following details.
Name |
Required value |
Subscription |
Your Azure subscription. |
Resource group |
A resource group that will contain your resource. You can use an existing one, or create a new one. |
Region |
One of the supported regions. For example "West US 2". |
Name |
A name for your resource. |
Pricing tier |
One of the supported pricing tiers. You can use the Free (F0) tier to try the service. |
If you get a message saying "your login account is not an owner of the selected storage account's resource group", your account needs to have an owner role assigned on the resource group before you can create a Language resource. Contact your Azure subscription owner for assistance.
You can determine your Azure subscription owner by searching your resource group and following the link to its associated subscription. Then:
- Select the Access Control (IAM) tab
- Select Role assignments
- Filter by Role:Owner.
In the Custom text classification & custom named entity recognition section, select an existing storage account or select New storage account. Note that these values are to help you get started, and not necessarily the storage account values you’ll want to use in production environments. To avoid latency during building your project connect to storage accounts in the same region as your Language resource.
Storage account value |
Recommended value |
Storage account name |
Any name |
Storage account type |
Standard LRS |
Make sure the Responsible AI Notice is checked. Select Review + create at the bottom of the page.
Create a new Language resource from Language Studio
If it's your first time logging in, you'll see a window in Language Studio that will let you choose an existing Language resource or create a new one. You can also create a resource by clicking the settings icon in the top-right corner, selecting Resources, then clicking Create a new resource.
Create a Language resource with following details.
Instance detail |
Required value |
Azure subscription |
Your Azure subscription |
Azure resource group |
Your Azure resource group |
Azure resource name |
Your Azure resource name |
Location |
The region where your Language resource. |
Pricing tier |
The pricing tier for your Language resource. |
Important
- Make sure to enable Managed Identity when you create a Language resource.
- Read and confirm Responsible AI notice
To use custom text classification, you'll need to connect your resource to a storage account. If you don't have one, you can create an Azure storage account. Use the following steps to create your first project and connect your storage account.
Sign into the Language Studio. A window will appear to let you select your subscription and Language resource. Select your Language resource.
Under the Classify text section of Language Studio, select Custom text classification.
Select Create new project from the top menu in your projects page. Creating a project will let you label data, train, evaluate, improve, and deploy your models.
After you click, Create new project, a window will appear to let you connect your storage account. If you've already connected a storage account, you will see the storage accounted connected. If not, choose your storage account from the dropdown that appears and select Connect storage account; this will set the required roles for your storage account. This step will possibly return an error if you are not assigned as owner on the storage account.
Note
- You only need to do this step once for each new language resource you use.
- This process is irreversible, if you connect a storage account to your Language resource you cannot disconnect it later.
- You can only connect your Language resource to one storage account.
Select project type. You can either create a Multi label classification project where each document can belong to one or more classes or Single label classification project where each document can belong to only one class. The selected type can't be changed later. Learn more about project types
Enter the project information, including a name, description, and the language of the documents in your project. If you're using the example dataset, select English. You won’t be able to change the name of your project later. Select Next.
Tip
Your dataset doesn't have to be entirely in the same language. You can have multiple documents, each with different supported languages. If your dataset contains documents of different languages or if you expect text from different languages during runtime, select enable multi-lingual dataset option when you enter the basic information for your project. This option can be enabled later from the Project settings page.
Select the container where you have uploaded your dataset.
Note
If you have already labeled your data make sure it follows the supported format and select Yes, my documents are already labeled and I have formatted JSON labels file and select the labels file from the drop-down menu below.
If you’re using one of the example datasets, use the included webOfScience_labelsFile
or movieLabels
json file. Then select Next.
Review the data you entered and select Create Project.
You can create a new resource and a storage account using the following CLI template and parameters files, which are hosted on GitHub.
Edit the following values in the parameters file:
Parameter name |
Value description |
name |
Name of your Language resource |
location |
The region in which your resource is hosted. See region support for more information. |
sku |
The pricing tier of your resource. See service limits for more information. |
storageResourceName |
Name of your storage account |
storageLocation |
Region in which your storage account is hosted. |
storageSkuType |
SKU of your storage account. |
storageResourceGroupName |
Resource group of your storage account |
Use the following PowerShell command to deploy the Azure Resource Manager (ARM) template with the files you edited.
New-AzResourceGroupDeployment -Name ExampleDeployment -ResourceGroupName ExampleResourceGroup `
-TemplateFile <path-to-arm-template> `
-TemplateParameterFile <path-to-parameters-file>
See the ARM template documentation for information on deploying templates and parameter files.
Note
- The process of connecting a storage account to your Language resource is irreversible, it cannot be disconnected later.
- You can only connect your language resource to one storage account.
Using a pre-existing Language resource
Requirement |
Description |
Regions |
Make sure your existing resource is provisioned in one of the supported regions. If you don't have a resource, you will need to create a new one in a supported region. |
Pricing tier |
The pricing tier for your resource. |
Managed identity |
Make sure that the resource's managed identity setting is enabled. Otherwise, read the next section. |
To use custom text classification, you'll need to create an Azure storage account if you don't have one already.
Enable identity management for your resource
Your Language resource must have identity management, to enable it using Azure portal:
- Go to your Language resource
- From left hand menu, under Resource Management section, select Identity
- From System assigned tab, make sure to set Status to On
Your Language resource must have identity management, to enable it using Language Studio:
- Select the settings icon in the top right corner of the screen
- Select Resources
- Select the check box Managed Identity for your Azure AI Language resource.
Enable custom text classification feature
Make sure to enable Custom text classification / Custom Named Entity Recognition feature from Azure portal.
- Go to your Language resource in Azure portal
- From the left side menu, under Resource Management section, select Features
- Enable Custom text classification / Custom Named Entity Recognition feature
- Connect your storage account
- Select Apply
Important
- Make sure that your Language resource has storage blob data contributor role assigned on the storage account you are connecting.
Set roles for your Azure AI Language resource and storage account
Use the following steps to set the required roles for your Language resource and storage account.
Roles for your Azure AI Language resource
Go to your storage account or Language resource in the Azure portal.
Select Access Control (IAM) in the left pane.
Select Add to Add Role Assignments, and choose the appropriate role for your account.
You should have the owner or contributor role assigned on your Language resource.
Within Assign access to, select User, group, or service principal
Select Select members
Select your user name. You can search for user names in the Select field. Repeat this for all roles.
Repeat these steps for all the user accounts that need access to this resource.
Roles for your storage account
- Go to your storage account page in the Azure portal.
- Select Access Control (IAM) in the left pane.
- Select Add to Add Role Assignments, and choose the Storage blob data contributor role on the storage account.
- Within Assign access to, select Managed identity.
- Select Select members
- Select your subscription, and Language as the managed identity. You can search for user names in the Select field.
Important
If you have a virtual network or private endpoint, be sure to select Allow Azure services on the trusted services list to access this storage account in the Azure portal.
Enable CORS for your storage account
Make sure to allow (GET, PUT, DELETE) methods when enabling Cross-Origin Resource Sharing (CORS).
Set allowed origins field to https://language.cognitive.azure.com
. Allow all header by adding *
to the allowed header values, and set the maximum age to 500
.
Create a custom text classification project
Once your resource and storage container are configured, create a new custom text classification project. A project is a work area for building your custom AI models based on your data. Your project can only be accessed by you and others who have access to the Azure resource being used. If you have labeled data, you can import it to get started.
Sign into the Language Studio. A window will appear to let you select your subscription and Language resource. Select your Language resource.
Under the Classify text section of Language Studio, select Custom text classification.
Select Create new project from the top menu in your projects page. Creating a project will let you label data, train, evaluate, improve, and deploy your models.
After you click, Create new project, a window will appear to let you connect your storage account. If you've already connected a storage account, you will see the storage accounted connected. If not, choose your storage account from the dropdown that appears and select Connect storage account; this will set the required roles for your storage account. This step will possibly return an error if you are not assigned as owner on the storage account.
Note
- You only need to do this step once for each new language resource you use.
- This process is irreversible, if you connect a storage account to your Language resource you cannot disconnect it later.
- You can only connect your Language resource to one storage account.
Select project type. You can either create a Multi label classification project where each document can belong to one or more classes or Single label classification project where each document can belong to only one class. The selected type can't be changed later. Learn more about project types
Enter the project information, including a name, description, and the language of the documents in your project. If you're using the example dataset, select English. You won’t be able to change the name of your project later. Select Next.
Tip
Your dataset doesn't have to be entirely in the same language. You can have multiple documents, each with different supported languages. If your dataset contains documents of different languages or if you expect text from different languages during runtime, select enable multi-lingual dataset option when you enter the basic information for your project. This option can be enabled later from the Project settings page.
Select the container where you have uploaded your dataset.
Note
If you have already labeled your data make sure it follows the supported format and select Yes, my documents are already labeled and I have formatted JSON labels file and select the labels file from the drop-down menu below.
If you’re using one of the example datasets, use the included webOfScience_labelsFile
or movieLabels
json file. Then select Next.
Review the data you entered and select Create Project.
To start creating a custom text classification model, you need to create a project. Creating a project will let you label data, train, evaluate, improve, and deploy your models.
Note
The project name is case-sensitive for all operations.
Create a PATCH request using the following URL, headers, and JSON body to create your project.
Request URL
Use the following URL to create a project. Replace the placeholder values below with your own values.
{Endpoint}/language/authoring/analyze-text/projects/{projectName}?api-version={API-VERSION}
Placeholder |
Value |
Example |
{ENDPOINT} |
The endpoint for authenticating your API request. |
https://<your-custom-subdomain>.cognitiveservices.azure.com |
{PROJECT-NAME} |
The name for your project. This value is case-sensitive. |
myProject |
{API-VERSION} |
The version of the API you are calling. The value referenced here is for the latest version released. See Model lifecycle to learn more about other available API versions. |
2022-05-01 |
Use the following header to authenticate your request.
Key |
Value |
Ocp-Apim-Subscription-Key |
The key to your resource. Used for authenticating your API requests. |
Body
Use the following JSON in your request. Replace the placeholder values below with your own values.
{
"projectName": "{PROJECT-NAME}",
"language": "{LANGUAGE-CODE}",
"projectKind": "customMultiLabelClassification",
"description": "Project description",
"multilingual": "True",
"storageInputContainerName": "{CONTAINER-NAME}"
}
Key |
Placeholder |
Value |
Example |
projectName |
{PROJECT-NAME} |
The name of your project. This value is case-sensitive. |
myProject |
language |
{LANGUAGE-CODE} |
A string specifying the language code for the documents used in your project. If your project is a multilingual project, choose the language code of the majority of the documents. See language support to learn more about supported language codes. |
en-us |
projectKind |
customMultiLabelClassification |
Your project kind. |
customMultiLabelClassification |
multilingual |
true |
A boolean value that enables you to have documents in multiple languages in your dataset and when your model is deployed you can query the model in any supported language (not necessarily included in your training documents. See language support to learn more about multilingual support. |
true |
storageInputContainerName |
{CONTAINER-NAME} |
The name of your Azure storage container where you have uploaded your documents. |
myContainer |
{
"projectName": "{PROJECT-NAME}",
"language": "{LANGUAGE-CODE}",
"projectKind": "customSingleLabelClassification",
"description": "Project description",
"multilingual": "True",
"storageInputContainerName": "{CONTAINER-NAME}"
}
Key |
Placeholder |
Value |
Example |
projectName |
{PROJECT-NAME} |
The name of your project. This value is case-sensitive. |
myProject |
language |
{LANGUAGE-CODE} |
A string specifying the language code for the documents used in your project. If your project is a multilingual project, choose the language code of the majority of the documents. See language support to learn more about supported language codes. |
en-us |
projectKind |
customSingleLabelClassification |
Your project kind. |
customSingleLabelClassification |
multilingual |
true |
A boolean value that enables you to have documents in multiple languages in your dataset and when your model is deployed you can query the model in any supported language (not necessarily included in your training documents. See language support to learn more about multilingual support. |
true |
storageInputContainerName |
{CONTAINER-NAME} |
The name of your Azure storage container where you have uploaded your documents. |
myContainer |
This request will return a 201 response, which means that the project is created.
This request will return an error if:
- The selected resource doesn't have proper permission for the storage account.
Import a custom text classification project
If you have already labeled data, you can use it to get started with the service. Make sure that your labeled data follows the accepted data formats.
Sign into the Language Studio. A window will appear to let you select your subscription and Language resource. Select your Language resource.
Under the Classify text section of Language Studio, select Custom text classification.
Select Create new project from the top menu in your projects page. Creating a project will let you label data, train, evaluate, improve, and deploy your models.
After you select Create new project, a screen will appear to let you connect your storage account. If you can’t find your storage account, make sure you created a resource using the recommended steps. If you've already connected a storage account to your Language resource, you will see your storage account connected.
Note
- You only need to do this step once for each new language resource you use.
- This process is irreversible, if you connect a storage account to your Language resource you cannot disconnect it later.
- You can only connect your Language resource to one storage account.
Select project type. You can either create a Multi label classification project where each document can belong to one or more classes or Single label classification project where each document can belong to only one class. The selected type can't be changed later.
Enter the project information, including a name, description, and the language of the documents in your project. You won’t be able to change the name of your project later. Select Next.
Tip
Your dataset doesn't have to be entirely in the same language. You can have multiple documents, each with different supported languages. If your dataset contains documents of different languages or if you expect text from different languages during runtime, select enable multi-lingual dataset option when you enter the basic information for your project. This option can be enabled later from the Project settings page.
Select the container where you have uploaded your dataset.
Select Yes, my documents are already labeled and I have formatted JSON labels file and select the labels file from the drop-down menu below to import your JSON labels file. Make sure it follows the supported format.
Select Next.
Review the data you entered and select Create Project.
Submit a POST request using the following URL, headers, and JSON body to import your labels file. Make sure that your labels file follow the accepted format.
If a project with the same name already exists, the data of that project is replaced.
{Endpoint}/language/authoring/analyze-text/projects/{projectName}/:import?api-version={API-VERSION}
Placeholder |
Value |
Example |
{ENDPOINT} |
The endpoint for authenticating your API request. |
https://<your-custom-subdomain>.cognitiveservices.azure.com |
{PROJECT-NAME} |
The name for your project. This value is case-sensitive. |
myProject |
{API-VERSION} |
The version of the API you are calling. The value referenced here is for the latest version released. Learn more about other available API versions |
2022-05-01 |
Use the following header to authenticate your request.
Key |
Value |
Ocp-Apim-Subscription-Key |
The key to your resource. Used for authenticating your API requests. |
Body
Use the following JSON in your request. Replace the placeholder values below with your own values.
{
"projectFileVersion": "{API-VERSION}",
"stringIndexType": "Utf16CodeUnit",
"metadata": {
"projectName": "{PROJECT-NAME}",
"storageInputContainerName": "{CONTAINER-NAME}",
"projectKind": "customMultiLabelClassification",
"description": "Trying out custom multi label text classification",
"language": "{LANGUAGE-CODE}",
"multilingual": true,
"settings": {}
},
"assets": {
"projectKind": "customMultiLabelClassification",
"classes": [
{
"category": "Class1"
},
{
"category": "Class2"
}
],
"documents": [
{
"location": "{DOCUMENT-NAME}",
"language": "{LANGUAGE-CODE}",
"dataset": "{DATASET}",
"classes": [
{
"category": "Class1"
},
{
"category": "Class2"
}
]
},
{
"location": "{DOCUMENT-NAME}",
"language": "{LANGUAGE-CODE}",
"dataset": "{DATASET}",
"classes": [
{
"category": "Class2"
}
]
}
]
}
}
Key |
Placeholder |
Value |
Example |
api-version |
{API-VERSION} |
The version of the API you are calling. The version used here must be the same API version in the URL. Learn more about other available API versions |
2022-05-01 |
projectName |
{PROJECT-NAME} |
The name of your project. This value is case-sensitive. |
myProject |
projectKind |
customMultiLabelClassification |
Your project kind. |
customMultiLabelClassification |
language |
{LANGUAGE-CODE} |
A string specifying the language code for the documents used in your project. If your project is a multilingual project, choose the language code of the majority of the documents. See language support to learn more about multilingual support. |
en-us |
multilingual |
true |
A boolean value that enables you to have documents in multiple languages in your dataset and when your model is deployed you can query the model in any supported language (not necessarily included in your training documents. See language support to learn more about multilingual support. |
true |
storageInputContainerName |
{CONTAINER-NAME} |
The name of your Azure storage container where you have uploaded your documents. |
myContainer |
classes |
[] |
Array containing all the classes you have in the project. These are the classes you want to classify your documents into. |
[] |
documents |
[] |
Array containing all the documents in your project and what the classes labeled for this document. |
[] |
location |
{DOCUMENT-NAME} |
The location of the documents in the storage container. Since all the documents are in the root of the container this should be the document name. |
doc1.txt |
dataset |
{DATASET} |
The test set to which this document will go to when split before training. See How to train a model for more information on data splitting. Possible values for this field are Train and Test . |
Train |
{
"projectFileVersion": "{API-VERSION}",
"stringIndexType": "Utf16CodeUnit",
"metadata": {
"projectName": "{PROJECT-NAME}",
"storageInputContainerName": "{CONTAINER-NAME}",
"projectKind": "customSingleLabelClassification",
"description": "Trying out custom multi label text classification",
"language": "{LANGUAGE-CODE}",
"multilingual": true,
"settings": {}
},
"assets": {
"projectKind": "customSingleLabelClassification",
"classes": [
{
"category": "Class1"
},
{
"category": "Class2"
}
],
"documents": [
{
"location": "{DOCUMENT-NAME}",
"language": "{LANGUAGE-CODE}",
"dataset": "{DATASET}",
"class": {
"category": "Class2"
}
},
{
"location": "{DOCUMENT-NAME}",
"language": "{LANGUAGE-CODE}",
"dataset": "{DATASET}",
"class": {
"category": "Class1"
}
}
]
}
}
Key |
Placeholder |
Value |
Example |
api-version |
{API-VERSION} |
The version of the API you are calling. The version used here must be the same API version in the URL. |
2022-05-01 |
projectName |
{PROJECT-NAME} |
The name of your project. This value is case-sensitive. |
myProject |
projectKind |
customSingleLabelClassification |
Your project kind. |
customSingleLabelClassification |
language |
{LANGUAGE-CODE} |
A string specifying the language code for the documents used in your project. If your project is a multilingual project, choose the language code of the majority of the documents. See language support to learn more about supported language codes. |
en-us |
multilingual |
true |
A boolean value that enables you to have documents in multiple languages in your dataset and when your model is deployed you can query the model in any supported language (not necessarily included in your training documents. See language support to learn more about multilingual support. |
true |
storageInputContainerName |
{CONTAINER-NAME} |
The name of your Azure storage container where you have uploaded your documents. |
myContainer |
classes |
[] |
Array containing all the classes you have in the project. These are the classes you want to classify your documents into. |
[] |
documents |
[] |
Array containing all the documents in your project and which class this document belongs to. |
[] |
location |
{DOCUMENT-NAME} |
The location of the documents in the storage container. Since all the documents are in the root of the container this should be the document name. |
doc1.txt |
dataset |
{DATASET} |
The test set to which this document will go to when split before training. See How to train a model to learn more about data splitting. Possible values for this field are Train and Test . |
Train |
Once you send your API request, you’ll receive a 202
response indicating that the job was submitted correctly. In the response headers, extract the operation-location
value. It will be formatted like this:
{ENDPOINT}/language/authoring/analyze-text/projects/{PROJECT-NAME}/import/jobs/{JOB-ID}?api-version={API-VERSION}
{JOB-ID}
is used to identify your request, since this operation is asynchronous. You’ll use this URL to get the import job status.
Possible error scenarios for this request:
- The selected resource doesn't have proper permissions for the storage account.
- The
storageInputContainerName
specified doesn't exist.
- Invalid language code is used, or if the language code type isn't string.
multilingual
value is a string and not a boolean.
Get project details
Go to your project settings page in Language Studio.
You can see project details.
In this page, you can update project description and enable/disable Multi-lingual dataset in project settings.
You can also view the connected storage account and container to your Language resource.
You can also retrieve your resource primary key from this page.
To get custom text classification project details, submit a GET request using the following URL and headers. Replace the placeholder values with your own values.
{ENDPOINT}/language/authoring/analyze-text/projects/{PROJECT-NAME}?api-version={API-VERSION}
Placeholder |
Value |
Example |
{ENDPOINT} |
The endpoint for authenticating your API request. |
https://<your-custom-subdomain>.cognitiveservices.azure.com |
{PROJECT-NAME} |
The name for your project. This value is case-sensitive. |
myProject |
{API-VERSION} |
The version of the API you're calling. The value referenced here is for the latest released model version. |
2022-05-01 |
Use the following header to authenticate your request.
Key |
Value |
Ocp-Apim-Subscription-Key |
The key to your resource. Used for authenticating your API requests. |
Response Body
Once you send the request, you will get the following response.
{
"createdDateTime": "2022-04-23T13:39:09.384Z",
"lastModifiedDateTime": "2022-04-23T13:39:09.384Z",
"lastTrainedDateTime": "2022-04-23T13:39:09.384Z",
"lastDeployedDateTime": "2022-04-23T13:39:09.384Z",
"projectKind": "customSingleLabelClassification",
"storageInputContainerName": "{CONTAINER-NAME}",
"projectName": "{PROJECT-NAME}",
"multilingual": true,
"description": "Project description",
"language": "{LANGUAGE-CODE}"
}
Value |
placeholder |
Description |
Example |
projectKind |
customSingleLabelClassification |
Your project kind. |
This value can be customSingleLabelClassification or customMultiLabelClassification . |
storageInputContainerName |
{CONTAINER-NAME} |
The name of your Azure storage container where you have uploaded your documents. |
myContainer |
projectName |
{PROJECT-NAME} |
The name of your project. This value is case-sensitive. |
myProject |
multilingual |
|
A boolean value that enables you to have documents in multiple languages in your dataset. When your model is deployed, you can query the model in any supported language (not necessarily included in your training documents. For more information on multilingual support, see language support. |
true |
language |
{LANGUAGE-CODE} |
A string specifying the language code for the documents used in your project. If your project is a multilingual project, choose the language code of the majority of the documents. See language support to learn more about supported language codes. |
en-us |
Once you send your API request, you'll receive a 200
response indicating success and JSON response body with your project details.
Delete project
When you don't need your project anymore, you can delete your project using Language Studio. Select Custom text classification in the top, and then select the project you want to delete. Select Delete from the top menu to delete the project.
When you no longer need your project, you can delete it with the following DELETE request. Replace the placeholder values with your own values.
{Endpoint}/language/authoring/analyze-text/projects/{projectName}?api-version={API-VERSION}
Placeholder |
Value |
Example |
{ENDPOINT} |
The endpoint for authenticating your API request. |
https://<your-custom-subdomain>.cognitiveservices.azure.com |
{PROJECT-NAME} |
The name for your project. This value is case-sensitive. |
myProject |
{API-VERSION} |
The version of the API you are calling. The value referenced here is for the latest version released. Learn more about other available API versions |
2022-05-01 |
Use the following header to authenticate your request.
Key |
Value |
Ocp-Apim-Subscription-Key |
The key to your resource. Used for authenticating your API requests. |
Once you send your API request, you will receive a 202
response indicating success, which means your project has been deleted. A successful call results with an Operation-Location
header used to check the status of the job.
Next steps
You should have an idea of the project schema you will use to label your data.
After your project is created, you can start labeling your data, which will inform your text classification model how to interpret text, and is used for training and evaluation.