AI Search: Improving Exact Match Scoring in Document Search

Shruti 40 Reputation points
2025-04-24T09:58:23.7933333+00:00

Building a search functionality on a dataset containing approximately 65,000,000 documents and 30 GB of data, with fields such as name, address, city, telephone number, and postcode, along with filter fields like country and type.

The challenge lies in the scoring system where exact matches receive lower scores compared to documents that contain all search terms alongside additional terms.

For example, when searching for Microsoft Inc using the following query:

  {
	 "query": "Microsoft Inc",
    "options": {
        "includeTotalCount": true,
        "queryType": "FULL",
        "searchMode": "ALL",
        "filter": "(countryCode eq USA)",
        "searchFields": ["organisationName"],
        "scoringProfile": "default"
    }
}

The expected results are:

  • "Microsoft Inc."
  • "Microsoft Inc. US"
  • "Microsoft Corporation"

However, "Microsoft Inc. US" is getting a higher score than "Microsoft Inc." despite being an exact match.

A scoring profile has been added, but it hasn't resolved the issue:

{
  "scoringProfiles": [
    {
      "name": "default",
      "functionAggregation": "firstMatching",
      "text": {
        "weights": {
          "organisationName": 5
        }
      },
      "functions": []
    }
  ]
}

The field definition is as follows:

{
  "name": "organisationName",
  "type": "Edm.String",
  "searchable": true,
  "filterable": false,
  "retrievable": true,
  "stored": true,
  "sortable": false,
  "facetable": false,
  "key": false,
  "analyzer": "diacritics_analyzer",
  "synonymMaps": []
}

The scoring configuration in use is:

"@odata.type": "#Microsoft.Azure.Search.BM25Similarity"

How can the scoring for exact matches be improved in this scenario?

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,291 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Suresh Chikkam 1,560 Reputation points Microsoft External Staff Moderator
    2025-04-29T07:17:19.8533333+00:00

    Hi Shruti,

    To address this, introduce a new field in the index specifically designed for exact matches. Define this field using a custom analyzer with a keyword tokenizer. That way, the entire phrase “Microsoft Inc” is treated as a single token and not split into "Microsoft" and "Inc".

    "analyzers": [
      {
        "name": "keyword_analyzer",
        "tokenizer": "keyword",
        "filters": [
          "lowercase"
        ]
      }
    ]
    

    And the field definition:

    {
      "name": "organisationName_exact",
      "type": "Edm.String",
      "searchable": true,
      "analyzer": "keyword_analyzer",
      "filterable": false,
      "retrievable": true,
      "sortable": false,
      "facetable": false,
      "key": false
    }
    

    When indexing documents, check the organisationName_exact field contains a lowercased version of the original name. For example, if the value is “Microsoft Inc”, the organisationName_exact field should store “microsoft inc”.

    Now, in the search query, continue using the regular organisationName field for general term matching, but also introduce a scoring profile that gives a significant boost when the exact phrase appears in the organisationName_exact field.

    {
      "search": "Microsoft Inc",
      "queryType": "full",
      "searchMode": "all",
      "searchFields": "organisationName",
      "filter": "countryCode eq 'USA'",
      "scoringProfile": "exactMatchBoost",
      "scoringParameters": [
        "searchText='Microsoft Inc'"
      ]
    }
    

    And the scoring profile:

    "scoringProfiles": [
      {
        "name": "exactMatchBoost",
        "functions": [
          {
            "type": "magnitude",
            "fieldName": "organisationName_exact",
            "boost": 10,
            "interpolation": "constant",
            "parameters": {
              "values": ["Microsoft Inc"]
            }
          }
        ],
        "functionAggregation": "sum"
      }
    ]
    

    This way, BM25 will continue to work for overall relevance, but the exact match will receive a strong score boost, helping “Microsoft Inc” appear above similar but less precise matches like “Microsoft Inc. US.”

    Hope it helps!


    Please do not forget to click "Accept the answer” and Yes wherever the information provided helps you, this can be beneficial to other community members.

    User's image

    If you have any other questions or still running into more issues, let me know in the "comments" and I would be happy to help you.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.