Azure Synapse CI/CD Pipeline Spark Pool Parameterization Issue

Debbie Edwards 521 Reputation points
2025-04-24T11:09:51.4433333+00:00

A Synapse Project with three notebooks is set up, each utilizing a Spark cluster, e.g., sparkuksdev. The notebooks are part of a pipeline where the Spark cluster is parameterized through a SQL Data Warehouse's Parameters table using @activity('LookupGetParameters').output.firstRow.sparkpool. A stored procedure retrieves the parameters for each pipeline instance, which updates correctly across dev, tst, and prod environments.

However, an issue arises during a YAML pipeline execution that transfers Synapse to different environments, resulting in the following error:

##[error]Encountered with exception:Error: Failed to fetch the deployment status {"code":"BadRequest","message":"The document creation or update failed because of invalid reference 'sparkuskdev'."}

Several JSON and YAML files are in use to support the process, and any insights into the problem or overlooked elements would be appreciated.

Main Branch template-parameters-definition.json

The template-parameters-definition.json file is used to define parameters for template customization across environments. However, a parameters section for the spark pool (bigDataPool) has not been added, leading to uncertainty about its functionality.

User's image

I believe this is the only section that has been added that possibly relates to the spark pool

Synapse Artifacts Deploy Pipeline YAML

There is an expectation that the above file would automatically update the YAML for the Synapse Artifacts Deploy Pipeline with a variable like:

-name sparkpool
value 'projsp${(parameters.environment)}'

But no changes have occurred, suggesting that there may be a misconfiguration.

TemplateForWorkspace.JSON

This template was assumed to automatically include parameters, thereby parameterizing hardcoded spark pool values. However, the spark pool reference remains hardcoded as sparkdev, indicating a lack of automatic updates.

TemplateParametersForWorkspace_Dev.JSON

These parameters appear to have been manually updated per environment, showing numerous references to the spark pool (e.g., "read_json_properties_bigDataPool_referenceName": { "value": "sparkdev"}). It is unclear if these were generated from TemplateParametersForWorkspace.JSON or manually inputted.

Comprehensive documentation has been reviewed, but the understanding of how to achieve a fully functional setup remains elusive. Any guidance on resolving these issues would be greatly appreciated.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,318 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Venkat Reddy Navari 1,585 Reputation points Microsoft External Staff
    2025-04-24T12:14:08.02+00:00

    Hi @Debbie Edwards You're running into issue with Synapse CI/CD deployment where Spark pool references aren’t properly parameterized in your ARM templates. The key error — "The document creation or update failed because of invalid reference 'sparkuskdev'" — suggests that during deployment, the Spark pool name is still hardcoded rather than dynamically resolved via your environment-specific parameters.

    Here are a few things to double-check:

    1. Parameter Definition in template-parameters-definition.json It’s crucial that you explicitly define a parameter for the Spark pool (e.g., bigDataPool) in this file. Without it, your deployment won’t know to expect a value for substitution.
    2. templateForWorkspace.json: Within this template, all references to the Spark pool should use a parameter reference lik
         "bigDataPool": {
         "referenceName": "[parameters('bigDataPool')]",
         "type": "BigDataPoolReference"
         }
      
      If referenceName is hardcoded to sparkdev, it won’t dynamically update during deployment. This is likely the root cause of the BadRequest you're seeing.
    3. TemplateParametersForWorkspace_{env}.json: These environment-specific parameter files should include entries like:
         "bigDataPool": {
         "value": "sparkuksdev"
         }
      
      Make sure these are consistent and match the pool names for each environment.
    4. YAML pipeline: It won’t auto-update values unless the pipeline explicitly passes parameters into the deployment template. Ensure you’re passing the right environment-specific parameters file using something like
         parameters:
         - name: environment
         default: 'dev'
         ...
         - task: AzureResourceManagerTemplateDeployment@3
         inputs:
         csmFile: 'templateForWorkspace.json'
         csmParametersFile: 'TemplateParametersForWorkspace_$(environment).json'
      
    5. Regenerate Synapse templates after changes: Any manual edits should be followed by re-exporting or regenerating your Synapse workspace templates to maintain integrity.

    If you fix the parameterization in both the template and the parameters files and ensure your pipeline is wired up to pass those correctly, your deployment should succeed without hardcoded values.

    I hope this information helps. Please do let us know if you have any further queries. Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.


  2. Debbie Edwards 521 Reputation points
    2025-04-30T08:46:01.87+00:00

    We have got round this by creating a generically named spark pool. the underlying properties change via the above. Still we don't think thats really what we wanted to do but couldn't see another way round it

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.