Azure Data Factory - fetch and freeze mechanism during pipeline run?

Shad300 20 Reputation points
2025-04-29T12:07:38.28+00:00

Dear community,

Context:

I have a Azure Data Factory pipeline linked to Databricks workspace.

There are a build and a release CI/CD Azure devops pipelines that automatically update the Databricks notebooks in the Databricks workspace.

The CI/CD build and release pipelines can happen at the same time as an ADF (Azure Data Factory) pipeline i.e. when an ADF pipeline is already running.

Both the Azure devops CI/CD pipelines AND the Azure Data Factory pipelines are accessing the Databricks workspace. This means that when a ADF pipeline is running, there is a risk if the databricks workspace notebooks are changed through the Azure Devops CI/CD pipeline.

The wished / expected behaviour would be:

1/ all files/codes/scripts/notebooks are fetched when the ADF pipeline starts

2/ all files remain exactly the same until the pipeline ends.

To reformulate the point 2) above: to release the CI/CD release in Azure Devops should not affect the ADF pipeline run in any way, even though the release pipeline adds/removes/updates notebooks in Databricks Workspace.

First Question:

Is the wished/expected behaviour described above currently what is happening when using Azure Devops, ADF and Databricks Worspaces in this way?

To reformulate: is there a fetch-freeze or "lock" mechanism between Azure Data Factory and Azure devops, including linked services such as Databricks? Can we be sure that the files remain exactly the same between the ADF pipeline start and end?

Second part of the question:

If the answer to the first question is "no", then we have a problem.

What solutions would you recommend? We thought of two main ideas:

1/ Implement a pipeline-run monitoring (see linked info here) in the release pipeline to prevent changes during a pipeline run? i.e. do we need to queue the CI/CD release pipeline until the ADF pipeline is finished?

2/ Add an activity at the beginning of the ADF pipeline that runs the CI/CD pipelines steps for Databricks.

Thanks in advance for your contribution!

Kind regards,

Shad300

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,464 questions
0 comments No comments
{count} votes

Accepted answer
  1. Venkat Reddy Navari 1,585 Reputation points Microsoft External Staff
    2025-04-29T12:40:33.9566667+00:00

    Hi @Shad300 This is a common challenge when orchestrating ADF, Databricks, and Azure DevOps together.

    Is the wished/expected behavior currently what is happening?

    No - Azure Data Factory does not “freeze” or snapshot Databricks notebooks at the time a pipeline starts. ADF executes notebooks live at runtime, based on their current state in the Databricks workspace. So, if your CI/CD release pipeline modifies or deletes a notebook while an ADF pipeline is still running, it can affect the behavior of that in-progress run — especially if the changed notebook is called later in the pipeline.

    There is no built-in lock or fetch-and-freeze mechanism between ADF and Databricks by default.

    Regarding your second point and proposed solutions:

    Monitoring ADF pipelines before starting a release is a practical and effective option. You can query ADF pipeline run status using the REST API or SDK, and delay or queue your release pipeline until the ADF run has finished. This helps avoid runtime conflicts caused by concurrent updates.

    Triggering CI/CD deployment logic from within the ADF pipeline is technically possible — for example, by pulling notebooks from a Git repo at the beginning — but this tightly couples deployment and data processing. It may work in controlled environments but could reduce flexibility and increase maintenance overhead.

    Alternative approach:

    Package your Databricks logic (e.g. shared functions, transformations) into versioned Python wheel files and install them into your Databricks clusters. This way, your ADF pipeline runs against a fixed version of your codebase, isolated from any ongoing CI/CD changes. This adds stability, versioning, and reduces risk during long-running jobs.

    In summary: The default setup does not prevent updates from affecting running pipelines.

    To protect against this, you’ll need to add some form of isolation — via runtime checks, gated releases, or versioned code execution.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.