Azure Data Factory - fetch and freeze mechanism during pipeline run?

Question

Azure Data Factory - fetch and freeze mechanism during pipeline run?

Shad300 20

Dear community,

Context:

I have a Azure Data Factory pipeline linked to Databricks workspace.

There are a build and a release CI/CD Azure devops pipelines that automatically update the Databricks notebooks in the Databricks workspace.

The CI/CD build and release pipelines can happen at the same time as an ADF (Azure Data Factory) pipeline i.e. when an ADF pipeline is already running.

Both the Azure devops CI/CD pipelines AND the Azure Data Factory pipelines are accessing the Databricks workspace. This means that when a ADF pipeline is running, there is a risk if the databricks workspace notebooks are changed through the Azure Devops CI/CD pipeline.

The wished / expected behaviour would be:

1/ all files/codes/scripts/notebooks are fetched when the ADF pipeline starts

2/ all files remain exactly the same until the pipeline ends.

To reformulate the point 2) above: to release the CI/CD release in Azure Devops should not affect the ADF pipeline run in any way, even though the release pipeline adds/removes/updates notebooks in Databricks Workspace.

First Question:

Is the wished/expected behaviour described above currently what is happening when using Azure Devops, ADF and Databricks Worspaces in this way?

To reformulate: is there a fetch-freeze or "lock" mechanism between Azure Data Factory and Azure devops, including linked services such as Databricks? Can we be sure that the files remain exactly the same between the ADF pipeline start and end?

Second part of the question:

If the answer to the first question is "no", then we have a problem.

What solutions would you recommend? We thought of two main ideas:

1/ Implement a pipeline-run monitoring (see linked info here) in the release pipeline to prevent changes during a pipeline run? i.e. do we need to queue the CI/CD release pipeline until the ADF pipeline is finished?

2/ Add an activity at the beginning of the ADF pipeline that runs the CI/CD pipelines steps for Databricks.

Thanks in advance for your contribution!

Kind regards,

Shad300

Accepted answer

0 additional answers

Your answer

Answer 1

Venkat Reddy Navari 1,585 Microsoft External Staff

Hi @Shad300 This is a common challenge when orchestrating ADF, Databricks, and Azure DevOps together.

Is the wished/expected behavior currently what is happening?

No - Azure Data Factory does not “freeze” or snapshot Databricks notebooks at the time a pipeline starts. ADF executes notebooks live at runtime, based on their current state in the Databricks workspace. So, if your CI/CD release pipeline modifies or deletes a notebook while an ADF pipeline is still running, it can affect the behavior of that in-progress run — especially if the changed notebook is called later in the pipeline.

There is no built-in lock or fetch-and-freeze mechanism between ADF and Databricks by default.

Regarding your second point and proposed solutions:

Monitoring ADF pipelines before starting a release is a practical and effective option. You can query ADF pipeline run status using the REST API or SDK, and delay or queue your release pipeline until the ADF run has finished. This helps avoid runtime conflicts caused by concurrent updates.

Triggering CI/CD deployment logic from within the ADF pipeline is technically possible — for example, by pulling notebooks from a Git repo at the beginning — but this tightly couples deployment and data processing. It may work in controlled environments but could reduce flexibility and increase maintenance overhead.

Alternative approach:

Package your Databricks logic (e.g. shared functions, transformations) into versioned Python wheel files and install them into your Databricks clusters. This way, your ADF pipeline runs against a fixed version of your codebase, isolated from any ongoing CI/CD changes. This adds stability, versioning, and reduces risk during long-running jobs.

In summary: The default setup does not prevent updates from affecting running pipelines.

To protect against this, you’ll need to add some form of isolation — via runtime checks, gated releases, or versioned code execution.

Hope this helps. Do let us know if you any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Shad300 20 Reputation points

2025-04-29T18:52:02.7266667+00:00

Many many thanks as I was exactly searching for that information in the Azure docs and could not find it. I am a bit surprised that there is no built in mechanism, as it is a common challenge.

One question regarding the alternative approach you suggest: we are actually updating the databricks notebooks, that are running within the ADF pipeline. So in that case, I'm not sure how packaging them in a wheel is possible. Do you suggest to have a tagging system, so that the ADF pipeline always runs against a specific commit version, and we then release a new version only when the pipeline is not running? If that is the case, it would still be a bit similar to "waiting for the pipeline to end" before releasing, which we would like to avoid.
Venkat Reddy Navari 1,585 Reputation points Microsoft External Staff

2025-04-30T09:23:03.09+00:00

@Shad300 You're very welcome - Regarding your follow-up: If you’re directly running notebooks in ADF, then packaging logic into a wheel only works if your notebooks are calling reusable Python functions (from the wheel)- not containing all the logic themselves. In that setup, the notebooks stay lightweight and just point to a specific versioned library.

But if your logic lives directly inside the notebooks, then yes - using a versioning/tagging approach (e.g. always running notebooks from a specific Git commit or folder like /jobs/v1.1/) is a good alternative. That way, even if new versions are deployed, ADF continues to run against the version it was set up with.

You’re also right - his is still a form of “don’t change live notebooks while pipelines are running,” but it gives you more control without needing to wait for ADF to finish every time.
Shad300 20 Reputation points

2025-04-30T16:13:26.5033333+00:00

Thank you for your input, highly appreciate! This requires to set up 2 folders (the one used by ADF and the one that deploys the newest files from the CI/CD), with an additional step to update the version used by ADF, which I'm not sure can be done automatically. So it is definitely more complex. Do you know if Azure has a plan to have a built-in solution soon?
Venkat Reddy Navari 1,585 Reputation points Microsoft External Staff

2025-05-02T10:55:54.6066667+00:00

@Shad300 As of now, there’s no built-in feature for this from Azure.
Yes, using two folders adds some setup, and ADF doesn’t support auto-switching versions it has to be done manually or through deployment scripts.

Share via

Azure Data Factory - fetch and freeze mechanism during pipeline run?

0 additional answers

Your answer