How to fix Integration Runtime throttling for small Mapping Dataflow?

Question

How to fix Integration Runtime throttling for small Mapping Dataflow?

Rhys Doyle 0

Hi,

I have a Data Pipeline with two activities: a lookup which reads a JSON file and passes the contents to a For Each which runs a data flow. The data flow has 4 transformations: a source (CDM), Identify columns, Alter row and then a sink which writes Delta Inline dataset to a data lake gen 2.

This used to work fine, but now it fails before even running the data flow with this error:
There are substantial concurrent MappingDataflow executions which is causing failures due to throttling under Integration Runtime'.

I have tried creating different Integration runtimes with varies different numbers of cores but I get the same error. I have checked that there are no pipeline still running and the for each is set to run sequentially so it shouldn't spin up different cluster anyway. For reference I have also tried running the dataflow directly (i.e. not in the for each) and I get the same error.

Any help debugging this would be much appreciated. If I have some how hit the limit is there a way to reset it?

1 answer

Your answer

Answer 1

@Rhys Doyle

It looks like you're running into some throttling issues with your Mapping Data Flow due to substantial concurrent executions. This can definitely be frustrating, especially when everything was running smoothly before. Here are a few suggestions to help you debug and potentially resolve this issue:

Cluster Size and Type: Make sure that the integration runtime (IR) you are using is appropriately sized for the workload. Since you've already tried different IRs with varying core counts without success, consider checking that your current settings match the scale of the data you're processing. Increasing the number of worker and driver cores might help but be mindful of costs.

Sequential Execution: You mentioned that the For Each activity is set to run sequentially, which is good. However, ensure that there are no other activities in your pipeline that might be triggering multiple parallel runs unintentionally. Throttling often happens when multiple jobs contend for the same resources.

Custom Shuffle Partitions: If your data flow is processing large datasets, adjusting the custom shuffle partitions might be beneficial. You can set this in the ADF portal under your integration runtime settings. For instance, setting shuffle partitions to a value like 250 could help manage memory consumption better.

Debug Commands: When running in debug mode, it can lead to concurrent executions if multiple debug sessions are initiated. Ensure that you’re running your data flow in a manner that limits the number of active executions at the same time.

Transient Issues: Sometimes these errors can be caused by transient issues with the Azure services. Implementing retries in your pipeline can help mitigate these temporary problems.

Limits Check: Lastly, if you suspect that you've hit a service limit, reviewing Azure's documentation on quotas and limits for your service may provide insight. Azure support could also help you determine if you’ve reached any limits and how to reset them.

If these suggestions don't resolve the issue, could you provide a bit more detail about:

The size of the datasets you're working with?
The current configuration of your integration runtime (cores, type)?
Any specific changes in your environment or data recently?
Whether you are using debug sessions often that might lead to these throttling scenarios?

Hope this helps. Do let us know if you any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Share via

How to fix Integration Runtime throttling for small Mapping Dataflow?

1 answer

Your answer