Hello Broomfield, Darrien,
Welcome to the Microsoft Q&A and thank you for posting your questions here.
Regarding the Azure Synapse Link for Dataverse Spark issues and the responses, after my careful review, you need to know in reality:
- Delta requires explicit activation and permissions (Synapse Managed Identity must have Storage Blob Data Owner role). Chech from this links: -
- https://learn.microsoft.com/en-us/fabric/data-engineering/delta-optimization-and-v-order
- https://docs.databricks.com/aws/en/delta/table-properties
- https://learn.microsoft.com/en-us/azure/databricks/delta-sharing/grant-access
- model.json must be parsed to map headers, and partitioned data requires path patterns. Check this links:
- https://learn.microsoft.com/en-us/azure/architecture/best-practices/data-partitioning
- https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-custom-path-patterns-blob-storage-output).
Therefore, follow the below steps to resolve the issue:
- Assign Synapse Managed Identity the Storage Blob Data Owner role on the storage account - https://learn.microsoft.com/en-us/azure/storage/common/storage-auth-aad-rbac-portal also, make sure Synapse identity has System Administrator in Dataverse - https://learn.microsoft.com/en-us/power-platform/admin/manage-service-principals then, reconfigure Synapse Link with Delta enabled.
- Read Parquet/Delta Data (If Delta Enabled)
# For Delta tables: df = spark.read.format("delta").load("abfss://<container>@<storage>.dfs.core.windows.net/Tables/<table>/delta") df.show() # For Parquet: df = spark.read.parquet("abfss://<container>@<storage>.dfs.core.windows.net/Tables/<table>/parquet")
- Handle CSV Files & Schema Extraction in this two ways:
- Parse
model.json
for Schema:from pyspark.sql import SparkSession import json # Read model.json model_json_path = "abfss://<container>@<storage>.dfs.core.windows.net/model.json" model_content = spark.sparkContext.wholeTextFiles(model_json_path).collect()[0][1] model = json.loads(model_content) # Extract schema for a specific table (e.g., "account") table_schema = [col["name"] for table in model["entities"] if table["name"] == "account" for col in table["attributes"]]
- Read CSV with Inferred Schema:
# Read all CSV files (including partitions and snapshots) path = "abfss://<container>@<storage>.dfs.core.windows.net/Tables/account/**/*.csv" df = spark.read.option("header", "false").csv(path) df = df.toDF(*table_schema) df.show()
- Parse
- Lastly, you will need to handle Partitioned Data & Snapshots, make sure you include all data:
- Use wildcards to read partitioned folders (e.g.,
Tables/account/**/*.csv
). - Snapshots are stored in
Tables/account/snapshot
; exclude them if not needed:path = "abfss://<container>@<storage>.dfs.core.windows.net/Tables/account/*/*.csv" # Excludes "snapshot"
- Use wildcards to read partitioned folders (e.g.,
I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.
Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.