Hello Asif Attar ,
Welcome to the Microsoft Q&A and thank you for posting your questions here.
I understand that you having issue while trying to use Azure Logic App's "Parse Host File Contents" action to convert a 200+ MB EBCDIC file (mainframe format) to CSV, but encounters a "large aggregated partial content" error. The file includes COBOL copybook definitions with REDEFINE and OCCURS clauses.
In addition to @Ranashekar Guda and best practices consideration:
Currently, the "Parse Host File Contents" action in Azure Logic Apps does not support chunking, which means that large files cannot be processed directly and may result in errors during parsing Microsoft Docs - Chunk Text, and Handle Large Messages. This limitation is particularly problematic when dealing with COBOL files containing complex structures like REDEFINE and OCCURS clauses, which require precise parsing to interpret nested or repeated fields accurately COBOL REDEFINES Clause - Mainframes Tech Help, Handling REDEFINES - IBM Mainframes Forum, COBOL REDEFINES - IBM Mainframer.
Since Logic Apps alone cannot handle these complexities efficiently, it is recommended to use a custom solution that reads and parses files in smaller, manageable chunks. Programming languages like Python and Java are excellent options for this task, offering libraries and techniques to process large EBCDIC files without loading them entirely into memory Handling COMP-3 and EBCDIC Conversion, Efficiently Read Large Files in Chunks, EBCDIC Handling in Python.
To completely achieve your goal:
The process begins with extracting the COBOL schema by generating a HIDX file using the Azure Logic Apps Host File Schema Tool - https://learn.microsoft.com/en-us/azure/connectors/integrate-host-files-ibm-mainframe#parse-host-file-contents. Following that, the large EBCDIC file should be preprocessed externally without direct splitting to ensure that record structures remain intact. Tools like CB2XML can transform COBOL+EBCDIC data into XML and then into CSV CB2XML GitHub, while RecordEditor or JRecord provide GUI and Java API support for managing EBCDIC data RecordEditor Site. Alternatively, a custom Python script can be developed to decode EBCDIC (e.g., cp037
, cp500
) to Unicode and parse the file according to the extracted HIDX schema Handling EBCDIC in Python.
Once the file is correctly parsed and encoded into CSV format, it should be split into smaller chunks, ideally around 10–20 MB each, and uploaded to Azure Blob Storage Azure Blob Storage. Afterward, a Logic App can be configured to trigger whenever a new blob is uploaded. Each small CSV chunk can then be parsed and the data either saved, transformed, or further processed as necessary Handle Large Messages in Logic Apps.
Optionally, instead of performing external preprocessing, an Azure Function App can be created to automate the parsing step in a serverless and scalable manner. The Azure Function would be triggered by the blob upload event, decode and parse the EBCDIC content into CSV format, and save the processed results back to Blob Storage - Azure Function Blob Trigger. This architecture ensures better scalability, automation, and cost-efficiency.
I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.
Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.