Modernize mainframe and midrange data
Apache®, Spark, and the flame logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.
This article describes an end-to-end modernization plan for mainframe and midrange data sources. Modernization helps improve scalability and performance for your mission-critical workloads.
Architecture
Download a Visio file of this architecture.
Dataflow
The following dataflow corresponds to the previous diagram:
Mainframe and midrange systems store data in the following data sources.
File systems:
- Virtual Storage Access Method (VSAM)
- Flat files
- Linear Tape File System
Relational databases:
- Db2 for z/OS
- Db2 for IBM i
- Db2 for Linux UNIX and Windows
Nonrelational databases:
- Information Management System (IMS)
- Adabas
- Integrated Database Management System (IDMS)
The object conversion process extracts object definitions from source objects. The definitions are then converted into corresponding objects in the target data store.
SQL Server Migration Assistant for Db2 migrates schemas and data from IBM Db2 databases to Azure databases.
Managed Data Provider for Host Files converts objects by:
- Parsing common business-oriented language (COBOL) and Report Program Generator record layouts, or copybooks.
- Mapping the copybooks to C# objects that .NET applications use.
The Db2toAzurePostgreSQL tool migrates database objects from Db2 to Azure Database for PostgreSQL.
Partner tools perform automated object conversion on nonrelational databases, file systems, and other data stores.
Data is ingested and transformed. Mainframe and midrange systems store their file system data in EBCDIC-encoded format in file formats like:
COBOL, Programming Language One, and assembly language copybooks define the data structure of these files.
a. File Transfer Protocol (FTP) transfers mainframe and midrange file system datasets and their corresponding copybooks to Azure. These datasets have single layouts and unpacked fields in binary format.
b. Data conversion is accomplished by developing custom programs by using the host file component of Host Integration Server or by using the built-in connector for IBM host files in Azure Logic Apps.
The Spark Notebook converter is developed by using open-source Spark frameworks. It's compatible with Spark environments such as Microsoft Fabric, Azure Synapse Analytics, and Azure Databricks.
c. Relational database data is migrated.
IBM mainframe and midrange systems store data in relational databases like:
The following services migrate the database data:
- Azure Data Factory uses a Db2 connector to extract and integrate data from the databases.
- SQL Server Integration Services handles various data extract, transform, and load tasks.
- Fabric Data Factory uses the IBM Db2 connector to migrate Db2 data.
d. Nonrelational database data is migrated.
IBM mainframe and midrange systems store data in nonrelational databases like:
- IDMS, a network model database management system (DBMS).
- IMS, a hierarchical model DBMS.
- Adabas.
- Datacom.
Partner products integrate data from these databases.
Azure tools like Azure Data Factory and AzCopy load data into Azure databases and Azure data storage. You can also use partner solutions and custom loading solutions to load data.
Azure provides various database services, including fully managed relational database services like Azure SQL Database and NoSQL options like Azure Cosmos DB. These services are designed for scalability, flexibility, and global distribution.
Azure also provides a range of storage solutions, including Azure Blob Storage for unstructured data and Azure Files for fully managed file shares.
Azure services use the modernized data tier for computing, analytics, storage, and networking.
Client applications also use the modernized data tier.
Components
This architecture uses the following components.
Data storage
This architecture describes how to migrate data to scalable, more secure cloud storage and managed databases for flexible, intelligent data management in Azure.
SQL Database is part of the Azure SQL family. It's designed for the cloud and provides all the benefits of a fully managed and evergreen platform as a service (PaaS). SQL Database also provides AI-powered automated features that optimize performance and durability. Serverless compute and Hyperscale storage options automatically scale resources on demand.
Azure Database for PostgreSQL is a fully managed relational database service based on the community edition of the open-source PostgreSQL database engine.
Azure Cosmos DB is a globally distributed multiple-model NoSQL database.
Azure Database for MySQL is a fully managed relational database service based on the community edition of the open-source MySQL database engine.
SQL Managed Instance is an intelligent, scalable cloud database service that provides all the benefits of a fully managed and evergreen PaaS. SQL Managed Instance has almost complete compatibility with the latest SQL Server Enterprise edition database engine. It also provides a native virtual network implementation that addresses common security concerns.
Azure Data Lake Storage is a storage repository that holds large amounts of data in its native, raw format. Data lake stores are optimized for scaling to terabytes and petabytes of data. The data typically comes from multiple heterogeneous sources. It can be structured, semi-structured, or unstructured.
SQL database in Microsoft Fabric is a developer-friendly transactional database that's based on SQL Database. Use it to easily create your operational database in Fabric. A SQL database in Fabric uses the same SQL database engine as SQL Database.
Microsoft Fabric Lakehouse is a data architecture platform for storing, managing, and analyzing structured and unstructured data in a single location.
Compute
Azure Data Factory integrates data across different network environments by using an integration runtime (IR), which is a compute infrastructure. Azure Data Factory copies data between cloud data stores and data stores in on-premises networks by using self-hosted IRs.
The on-premises data gateway is a locally installed Windows client application that acts as a bridge between your local on-premises data sources and services in the Microsoft Cloud.
Azure Virtual Machines provides on-demand, scalable computing resources. An Azure virtual machine (VM) provides the flexibility of virtualization but eliminates the maintenance demands of physical hardware. Azure VMs provide a choice of operating systems, including Windows and Linux.
Data integrators
This architecture outlines various Azure-native migration tools that you use depending on the mainframe source data and the target database.
Azure Data Factory is a hybrid data integration service. In this solution, Azure Data Factory migrates data from Db2 sources to Azure database targets by using native connectors.
AzCopy is a command-line utility that moves blobs or files into and out of storage accounts.
SQL Server Integration Services is a platform for creating enterprise-level data integration and transformation solutions. You can use it to solve complex business problems by:
- Copying or downloading files.
- Loading data warehouses.
- Cleansing and mining data.
- Managing SQL Server objects and data.
Host Integration Server technologies and tools can integrate existing IBM host systems, programs, messages, and data with Azure applications. The host file client component provides flexibility for data that was converted from EBCDIC to ASCII. For example, you can generate data in JSON or XML format from the data that was converted.
Azure Synapse Analytics combines data integration, enterprise data warehousing, and big data analytics. This architecture uses the Azure Synapse Analytics conversion solution. It's based on Apache Spark and is a good candidate for large mainframe-dataset workload conversion. It supports a wide range of mainframe data structures and targets and requires minimal coding effort.
Microsoft Fabric is an enterprise-ready, end-to-end analytics platform. It unifies data movement, data processing, ingestion, transformation, real-time event routing, and report building. It supports these capabilities by using the following integrated services:
- Fabric Data Engineer
- Fabric Data Factory
- Fabric Data Science
- Fabric Real-Time Intelligence
- Fabric Data Warehouse
- Fabric Databases
Other tools
SQL Server Migration Assistant for Db2 automates migration from Db2 to Microsoft database services. When this tool runs on a VM, it converts Db2 database objects into SQL Server database objects and creates those objects in SQL Server.
Data Provider for Host Files is a component of Host Integration Server that uses offline, SNA, or TCP/IP connections.
- With offline connections, Data Provider reads and writes records in a local binary file.
- With SNA and TCP/IP connections, Data Provider reads and writes records stored in remote z/OS (IBM Z series mainframe) datasets or remote i5/OS (IBM AS/400 and iSeries systems) physical files. Only i5/OS systems use TCP/IP.
Azure services provide environments, tools, and processes for developing and scaling new applications in the public cloud.
Scenario details
Modern data storage solutions like the Azure data platform provide better scalability and performance than mainframe and midrange systems. By modernizing your systems, you can take advantage of these benefits. However, updating technology, infrastructure, and practices is complex. The process involves an exhaustive investigation of business and engineering activities. Data management is one consideration when you modernize your systems. You also need to look at data visualization and integration.
Successful modernizations use a data-first strategy. When you use this approach, you focus on the data rather than the new system. Data management is no longer just an item on the modernization checklist. Instead, the data is the centerpiece. Coordinated, quality-oriented data solutions replace fragmented, poorly governed ones.
This solution uses Azure data platform components in a data-first approach. Specifically, the solution involves:
Object conversion. Convert object definitions from the source data store to corresponding objects in the target data store.
Data ingestion. Connect to the source data store and extract data.
Data transformation. Transform extracted data into appropriate target data store structures.
Data storage. Load data from the source data store to the target data store, both initially and continually.
Potential use cases
Organizations that use mainframe and midrange systems can benefit from this solution, especially when they want to:
Modernize mission-critical workloads.
Acquire business intelligence to improve operations and gain a competitive advantage.
Remove the high costs and rigidity that are associated with mainframe and midrange data stores.
Considerations
These considerations implement the pillars of the Azure Well-Architected Framework, which is a set of guiding tenets that you can use to improve the quality of a workload. For more information, see Well-Architected Framework.
Security
Security provides assurances against deliberate attacks and the misuse of your valuable data and systems. For more information, see Design review checklist for Security.
Be aware of the differences between on-premises client identities and client identities in Azure. You need to compensate for any differences.
Use managed identities for component-to-component data flows.
When you use Data Provider for Host Files to convert data, follow the recommendations in Data Providers for Host Files security and protection.
Cost Optimization
Cost Optimization focuses on ways to reduce unnecessary expenses and improve operational efficiencies. For more information, see Design review checklist for Cost Optimization.
SQL Server Migration Assistant is a free, supported tool that simplifies database migration from Db2 to SQL Server, SQL Database, and SQL Managed Instance. SQL Server Migration Assistant automates all aspects of migration, including migration assessment analysis, schema and SQL statement conversion, and data migration.
The Azure Synapse Analytics Spark-based solution is built from open-source libraries. It eliminates the financial burden of licensing conversion tools.
Use the Azure pricing calculator to estimate the cost of implementing this solution.
Performance Efficiency
Performance Efficiency refers to your workload's ability to scale to meet user demands efficiently. For more information, see Design review checklist for Performance Efficiency.
The key pillars of Performance Efficiency are performance management, capacity planning, scalability, and choosing an appropriate performance pattern.
You can scale out the self-hosted IR by associating the logical instance with multiple on-premises machines in active-active mode.
Use SQL Database to dynamically scale your databases. The Serverless tier can automatically scale the compute resources. Elastic pools allow databases to share resources in a pool and can only be scaled manually.
When you use the Data Provider for Host Files client to convert data, turn on connection pooling to reduce the connection startup time. When you use Azure Data Factory to extract data, tune the performance of the copy activity.
Contributors
Microsoft maintains this article. The following contributors wrote this article.
Principal author:
- Ashish Khandelwal | Principal Engineering Architect Manager
Other contributors:
- Nithish Aruldoss | Engineering Architect
To see nonpublic LinkedIn profiles, sign in to LinkedIn.
Next steps
Review the Azure Database Migration Guides. Contact Azure Data Engineering - Mainframe & Midrange Modernization for more information.
See the following articles:
- IBM workloads on Azure
- Mainframe rehosting on Azure VMs
- Mainframe workloads supported on Azure
- Move mainframe compute to Azure