Edit

Share via


IoT solution scalability, high availability, and disaster recovery

This overview introduces the key concepts around the options for scalability, high availability, and disaster recovery in an Azure IoT solution. Each section includes links to content that provides further detail and guidance.

The following diagram shows a high-level view of the components in a typical edge-based IoT solution. This article focuses on the areas relevant to scalability, high availability, and disaster recovery in an edge-based IoT solution:

Diagram that shows the high-level IoT edge-based solution architecture highlighting scalability, high availability, and disaster recovery.

Scalability

An IoT solution might need to support millions of connected assets and devices. You need to ensure that the components in your solution can scale to meet the demands.

Deploy Azure IoT Operations on a multi-node cluster to ensure that you can handle increased traffic or workload demands. When Azure IoT Operations runs on a multi-node cluster, it can process more data and take advantage of the scalability and high-availability capabilities of Kubernetes.

You can horizontally scale the MQTT broker of Azure IoT Operations by adding more frontend replicas and backend partitions. The frontend replicas are responsible for accepting MQTT connections from clients and forwarding them to the backend partitions. The backend partitions are responsible for storing and delivering messages to the clients. The frontend pods distribute message traffic across the backend pods. The backend redundancy factor determines the number of data copies to provide resiliency against node failures in the cluster. To learn more, see Configure broker settings for high availability, scaling, and memory usage.

Azure Device Registry is a backend service that enables the cloud and edge management of assets. Device Registry projects assets defined in your edge environment as Azure resources in the cloud. It provides a single unified registry so that all apps and services that interact with your assets can connect to a single source. Device Registry also manages the synchronization between assets in the cloud and assets as custom resources in Kubernetes on the edge, allowing you to scale your solution to millions of connected assets.

You can scale the data flow profile to adjust the number of instances that run the data flows. Increasing the instance count can improve the throughput of the data flows by creating multiple clients to process the data. When using data flows with cloud services that have rate limits per client, increasing the instance count can help you stay within the rate limits. Scaling can also improve the resiliency of the data flows by providing redundancy in case of failures. To learn more, see Scaling data flow profiles.

High availability and disaster recovery

IoT solutions are often business-critical. You need to ensure that your solution can continue to operate if a failure occurs. You also need to ensure that you can recover your solution following a disaster.

Azure IoT Operations features an MQTT broker that's enterprise grade and compliant with standards. The MQTT broker is scalable, highly available, and Kubernetes-native. It provides the messaging plane for IoT Operations, enables bidirectional edge/cloud communication, and powers event-driven applications at the edge. To ensure zero data loss and high availability during deployment upgrades, the MQTT broker implements rolling updates across the MQTT broker pods.

The state store is a distributed storage system, deployed as part of Azure IoT Operations. Using the state store, applications can get, set, and delete key-value pairs, without needing to install more services, such as Redis. The state store also provides versioning of the data, and also the primitives for building distributed locks, ideal for highly available applications. To learn more, see Persisting data in the state store.

On multi-node clusters with at least three nodes, you have the option of enabling fault tolerance for storage with Azure Container Storage enabled by Azure Arc when you deploy Azure IoT Operations.

Dapr is offered as part of MQTT broker, abstracting away details of MQTT session management, message QoS and acknowledgment, and built-in key-value stores, making it a practical choice for developing a highly available application.

The Azure IoT Operations SDKs (preview) are a suite of tools and libraries across multiple languages designed to aid the development of highly available applications for Azure IoT Operations.

For information on high availability across availability zones and regions for Azure Device Registry, see Reliability in Azure Device Registry.