Hello Youvashri,
Welcome to the Microsoft Q&A and thank you for posting your questions here.
I understand that your Azure MySQL Flexible Server v8.0 switching to failover instance frequently stating health check failure.
Regarding your explanations, there are couple of things you can do to resolve the issue:
- Check InnoDB Crash Recovery Behavior, MySQL 8.0 enhances crash recovery but may require tuning. Verify if
innodb_force_recovery
is set incorrectly - https://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html Because, MySQL 8.0 uses atomic DDL, which can cause longer recovery times if interrupted. Check forDDL
transaction errors in logs https://dev.mysql.com/doc/refman/8.0/en/atomic-ddl.html - Use
SHOW REPLICA STATUS
to checkSeconds_Behind_Source
to monitor replication logs, Azure HA uses replication; lag >5s can trigger failovers - https://learn.microsoft.com/en-us/azure/mysql/flexible-server/concepts-high-availability#monitoring Then, use Azure Metrics to monitorIOPS
andstorage latency
. High latency can cause health check timeouts - https://learn.microsoft.com/en-us/azure/mysql/flexible-server/concepts-monitoring - Let's investigate unclean shutdown set
innodb_fast_shutdown = 0
to force a full purge/rollback on shutdown. This can prevent crash recovery loops - https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_fast_shutdown also query Azure’ssys.dm_os_ring_buffers
forOOM
events or forced instance terminations - https://learn.microsoft.com/en-us/azure/azure-sql/database/monitoring-tuning also, check for OOM (Out of Memory) or VM kill events:
For Azure Flexible Server Monitoring - https://learn.microsoft.com/en-us/azure/mysql/flexible-server/concepts-monitoring and for sys.dm_os_ring_buffers - https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-os-ring-buffers-transact-sql-- All the code snippet should be one after the other SELECT * FROM information_schema.innodb_trx; SHOW ENGINE INNODB STATUS; -- Azure sys tables can indicate host-level events: SELECT * FROM sys.dm_os_ring_buffers WHERE ring_buffer_type = 'RING_BUFFER_EXCEPTION';
- Manually trigger a failover via Azure Portal and measure recovery time. If it matches the observed 2–3 minutes, the issue is Azure-side HA orchestration - https://learn.microsoft.com/en-us/azure/mysql/flexible-server/how-to-manage-high-availability-portal Then, make sure the MySQL 8.0 parameter group aligns with Azure’s HA requirements (e.g.,
server_id
,read_only
settings) - https://learn.microsoft.com/en-us/azure/mysql/flexible-server/concepts-server-parameters. - You can escalate to Microsoft with diagnostic logs via Priority Customer Support (PCS)
I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.
Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.