Redis Enterprise cluster connection failure, how to fix this issue? or how to restart it?

xia chen 0 Reputation points
2025-03-24T18:47:31.15+00:00

web app can't connect to redis for the past 2 hours, it's production environment, and this redis has been deployed for over 1 month, hot to restart the redis cluster?

Azure Cache for Redis
Azure Cache for Redis
An Azure service that provides access to a secure, dedicated Redis cache, managed by Microsoft.
291 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Prasad Chaganti 415 Reputation points Microsoft External Staff
    2025-03-24T19:22:42.6133333+00:00

    Hi xia chen,

    This was an intermittent issue. After some time, the issue got resolved.
    Here are some potential causes for this temporary issue:

    Temporary Authentication or Connectivity Issues: Sometimes, short-lived connectivity or cache server issues can cause Redis to reject valid credentials temporarily. This can happen due to server maintenance, network disruptions, or load balancing issues within the cluster.

    Redis Cluster Failover or Maintenance: Redis clusters automatically perform failover in case of issues with a primary node, which may briefly cause connection problems. During failover or a maintenance period, the cluster may appear unreachable or return errors until it stabilizes.

    Configuration Update Delay: If the authentication credentials (username/password) were recently changed or reconfigured, it may take a short time for the update to propagate across all nodes in a Redis Enterprise cluster, causing temporary errors.

    High Availability Service Commitment: Redis Enterprise typically guarantees a very high uptime (like 99.999%). This equates to roughly 5.26 minutes of potential downtime per year. So, while it’s incredibly reliable, brief issues can still occur, especially if automatic failover or load balancing is happening.

    Regarding the 99.999% availability, this metric typically refers to the overall uptime of the service, not necessarily the uninterrupted access to specific features like authentication. While such issues are rare, they can occasionally occur without significantly impacting the overall availability metric.

    To ensure this doesn’t affect your production services, consider implementing the following:

    • Monitoring and Alerts: Set up monitoring and alerts to quickly detect and respond to such issues.
    • Redundancy: Use multiple Redis instances or clusters to provide redundancy.
    • Failover Mechanisms: Implement failover mechanisms to switch to a backup instance if the primary one encounters issues.

    Hope this helps. Do let us know if you any further queries.

    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.