Fault Recovery

YMatrix is a highly available distributed database system that supports fault recovery when nodes fail. High availability relies on redundant deployment: the Master node must have a Standby node as backup; for data nodes (Segments), each Primary node must have a corresponding Mirror node.

The following diagram illustrates a typical high-availability deployment:

HA

When a node fails in the cluster, you can check the node status via the graphical interface (MatrixUI). In this example, the Master is mdw, the Standby is smdw, and the Segments are sdw1 and sdw2, each with its own Mirror.

Deployment principles:

  1. Master and Standby are deployed on separate independent hosts.
  2. Each Segment's Primary and Mirror are deployed on different hosts.
  3. Segment Primary nodes are distributed across hosts.

This deployment avoids single-host failure causing system unavailability and balances cluster load.

Below describes the automatic operation and maintenance mechanism of YMatrix clusters and solutions for various node failure scenarios.

1 Mechanism

YMatrix supports Cluster Service for automated operations. This service includes two key features: automatic failover and automatic failback (powered by the mxrecover tool). Together, they enable complete node recovery workflows.

1.1 Automatic Failover

Automatic failover refers to the mechanism where, upon detecting a node failure through etcd cluster health checks, the system automatically switches roles between primary and standby nodes. The etcd cluster is the core component of YMatrix Cluster Service, managing the state of all nodes. When any node fails, the database system automatically performs failover without manual intervention.

1.2 Automatic Failback

After failover completes, only the new Primary/Master remains active—there is no healthy Mirror/Standby. If another failure occurs, recovery will not be possible. Therefore, use the mxrecover tool to create a new healthy Mirror/Standby for the promoted Primary/Master.

The mxrecover tool provides the following functions:

  • Reactivates failed Mirror/Standby nodes
  • Creates new Mirrors/Standbys for nodes promoted to Primary/Master
  • Redistributes node roles to match the original configuration

Note!
For detailed usage of mxrecover, refer to mxrecover.

2 Node Failure Scenarios

2.1 Mirror / Standby Node Failure

When the system detects a Mirror/Standby node failure, the node status in the graphical interface changes to down.

Note!
A Mirror failure does not affect cluster availability, so the system does not automatically reactivate it.
Use the mxrecover tool to reactivate the Mirror—see below.

If the downtime was short and the amount of data on the failed node is small, consider incremental recovery first. Running mxrecover without parameters or with only -c triggers incremental recovery mode. If incremental recovery fails, perform full data replication to reactivate the node using the mxrecover -F command.

2.2 Primary / Master Node Failure

When a Primary node fails, the system automatically promotes its Mirror to Primary.

2.2.1 Reactivating Failed Mirror / Standby

After the system promotes a Mirror/Standby, run mxrecover to generate a new Mirror/Standby for the current Primary/Master and synchronize data incrementally or fully to restore the failed node. Simply running mxrecover reactivates the failed Mirror/Standby with incremental recovery. As mentioned above, use mxrecover -F to force full recovery if needed.

[mxadmin@mdw ~]$ mxrecover

2.2.2 Node Role Redistribution

Although mxrecover creates a new Mirror for the promoted Primary, it may result in uneven distribution—one host now runs two Primary nodes (e.g., both on sdw2). This causes unbalanced resource usage, placing heavier load on sdw2.

To redistribute roles, run:

[mxadmin@mdw ~]$ mxrecover -r

After executing mxrecover -r, verify the updated configuration in the cluster management page of the graphical interface.

3 Impact of Master Failover on Components and Recommended Actions

Master failover occurs under two conditions:

  • Cluster failure: The graphical interface shows the cluster as failed—completely unavailable, unable to read or write.
  • Cluster anomaly: The interface shows an anomaly—some instances are unhealthy, but the cluster remains functional for read/write operations.

Below details the impact of Master failover on various components and recommended actions for each scenario.

3.1 Graphical Interface (MatrixUI)

  1. Cluster Anomaly

This indicates that after Master failure, failover has occurred and Standby now manages the cluster.

In this case, access the graphical interface on the Standby node. The default address is http://<standbyIP>:8240.

You can log in using either the mxadmin database password or the /etc/matrixdb5/auth.conf superuser password on the Standby node.

All functions remain fully usable after login. Red on the left indicates a failed node; yellow indicates a completed failover.

  1. Cluster Failure

This means the entire cluster is unavailable. You may still access the MatrixUI via the Master node to view status, but database query functionality will be limited.

Note!
If your cluster did not initially configure a Standby but later added one using the mxinitstandby tool, the behavior matches that of a pre-configured Standby.

Note!
Always configure a Standby in production environments.

3.2 MatrixGate

  1. Cluster Anomaly

Failover has occurred—the Standby now manages the cluster.

Two scenarios apply:

  • If Standby was configured before MatrixGate started: MatrixGate automatically detects the Standby connection information at startup—no manual action required.
  • If Standby was added after MatrixGate started: Since MatrixGate only checks for Standby at startup, newly added Standbys won't trigger automatic switching. We recommend configuring Standby before deploying MatrixGate, or restarting MatrixGate after adding a Standby.
  1. Cluster Failure

MatrixGate resides on the Master host, which has failed and rendered the cluster unusable.

In this case, the MatrixGate process is considered dead or network-isolated along with the host.

3.3 Monitoring (Deployed via SELECT mxmgr_init_local())

  1. Cluster Anomaly

MatrixGate automatically redirects monitoring data insertion to the Standby. Monitoring continues uninterrupted—no manual intervention needed.

To view Grafana dashboards, manually update the data source to point to the Standby’s address.

  1. Cluster Failure

In this case, the MatrixGate instance used for monitoring has failed, so no new monitoring data will be generated.