Capability Overview

This feature is available only as an experimental feature in YMatrix version 6.0.0.

To meet the requirements of diverse customers and business scenarios, YMatrix 6.X introduces Disaster Recovery (DR) capabilities to address high availability needs for business data.


Overview

A DR cluster, or Disaster Recovery cluster, is designed to establish a disaster-tolerant environment that ensures business continuity in the event of a disaster.

  • A DR cluster is typically a secondary environment independent of the primary production environment. It stores backup data, runs standby systems, and provides disaster recovery services.

  • The primary objective of a DR cluster is to maintain real-time, full-data and configuration replication from the primary cluster, enabling fast and reliable business recovery in the event of a disaster or system failure. When the primary system fails, the DR cluster takes over operations, restoring business functionality in minimal time while minimizing data loss and downtime.

  • Key capabilities of a DR cluster include:

Feature Description
Data Backup and Replication Data from the primary cluster is backed up to the DR cluster either periodically or in real time, ensuring data safety and integrity. Backup methods include data replication, offline backups, snapshots, incremental backups, and redundant arrays for data transfer and storage.
Disaster Recovery The DR cluster must have a detailed disaster recovery plan, including emergency response procedures, data restoration processes, system startup sequences, and network reconnection steps. This ensures recovery operations are fast and well-organized when a disaster occurs.
Redundancy and High Availability The DR cluster typically employs redundant and highly available designs, including multiple backup servers, storage devices, and network connections. This enables seamless failover from the primary cluster to the backup system, ensuring reliable service continuity.
Monitoring and Testing The DR cluster requires regular monitoring and testing to verify backup data integrity, availability of backup systems, and feasibility of recovery procedures. This helps identify and resolve potential issues early, improving the reliability and availability of the DR cluster.
  • Key limitations of DR clusters:
Limitation Description
Comprehensive Project Requirements Beyond YMatrix software functionality, DR implementation involves infrastructure, security standards, network equipment, initial and operational costs, and DR objectives (RTO, RPO). Standardized technical specifications are required, along with strong coordination among all parties involved.

Architecture

In YMatrix, DR clusters form a complete local or remote disaster recovery framework (or workflow) through internal processes.

We build two independent secondary environments—local standby center B and remote standby center C—for the primary production center A. Each secondary environment maintains a complete set of redundant data.

  • Due to its proximity, local standby center B can be directly connected to production center A via a dedicated enterprise line provided by a carrier (this data transfer method is illustrative only; in practice, direct connection, temporary media, or object storage can be selected as needed). This ensures fast data backup. However, direct connection has a limitation: if center B fails, redundant data accumulates in cluster A, potentially degrading performance or even blocking transactions on the source cluster, rendering it inoperable.

  • For remote standby center C, which is geographically distant, using temporary media for data transfer may be a better option. The temporary media is typically deployed at center A, center C, or a midpoint between them, and uses systems such as FTP file storage or Kafka message streaming to buffer data. This approach ensures that if center C fails, center A is not impacted by blocked data transmission, avoiding performance degradation or broader disruptions.

  • All internal processes associated with each DR cluster are themselves highly available.

  • Both DR clusters (B and C) are read-only and do not support write operations. If cluster A becomes unavailable, manual intervention is required to promote either B or C to become the new primary cluster.