Quick onboard
Deployment
Data Modeling
Connection
Migration
Query
Operations and Maintenance
Common Maintenance
Partition
Backup and Restore
Expansion
Monitoring
Performance Tuning
Troubleshooting
Reference Guide
Tool guide
Data Type
Storage Engine
Executor
Stream
DR (Disaster Recovery)
Configuration
Index
Extension
SQL Reference
This feature is available only as an experimental feature in YMatrix version 6.0.0.
To meet the requirements of diverse customers and business scenarios, YMatrix 6.X introduces Disaster Recovery (DR) capabilities to address high availability needs for business data.
A DR cluster, or Disaster Recovery cluster, is designed to establish a disaster-tolerant environment that ensures business continuity in the event of a disaster.
A DR cluster is typically a secondary environment independent of the primary production environment. It stores backup data, runs standby systems, and provides disaster recovery services.
The primary objective of a DR cluster is to maintain real-time, full-data and configuration replication from the primary cluster, enabling fast and reliable business recovery in the event of a disaster or system failure. When the primary system fails, the DR cluster takes over operations, restoring business functionality in minimal time while minimizing data loss and downtime.
Key capabilities of a DR cluster include:
| Feature | Description |
|---|---|
| Data Backup and Replication | Data from the primary cluster is backed up to the DR cluster either periodically or in real time, ensuring data safety and integrity. Backup methods include data replication, offline backups, snapshots, incremental backups, and redundant arrays for data transfer and storage. |
| Disaster Recovery | The DR cluster must have a detailed disaster recovery plan, including emergency response procedures, data restoration processes, system startup sequences, and network reconnection steps. This ensures recovery operations are fast and well-organized when a disaster occurs. |
| Redundancy and High Availability | The DR cluster typically employs redundant and highly available designs, including multiple backup servers, storage devices, and network connections. This enables seamless failover from the primary cluster to the backup system, ensuring reliable service continuity. |
| Monitoring and Testing | The DR cluster requires regular monitoring and testing to verify backup data integrity, availability of backup systems, and feasibility of recovery procedures. This helps identify and resolve potential issues early, improving the reliability and availability of the DR cluster. |
| Limitation | Description |
|---|---|
| Comprehensive Project Requirements | Beyond YMatrix software functionality, DR implementation involves infrastructure, security standards, network equipment, initial and operational costs, and DR objectives (RTO, RPO). Standardized technical specifications are required, along with strong coordination among all parties involved. |
In YMatrix, DR clusters form a complete local or remote disaster recovery framework (or workflow) through internal processes.
We build two independent secondary environments—local standby center B and remote standby center C—for the primary production center A. Each secondary environment maintains a complete set of redundant data.
Due to its proximity, local standby center B can be directly connected to production center A via a dedicated enterprise line provided by a carrier (this data transfer method is illustrative only; in practice, direct connection, temporary media, or object storage can be selected as needed). This ensures fast data backup. However, direct connection has a limitation: if center B fails, redundant data accumulates in cluster A, potentially degrading performance or even blocking transactions on the source cluster, rendering it inoperable.
For remote standby center C, which is geographically distant, using temporary media for data transfer may be a better option. The temporary media is typically deployed at center A, center C, or a midpoint between them, and uses systems such as FTP file storage or Kafka message streaming to buffer data. This approach ensures that if center C fails, center A is not impacted by blocked data transmission, avoiding performance degradation or broader disruptions.
All internal processes associated with each DR cluster are themselves highly available.
Both DR clusters (B and C) are read-only and do not support write operations. If cluster A becomes unavailable, manual intervention is required to promote either B or C to become the new primary cluster.