YMatrix
Quick Start
Simulate Time Series Scenarios
Standard Cluster Deployment
Data Modeling
Connecting to The database
Data Writing
Data Migration
Data Query
Scene Application Examples
Federal Query
Maintenance and Monitoring
Global Maintenance
Partition Maintenance
Backup and Restore
Cluster Expansion
Enable Mirroring
Resource Management
Security
Monitoring
Performance Tuning
Troubleshooting
Reference Guide
Tool Guide
Data Type
Storage Engine
Execution Engine
Stream Processing
DR (Disaster Recovery)
Configuration Parameters
Index
Extension
SQL Reference
FAQ
This document describes the frequently asked questions about the high availability of YMatrix 5.X.
The storage of etcd can be understood as a fully replicated table, so the number of nodes does not have to be too many. At the practical level, the recommended configuration is 1, 3, 5, and 7 nodes. The installation and deployment process will automatically select and deploy several etcd instances according to the number of hosts.
Therefore, if the number of hosts in the cluster is even, or more than 7 hosts, then some machines do not have etcd nodes. You can use the ps
command to check whether there are etcd processes on the current host.
Usually, etcd nodes are placed on the hosts of Master and Standby.
In etcd, setting the number of nodes to odd numbers is to ensure the stability and consistency of the election process. In the Raft protocol, the election of the leader is based on most principles. When the number of nodes is odd, it is easier for the election process to reach consensus. In this case, more than half of the nodes need to agree and vote for a candidate to become the leader. Suppose there are 5 nodes, at least 3 nodes need to agree to elect the same leader; if there are 7 nodes, at least 4 nodes need to agree.
This configuration not only ensures that the leader's election results are unique, but also ensures that the election time is as short as possible. In addition, odd nodes can also provide better fault tolerance. In the event of a failure or network abnormality, most nodes can still maintain the election process to ensure system availability and consistency. When the number of nodes is even, a draw may occur, resulting in the election being unable to be completed or the result is uncertain.
First, you need to deploy monitoring for etcd. Currently, it supports monitoring (Prometheus + Grafana) for deployment of etcd in Prometheus 5.0.
5.X's etcd will automatically clean up regularly, so the data directory and memory size can be maintained in a relatively fixed range.
However, if there is a node failure and subsequent recovery operation, the data of etcd will slightly bloat in a short period of time. As long as the etcd data directory does not continue to swell more than 1.5GB, it is normal. It is recommended to monitor and regularly check it through monitoring.
There has been no change in user experience, and the pages and operation methods of installation and deployment are exactly the same as before.
There is a conceptual misunderstanding here, and a Segment has two dimensions:
In the current version, the Master Auto-failover function refers to the automatic switching of node roles. That is, after Master is offline, Standby can automatically switch to Master. This action is called Promote or Failover.
Once the state of a Segment changes from Up
to Down
, it must be changed back to Up
through a manual node recovery operation (mxrecover
).
postmaster
process crashes, but the host and network are both normal, Master can quickly switch to Standby.Yes.
The survival of the postgres process requires the survival of the lease of the etcd cluster. When an exception occurs in the etcd service itself (more than half of the etcd nodes are down or lost contact), the postgres process cannot maintain survival, and the downtime is an inevitable result. Therefore, please strictly deploy etcd monitoring and pay close attention to its health status. Monitoring matters include but are not limited to: disk remaining space, disk I/O, network connectivity, the survival of the host process supervisor, etc.