Quick Start
Cluster Deployment
Data Model
Data Writing
Data Query
SQL Reference
Maintenance and Monitoring
Tool Guide
Troubleshooting
FAQ
The Overview panel displays the overall operational status of the cluster, including:
| Parameter | Description | Recommended Alert Threshold |
|---|---|---|
| Cluster Status | Node status of the cluster, including: 0: Normal 1: No Standby 2: No Mirror 10: Imbalanced Distribution (after a node recovery, primary-mirror roles have not been rebalanced) 11: Primary-Mirror Out of Sync (some mirror nodes are out of sync with their primaries) 12: Master Only (only the Master node is running, typically used during diagnostics) 20: Segment Down (unavailable Segment nodes exist, cluster is unusable) |
Segment down is a critical event; an alert is required |
| Uptime | MatrixDB uptime since last start and the operating system uptime of the master host | |
| Version | MatrixDB version | |
| Connection Status | Database connection statistics, including: total connections, blocked queries, idle connections, idle in transaction | |
| Slow Queries | Number of queries currently running longer than 1 day | Alert if greater than 0 |
| Transactions | Statistics on transaction commits and rollbacks | Set alert threshold for rollbacks |
| Disk Usage | Disk usage and available space on master and segment nodes | Recommended to set alerts directly in node_exporter |
| Node Status | Status of each node, including: 0: UP (normal) 10: Switched (role switch occurred; rebalancing needed) 11: Resync (synchronizing primary and mirror) 20: Down (node down) |
Alert on values 11 and 20 |
The Database Performance panel shows key database performance metrics:
| Parameter | Description | Recommended Alert Threshold |
|---|---|---|
| Page Hit Ratio | Ratio of buffer hits during data page reads | |
| Temp Size | Temporary file usage | |
| Deadlocks | Number of deadlocks occurred | Alert if greater than 0 |
| Checksum Failures | Number of data page checksum failures | Alert if greater than 0 |
| Sessions Per Database | Number of connections per database | |
| Page Cache Hit | blks_hit: number of buffer hits during data reads blks_read: number of disk reads due to cache miss |
|
| Rows Read | Number of tuples read and returned by queries | |
| Checkpoints | Number of checkpoint events, including: checkpoints_req: manually triggered checkpoints_timed: time-based triggers |
|
| Replication Latency | Primary-mirror replication lag in milliseconds write_lag: delay in writing logs to mirror's file cache flush_lag: delay in flushing logs to mirror's disk replay_lag: delay in replaying logs on mirror Top Segment: sum of write_lag + flush_lag + replay_lag for the node with highest total |
Set alert threshold based on requirements |
| Rows Insert/Update/Delete | Row operation statistics Rows Insert: number of inserted rows Rows Update: number of updated rows Rows Delete: number of deleted rows |
|
| Checkpoint Buffers | Statistics on dirty page writes buffers_checkpoint: number of dirty pages written during checkpoints buffers_clean: number written by bgwriter buffers_backend: number written by backend processes |
|
| Top 10 Replication Lag Size | Replication lag for top 10 nodes, calculated as difference between sent LSN and replayed LSN | Set alert threshold based on requirements |
The Storage panel displays storage-related statistics:
| Parameter | Description | Recommended Alert Threshold |
|---|---|---|
| Top 10 Databases | Largest 10 databases by size | |
| Top 10 Users | Top 10 users by data volume generated | |
| Top 10 Aging Database | Top 10 databases by age (transaction IDs below this value are replaced with FrozenXID) | |
| Top 10 Big Tables | Largest 10 tables by size | |
| Top 10 Big Partitions | Largest 10 partitioned tables by size | |
| Top 10 Growth Today | Top 10 tables with the highest size increase today | |
| Top 10 Growth Last 7 Days | Top 10 tables with the highest size increase over the past 7 days |