Self-Service Inspection

This document describes the self-service inspection feature in the YMatrix graphical user interface.

If monitoring and alerting functions are considered the "emergency care" for a cluster, then the inspection function serves as the cluster's "routine health checkup." Regular inspections help you better understand the overall operation of the cluster, identify potential issues early, determine optimal timing for maintenance tasks such as VACUUM, prevent failures, and reduce operational overhead.

The self-service inspection feature in the YMatrix GUI supports:

  1. Creating custom inspection plans by selecting desired inspection items
  2. Generating detailed inspection reports, including the number of anomalies, critical issues, analysis of results, follow-up recommendations, and inspection logs

1 Prerequisites

First, log in to the graphical interface. Enter the Master node's IP address and port number in your browser:

http://<IP>:8240

2 Self-Service Inspection

Self-service inspection page.

Create a custom inspection plan.

Full list of inspection items:

Category Inspection Item Level
Cluster Basic Information Check reachability of all instances High
Check cluster status High
Check users with password expiration within 30 days Medium
License validity check Medium
Check connection count health Low
Check cluster version Low
Database Runtime Status Check MARS2/CV health High
Check if data exists in Default partitions High
Top 10 database ages High
Check largest 20 business tables Medium
View top 20 largest system tables Medium
Check top 20 longest-running SQL queries Medium
Identify tables with data skew exceeding 10,000 rows Medium
Check index consistency between Master and Segments Medium
Check for core files on each instance Medium
Check auto-partitioning policy execution status Medium
Check HEAP/MARS2 tables with bloat rate over 20% Medium
View top 10 largest schemas by size Low
Check 20 least-used indexes Low
Check 20 indexes with lowest index cache hit ratio Low
Check system tables with excessively large indexes Low
Check 20 largest indexes Low
Check number of subpartitions per partitioned table Low
Check for duplicate indexes Low
View size of each database Low
Check Plpython parameters Low
Check database log size across all instances Low
Check database parameters Low
Server Runtime Status Check process status over last 7 days High
Check network bandwidth usage over last 7 days High
Check disk usage High
Check disk I/O usage over last 7 days High
Check CPU usage over last 7 days High
Check Commit memory usage over last 7 days High
Check system load over last 7 days Low
Check I/O bandwidth usage over last 7 days Low
Check operating system parameters Low
mxgate Runtime Status Check mxgate logs for error messages Low
Check number of database connections used by mxgate Low

Inspection item severity levels are defined as follows:

Level Description
High Anomalies in these items may impact cluster stability
Medium Anomalies may affect certain business operations
Low Anomalies do not directly impact current operations but could worsen over time

Note!
For detailed descriptions of each inspection item, refer to the inspection report.