Health Monitoring

This document introduces the cluster health monitoring function of the graphical interface.

When supporting daily business, YMatrix database will run a large number of SQL statements, which may cause hardware problems such as network failures, lock waiting caused by transaction concurrency, etc. If not processed in time, it will lead to slow response speed or even direct errors, which will affect the efficiency of business operation. In order to better deal with the above problems, the health monitoring function of the graphical interface can help you discover abnormal performance of the database cluster faster.

Health monitoring will regularly check the corresponding database system tables based on different detection items to check whether the operating status of the query meets business expectations. Once it is found that the expected status does not meet, we will immediately send a notification. The notification can be viewed in the graphical interface. If you think it is inconvenient to always view the page, you can also choose to receive alarm information more timely by email notification.

1 Preparation for use

Enter the IP of the machine where MatrixGate is located (the IP of Master by default) and port number in the browser to log in to the graphical interface.

http://<IP>:8240

2 Health Monitoring

After successfully logging in, enter the Health Monitoring page.

2.1 Email Configuration

![](https://img.ymatrix.cn/ymatrix_home/Health Monitoring 1_1673319382.png) You can choose whether to configure your email address according to your needs. If you complete the email address configuration, you can receive email notifications.

  1. Graphical interface domain name In order to make it easier and timely to view the detailed information of the alarm event, we will add a link to the graphic interface to the email. If the email recipient cannot access it through the default domain name, the content of this field must be modified.

  2. SMTP server address The SMTP server address consists of an IP address and a port number. Example: smtp.example.com:465.

Common third-party mailbox servers

  1. Alibaba Cloud Email Service Address Description Personal version: Start the SMTP service first, refer to [Document] (https://mailhelp.aliyun.com/freemail/detail.htm?knoId=6521875). SMTP service address and port number, refer to [Document] (https://mailhelp.aliyun.com/freemail/detail.htm?knoId=5869705).
    Enterprise version: The email administrator needs to confirm that the SMTP service is enabled, refer to [Document] (https://help.aliyun.com/document_detail/447503.html). SMTP service address and port number, refer to [Document] (https://help.aliyun.com/knowledge_detail/36576.html).
  2. Google Email Service Address Description: IMAP or POP service must be enabled first, refer to [Document] (https://support.google.com/mail/answer/7104828).
  3. NetEase Email Service Address Description: Personal version: Start the SMTP service first, refer to [Document] (https://help.mail.163.com/faqDetail.do?code=d7a5dc8471cd0c0e8b4b8f4f8e49998b374173cfe9171305fa1ce630d7f67ac2cda80145a1742516).
    Enterprise version: The SMTP service is enabled by default. If you need to verify the service opening status, refer to [Document] (https://qiye.163.com/help/ac1ca1.html). SMTP service address and port number, refer to [Document] (https://qiye.163.com/help/client-profile.html).
  4. QQ email service address description: Personal version: Start the SMTP service first, refer to [Document] (https://service.mail.qq.com/cgi-bin/help?subtype=1&&id=28&&no=166). SMTP service address and port number, refer to [Document] (https://service.mail.qq.com/cgi-bin/help?id=28&no=167&subtype=1).
    Enterprise version: To enable SMTP service operation steps, refer to [Document] (https://work.weixin.qq.com/help?person_id=0&doc_id=302&helpType=exmail). SMTP service address and port number, refer to [Document] (https://work.weixin.qq.com/help?person_id=0&doc_id=431&helpType=exmail).

Notes!
If the email service is built by the enterprise, you need to consult the email administrator or email service provider.

  1. Username An account used to authenticate on an SMTP server. This field is optional and only needs to be authenticated if the SMTP server requires a username. Example: [email protected].

  2. Password SMTP username password. This field is optional and only needs to be authenticated if the SMTP server requires a username and password.

  • Common third-party mailbox servers
  1. Alibaba Cloud Email: Use the email login password, that is, the password corresponding to the user name and email address.
  2. Google Email: Use the email login password, that is, the password corresponding to the user name and email address.
  3. NetEase Email: Personal version: You need to use an authorization code as your password, refer to [Document] (https://help.mail.163.com/faqDetail.do?code=d7a5dc8471cd0c0e8b4b8f4f8e49998b374173cfe9171305fa1ce630d7f67ac21b8ba4d48ed49ebc).
    Enterprise version: Use the email login password by default. If the administrator enables the client authorization code function, you need to consult the administrator on how to obtain the authorization code.
  4. QQ Email: Personal version: You need to use an authorization code as your password, refer to [Document] (https://service.mail.qq.com/cgi-bin/help?subtype=1&&id=28&&no=1001256).
    Enterprise version: Use the email login password by default. If the administrator enables secure login, you need to use the authorization code. Please refer to [Document] ( https://work.weixin.qq.com/help?person_id=0&doc_id=301&helpType=exmail).

Notes!
If the email service is built by the enterprise, you need to consult the email administrator or email service provider.

  1. Sender If third-party mailbox services are used, this field should be consistent with the content of the "User Name"; If you use your own email service, just fill in the sending email.

  2. Recipient Fill in the recipient's email address, and you can fill in multiple ones.
    ![](https://img.ymatrix.cn/ymatrix_home/Health Monitoring 2_1673319399.png)

2.2 Monitoring Project

The list is the monitoring items currently provided by YMatrix. The default is enabled, which you can enable as needed. ![](https://img.ymatrix.cn/ymatrix_home/Health Monitoring 3_1673319412.png)

If you think the default parameters of the detection project do not meet the business scenario, you can also modify them yourself. ![](https://img.ymatrix.cn/ymatrix_home/Health Monitoring 4_1673319422.png)

Serial number Detection items Instructions
1 Cluster not available Verify whether the cluster is available by periodically executing the query SELECT * FROM gp_dist_random('gp_id'); If the query fails three times in a row, it is likely that the cluster has crashed, which may be the main Segment, corresponding mirror Segment simultaneous failure, network failure, power failure, hardware failure, etc.
2 Segment failed The main segment failure will cause the corresponding mirror segment server resources to be tilted. The processing pressure of the machine where the mirror segment data is located will increase, and the query speed will slow down. In severe cases, it may cause the memory resources of the tilted node to be exhausted and the cluster will be unavailable.
The presence of a mirrored segment failure will cause the cluster to be less highly available. Once the corresponding primary segment fails, the cluster will be unavailable
3 Query/transaction takes more than 12 hours If the query/transaction takes too long, it may occupy a large amount of memory, CPU and other server resources, causing the database service response to slow down and the system triggers OOM (memory overflow), etc.; in addition, it may cause the VACUUM process to be delayed
4 The transaction is in idle in transaction state for more than 1 hour The transaction is in idle in transaction state for a long time, and most queries with tables involved in the transaction will be blocked, which will also prevent the VACUUM process from reclaiming records, causing table data to bloat
5 Single query/transaction blocks more than 5 other queries and lasts for more than 15 minutes Query/transaction blocks many other queries, and the blocking time is long, which can easily cause other statements to block each other, affecting service response efficiency
6 The query that applied for Exclusive or AccessExclusive lock was blocked for more than 15 minutes Ques for table-level Exclusive or AccessExclusive locks are blocked for a long time, which may cause query blocking and accumulation, affecting service response efficiency
7 Query/transaction holds Exclusive or AccessExclusive locks and takes more than 2 hours Query/transaction holds table-level Exclusive or AccessExclusive locks and takes a long time, which will cause all queries involving locked tables to be blocked, affecting service response efficiency
8 Transactions that hold Exclusive or AccessExclusive locks and are in idle in transaction status for more than 15 minutes If the transaction holds an Exclusive or AccessExclusive lock and is in the idle in transaction state for 15 minutes, most of the queries with the tables involved in the transaction will be blocked, affecting the service response efficiency

2.3 Email Notification

If you configure the mailbox, you will receive an email when an event that meets the detection project failure conditions. ![](https://img.ymatrix.cn/ymatrix_home/Health Monitoring 5_1673336152.png)

2.4 Event History

Regardless of whether you configure the mailbox or not, you can view records that occur in the cluster and meet the detection project failure conditions in the "Event History". ![](https://img.ymatrix.cn/ymatrix_home/Health Monitoring 6_1673319439.png)

2.5 Disk

The related items can be quickly enabled/disabled to monitor disk status, including "Disk full", "Disk space is less than 20%", "Disk space will be exhausted within 7 days", and "Disk has grown abnormally in the past 1 day". You can also click the "Edit" button to make adaptive adjustments according to business needs.