YMatrix
Quick Start
Cluster Deployment
Data Model
Data Writing
Data Query
SQL Reference
Maintenance and Monitoring
Tool Guide
Troubleshooting
FAQ
This document records common performance optimization-related issues in MatrixDB.
JDBC query stress test, concurrency 50, the connection pool uses Alibaba's druid.
The response time is stable within 2.5 minutes, and then the response time becomes longer. Master begins to experience the following error:
ERROR "failed to acquire resources on one or more segments", "can not connect to server: Connection timed out".
Segment None error/panic log
Problem Analysis
A distributed database will have a large amount of TCP/UDP data transmission, and each transmission will use a different port number. These port numbers or connections are routed at one time (connection conn) in the eyes of the OS.
The system parameter nf_conntrack_max
means that the number of routing information that OS can maintain at the maximum at the same time.
Since multiple virtual machines are on a physical machine in this case, the virtual network should use NAT. This way, when a large number of concurrent queries come at the same time, the virtual routing information will increase rapidly, which may cross the nf_conntrack_max
threshold in a short time, which will cause the network card to actively discard packages that cannot be processed.
This can also explain the larger the concurrent query, the more likely it is to lose packets.
Solution
Modify kernel parameters:
sudo sysctl net.netfilter.nf_conntrack_buckets=262144
sudo sysctl net.netfilter.nf_conntrack_max=1048576
In the query execution plan, it was found that the motion operator was expensive.
Problem Analysis
The business environment may be a cloud environment. The cloud environment is inefficient in UDP processing, which may lead to high UDF latency or high packet loss probability, indirectly leading to a longer UDF data transmission time.
In addition, the user log level may be set to higher levels such as debug5, and the UDP transmission efficiency will also be affected during the process of large amounts of log output.
Solution
Problem Description
It takes a long time to return large amounts of data when using PSQL
clients or other clients.
Solution
Set the FETCH_COUNT
parameter before querying to limit the number of data rows returned at one time.
postgres=# \set FETCH_COUNT 200
postgres=# SELECT * FROM test_table;