MatrixDB Performance FAQ

This document records common performance optimization-related issues in MatrixDB.

1 Java stress test report Connection timed out


JDBC query stress test, concurrency 50, the connection pool uses Alibaba's druid.
The response time is stable within 2.5 minutes, and then the response time becomes longer. Master begins to experience the following error: ERROR "failed to acquire resources on one or more segments", "can not connect to server: Connection timed out".
Segment None error/panic log

Problem Analysis

A distributed database will have a large amount of TCP/UDP data transmission, and each transmission will use a different port number. These port numbers or connections are routed at one time (connection conn) in the eyes of the OS.
The system parameter nf_conntrack_max means that the number of routing information that OS can maintain at the maximum at the same time.
Since multiple virtual machines are on a physical machine in this case, the virtual network should use NAT. This way, when a large number of concurrent queries come at the same time, the virtual routing information will increase rapidly, which may cross the nf_conntrack_max threshold in a short time, which will cause the network card to actively discard packages that cannot be processed.

This can also explain the larger the concurrent query, the more likely it is to lose packets.

Solution

Modify kernel parameters:

sudo sysctl net.netfilter.nf_conntrack_buckets=262144
sudo sysctl net.netfilter.nf_conntrack_max=1048576

2 SQL query motion operator takes time


In the query execution plan, it was found that the motion operator was expensive.

Problem Analysis

The business environment may be a cloud environment. The cloud environment is inefficient in UDP processing, which may lead to high UDF latency or high packet loss probability, indirectly leading to a longer UDF data transmission time.
In addition, the user log level may be set to higher levels such as debug5, and the UDP transmission efficiency will also be affected during the process of large amounts of log output.

Solution

  1. Switch TCP interconnect.
  2. Turn on ic proxy.
  3. Adjust log level

3 The client returns a large amount of data for a long time


Problem Description

It takes a long time to return large amounts of data when using PSQL clients or other clients.

Solution

Set the FETCH_COUNT parameter before querying to limit the number of data rows returned at one time.

postgres=# \set FETCH_COUNT 200
postgres=# SELECT *  FROM test_table;