FAQ

1 Installation

Question 1

Phenomenon

yum install matrixdb error after installing the package cpio read error

reason

The user environment is Windows, using a vm15 virtual machine. After Windows downloads the installation package, the file is dragged to the virtual machine, causing the file to be truncated.

Solution

Use the vm shared directory mechanism to transfer data, see https://blog.csdn.net/highning/article/details/106000215

Question 2

Phenomenon

When creating a MARS extension, an error was reported:

could not load library "/usr/local/matrixdb-4.0.0.enterprise/lib/postgresql/mars.so": /lib64/libarrow.so.300: undefined symbol: LZ4F_resetDecompressionContext

reason

YMatrix 4 depends on arrow 300, arrow requires LZ4 version >= 1.8

Solution

Upgrade LZ4

Question 3

Phenomenon

An error occurred during initialization:

could not connect to server: No route to host
 Is the server running on host "192.168.88.203" and accepting
 TCP/IP connections on port 40000?
 (seg0 192.168.88.203:40000)

reason

203 The machine has turned off iptables, but there is no disable. After restarting the machine, the firewall is started again, and the port is not released by default, resulting in the machine being unable to communicate during initialization. The phenomenon is that the initialization is stuck and cannot be completed.

Solution

Clear the firewall rules on the 203 machine, stop the iptables service and disable it to prevent the network from running out of touch after restart.

Question 4

Phenomenon

Report an error:

error: could not access directory \"/data/mxdata_20221104084534/master/mxseg-1\": Permission denied

mxui initialization log:

"error": "execute: do execute: run: initialize_database: 7 errors occurred: * 
error execute \"/usr/local/matrixdb-4.5.0.community/bin/initdb\"\n\n STDOUT:
The files belonging to this database system will be owned by user \"mxadmin\".
This user must also own the server process.

  The database cluster will be initialized with locale \"en_US.utf8\".\n The default text search configuration will be set to \"english\".

  Data page checksums are enabled.

   STDERR:
      initdb: error: could not access directory \"/data/mxdata_20221104084534/master/mxseg-1\": Permission denied\n * error execute \"/usr/local/matrixdb-4.5.0.community/bin/initdb\"
   STDOUT:
      The files belonging to this database system will be owned by user \"mxadmin\".
      This user must also own the server process.\n\n The database cluster will be initialized with locale \"en_US.utf8\".
      The default text search configuration will be set to \"english\".
      Data page checksums are enabled.

reason

Only the owner of the data directory has rwx permissions, and the group and other users do not have access rights.

[root@mdw ~]# ll /
total 36
lrwxrwxrwx.   1 root    root       7 Jun  1 19:38 bin -> usr/bin
dr-xr-xr-x.   5 root    root    4096 Oct 26 18:28 boot
drwxr-xr-x   20 root    root    3200 Oct 26 14:45 dev
drwxr-xr-x.  80 root    root    8192 Oct 28 13:53 etc
drwxr-xr-x.   5 root    root    8192 Oct 26 18:17 export
drwxr-xr-x.   5 root    root     105 Oct 26 18:28 home
drwx------.   5 root    root     105 Oct 26 18:28 data

Solution

Just modify the data directory permissions.

sudo chmod 755 /data

Question 5

Phenomenon

The setuptools report does not support parameters:

unknown distribution option:"long_description_content_type'

reason

The setuptools version is older

Solution

sudo python3 -m pip install --upgrade setuptools

Question 6

Phenomenon

After uninstalling and reinstalling, the cluster cannot be reinitialized

reason

Reinstalling the cluster requires necessary cleaning

Solution

  1. Delete the mxadmin user's ~/.matrixdb.env
  2. Delete /etc/matrixdb/cluster.conf
  3. Restart the supervisor
    • systemctl restart matrixdb.supervisor.service
  4. Refresh the installation interface again: http://:8240/installer

Question 7

Phenomenon

ssh default port is not 22

Solution

Add the host name, port number and user configuration to the .ssh/config file:

Host mdw
   Hostname mdw
   Port 29022
   User mxadmin
Host sdw1
   Hostname sdw1
   Port 29022
   User mxadmin

2 Network

Question 1

Phenomenon

JDBC query stress test, concurrency 50, the connection pool uses Alibaba's druid. The response time is stable within 2.5 minutes, and then the response time becomes longer. Master begins to experience the following error:

ERROR "failed to acquire resources on one or more segments", "could not connect to server: Connection timed out"

Segment None error/panic log

reason

A distributed database will have a large amount of TCP/UDP data transmission, and each transmission will use a different port number. In the eyes of the OS, these port numbers or connections are routed at one time (connection conn). The system parameter nf_conntrack_max means that the OS can maintain up to the number of routing information at the same time. Because our multiple virtual machines are on a physical machine, the virtual network should use NAT. This way, when a large number of concurrent queries come at the same time, the virtual routing information will increase rapidly, which may exceed the nf_conntrack_max limit in a short period of time, which will cause the network card to actively discard packages that cannot be processed. This can also explain the larger the concurrent query, the more likely it is to lose packets during the previous period.

Solution

Modify kernel parameters

sudo sysctl net.netfilter.nf_conntrack_buckets=262144
sudo sysctl net.netfilter.nf_conntrack_max=1048576

Question 2

Phenomenon

The overhead of the query is high

reason

  1. The environment is a cloud environment. The cloud environment is inefficient in UDP processing, which may lead to high UDF delay or high packet loss probability, which indirectly leads to a longer UDF data transmission time.
  2. The user log level is set to debug5, and the UDP transmission efficiency will be affected during the process of large amounts of log output.

Solution

  1. Switch TCP interconnect
  2. Turn on ic proxy
  3. Adjust log level

3 Query

Question 1

Phenomenon

The UI client accesses the remote database, sends queries after a long query process or a long time idle, and the client sometimes receives a log:

server closed the connection unexpectedly

reason

The query timeout cancel setting exists on the client, or the idle timeout cancel connection setting

Solution

Change client timeout settings and cancel timeout

Question 2

Phenomenon

The PARTITION table is simple. UNION ALL query for Filter operations is slower than IN query.

reason

In the IN query of the PARTITION table, there is only 1 default partition after the partition is cropped, but each subquery in the UNION ALL query is cropped to the default partition. After scanning the default partition multiple times, the performance impact is obvious.

Solution

For PARTITION table

  1. Try to avoid default partitioning
  2. Try not to use UNION but use the in clause

Question 3

Phenomenon

Insert int type, and SELECT query runs very fast alone, and it is very slow to put it in plpgsql function

reason

The query in the plpgsql function is run through SPI. The output result of SPI Plan is two tables Join, using nestloop, the statement rows=1, without analyze

Solution

Perform ANALYSE

Question 4

Phenomenon

PARTITION partition cropping and update operation, two sessions (Sessions) independently update will cause interlocking.

reason

Distributed deadlock

Solution

Turn on distributed deadlock detection

gpconfig -c gp_enable_global_deadlock_detector -v on

4 Storage

Question 1

Phenomenon

Low data loading performance

reason

gpcheckperf looks at disk performance and network performance, and finds that disk performance is only 80MB/s

Solution

Loading multiple disks to improve I/O performance, while WA and Data disks to improve I/O performance separately

Question 2

Phenomenon

mxgate opens loading 30 tables at the same time, and reports an error:

failed to acquire resources on on or more segments ,fatal out of memory

reason

PG/GP is a multi-process mode, with high concurrent requests coming, too many connections, and the memory cannot be allocated to the corresponding request, thus reporting an error.

Solution

Adjust parameters /etc/sysctl.conf vm.overcommit_memory = 2 mxgate prepared=10 Change to prepared=5

5 PXF

Question 1

Phenomenon

After PXF deployment, an error is reported when accessing HDFS:

remote component error,Failed connect to localhost:5888; Connection refused (libchurl.c:950)

Solution

  1. The way PXF accesses files requires the PXF server to be enabled on the Master node, but the data file needs to be on the Segment PXF
  2. pxf/servers/core-site.xml and hdfs-site.xml must be the same as hadoop configuration files
  3. pxf/servers/core-site.xml configure user access rights
  4. The username and group of the file on hadoop need to be consistent with the specified in pxf/core-site.xml

Question 2

Phenomenon

When a file is in the library, a certain field contains a newline character. Divide a line of data into two lines, and then divide it with a separator, which will cause the data to be inconsistent with the number of fields. That is to say, there are two in a line of data \n One is in the middle and the other is at the end, but the middle one cannot be treated as a newline character.

Solution

  1. You can add escaple 'off' to the option
  2. You can also use format 'text:multi'

6 Monitoring

Question 1

Phenomenon

Requires offline installation of Grafana monitoring

Solution

Click to download the Grafana repository dependency package, and then install it after creating the local repository.

# ls
create_repo.sh  grafana-7.3.6-1.x86_64.rpm  grafana_repo

# sh create_repo.sh
Create ymatrix-grafana repo successfully!

#  yum install --disablerepo=* --enablerepo=ymatrix_grafana grafana-7.3.6-1.x86_64.rpm