Phenomenon
yum install matrixdb
error after installing the package cpio read error
reason
The user environment is Windows, using a vm15 virtual machine. After Windows downloads the installation package, the file is dragged to the virtual machine, causing the file to be truncated.
Solution
Use the vm shared directory mechanism to transfer data, see https://blog.csdn.net/highning/article/details/106000215
Phenomenon
When creating a MARS extension, an error was reported:
could not load library "/usr/local/matrixdb-4.0.0.enterprise/lib/postgresql/mars.so": /lib64/libarrow.so.300: undefined symbol: LZ4F_resetDecompressionContext
reason
YMatrix 4 depends on arrow 300, arrow requires LZ4 version >= 1.8
Solution
Upgrade LZ4
Phenomenon
An error occurred during initialization:
could not connect to server: No route to host
Is the server running on host "192.168.88.203" and accepting
TCP/IP connections on port 40000?
(seg0 192.168.88.203:40000)
reason
203 The machine has turned off iptables, but there is no disable. After restarting the machine, the firewall is started again, and the port is not released by default, resulting in the machine being unable to communicate during initialization. The phenomenon is that the initialization is stuck and cannot be completed.
Solution
Clear the firewall rules on the 203 machine, stop the iptables service and disable it to prevent the network from running out of touch after restart.
Phenomenon
Report an error:
error: could not access directory \"/data/mxdata_20221104084534/master/mxseg-1\": Permission denied
mxui initialization log:
"error": "execute: do execute: run: initialize_database: 7 errors occurred: *
error execute \"/usr/local/matrixdb-4.5.0.community/bin/initdb\"\n\n STDOUT:
The files belonging to this database system will be owned by user \"mxadmin\".
This user must also own the server process.
The database cluster will be initialized with locale \"en_US.utf8\".\n The default text search configuration will be set to \"english\".
Data page checksums are enabled.
STDERR:
initdb: error: could not access directory \"/data/mxdata_20221104084534/master/mxseg-1\": Permission denied\n * error execute \"/usr/local/matrixdb-4.5.0.community/bin/initdb\"
STDOUT:
The files belonging to this database system will be owned by user \"mxadmin\".
This user must also own the server process.\n\n The database cluster will be initialized with locale \"en_US.utf8\".
The default text search configuration will be set to \"english\".
Data page checksums are enabled.
reason
Only the owner of the data directory has rwx permissions, and the group and other users do not have access rights.
[root@mdw ~]# ll /
total 36
lrwxrwxrwx. 1 root root 7 Jun 1 19:38 bin -> usr/bin
dr-xr-xr-x. 5 root root 4096 Oct 26 18:28 boot
drwxr-xr-x 20 root root 3200 Oct 26 14:45 dev
drwxr-xr-x. 80 root root 8192 Oct 28 13:53 etc
drwxr-xr-x. 5 root root 8192 Oct 26 18:17 export
drwxr-xr-x. 5 root root 105 Oct 26 18:28 home
drwx------. 5 root root 105 Oct 26 18:28 data
Solution
Just modify the data directory permissions.
sudo chmod 755 /data
Phenomenon
The setuptools report does not support parameters:
unknown distribution option:"long_description_content_type'
reason
The setuptools version is older
Solution
sudo python3 -m pip install --upgrade setuptools
Phenomenon
After uninstalling and reinstalling, the cluster cannot be reinitialized
reason
Reinstalling the cluster requires necessary cleaning
Solution
Phenomenon
ssh default port is not 22
Solution
Add the host name, port number and user configuration to the .ssh/config file:
Host mdw
Hostname mdw
Port 29022
User mxadmin
Host sdw1
Hostname sdw1
Port 29022
User mxadmin
Phenomenon
JDBC query stress test, concurrency 50, the connection pool uses Alibaba's druid. The response time is stable within 2.5 minutes, and then the response time becomes longer. Master begins to experience the following error:
ERROR "failed to acquire resources on one or more segments", "could not connect to server: Connection timed out"
Segment None error/panic log
reason
A distributed database will have a large amount of TCP/UDP data transmission, and each transmission will use a different port number. In the eyes of the OS, these port numbers or connections are routed at one time (connection conn). The system parameter nf_conntrack_max means that the OS can maintain up to the number of routing information at the same time. Because our multiple virtual machines are on a physical machine, the virtual network should use NAT. This way, when a large number of concurrent queries come at the same time, the virtual routing information will increase rapidly, which may exceed the nf_conntrack_max limit in a short period of time, which will cause the network card to actively discard packages that cannot be processed. This can also explain the larger the concurrent query, the more likely it is to lose packets during the previous period.
Solution
Modify kernel parameters
sudo sysctl net.netfilter.nf_conntrack_buckets=262144
sudo sysctl net.netfilter.nf_conntrack_max=1048576
Phenomenon
The overhead of the query is high
reason
Solution
Phenomenon
The UI client accesses the remote database, sends queries after a long query process or a long time idle, and the client sometimes receives a log:
server closed the connection unexpectedly
reason
The query timeout cancel setting exists on the client, or the idle timeout cancel connection setting
Solution
Change client timeout settings and cancel timeout
Phenomenon
The PARTITION table is simple. UNION ALL query for Filter operations is slower than IN query.
reason
In the IN query of the PARTITION table, there is only 1 default partition after the partition is cropped, but each subquery in the UNION ALL query is cropped to the default partition. After scanning the default partition multiple times, the performance impact is obvious.
Solution
For PARTITION table
Phenomenon
Insert int type, and SELECT query runs very fast alone, and it is very slow to put it in plpgsql function
reason
The query in the plpgsql function is run through SPI. The output result of SPI Plan is two tables Join, using nestloop, the statement rows=1, without analyze
Solution
Perform ANALYSE
Phenomenon
PARTITION partition cropping and update operation, two sessions (Sessions) independently update will cause interlocking.
reason
Distributed deadlock
Solution
Turn on distributed deadlock detection
gpconfig -c gp_enable_global_deadlock_detector -v on
Phenomenon
Low data loading performance
reason
gpcheckperf looks at disk performance and network performance, and finds that disk performance is only 80MB/s
Solution
Loading multiple disks to improve I/O performance, while WA and Data disks to improve I/O performance separately
Phenomenon
mxgate opens loading 30 tables at the same time, and reports an error:
failed to acquire resources on on or more segments ,fatal out of memory
reason
PG/GP is a multi-process mode, with high concurrent requests coming, too many connections, and the memory cannot be allocated to the corresponding request, thus reporting an error.
Solution
Adjust parameters /etc/sysctl.conf vm.overcommit_memory = 2 mxgate prepared=10 Change to prepared=5
Phenomenon
After PXF deployment, an error is reported when accessing HDFS:
remote component error,Failed connect to localhost:5888; Connection refused (libchurl.c:950)
Solution
Phenomenon
When a file is in the library, a certain field contains a newline character. Divide a line of data into two lines, and then divide it with a separator, which will cause the data to be inconsistent with the number of fields. That is to say, there are two in a line of data \n One is in the middle and the other is at the end, but the middle one cannot be treated as a newline character.
Solution
Phenomenon
Requires offline installation of Grafana monitoring
Solution
Click to download the Grafana repository dependency package, and then install it after creating the local repository.
# ls
create_repo.sh grafana-7.3.6-1.x86_64.rpm grafana_repo
# sh create_repo.sh
Create ymatrix-grafana repo successfully!
# yum install --disablerepo=* --enablerepo=ymatrix_grafana grafana-7.3.6-1.x86_64.rpm