Symptom
Error occurs after installing the yum install matrixdb package: cpio read error
Cause
The user environment is Windows with VMware Workstation 15. The installation package was downloaded on Windows and dragged into the virtual machine, causing file truncation.
Solution
Use the VMware shared folder mechanism to transfer files. See https://blog.csdn.net/highning/article/details/106000215.
Symptom
Error when creating the MARS extension:
could not load library "/usr/local/matrixdb-4.0.0.enterprise/lib/postgresql/mars.so": /lib64/libarrow.so.300: undefined symbol: LZ4F_resetDecompressionContext
Cause
YMatrix 4 depends on Arrow version 300, which requires LZ4 version >= 1.8.
Solution
Upgrade the LZ4 library.
Symptom
Initialization fails with the error:
could not connect to server: No route to host
Is the server running on host "192.168.88.203" and accepting
TCP/IP connections on port 40000?
(seg0 192.168.88.203:40000)
Cause
Firewall iptables was stopped but not disabled on host 203. After reboot, the firewall restarted automatically. The required ports were not open, preventing inter-node communication. This causes initialization to hang indefinitely.
Solution
Clear firewall rules on host 203, stop the iptables service, and disable it to prevent automatic startup after reboot.
Symptom
Error message:
error: could not access directory \"/data/mxdata_20221104084534/master/mxseg-1\": Permission denied
mxui initialization log:
"error": "execute: do execute: run: initialize_database: 7 errors occurred: *
error execute \"/usr/local/matrixdb-4.5.0.community/bin/initdb\"\n\n STDOUT:
The files belonging to this database system will be owned by user \"mxadmin\".
This user must also own the server process.
The database cluster will be initialized with locale \"en_US.utf8\".\n The default text search configuration will be set to \"english\".
Data page checksums are enabled.
STDERR:
initdb: error: could not access directory \"/data/mxdata_20221104084534/master/mxseg-1\": Permission denied\n * error execute \"/usr/local/matrixdb-4.5.0.community/bin/initdb\"
STDOUT:
The files belonging to this database system will be owned by user \"mxadmin\".
This user must also own the server process.\n\n The database cluster will be initialized with locale \"en_US.utf8\".
The default text search configuration will be set to \"english\".
Data page checksums are enabled.
Cause
The data directory has restrictive permissions — only the owner has read, write, and execute (rwx) permissions. Group and other users have no access.
[root@mdw ~]# ll /
total 36
lrwxrwxrwx. 1 root root 7 Jun 1 19:38 bin -> usr/bin
dr-xr-xr-x. 5 root root 4096 Oct 26 18:28 boot
drwxr-xr-x 20 root root 3200 Oct 26 14:45 dev
drwxr-xr-x. 80 root root 8192 Oct 28 13:53 etc
drwxr-xr-x. 5 root root 8192 Oct 26 18:17 export
drwxr-xr-x. 5 root root 105 Oct 26 18:28 home
drwx------. 5 root root 105 Oct 26 18:28 data
Solution
Adjust the data directory permissions:
sudo chmod 755 /data
Symptom
Setuptools reports unsupported argument:
unknown distribution option: "long_description_content_type"
Cause
Outdated version of setuptools.
Solution
sudo python3 -m pip install --upgrade setuptools
Symptom
After uninstalling and reinstalling, the cluster cannot be reinitialized.
Cause
Reinstallation requires proper cleanup.
Solution
~/.matrixdb.env file for the mxadmin user./etc/matrixdb/cluster.conf.systemctl restart matrixdb.supervisor.serviceSymptom
SSH default port is not 22.
Solution
Add host, port, and user configuration in the .ssh/config file:
Host mdw
Hostname mdw
Port 29022
User mxadmin
Host sdw1
Hostname sdw1
Port 29022
User mxadmin
Symptom
JDBC stress test with 50 concurrent connections using Alibaba's Druid connection pool. Response time remains stable for the first 2.5 minutes, then increases. Master logs show:
ERROR "failed to acquire resources on one or more segments", "could not connect to server: Connection timed out"
No error or panic logs on Segments.
Cause
Distributed databases generate large volumes of TCP/UDP traffic. Each transmission uses a different port, treated by the OS as a separate connection (conntrack entry). The kernel parameter nf_conntrack_max limits the total number of concurrent connections tracked. In this setup, multiple VMs run on a single physical host using NAT networking. Under high concurrency, the number of virtual connections spikes rapidly, exceeding nf_conntrack_max. This leads to packet drops due to overwhelmed network processing.
Solution
Adjust kernel parameters:
sudo sysctl net.netfilter.nf_conntrack_buckets=262144
sudo sysctl net.netfilter.nf_conntrack_max=1048576
Symptom
High overhead observed in motion operations during queries.
Cause
debug5, resulting in excessive logging that interferes with UDP transmission efficiency.Solution
Symptom
When using a UI client to access a remote database, long-running queries or queries after prolonged idle periods sometimes return:
server closed the connection unexpectedly
Cause
Client-side timeout settings may cancel queries or close idle connections.
Solution
Modify client timeout settings to disable query or idle timeouts.
Symptom
For partitioned tables, a UNION ALL query with simple filter conditions performs slower than an equivalent IN query.
Cause
In the IN query, partition pruning results in scanning only the default partition. However, in the UNION ALL query, each subquery independently prunes to the default partition, leading to multiple scans of the same partition and significant performance degradation.
Solution
For partitioned tables:
IN clauses over UNION ALL.Symptom
An INSERT into an integer column followed by a SELECT runs fast standalone, but executes slowly when placed inside a PL/pgSQL function.
Cause
Queries within PL/pgSQL functions use SPI (Server Programming Interface). The SPI plan shows a two-table join using a nested loop. The estimated row count is 1, and no ANALYZE has been performed.
Solution
Run ANALYZE on the relevant tables.
Symptom
Two concurrent sessions performing updates on a partitioned table cause mutual locking.
Cause
Distributed deadlock.
Solution
Enable distributed deadlock detection:
gpconfig -c gp_enable_global_deadlock_detector -v on
Symptom
Slow data loading performance.
Cause
gpcheckperf reveals disk performance at only 80 MB/s.
Solution
Use multiple disks to improve I/O performance. Separate WAL and data disks to reduce I/O contention.
Symptom
When mxgate loads 30 tables concurrently, the following error occurs:
failed to acquire resources on one or more segments, fatal out of memory
Cause
PostgreSQL/Greenplum uses a multi-process architecture. High concurrency leads to excessive connections, exhausting available memory per request.
Solution
Adjust /etc/sysctl.conf:
vm.overcommit_memory = 2
Change mxgate prepared=10 to prepared=5.
Symptom
After deploying PXF, accessing HDFS fails with:
remote component error, Failed connect to localhost:5888; Connection refused (libchurl.c:950)
Solution
pxf/servers/core-site.xml and hdfs-site.xml match Hadoop configuration files exactly.pxf/servers/core-site.xml.pxf/core-site.xml.Symptom
During data ingestion, a field containing a newline character splits one row into two. When parsed by delimiter, the number of fields does not match. That is, there are two \n characters — one in the middle and one at the end — but the internal \n should not be treated as a row delimiter.
Solution
escape 'off' in the options.FORMAT 'text:multi'.Symptom
Need to install Grafana monitoring offline.
Solution
Download the Grafana repository package. Create a local repository and install.
# ls
create_repo.sh grafana-7.3.6-1.x86_64.rpm grafana_repo
# sh create_repo.sh
Create ymatrix-grafana repo successfully!
# yum install --disablerepo=* --enablerepo=ymatrix_grafana grafana-7.3.6-1.x86_64.rpm