Frequently Asked Questions

1 Installation

Problem 1

Symptom

Error occurs after installing the yum install matrixdb package: cpio read error

Cause

The user environment is Windows with VMware Workstation 15. The installation package was downloaded on Windows and dragged into the virtual machine, causing file truncation.

Solution

Use the VMware shared folder mechanism to transfer files. See https://blog.csdn.net/highning/article/details/106000215.

Problem 2

Symptom

Error when creating the MARS extension:

could not load library "/usr/local/matrixdb-4.0.0.enterprise/lib/postgresql/mars.so": /lib64/libarrow.so.300: undefined symbol: LZ4F_resetDecompressionContext

Cause

YMatrix 4 depends on Arrow version 300, which requires LZ4 version >= 1.8.

Solution

Upgrade the LZ4 library.

Problem 3

Symptom

Initialization fails with the error:

could not connect to server: No route to host
 Is the server running on host "192.168.88.203" and accepting
 TCP/IP connections on port 40000?
 (seg0 192.168.88.203:40000)

Cause

Firewall iptables was stopped but not disabled on host 203. After reboot, the firewall restarted automatically. The required ports were not open, preventing inter-node communication. This causes initialization to hang indefinitely.

Solution

Clear firewall rules on host 203, stop the iptables service, and disable it to prevent automatic startup after reboot.

Problem 4

Symptom

Error message:

error: could not access directory \"/data/mxdata_20221104084534/master/mxseg-1\": Permission denied

mxui initialization log:

"error": "execute: do execute: run: initialize_database: 7 errors occurred: * 
error execute \"/usr/local/matrixdb-4.5.0.community/bin/initdb\"\n\n STDOUT:
The files belonging to this database system will be owned by user \"mxadmin\".
This user must also own the server process.

  The database cluster will be initialized with locale \"en_US.utf8\".\n The default text search configuration will be set to \"english\".

  Data page checksums are enabled.

   STDERR:
      initdb: error: could not access directory \"/data/mxdata_20221104084534/master/mxseg-1\": Permission denied\n * error execute \"/usr/local/matrixdb-4.5.0.community/bin/initdb\"
   STDOUT:
      The files belonging to this database system will be owned by user \"mxadmin\".
      This user must also own the server process.\n\n The database cluster will be initialized with locale \"en_US.utf8\".
      The default text search configuration will be set to \"english\".
      Data page checksums are enabled.

Cause

The data directory has restrictive permissions — only the owner has read, write, and execute (rwx) permissions. Group and other users have no access.

[root@mdw ~]# ll /
total 36
lrwxrwxrwx.   1 root    root       7 Jun  1 19:38 bin -> usr/bin
dr-xr-xr-x.   5 root    root    4096 Oct 26 18:28 boot
drwxr-xr-x   20 root    root    3200 Oct 26 14:45 dev
drwxr-xr-x.  80 root    root    8192 Oct 28 13:53 etc
drwxr-xr-x.   5 root    root    8192 Oct 26 18:17 export
drwxr-xr-x.   5 root    root     105 Oct 26 18:28 home
drwx------.   5 root    root     105 Oct 26 18:28 data

Solution

Adjust the data directory permissions:

sudo chmod 755 /data

Problem 5

Symptom

Setuptools reports unsupported argument:

unknown distribution option: "long_description_content_type"

Cause

Outdated version of setuptools.

Solution

sudo python3 -m pip install --upgrade setuptools

Problem 6

Symptom

After uninstalling and reinstalling, the cluster cannot be reinitialized.

Cause

Reinstallation requires proper cleanup.

Solution

  1. Remove the ~/.matrixdb.env file for the mxadmin user.
  2. Delete /etc/matrixdb/cluster.conf.
  3. Restart the supervisor service:
    • systemctl restart matrixdb.supervisor.service
  4. Refresh the installer page: http://:8240/installer

Problem 7

Symptom

SSH default port is not 22.

Solution

Add host, port, and user configuration in the .ssh/config file:

Host mdw
   Hostname mdw
   Port 29022
   User mxadmin
Host sdw1
   Hostname sdw1
   Port 29022
   User mxadmin

2 Network

Problem 1

Symptom

JDBC stress test with 50 concurrent connections using Alibaba's Druid connection pool. Response time remains stable for the first 2.5 minutes, then increases. Master logs show:

ERROR "failed to acquire resources on one or more segments", "could not connect to server: Connection timed out"

No error or panic logs on Segments.

Cause

Distributed databases generate large volumes of TCP/UDP traffic. Each transmission uses a different port, treated by the OS as a separate connection (conntrack entry). The kernel parameter nf_conntrack_max limits the total number of concurrent connections tracked. In this setup, multiple VMs run on a single physical host using NAT networking. Under high concurrency, the number of virtual connections spikes rapidly, exceeding nf_conntrack_max. This leads to packet drops due to overwhelmed network processing.

Solution

Adjust kernel parameters:

sudo sysctl net.netfilter.nf_conntrack_buckets=262144
sudo sysctl net.netfilter.nf_conntrack_max=1048576

Problem 2

Symptom

High overhead observed in motion operations during queries.

Cause

  1. Cloud environment inefficiently handles UDP traffic, increasing UDF latency or packet loss, thus prolonging data transfer.
  2. Log level set to debug5, resulting in excessive logging that interferes with UDP transmission efficiency.

Solution

  1. Switch to TCP interconnect.
  2. Enable IC proxy.
  3. Adjust log level.

3 Query

Problem 1

Symptom

When using a UI client to access a remote database, long-running queries or queries after prolonged idle periods sometimes return:

server closed the connection unexpectedly

Cause

Client-side timeout settings may cancel queries or close idle connections.

Solution

Modify client timeout settings to disable query or idle timeouts.

Problem 2

Symptom

For partitioned tables, a UNION ALL query with simple filter conditions performs slower than an equivalent IN query.

Cause

In the IN query, partition pruning results in scanning only the default partition. However, in the UNION ALL query, each subquery independently prunes to the default partition, leading to multiple scans of the same partition and significant performance degradation.

Solution

For partitioned tables:

  1. Avoid using a default partition when possible.
  2. Prefer IN clauses over UNION ALL.

Problem 3

Symptom

An INSERT into an integer column followed by a SELECT runs fast standalone, but executes slowly when placed inside a PL/pgSQL function.

Cause

Queries within PL/pgSQL functions use SPI (Server Programming Interface). The SPI plan shows a two-table join using a nested loop. The estimated row count is 1, and no ANALYZE has been performed.

Solution

Run ANALYZE on the relevant tables.

Problem 4

Symptom

Two concurrent sessions performing updates on a partitioned table cause mutual locking.

Cause

Distributed deadlock.

Solution

Enable distributed deadlock detection:

gpconfig -c gp_enable_global_deadlock_detector -v on

4 Storage

Problem 1

Symptom

Slow data loading performance.

Cause

gpcheckperf reveals disk performance at only 80 MB/s.

Solution

Use multiple disks to improve I/O performance. Separate WAL and data disks to reduce I/O contention.

Problem 2

Symptom

When mxgate loads 30 tables concurrently, the following error occurs:

failed to acquire resources on one or more segments, fatal out of memory

Cause

PostgreSQL/Greenplum uses a multi-process architecture. High concurrency leads to excessive connections, exhausting available memory per request.

Solution

Adjust /etc/sysctl.conf:

vm.overcommit_memory = 2

Change mxgate prepared=10 to prepared=5.

5 PXF

Problem 1

Symptom

After deploying PXF, accessing HDFS fails with:

remote component error, Failed connect to localhost:5888; Connection refused (libchurl.c:950)

Solution

  1. PXF services must be started on the Master node, but data files must reside on Segment nodes.
  2. Ensure pxf/servers/core-site.xml and hdfs-site.xml match Hadoop configuration files exactly.
  3. Configure user access permissions in pxf/servers/core-site.xml.
  4. The username and group of files on Hadoop must match those specified in pxf/core-site.xml.

Problem 2

Symptom

During data ingestion, a field containing a newline character splits one row into two. When parsed by delimiter, the number of fields does not match. That is, there are two \n characters — one in the middle and one at the end — but the internal \n should not be treated as a row delimiter.

Solution

  1. Add escape 'off' in the options.
  2. Use FORMAT 'text:multi'.

6 Monitoring

Problem 1

Symptom

Need to install Grafana monitoring offline.

Solution

Download the Grafana repository package. Create a local repository and install.

# ls
create_repo.sh  grafana-7.3.6-1.x86_64.rpm  grafana_repo

# sh create_repo.sh
Create ymatrix-grafana repo successfully!

# yum install --disablerepo=* --enablerepo=ymatrix_grafana grafana-7.3.6-1.x86_64.rpm