This document describes common problems in cluster deployment.
error: could not access directory \"/data/mxdata_20221104084534/master/mxseg-1\": Permission denied
MXUI initialization log:
"error": "execute: do execute: run: initialize_database: 7 errors occurred: *
error execute \"/usr/local/matrixdb-4.5.0.community/bin/initdb\"\n\n STDOUT:
The files belonging to this database system will be owned by user \"mxadmin\".
This user must also own the server process.
The database cluster will be initialized with locale \"en_US.utf8\".\n The default text search configuration will be set to \"english\".
Data page checksums are enabled.
STDERR:
initdb: error: could not access directory \"/data/mxdata_20221104084534/master/mxseg-1\": Permission denied\n * error execute \"/usr/local/matrixdb-4.5.0.community/bin/initdb\"
STDOUT:
The files belonging to this database system will be owned by user \"mxadmin\".
This user must also own the server process.\n\n The database cluster will be initialized with locale \"en_US.utf8\".
The default text search configuration will be set to \"english\".
Data page checksums are enabled.
Problem Analysis
Only the owner of the data directory has rwx permissions, and the group and other users do not have access rights.
[root@mdw ~]# ll /
total 36
lrwxrwxrwx. 1 root root 7 Jun 1 19:38 bin -> usr/bin
dr-xr-xr-x. 5 root root 4096 Oct 26 18:28 boot
drwxr-xr-x 20 root root 3200 Oct 26 14:45 dev
drwxr-xr-x. 80 root root 8192 Oct 28 13:53 etc
drwxr-xr-x. 5 root root 8192 Oct 26 18:17 export
drwxr-xr-x. 5 root root 105 Oct 26 18:28 home
drwx------. 5 root root 105 Oct 26 18:28 data
Solution
Just modify the data directory permissions.
sudo chmod 755 /data
yum
error after installing the matrixdb package cpio read error
Problem Analysis
The user environment is Windows, and you use a vm15 virtual machine. After Windows downloads the installation package, the file will be dragged to the virtual machine, causing the file to be truncated.
Solution
Use the vm shared directory mechanism to transfer data.
could not connect to server: No route to host
An error occurred during initialization:
could not connect to server: No route to host
Is the server running on host "192.168.88.203" and accepting
TCP/IP connections on port 40000?
(seg0 192.168.88.203:40000)
Problem Analysis
203 The machine turned off iptables, but there was no disable
. After restarting the machine, the firewall started again.
The port is not released by default, which causes the machine to be unable to communicate during initialization. The phenomenon is that the initialization is stuck and cannot be completed.
Solution
Clear the firewall rules on the 203 machine, stop the iptables service and disable
to prevent the network from running out after restarting.
unknown distribution option:"long_description_content_type"
Problem Analysis
The setuptools version is relatively old.
Solution
sudo python3 -m pip install --upgrade setuptools
Solution
Add the host name, port number and user configuration to the .ssh/config
file:
Host mdw
Hostname mdw
Port 29022
User mxadmin
Host sdw1
Hostname sdw1
Port 29022
User mxadmin
Problem Analysis
The same entry exists in /etc/hosts
, such as:
<IP 地址1> <主机名1>
<IP 地址1> <主机名1>
<IP 地址2> <主机名2>
<IP 地址2> <主机名2>
Solution
Delete extra entries in /etc/hosts
:
<IP 地址1> <主机名1>
<IP 地址2> <主机名2>
It can be initialized normally after modification.
failed to connect to host=mdw user=mxadmin database=postgres: dial error (dial tcp 192.168.247.132:5432: connect: connection refused)
The following error occurred in the graphic interface:
failed to connect to host=mdw user=mxadmin database=postgres: dial error (dial tcp 192.168.247.132:5432: connect: connection refused)
Problem Analysis
You probably have installed YMatrix once using your browser. For some reason, the previous YMatrix environment has been cleaned up. If this graphic interface is loaded again, the /datastream
path will be added to the URL address by default.
For example: http://192.168.247.132:8240/datastream
Solution
Change the datastream
keyword to installer
.
For example: http://192.168.247.132:8240/installer
Use the graphic interface again to perform the next installation.
collect: do collect: unmarshal remote: json: cannot unmarshal string into Go struct field Disk.hardware.disk.ineligibleDesc of type mxi18n.Message
When installing and deploying a YMatrix cluster using a graphic interface, an error was reported for adding nodes:
Host addition failed collect: do collect: unmarshal remote: json: cannot unmarshal string into Go struct field Disk.hardware.disk.ineligibleDesc of type mxi18n.Message
Problem Analysis
The YMatrix version installed by each server node is caused by inconsistent.
Checking Method
Check the YMatrix version of each server node in turn.
Check the master node mdw YMatrix version.
[root@mdw matrixdb]# ll /usr/local/matrixdb
lrwxrwxrwx 1 root root 25 12月 9 18:02 /usr/local/matrixdb -> matrixdb-4.7.5.enterprise
Check the data node sdw1 YMatrix version.
[root@sdw1 ~]$ ll /usr/local/matrixdb
lrwxrwxrwx 1 root root 25 12月 22 17:24 /usr/local/matrixdb -> matrixdb-4.6.2.enterprise
Check the data node sdw2 YMatrix version.
[root@sdw2 ~]# ll /usr/local/matrixdb
lrwxrwxrwx 1 root root 25 12月 9 18:02 /usr/local/matrixdb -> matrixdb-4.7.5.enterprise
Check the data node sdw3 YMatrix version.
[root@sdw3 ~]# ll /usr/local/matrixdb
lrwxrwxrwx 1 root root 25 12月 9 18:02 /usr/local/matrixdb -> matrixdb-4.7.5.enterprise
Check results
The database version of the sdw1 node is 4.6.2, and the database version of the other nodes is 4.7.5.
Solution
Upgrade the database version of the sdw1 node with the same version as other nodes, and the command is as follows:
Stop Supervisor service.
[root@sdw1 ~]$ systemctl stop matrixdb.supervisor.service
Uninstall the old version of YMatrix software.
[root@sdw1 ~]$ yum -y remove matrixdb
Install the new version of YMatrix software.
[root@sdw1 ~]$ yum -y install /home/mxadmin/matrixdb-4.7.5.enterprise-1.el7.x86_64.rpm
Start the Supervisor service.
[root@sdw1 ~]$ systemctl start matrixdb.supervisor.service
Error message
20221223:09:55:10:001626 mxstart:mdw:mxadmin-[CRITICAL]:-Error occurred: non-zero rc: 1
Command was: 'env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /mxdata_20221221165810/master/mxseg-1 -l /mxdata_20221221165810/master/mxseg-1/log/startup.log -w -t 600 -o " -p 5432 -c gp_role=utility " start'
rc=1, stdout='waiting for server to start.... stopped waiting
', stderr='pg_ctl: could not start server
Examine the log output.
'
Problem Analysis
View the log file.
[mxadmin@mdw ~]$ cd /mxdata_20221221165810/master/mxseg-1/log
[mxadmin@mdw log]$ vi startup.log
"FATAL","42501","could not create lock file ""/tmp/.s.PGSQL.5432.lock"": Permission denied",,,,,,,,"CreateLockFile","miscinit.c",994,1 0xd44e33 postgres errstart (elog.c:498)
Check the permissions for the /tmp
path. Since the permissions of the /tmp
path must be 777, just modify it back.
[mxadmin@mdw ~]$ ll / | grep tmp
drw-r-xr-x. 7 root root 8192 12月 23 10:00 tmp
Solution
Under root user, change the /tmp
path permission to 777
permission.
[mxadmin@mdw ~]$ exit
[root@mdw ~]# chmod 777 /tmp
Restart the cluster.
[root@mdw ~]# su - mxadmin
[mxadmin@mdw ~]$ mxstart -a
Problem Analysis View graphic deployment log files
[mxadmin@mdw ~]$ cd /var/log/matrixdb/
[mxadmin@mdw matrixdb]$ vi mxui.log
[20221223:10:08:43][INFO] id=1; start: system_setup
[20221223:10:08:43][INFO] id=1; done
[20221223:10:08:43][INFO] id=2; start: create_user_and_directories
[20221223:10:08:43][INFO] id=2; done
[20221223:10:08:43][INFO] id=3; start: initialize_database
[20221223:10:08:44][INFO] id=3; running: 6%
[20221223:10:08:44][INFO] id=3; running: 6%
[20221223:10:08:44][INFO] id=3; running: 6%
[20221223:10:08:44][INFO] id=3; running: 6%
[20221223:10:08:44][INFO] id=3; running: 6%
[20221223:10:08:45][INFO] id=3; running: 6%
[20221223:10:08:45][INFO] id=3; running: 6%
[20221223:10:08:45][INFO] id=3; done
[20221223:10:08:45][INFO] id=4; start: launch_matrixdb
[20221223:10:08:45][ERROR] id=4; failed: launch_matrixdb: error execute "/usr/local/matrixdb-4.7.5.enterprise/bin/pg_ctl -w -l /mxdata_20221223100549/master/mxseg-1/log/startup.log -D /mxdata_20221223100549/master/mxseg-1 -o -i -p 5432 -c gp_role=utility -m start"
STDOUT:
waiting for server to start.... stopped waiting
STDERR:
pg_ctl: could not start server
Examine the log output.
[20221223:10:08:45][INFO] id=4; revert start: launch_matrixdb
[20221223:10:08:45][INFO] id=4; revert done
[20221223:10:08:45][INFO] id=3; revert start: initialize_database
[20221223:10:08:45][INFO] id=3; revert done
[20221223:10:08:45][INFO] id=2; revert start: create_user_and_directories
[20221223:10:08:45][INFO] id=2; revert done
[20221223:10:08:45][INFO] id=1; revert start: system_setup
[20221223:10:08:45][INFO] id=1; revert done
{
"error": "execute: do execute: run: launch_matrixdb: error execute \"/usr/local/matrixdb-4.7.5.enterprise/bin/pg_ctl -w -l /mxdata_20221223100549/master/mxseg-1/log/startup.log -D /mxdata_20221223100549/master/mxseg-1 -o -i -p 5432 -c gp_role=utility -m start\"\n\nSTDOUT:\n waiting for server to start.... stopped waiting\n\nSTDERR:\n pg_ctl: could not start server\nExamine the log output.\n"
}
execute: do execute: run: launch_matrixdb: error execute "/usr/local/matrixdb-4.7.5.enterprise/bin/pg_ctl -w -l /mxdata_20221223100549/master/mxseg-1/log/startup.log -D /mxdata_20221223100549/master/mxseg-1 -o -i -p 5432 -c gp_role=utility -m start"
STDOUT:
waiting for server to start.... stopped waiting
STDERR:
pg_ctl: could not start server
Examine the log output.
[GIN] 2022/12/23 - 10:08:45 | 200 | 148.13µs | 192.168.247.2 | GET "/api/installer/log"
The steps to view the graphic interface deployment are currently running to launch_matrixdb
.
Find the relevant operation in the log and analyze according to the context: it is found that the pg_ctl
starts the instance step is currently running. This indicates that the instance failed to start, resulting in the entire initialization operation failing and fallback.
There are several reasons that may cause the startup instance to fail. You need to check one by one according to the actual situation:
/tmp
permission is insufficient, and the lock file cannot be created.Solution
Since the causes of the first three problems are complicated, specific scenarios need to be analyzed in detail, and will not be described in detail here.
Under root user, change the /tmp
path permission to 777
permission.
[root@mdw ~]# chmod 777 /tmp
Error message
[root@sdw4 yum.repos.d]# yum -y install /home/mxadmin/matrixdb5-5.0.0.enterprise-1.el7.x86_64.rpm
已加载插件:fastestmirror
正在检查 /home/mxadmin/matrixdb5-5.0.0.enterprise-1.el7.x86_64.rpm: matrixdb5-5.0.0.enterprise-1.el7.x86_64
/home/mxadmin/matrixdb5-5.0.0.enterprise-1.el7.x86_64.rpm 将被安装
正在解决依赖关系
--> Checking transactions
---> package matrixdb.x86_64.0.5.0.enterprise-1.el7 will be installed
--> Dependency sysstat is being processed, it is required by package matrixdb5-5.0.0.enterprise-1.el7.x86_64
Loading mirror speeds from cached hostfile
--> Resolving dependency completion
错误:软件包:matrixdb5-5.0.0.enterprise-1.el7.x86_64 (/matrixdb5-5.0.0.enterprise-1.el7.x86_64)
需要:sysstat
您可以尝试添加 --skip-broken 选项来解决该问题
您可以尝试执行:rpm -Va --nofiles --nodigest
Problem Analysis
The sysstat package is missing.
Solution
Configure the yum source, use yum -y install sysstat
to install the sysstat tool, and then install the YMatrix package.
Problem Analysis
After the YMatrix deployment is completed, the Supervisor process starts an exception. After checking /var/log/messages
, you will find the following exception log
Dec 22 19:08:59 sdw21 systemd: matrixdb.supervisor.service holdoff time over, scheduling restart.
Dec 22 19:08:59 sdw21 systemd: Stopped MatrixDB Supervisord Daemon.
Dec 22 19:08:59 sdw21 systemd: Started MatrixDB Supervisord Daemon.
Dec 22 19:08:59 sdw21 bash: time="2022-12-22T19:08:59+08:00" level=info msg="load configuration from file" file=/etc/matrixdb/supervisor.conf
Dec 22 19:08:59 sdw21 bash: time="2022-12-22T19:08:59+08:00" level=info msg="load config file over, content "
Dec 22 19:09:09 sdw21 bash: panic: timeout to start gRPC service
Dec 22 19:09:09 sdw21 bash: goroutine 1 [running]:
Dec 22 19:09:09 sdw21 bash: main.runServer()
Dec 22 19:09:09 sdw21 bash: /home/runner/work/matrixdb-ci/matrixdb-ci/cmd/supervisor/main.go:151 +0x4f0
Dec 22 19:09:09 sdw21 bash: main.main()
Dec 22 19:09:09 sdw21 bash: /home/runner/work/matrixdb-ci/matrixdb-ci/cmd/supervisor/main.go:216 +0x185
Dec 22 19:09:09 sdw21 systemd: matrixdb.supervisor.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Dec 22 19:09:09 sdw21 systemd: Unit matrixdb.supervisor.service entered failed state.
Dec 22 19:09:09 sdw21 systemd: matrixdb.supervisor.service failed.
Dec 22 19:09:14 sdw21 systemd: matrixdb.supervisor.service holdoff time over, scheduling restart.
Dec 22 19:09:14 sdw21 systemd: Stopped MatrixDB Supervisord Daemon.
Dec 22 19:09:14 sdw21 systemd: Started MatrixDB Supervisord Daemon.
After investigation, it was found that the variable value in the /etc/sysctl.conf
file was too large, which would cause the Supervisor to run poorly. Just change to the normal size value, as follows.
Solution
/etc/sysctl.conf
to prevent Supervisor from blocking operation################## value too large, supervisor startup fail
#net.core.rmem_default = 1800262144
#net.core.wmem_default = 1800262144
#net.core.rmem_max = 2000777216
#net.core.wmem_max = 2000777216
############
net.core.rmem_default = 262144
net.core.wmem_default = 262144
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
journalctl --no-pager
to grab detailed information, including the crashed stack, etc.Error message
Collection of information failed collect: do collect: hardware: GetDisk: createTempUser: error execute useradd: exit status 1 please create "mxadmin" user manually to workaround this issue useradd: Unable to open /etc/passwd
Problem Analysis
According to the analysis of the error information, it is the /etc/passwd
file permission problem that caused the creation of the operating system user mxadmin failed.
Solution
Create a test user manually and view the error message
[root@sdw4 ~]# useradd test1
useradd:无法打开 /etc/passwd
Check the /etc/passwd
permissions
[root@sdw4 ~]# ll /etc/passwd
-rw-r--r-- 1 root root 898 12月 24 01:48 /etc/passwd
The result shows that the 644
permission is normal permission, no problem
Check whether /etc/passwd
has special permissions
[root@sdw4 ~]# lsattr /etc/passwd
----i--------- /etc/passwd
The result shows that the /etc/passwd
file has special permissions "i" (Permission Description: No user may change or delete, including root).
Repeat the above steps to check the /etc/group
file
[root@sdw4 ~]# ll /etc/group
-rw-r--r-- 1 root root 460 12月 24 01:48 /etc/group
[root@sdw4 ~]# lsattr /etc/group
----i---------- /etc/group
Remove the special permissions of the two files /etc/passwd
and /etc/group
[root@sdw4 ~]# chattr -i /etc/passwd
[root@sdw4 ~]# chattr -i /etc/group
Try to create a test user manually again
[root@sdw4 ~]# useradd test1
As a result, users can be created normally.
Delete the test user
[root@sdw4 ~]# userdel -r test1
Continue installing and deploying YMatrix using the graphic interface
When using MXUI for initialization, perform the last step and the error is reported as follows:
[20221101:14:59:02][ERROR] id=3; failed: initialize_database: error execute "/opt/ymatrix/matrixdb-5.0.0+enterprise/bin/initdb"
STDOUT:
The files belonging to this database system will be owned by user "mxadmin".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf8".
The default text search configuration will be set to "english".
{
"error": "execute: do execute: run: initialize_database: error execute \"/opt/ymatrix/matrixdb-5.0.0+enterprise/bin/initdb\"\n\nSTDOUT:\n The files belonging to this database system will be owned by user \"mxadmin\".\nThis user must also own the server process.\n\nThe database cluster will be initialized with locale \"en_US.utf8\".\nThe default text search configuration will be set to \"english\".\n\nData page checksums are enabled.\n\nfixing permissions on existing directory /data/mxdata_20221101121858/primary/mxseg0 ... ok\ncreating subdirectories ... ok\nselecting dynamic shared memory implementation ... posix\nselecting default max_connections ... \nSTDERR:\n initdb: error: initdb: error 256 from: \"/opt/ymatrix/matrixdb-5.0.0+enterprise/bin/postgres\" --boot -x0 -F -c max_connections=1500 -c shared_buffers=262144 -c dynamic_shared_memory_type=posix \u003c \"/dev/null\" \u003e \"/dev/null\" 2\u003e\u00261\ninitdb: removing contents of data directory \"/data/mxdata_20221101121858/primary/mxseg0\"\n"
Data page checksums are enabled.
fixing permissions on existing directory /data/mxdata_20221101121858/primary/mxseg0 ... ok
creating subdirectories ... ok
}
selecting dynamic shared memory implementation ... posix
selecting default max_connections ...
STDERR:
initdb: error: initdb: error 256 from: "/opt/ymatrix/matrixdb-5.0.0+enterprise/bin/postgres" --boot -x0 -F -c max_connections=1500 -c shared_buffers=262144 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1
initdb: removing contents of data directory "/data/mxdata_20221101121858/primary/mxseg0"
Problem Analysis
Solution
/etc/hosts
.free -g
command to check the memory size.OK.
The graphic client MXUI uses http://<IP>:8240
to provide services to the outside world. If you need to enable it to be directly accessed through the domain name http://hostname
, you can configure the reverse proxy with the help of Nginx.
For example: If you need to set mxui.ymatrix.cn
as an external access address, the Nginx configuration is as follows.
server
{
listen 80;
server_name mxui.ymatrix.cn; # 对外域名
# WebSocket forwarding rules
location /ws {
proxy_pass http://127.0.0.1:8240/ws;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
proxy_set_header Host $host;
}
# Web forwarding rules
location / {
proxy_pass http://127.0.0.1:8240;
}
}
Notes!
MXUI needs to use the WebSocket API to communicate and need to configure forwarding rules separately for it to be used normally.
Interconnect error writing an outgoing packet: operation not allowed
The output of the system log /var/log/message
is as follows:
Mar 10 06:26:15 sdw37 kernel: nf_conntrack: table full, dropping packet
Mar 10 06:26:15 sdw37 kernel: nf_conntrack: table full, dropping packet
Mar 10 06:26:15 sdw37 kernel: nf_conntrack: table full, dropping packet
Mar 10 06:26:15 sdw37 kernel: nf_conntrack: table full, dropping packet
Mar 10 06:26:15 sdw37 kernel: nf_conntrack: table full, dropping packet
Mar 10 06:26:15 sdw37 kernel: nf_conntrack: table full, dropping packet
Mar 10 06:26:15 sdw37 kernel: nf_conntrack: table full, dropping packet
Mar 10 06:26:15 sdw37 kernel: nf_conntrack: table full, dropping packet
Mar 10 06:26:15 sdw37 kernel: nf_conntrack: table full, dropping packet
Mar 10 06:26:20 sdw37 kernel: nf_conntrack: table full, dropping packet
Problem Analysis
System parameters net.netfilter.nf_conntrack_max default maximum tracking 65536 connections. This error occurs when there are a large number of connections. You can use the following command to view the current system setting maximum number of connections:
cat /proc/sys/net/netfilter/nf_conntrack_max
Solution
To increase the parameter net.netfilter.nf_conntrack_max.
sysctl -w net.netfilter.nf_conntrack_max=655360
OK. Please refer to the following steps to deploy.
Notes!
Please replace the graphic interface database deployment section in the standard cluster deployment document with the following command. This part of the commands must be executed on the Master using the root user.
Switch to root user.
$ sudo su
# echo "" | /opt/ymatrix/matrixdb5/bin/mxctl setup collect > /tmp/collect.1
Then, set the number of instances on each host.
# export MXDEBUG_PRIMARY_PER_HOST=1
Collect information from each host. It needs to be collected one by one, and the Master has collected it and does not need to be executed again.
# cat /tmp/collect.1 | /opt/ymatrix/matrixdb5/bin/mxctl setup collect --host sxd1 > /tmp/collect.2
# cat /tmp/collect.2 | /opt/ymatrix/matrixdb5/bin/mxctl setup collect --host sxd2 > /tmp/collect.3
# cat /tmp/collect.3 | /opt/ymatrix/matrixdb5/bin/mxctl setup netcheck > /tmp/collect.3c
a. Set sxd2 only to Standby (sxd2 only serves the Standby role and does not store business data).
# tac /tmp/collect.3c | sed '0,/"isSegment":\ true/{s/"isSegment":\ true/"isSegment":\ false/}' | tac > /tmp/collect.3d
# tac /tmp/collect.3d | sed '0,/"isStandby":\ false/{s/"isStandby":\ false/"isStandby":\ true/}' | tac > /tmp/collect.3e
b. Add Standby node instance on sxd2 (sxd2 plays two roles: Segment + Standby).
# tac /tmp/collect.3c | sed '0,/"isStandby":\ false/{s/"isStandby":\ false/"isStandby":\ true/}' | tac > /tmp/collect.3e
In the example we choose to set sxd2 to Segment + Standby.
Adjust disk planning (optional)
# cat /tmp/collect.3e |grep -i disklist
"diskList": []
"diskList": []
"diskList": []
You can modify diskList
by editing the collect.3e
file generated in 1.3 (for example: "diskList": ["/data1", "/data2", "/data3"]
).
In this file, diskList defaults to an empty array. If you choose to keep the default file, the data will be automatically stored on the disk with the largest space.
Enable Mirror
Set up the mirror node in the collect.3e
file.
# cat /tmp/collect.3e
Modify the relevant content of Mirror settings.
{
"versionID": "iAioSDT2QJnXAXRzBfYsx6",
"genDateTime": "2023-04-12 07:26:49",
"strategy": {
"hostRole": {
"isMaster": true,
"isSegment": false,
"isStandby": false
},
"highAvailability": {
"useMirror": "auto", // auto 是默认值,可用 yes/no 强制启用。如果 Segment 主机有 2 个或更多,YMatrix 会自动启用 Mirror 机制,如果只有 1 个则不启用
"mirrorStrategy": "ring",
"haProvisioner": ""
}
Generate the final deployment plan
# cat /tmp/collect.3e | /opt/ymatrix/matrixdb5/bin/mxbox deployer plan | tee /tmp/p4
Cluster deployment
# cat /tmp/p4 | /opt/ymatrix/matrixdb5/bin/mxbox deployer setup --debug
Error message
collect: do collect: remote host 172.16.100.144: rpc error: code = Unavailable desc = connection error:desc = "ransport: Error while dialing dial tcp 172.16.100.144:4617: connect: connection refused"
Problem phenomenon
The installation and deployment cluster always reports an error and cannot connect. Check that the network is not absolute. Check out supervisor
periodicity is good and bad.
[root@ljb-sdw2 software]# systemctl status matrixdb5.supervisor.service
● matrixdb5.supervisor.service - MatrixDB 5 Supervisord Daemon
Loaded: loaded (/usr/lib/systemd/system/matrixdb5.supervisor.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2023-06-05 17:46:20 CST; 2s ago
Main PID: 14587 (supervisord)
Tasks: 9
CGroup: /system.slice/matrixdb5.supervisor.service
└─14587 /opt/ymatrix/matrixdb5/bin/supervisord -c /etc/matrixdb5/supervisor.conf
Jun 05 17:46:20 ljb-sdw2 systemd[1]: Started MatrixDB 5 Supervisord Daemon.
Jun 05 17:46:20 ljb-sdw2 bash[14587]: time="2023-06-05T17:46:20+08:00" level=info msg="load configuration from file" file=/etc/matrixdb5/supervisor.conf
Jun 05 17:46:20 ljb-sdw2 bash[14587]: time="2023-06-05T17:46:20+08:00" level=info msg="load config file over, content "
[root@ljb-sdw2 software]# systemctl status matrixdb5.supervisor.service
● matrixdb5.supervisor.service - MatrixDB 5 Supervisord Daemon
Loaded: loaded (/usr/lib/systemd/system/matrixdb5.supervisor.service; enabled; vendor preset: enabled)
Active: activating (auto-restart) (Result: exit-code) since Mon 2023-06-05 17:46:23 CST; 4s ago
Process: 14587 ExecStart=/bin/bash -c PATH="$MXHOME/bin:$PATH" exec "$MXHOME"/bin/supervisord -c "$MX_SUPERVISOR_CONF" (code=exited, status=2)
Main PID: 14587 (code=exited, status=2)
Jun 05 17:46:23 ljb-sdw2 systemd[1]: Unit matrixdb5.supervisor.service entered failed state.
Jun 05 17:46:23 ljb-sdw2 systemd[1]: matrixdb5.supervisor.service failed.
Check the cluster log file and found that it was not generated.
[root@ljb-sdw2 software]# cd /var/log/matrixdb5/
[root@ljb-sdw2 matrixdb5]# ls -l
total 0
View the supervisor
real-time dynamic log.
[root@ljb-sdw2 software]# journalctl -u matrixdb5.supervisor.service -f
-- Logs began at Sun 2023-05-07 21:33:02 CST. --
Jun 05 17:52:31 ljb-sdw2 bash[15171]: time="2023-06-05T17:52:31+08:00" level=info msg="load configuration from file" file=/etc/matrixdb5/supervisor.conf
Jun 05 17:52:31 ljb-sdw2 bash[15171]: time="2023-06-05T17:52:31+08:00" level=info msg="load config file over, content "
Jun 05 17:52:34 ljb-sdw2 bash[15171]: panic: timeout to start gRPC service
Jun 05 17:52:34 ljb-sdw2 bash[15171]: goroutine 1 [running]:
Jun 05 17:52:34 ljb-sdw2 bash[15171]: main.runServer()
Jun 05 17:52:34 ljb-sdw2 bash[15171]: /home/runner/work/matrixdb-ci/matrixdb-ci/cmd/supervisor/main.go:154 +0x4f0
Jun 05 17:52:34 ljb-sdw2 bash[15171]: main.main()
Jun 05 17:52:34 ljb-sdw2 bash[15171]: /home/runner/work/matrixdb-ci/matrixdb-ci/cmd/supervisor/main.go:219 +0x185
Jun 05 17:52:34 ljb-sdw2 systemd[1]: matrixdb5.supervisor.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Jun 05 17:52:34 ljb-sdw2 systemd[1]: Unit matrixdb5.supervisor.service entered failed state.
Jun 05 17:52:34 ljb-sdw2 systemd[1]: matrixdb5.supervisor.service failed.
Jun 05 17:52:39 ljb-sdw2 systemd[1]: matrixdb5.supervisor.service holdoff time over, scheduling restart.
Jun 05 17:52:39 ljb-sdw2 systemd[1]: Stopped MatrixDB 5 Supervisord Daemon.
Jun 05 17:52:39 ljb-sdw2 systemd[1]: Started MatrixDB 5 Supervisord Daemon.
Jun 05 17:52:40 ljb-sdw2 bash[15186]: time="2023-06-05T17:52:40+08:00" level=info msg="load configuration from file" file=/etc/matrixdb5/supervisor.conf
Jun 05 17:52:40 ljb-sdw2 bash[15186]: time="2023-06-05T17:52:40+08:00" level=info msg="load config file over, content "
Jun 05 17:52:43 ljb-sdw2 bash[15186]: panic: timeout to start gRPC service
Jun 05 17:52:43 ljb-sdw2 bash[15186]: goroutine 1 [running]:
Jun 05 17:52:43 ljb-sdw2 bash[15186]: main.runServer()
Jun 05 17:52:43 ljb-sdw2 bash[15186]: /home/runner/work/matrixdb-ci/matrixdb-ci/cmd/supervisor/main.go:154 +0x4f0
Jun 05 17:52:43 ljb-sdw2 bash[15186]: main.main()
Jun 05 17:52:43 ljb-sdw2 bash[15186]: /home/runner/work/matrixdb-ci/matrixdb-ci/cmd/supervisor/main.go:219 +0x185
Jun 05 17:52:43 ljb-sdw2 systemd[1]: matrixdb5.supervisor.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Jun 05 17:52:43 ljb-sdw2 systemd[1]: Unit matrixdb5.supervisor.service entered failed state.
Jun 05 17:52:43 ljb-sdw2 systemd[1]: matrixdb5.supervisor.service failed.
Solution
/etc/hosts
file and add the localhost
configuration# vi /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
supervisor
# sudo systemctl restart matrixdb5.supervisor.service
Solution
LOG: gp_role forced to 'utility' in single-user mode Y.sh: line 1: 11865 Illegal instruction
errorError message
"LogCheckpointEnd","xlog.c",8916, LOG: gp_role forced to 'utility' in single-user mode Y.sh: line 1:
11865 Illegal instruction (core dumped) "/opt/ymatrix/matrixdb-5.0.0+community/bin/postgres" --single -F -O -j -c
gp_role=utility -c search_path=pg_catalog -c exit_on_error=true template1 > /dev/null child process exited with exit code 132
initdb: data directory "/mxdata_20231018165815/master/mxseg-1" not removed at user's request * rpc error:
Problem Analysis
The new version of the database supports the use of SIMD instructions in vector sets. During installation, the CPU instruction set will be detected. If the CPU instruction set is not supported, the above error message will appear.
Solution
cat /proc/cpuinfo|grep -E "mmx|sse|sse2|ssse3|sse4_1|sse4_2|avx|avx2"
# All ports need to be opened between the IPs within the cluster, and only the external ports can be exposed.
# Database firewall configuration, assuming in the example 10.129.38.230, 10.129.38.231, 10.129.38.232 Three servers built a database cluster
# All ports are open between all hosts in the cluster, including TCP and UDP protocols
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="10.129.38.230" port protocol="tcp" port="0-65535" accept"
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="10.129.38.231" port protocol="tcp" port="0-65535" accept"
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="10.129.38.232" port protocol="tcp" port="0-65535" accept"
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="10.129.38.230" port protocol="udp" port="0-65535" accept"
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="10.129.38.231" port protocol="udp" port="0-65535" accept"
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="10.129.38.232" port protocol="udp" port="0-65535" accept"
# Turn on ping
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="10.129.38.230" port protocol="icmp" port="0-65535" accept"
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="10.129.38.231" port protocol="icmp" port="0-65535" accept"
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="10.129.38.232" port protocol="icmp" port="0-65535" accept"
# Master and standby master nodes are restricted to the public 5432 service port 8240
firewall-cmd --zone=public --add-port=5432/tcp --permanent
firewall-cmd --zone=public --add-port=8240/tcp --permanent
# Grafana
firewall-cmd --zone=public --add-port=3000/tcp --permanent
# View firewall rules
firewall-cmd --list-all
# Reload the firewall to make the configuration take effect
firewall-cmd --reload
systemctl restart firewalld.service