This document describes common problems with MatrixDB operation and maintenance.
Problem Description
Forgot your password using MatrixDB.
Solution
=# alter user mxadmin with password 'you password';
/etc/matrixdb/auth.conf
.# sudo cat /etc/matrixdb/auth.conf
Problem Description
When using the gpssh
command or using the ssh login command, you will encounter the following error:
ssh_exchange_identification: Connection closed by remote host
Problem Analysis
The number of ssh connections exceeds the limit.
Solution
# vim /etc/ssh/sshd_config
# MaxStartup 10:30:60
Parameter configuration meaning:
MaxStartup
triple form 10:30:60
10: When the number of connections reach 10, the connection is rejected, but not all of them are rejected.
30: When the number of connections reaching 10, there is a probability that the subsequent connections will be rejected.
60: When the number of connections reaching 60, all subsequent connections are rejected.
#
, MaxStartup 30:30:60
.systemctl restart sshd
# or
service sshd restart
log
psql: WARNING: database "testdb" must be vacuumed within 529926866 transactions (seg35 10.162.115.5:6005 pid=448116)
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
VACUUM
Problem Analysis
The age of the transaction exceeds the limit.
Solution
$ psql -d testdb -c "vacuum freeze;"
Problem Description
When running SQL statements, you will encounter the following error:
could not read block 3684552 of temporary file: read only 0 of 8192 bytes
Problem Analysis
Disk-related problems will occur if the server is insufficient during SQL calculation.
This problem also occurs when disk exceptions are performed during SQL calculations.
Solution
Use the following command to observe the disk usage in real time during SQL calculation.
du -sh
Use the journalctl
command to find error information about the disk.
// View the total save size of journalctl logs
journalctl --disk-usage
// View logs after 20:20 pm on March 30, 2023
journalctl --since "2023-03-30 20:20:00"
// View all logs from 00:00:00 on March 1, 2023 to 00:00:00 on March 31, 2023
journalctl --since "2023-03-01 00:00:00" --until "2023-03-31 00:00:00"
// You can use "yesterday", "today", "tomorrow" or "now" to get the time period of logs
journalctl --since yesterday
// Get the log after 2023-03-25 Save it to the local disk
journalctl --since "2023-03-25 00:00:00" > journalctl.log
// Find logs about disk errors in journalctl.log log
cat journalctl.log|grep "I/O"
OK.
Modify /etc/hosts
Backup saves the /etc/hosts
file.
cp /etc/hosts /tmp/hosts_bak
Modify the Hostname of MatrixDB in the /etc/hosts
file.
Modify MatrixDB metadata information Switch users.
$ su - mxadmin
Close MatrixDB.
$ gpstop -af
Start the MatrixDB Master node.
$ gpstart -m
Go to the MatrixDB command line and run the command.
PGOPTIONS='-c gp_session_role=utility' psql -U mxadmin postgres
set allow_system_table_mods='ture';
update gp_segment_configuration set hostname=<更改之后的主机名>,address=<更改之后的主机名> where address=<原来的主机名>;
Close MatrixDB.
$ gpstop -mf
Start MatrixDB.
$ gpstart -m
Modify MatrixDB monitoring information Deploying Grafana monitoring requires the following operations:
psql -d matrixmgr
matrixmgr=# SELECT mxmgr_remove_all('local');
matrixmgr=# truncate local.matrix_manager_config;
matrixmgr=# SELECT mxmgr_init_local();
Modify the information related to hostname in dashboard.json
.
For example, the original hostname was dw1 and after modification it was sdw1, all dw1 in dashboard.json
needs to be modified to sdw1 and re-imported Dashboard.
Deploy Prometheus monitoring to perform the following operations
matrixmgr=# SELECT mxmgr_remove_exporter();
matrixmgr=# SELECT mxmgr_remove_gate_exporter();
matrixmgr=# truncate local.matrix_manager_config;
matrixmgr=# SELECT mxmgr_init_exporter();
Problem Analysis
The host is absolute, the cluster startup fails after restart, check the database log, and an error is reported:
FATAL","3D000","database ""postgres"" does not exist","The database subdirectory ""base/13376"" is missing
Check the corresponding file directory and find that the 13376
directory is missing.
Problem recurs
[mxadmin@mdw3 base]$ mxstop -af
13376
directory[mxadmin@mdw3 base]$ mv 13376 13376bak
[mxadmin@mdw3 ~]$ mxstart -a
2023-05-24:23:26:50.264 mxstart:mxadmin:mdw3:017121-[INFO]:-Starting MatrixDB cluster with: [no-prompt:true].
2023-05-24:23:26:50.264 mxstart:mxadmin:mdw3:017121-[INFO]:-Connecting to physical cluster's etcd cluster ...
2023-05-24:23:26:50.269 mxstart:mxadmin:mdw3:017121-[INFO]:-Connected to physical cluster's etcd cluster: [http://192.168.8.28:4679].
2023-05-24:23:26:50.269 mxstart:mxadmin:mdw3:017121-[INFO]:-Collecting database cluster ID ...
2023-05-24:23:26:50.269 mxstart:mxadmin:mdw3:017121-[INFO]:-Collected database cluster ID: AuWFhsrjyywC4xfMahgyor
2023-05-24:23:26:50.271 mxstart:mxadmin:mdw3:017121-[INFO]:-Collecting cluster info ...
2023-05-24:23:26:50.272 mxstart:mxadmin:mdw3:017121-[INFO]:----------------------------
2023-05-24:23:26:50.272 mxstart:mxadmin:mdw3:017121-[INFO]:-Master instance parameters
2023-05-24:23:26:50.272 mxstart:mxadmin:mdw3:017121-[INFO]:----------------------------
2023-05-24:23:26:50.272 mxstart:mxadmin:mdw3:017121-[INFO]:-Master Host = mdw3
2023-05-24:23:26:50.272 mxstart:mxadmin:mdw3:017121-[INFO]:-Master Port = 5432
2023-05-24:23:26:50.272 mxstart:mxadmin:mdw3:017121-[INFO]:-Master Directory = /mxdata_20230514185455/master/mxseg-1
2023-05-24:23:26:50.272 mxstart:mxadmin:mdw3:017121-[INFO]:----------------------------------------
2023-05-24:23:26:50.272 mxstart:mxadmin:mdw3:017121-[INFO]:-Segment instances that will be started
2023-05-24:23:26:50.272 mxstart:mxadmin:mdw3:017121-[INFO]:----------------------------------------
2023-05-24:23:26:50.272 mxstart:mxadmin:mdw3:017121-[INFO]:-Host Port DataDir Role
2023-05-24:23:26:50.272 mxstart:mxadmin:mdw3:017121-[INFO]:-mdw3 6000 /mxdata_20230514185455/primary/mxseg0 PRIMARY
2023-05-24:23:26:50.272 mxstart:mxadmin:mdw3:017121-[INFO]:-mdw3 6001 /mxdata_20230514185455/primary/mxseg1 PRIMARY
2023-05-24:23:26:50.272 mxstart:mxadmin:mdw3:017121-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait ...
2023-05-24:23:26:50.275 mxstart:mxadmin:mdw3:017121-[INFO]:-Send start to cluster service
2023-05-24:23:26:50.603 mxstart:mxadmin:mdw3:017121-[INFO]:-Retry start for [1/10], caused by rpc error: code = Unknown desc = query segments_configuration on master failed failed to connect to `host=mdw3 user=mxadmin database=postgres`: server error (FATAL: database "postgres" does not exist (SQLSTATE 3D000)).
2023-05-24:23:26:51.605 mxstart:mxadmin:mdw3:017121-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait ...
2023-05-24:23:26:51.608 mxstart:mxadmin:mdw3:017121-[INFO]:-Send start to cluster service
2023-05-24:23:26:51.748 mxstart:mxadmin:mdw3:017121-[INFO]:-Retry start for [2/10], caused by rpc error: code = Unknown desc = query segments_configuration on master failed failed to connect to `host=mdw3 user=mxadmin database=postgres`: server error (FATAL: database "postgres" does not exist (SQLSTATE 3D000)).
2023-05-24:23:26:52.749 mxstart:mxadmin:mdw3:017121-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait ...
2023-05-24:23:26:52.752 mxstart:mxadmin:mdw3:017121-[INFO]:-Send start to cluster service
2023-05-24:23:26:52.781 mxstart:mxadmin:mdw3:017121-[INFO]:-Retry start for [3/10], caused by rpc error: code = Unknown desc = query segments_configuration on master failed failed to connect to `host=mdw3 user=mxadmin database=postgres`: server error (FATAL: database "postgres" does not exist (SQLSTATE 3D000)).
2023-05-24:23:26:53.782 mxstart:mxadmin:mdw3:017121-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait ...
2023-05-24:23:26:53.785 mxstart:mxadmin:mdw3:017121-[INFO]:-Send start to cluster service
2023-05-24:23:26:53.906 mxstart:mxadmin:mdw3:017121-[INFO]:-Retry start for [4/10], caused by rpc error: code = Unknown desc = query segments_configuration on master failed failed to connect to `host=mdw3 user=mxadmin database=postgres`: server error (FATAL: database "postgres" does not exist (SQLSTATE 3D000)).
2023-05-24:23:26:54.907 mxstart:mxadmin:mdw3:017121-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait ...
2023-05-24:23:26:54.912 mxstart:mxadmin:mdw3:017121-[INFO]:-Send start to cluster service
2023-05-24:23:26:54.933 mxstart:mxadmin:mdw3:017121-[INFO]:-Retry start for [5/10], caused by rpc error: code = Unknown desc = query segments_configuration on master failed failed to connect to `host=mdw3 user=mxadmin database=postgres`: server error (FATAL: database "postgres" does not exist (SQLSTATE 3D000)).
2023-05-24:23:26:55.933 mxstart:mxadmin:mdw3:017121-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait ...
2023-05-24:23:26:55.936 mxstart:mxadmin:mdw3:017121-[INFO]:-Send start to cluster service
2023-05-24:23:26:55.958 mxstart:mxadmin:mdw3:017121-[INFO]:-Retry start for [6/10], caused by rpc error: code = Unknown desc = query segments_configuration on master failed failed to connect to `host=mdw3 user=mxadmin database=postgres`: server error (FATAL: database "postgres" does not exist (SQLSTATE 3D000)).
2023-05-24:23:26:56.958 mxstart:mxadmin:mdw3:017121-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait ...
2023-05-24:23:26:56.961 mxstart:mxadmin:mdw3:017121-[INFO]:-Send start to cluster service
2023-05-24:23:26:57.085 mxstart:mxadmin:mdw3:017121-[INFO]:-Retry start for [7/10], caused by rpc error: code = Unknown desc = query segments_configuration on master failed failed to connect to `host=mdw3 user=mxadmin database=postgres`: server error (FATAL: database "postgres" does not exist (SQLSTATE 3D000)).
2023-05-24:23:26:58.086 mxstart:mxadmin:mdw3:017121-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait ...
2023-05-24:23:26:58.089 mxstart:mxadmin:mdw3:017121-[INFO]:-Send start to cluster service
2023-05-24:23:26:58.114 mxstart:mxadmin:mdw3:017121-[INFO]:-Retry start for [8/10], caused by rpc error: code = Unknown desc = query segments_configuration on master failed failed to connect to `host=mdw3 user=mxadmin database=postgres`: server error (FATAL: database "postgres" does not exist (SQLSTATE 3D000)).
2023-05-24:23:26:59.115 mxstart:mxadmin:mdw3:017121-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait ...
2023-05-24:23:26:59.118 mxstart:mxadmin:mdw3:017121-[INFO]:-Send start to cluster service
2023-05-24:23:26:59.140 mxstart:mxadmin:mdw3:017121-[INFO]:-Retry start for [9/10], caused by rpc error: code = Unknown desc = query segments_configuration on master failed failed to connect to `host=mdw3 user=mxadmin database=postgres`: server error (FATAL: database "postgres" does not exist (SQLSTATE 3D000)).
2023-05-24:23:27:00.140 mxstart:mxadmin:mdw3:017121-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait ...
2023-05-24:23:27:00.143 mxstart:mxadmin:mdw3:017121-[INFO]:-Send start to cluster service
2023-05-24:23:27:00.299 mxstart:mxadmin:mdw3:017121-[INFO]:-Retry start for [10/10], caused by rpc error: code = Unknown desc = query segments_configuration on master failed failed to connect to `host=mdw3 user=mxadmin database=postgres`: server error (FATAL: database "postgres" does not exist (SQLSTATE 3D000)).
2023-05-24:23:27:00.301 mxstart:mxadmin:mdw3:017121-[ERROR]:-MatrixDB cluster start failed: 'rpc error: code = Unknown desc = query segments_configuration on master failed failed to connect to `host=mdw3 user=mxadmin database=postgres`: server error (FATAL: database "postgres" does not exist (SQLSTATE 3D000))'. Please confirm database cluster has been deployed.
Error: rpc error: code = Unknown desc = query segments_configuration on master failed failed to connect to `host=mdw3 user=mxadmin database=postgres`: server error (FATAL: database "postgres" does not exist (SQLSTATE 3D000))
Solution
postgres
library file of the Master node to be lost. If there is Standby in this cluster, switch Standby to take over the Master and rebuild Standby.Notes!
It is recommended to configure a complete Standby and Mirror mirroring mechanism to ensure the security and availability of the cluster.
Error message
Error: do execute: deploy rpc: rpc error: code = Unknown desc = deployer execute: rpc error: code = Unknown desc = error execute "/opt/ymatrix/matrixdb5/bin/pg_basebackup"
STDERR:
pg_basebackup: error: could not send replication command "CREATE_REPLICATION_SLOT "internal_wal_replication_slot" PHYSICAL RESERVE_WAL": ERROR: replication slot "internal_wal_replication_slot" already exists
Solution
=# SELECT E'PGOPTIONS=\'-c gp_role\=utility\' psql -h ' ||hostname ||' -p ' ||port FROM gp_segment_configuration where content<>-1;
?column?
----------------------------------------------------------------------------------------------------------------------------------
PGOPTIONS='-c gp_role=utility' psql -h sdw3 -p 6000
PGOPTIONS='-c gp_role=utility' psql -h sdw2 -p 6000
PGOPTIONS='-c gp_role=utility' psql -h sdw1 -p 6000
(3 rows)
slot
informationSELECT * FROM pg_replication_slots ;
slot
SELECT pg_drop_replication_slot('internal_wal_replication_slot');
ERROR: interconnect Error: Could not set up tcp listener socket
Error message
**** con19047721,,seg14,,sx1,"FATAL","58M01","interconnect Error: Could not set up tcp listener socket","bind: Address already in use",,,,0,,"ic_tcp.c",293,
**** con19047731,,seg14,,sx1,"FATAL","58M01","interconnect Error: Could not set up tcp listener socket","bind: Address already in use",,,,0,,"ic_tcp.c",293,
**** con19047723,,seg14,,sx1,"FATAL","58M01","interconnect Error: Could not set up tcp listener socket","bind: Address already in use",,,,0,,"ic_tcp.c",293,
Problem Analysis
Using journalctl
, we found that NetworkManager
has been restarting the dhclient
service, which will cause network interruption during the restart.
NetworkManager[2285]: <info> [1691711564.7553] device (ens10f0): state change: config -> ip-config (rea
NetworkManager[2285]: <info> [1691711564.7562] dhcp4 (ens10f0): activation: beginning transaction (time
NetworkManager[2285]: <info> [1691711564.7599] dhcp4 (ens10f0): dhclient started with pid 88186
dhclient[88186]: DHCPDISCOVER on ens10f0 to 255.255.255.255 port 67 interval 6 (xid=0x34e8d695)
dhclient[88186]: DHCPDISCOVER on ens10f0 to 255.255.255.255 port 67 interval 12 (xid=0x34e8d695)
In CentOS 7, there will be two services related to network configuration: network
and NetworkManager
. network
is configured through a static network interface, while NetworkManager
is a dynamic network to realize network management.
When these two services are started at the same time, conflicts will arise. NetworkManager
will clean up the route when the network is disconnected, causing the network to be disconnected.
Check the NetStat panel on the monitoring to view connection usage.
Solution
NetworkManager
servicesystemctl stop NetworkManager
systemctl disable NetworkManager
Add the IP segment of the target database to the pg_hba
file of all servers where the source database is located.
index ron requires 19160 bvtes, maximum size is 8191
Error message
java. lang.RuntimeException: org.springframework.web.client.HttpServerErrorException$InternalServerError: 500 Internal Server Error:
[ERROR: index row requires 19160 bytes, maximum size is 8191 (seg35 10.17.8.132:6012 pid=15508) (SQLSTATE 54000)]
Problem Analysis
The number of row limit for the btree
index was inserted.
postgres=# CREATE TABLE test ( x text ) DISTRIBUTED BY(x);
CREATE TABLE
Time: 26.768 ms
postgres=# INSERT INTO test(x) VALUES ( repeat('x', 900000) );
INSERT 0 1
Time: 16.707 ms
postgres=# CREATE INDEX test_x ON test USING btree(x);
ERROR: index row requires 10328 bytes, maximum size is 8191 (seg0 172.16.172.148:6000 pid=21674)
Time: 16.257 ms
postgres=# CREATE INDEX test_x ON test USING brin(x);
CREATE INDEX
Time: 19.866 ms
Solution
Reduce the length of the field or replace other indexes.