This document describes frequently asked questions about MatrixDB SQL queries.
Problem Analysis
When the parameter gp_enable_global_deadlock_detector is set to on, the lock mode may drop to RowExclusiveLock. When the update distribution key is performed, the UPDATE operation will be split into DELETE + INSERT. At this time, the concurrent update distribution key is updated. The DELETE operation cannot perform EvalPlanQual, and the INSERT operation will not blocked, which may generate unnecessary data. In order to prevent this phenomenon, such errors are thrown.
Reproduction steps
session 0: create table test_valplanqual (c1 int, c2 int) distributed by(c1);
CREATE
session 0: insert into test_valplanqual values(1,1);
INSERT 1
session 0: select * from test_valplanqual;
c1 | c2
----+----
1 | 1
(1 row)
session 1: begin;
BEGIN
session 2: begin;
BEGIN
session 1: update test_valplanqual set c1 = c1 + 1 where c1 = 1;
UPDATE 1
session 2: update test_valplanqual set c1 = c1 + 1 where c1 = 1; <waiting ...>
session 1: end;
END
session 2<: <... completed>
ERROR: EvalPlanQual can not handle subPlan with Motion node (seg1 127.0.1.1:7003 pid=34629)
session 2: end;
END
session 0: select * from test_valplanqual;
c1 | c2
----+----
2 | 1
(1 row)
session 0: drop table test_valplanqual;
DROP
Solution
Avoid updating the distribution keys.
Problem Analysis
The current mars2 storage engine controls the number of INSERT XIDs that exist on a BLOCK, and the algorithm is as follows:
#define INSERTXIDSPERBLOCK ((BLCKSZ - PAGE_RESERVED_SPACE) / sizeof(TransactionId))
#define PAGE_RESERVED_SPACE(MAXALIGN(sizeof(PageHeaderData)) + MAXALIGN(sizeof(SortHeapPageOpaqueData)))
Approximately (block_size - 24 + 24) / 4 sub-transactions. When the default block_size is 8K, it is about 2036. This error will occur when the number of sub-transactions exceeds this value.
Reproduction steps
session 0: create table test_xids(id int) using mars2;
CREATE TABLE
session 0: create index ON test_xids using mars2_btree (id);
CREATE INDEX
session 0:
DO $$
DECLARE
i int;
_start timestamptz;
BEGIN
FOR i IN 1..3000 LOOP
_start = clock_timestamp();
INSERT INTO test_xids VALUES (i);
RAISE NOTICE 'value: % escape: % ', i, clock_timestamp() - _start;
END LOOP;
END;
$$language plpgsql;
psql: NOTICE: value: 1 escape: 00:00:00.019476
psql: NOTICE: value: 2 escape: 00:00:00.002501
psql: NOTICE: value: 3 escape: 00:00:00.00218
psql: NOTICE: value: 4 escape: 00:00:00.002098
psql: NOTICE: value: 5 escape: 00:00:00.002938
psql: NOTICE: value: 6 escape: 00:00:00.001891
psql: NOTICE: value: 7 escape: 00:00:00.001794
...
...
psql: NOTICE: value: 2069 escape: 00:00:10.001794
psql: NOTICE: value: 2070 escape: 00:00:10.001495
psql: NOTICE: value: 2071 escape: 00:00:10.001251
psql: NOTICE: ERROR: Too many unresolved insertion xids, please do a vacuum (sortheap_external_sort.c:404) (seg0 172.16.100.197:6000 pid=12992) (sortheap_external_sort.c:404)
CONTEXT: SQL statement "insert into test_xids values(i)"
PL/pgSQL function inline_code_block line 7 at SQL statement
Solution
To avoid multiple executions of INSERT INTO tablename VALUES(...) operations in a transaction, the most friendly approach is to perform INSERT operations only once per transaction and insert data in batches using the form of INSERT INTO tablename VALUES(...),(...),(...). For MARS2 tables, the official recommendations using mxgate tool to write data more efficiently.
Problem Analysis
When the parameter gp_enable_global_deadlock_detector is set to on, the lock mode may drop to RowExclusiveLock. When performing an update distribution key, the INSERT...ON CONFLICT DO UPDATE SET... statement will be split into DELETE + INSERT when executing the UPDATE operation. At this time, the distribution key is updated, and this type of error is thrown.
Reproduction steps
session 0: create table test_upsert(id int, name text, primary key(id)) distributed by(id);
CREATE TABLE
session 0: insert into test_upsert select 1, 'a';
INSERT 0 1
session 0: insert into test_upsert select 1, 'a' on conflict (id) do update set id =2;
psql: ERROR: modification of distribution columns in OnConflictUpdate is not supported
session 0: insert into test_upsert select 1, 'a' on conflict (id) do update set name = 2;
INSERT 0 1
session 0: drop table test_upsert;
DROP
Solution
Avoid updating the distribution keys when performing UPSERT.
server closed the connection unexpectedly
The UI client accesses the remote database, sends queries after a long query process or a long time idle, and the client sometimes receives a log:
server closed the connection unexpectedly
Problem Analysis
The client has query timeout cancel setting, or idle timeout cancel connection setting.
Solution
Change the client timeout setting and cancel the timeout.
Problem Analysis
For the IN query of the PARTITION table, there is only 1 DEFAULT partition after the partition is cropped, but each subquery in the UNION ALL query is cropped to the DEFAULT partition. After multiple scans of the DEFAULT partition, the performance impact is obvious.
Solution
For the PARTITION table: Try to avoid DEFAULT partitioning, try not to use UNION but use the IN clause.
Problem Analysis
The query in the plpgsql function is run through SPI. The output result of SPI Plan is two tables Join, using nestloop, and the statement rows = 1, without ANALYSE.
Solution
Perform ANALYZE.
Problem Analysis
Distributed deadlock.
Solution
Turn on distributed deadlock detection.
gpconfig -c gp_enable_global_deadlock_detector -v on
Custom type usage
CREATE TYPE public.fhpm AS
(
avgval double precision,
minval double precision,
maxval double precision,
minval_interval integer,
maxval_interval integer
);
CREATE TABLE datapool.test
(
portindex integer,
begintime timestamp without time zone,
a_1 fhpm,
s_2 integer
);
insert into datapool.test values(1,'2022-01-01','(1,1,1,1,1)',1);
insert into datapool.test values(2,'2022-01-01','(2,,2,,)',2);
SELECT * FROM datapool.test;
portindex | begintime | a_1 | s_2
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | 2022-01-01 00:00:00 | (1,1,1,1,1) | 1
2 | 2022-01-01 00:00:00 | (2,,2,,) | 2
(2 rows)
SELECT (a_1).avgval FROM datapool.test;
avgval
--------
1
2
(2 rows)
SELECT (a_1).minval FROM datapool.test;
minval
--------
1
(2 rows)
Single table field extension psql: ERROR: tables can have at most 1600 columns
Use a custom type to achieve a single table that breaks the limit of 1600 fields. If you are interested, you can try it yourself. The following are implementation examples.
CREATE TYPE public.fhpm1 AS
(
a0001 double precision,
a0002 double precision,
...
a1600 double precision);
CREATE TABLE datapool.test1
(
portindex integer,
begintime timestamp without time zone,
a_1 fhpm1,
s_2 integer,
a_3 fhpm2
);
Generate 1600 column type script
echo 'CREATE TYPE public.fhpm1 AS(' > test.sql
for i in {1..1599};do
echo 'a'$i' double precision,' >> test.sql
done
echo 'a1600 double precision);' >> test.sql
Problem Analysis
The error is index corrupt. This error may also occur when the table is updated frequently and the index is severely bloated. This error occurs that means the index is no longer used normally.
Solution
Rebuild the index.
drop index tab_idx;
create index tab_idx on table &tablename (column1,column2..);
Problem Analysis
There are too many session connections, exceeding the maximum number of connections that the database limit is limited.
Solution
show superuser_reserved_connections;
. The default reserved is 10 connections. If it is caused by non-super-management users, you can kill the idle session in the database through the super-management user, or temporarily restrict the user login permissions for some low-priority jobs.too many clients already
. The restart method is as follows:pg_ctl
gpstart -m
commandgpstop
command to stop the cluster and check all instance residual processesgpstart
command to start the clustercannot delete from table "&tablename" because it does not have a replica identity and publishes deletes
Problem Analysis
The error table is turned on logical copying. If the logically copied table is not set, you can only perform insert operations. Update and delete will report the above errors. This replication logic appears in early postgresql, but this feature has always existed in subsequent versions. It is recommended not to enable this feature in MatrixDB.
Problem recurs
test6=# create table test as select * from pg_tables distributed randomly;
SELECT 94
test6=# create publication rep_source for table test;
CREATE PUBLICATION
test6=# insert into test select * from test;
INSERT 0 94
test6=# update test set tablename='pg_class_old' where tablename='pg_class';
psql: ERROR: cannot update table "test" because it does not have a replica identity and publishes updates
HINT: To enable updating the table, set REPLICA IDENTITY using ALTER TABLE.
test6=# delete from test where tablename='pg_class';
psql: ERROR: cannot delete from table "test" because it does not have a replica identity and publishes deletes
HINT: To enable deleting from the table, set REPLICA IDENTITY using ALTER TABLE.
Solution
Sets the previous image value that contains all columns when updating and deleting.
alter table &schema.tablename REPLICA IDENTITY FULL;
Or delete the corresponding copy and publish task.
select a.*,b.prrelid::regclass tablename from pg_publication a,pg_publication_rel b where a.oid=b.prpubid;
drop publication &任务名;
Problem Analysis
The file block is corrupt, causing the corresponding file to be unreadable
Problem recurs
test6=# create table test as select * from pg_tables distributed randomly;
SELECT 94
test6=# create index idx_tabname on test1(tablename);
CREATE INDEX
test6=# select relname,relfilenode from pg_class where relname in('test1','idx_tabname');
relname | relfilenode
----------------------------------------------------------------------------------------------------------------------------------
idx_tabname | 269422
test1 | 269421
(2 rows)
根据relfilenode找到对应的文件,任意破坏一个(此处破坏的是索引文件)
test6=# select * from test1 where tablename='a';
psql: ERROR: could not read block 0 in file "base/1588803/269422": read only 8123 of 8192 bytes
Solution
According to the error relfilenode
, confirm whether the damaged table file or index file is:
Situation 1: The index file is corrupted. Just rebuild directly, use the reindex
command.
Case 2: If there is Mirror, the data table file is corrupted. Just copy the Mirror file of the same content and replace the Primary file.
Situation 3: No Mirror, the data table file is corrupted, the data of the normal instance is copied separately, the table is rebuilt, and the data of the absolute instance will be lost.
pg_hba.conf
file configuration error causes the cluster to fail to startProblem Analysis
pg_hba.conf
is a configuration file that restricts remote access to the database permissions. If the configuration of this file does not meet the format requirements, the cluster will not be started.
Problem recurs
Modify the pg_hba.conf
file and add a line of information at the end.
[mxadmin@mxd2 ~]$ cd $MASTER_DATA_DIRECTORY
[mxadmin@mxd2 mxseg-1]$ vi pg_hba.conf
host all all 0.0.0.0 md5 ##新加行,不符合标准格式要求
Start the database and report an error.
[mxadmin@mxd2 mxseg-1]$ gpstart -a
...
20221230:14:08:56:014273 gpstart:mxd2:mxadmin-[CRITICAL]:-Failed to start Master instance in admin mode
20221230:14:08:56:014273 gpstart:mxd2:mxadmin-[CRITICAL]:-Error occurred: non-zero rc: 1
Command was: 'env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /mxdata_20220408112227/master/mxseg-1 -l /mxdata_20220408112227/master/mxseg-1/log/startup.log -w -t 600 -o " -p 5432 -c gp_role=utility " start'
rc=1, stdout='waiting for server to start.... stopped waiting
', stderr='pg_ctl: could not start server
Examine the log output.
'
Check the database log.
2022-12-29 22:08:56.932011 PST,,,p14282,th804919424,,,,0,,,seg-1,,,,,"LOG","F0000","invalid IP mask ""md5"": Name or service not known",,,,,"line 104 of configuration file ""/mxdata_20220408112227/master/mxseg-1/pg_hba.conf""",,0,,"hba.c",1254,
2022-12-29 22:08:56.932029 PST,,,p14282,th804919424,,,,0,,,seg-1,,,,,"FATAL","F0000","could not load pg_hba.conf",,,,,,,0,,"postmaster.c",1542,
2022-12-29 22:08:56.932033 PST,,,p14282,th804919424,,,,0,,,seg-1,,,,,"LOG","00000","database system is shut down",,,,,,,0,,"miscinit.c",905,
Solution
Fixed the pg_hba.conf
file configuration:
Case 1: If the exception line is illegally added, the corresponding exception line is deleted. Just start the cluster normally.
Scenario 2: The exception line is an absolute format when added. You need to confirm the adjustment of access rights and correct the exception line. Then start the cluster normally.
pg_hba.conf
Remote access permission exception caused by the order of the file content configurationProblem Analysis
pg_hba.conf
is a configuration file that restricts remote access to the database permissions. The order of access permissions is read from top to top. The configuration order is incorrect, which will cause permission control to fail.
Problem recurs
Create a test user.
test6=# create role test with login password 'test';
psql: NOTICE: resource queue required -- using default resource queue "pg_default"
CREATE ROLE
Log in to test.
[mxadmin@mxd2 mxseg-1]$ psql test6 -U test -h 192.168.8.12 -p 5432
Password for user test:
psql (12)
Type "help" for help.
test6=>
Restrict test users to remote login by modifying pg_hba.conf
. Limit configuration at the end of the file.
[mxadmin@mxd2 mxseg-1]$ vi pg_hba.conf
#user access rules
host all all 0.0.0.0/0 md5
host all test 0.0.0.0/0 reject //添加权限限制
[mxadmin@mxd2 mxseg-1]$ mxstop -u
//Test login, login is normal, and the remote is not restricted successfully
[mxadmin@mxd2 mxseg-1]$ psql test6 -U test -h 192.168.8.12 -p 5432
Password for user test:
psql (12)
Type "help" for help.
test6=>
Solution
Check the current user's restrictions in pg_hba.conf
and add the permission restriction line to the appropriate location.
//Adjusted `pg_hab.conf` file
[mxadmin@mxd2 mxseg-1]$ vi pg_hba.conf
#user access rules
host all test 0.0.0.0/0 reject //添加权限限制
host all all 0.0.0.0/0 md5
[mxadmin@mxd2 mxseg-1]$ mxstop -u
//Test login, login failed, and remote restrictions were successfully restricted
[mxadmin@mxd2 mxseg-1]$ psql test6 -U test -h 192.168.8.12 -p 5432
psql: error: could not connect to server: FATAL: pg_hba.conf rejects connection for host "192.168.8.12", user "test", database "test6", SSL off
idle in transaction timeout
Report an errorProblem Analysis
idle in transaction
connection timeout, which is controlled by parameters and should be cautious when setting it.
Problem recurs
Check the current parameter settings.
test=# show idle_in_transaction_session_timeout ;
idle_in_transaction_session_timeout
----------------------------------------------------------------------------------------------------------------------------------
100s
(1 row)
For demonstration convenience, set this parameter to 10s at the session level.
test=# set idle_in_transaction_session_timeout ='10s';
SET
test=# show idle_in_transaction_session_timeout ;
idle_in_transaction_session_timeout
----------------------------------------------------------------------------------------------------------------------------------
10s
(1 row)
Turn on the transaction and test the impact of this parameter.
test=# begin ;
BEGIN
test=# select count(*) from pg_tables;
count
-------
104
(1 row)
Wait for more than 10 seconds before executing the second time.
test=# select count(*) from pg_tables;
psql: FATAL: terminating connection due to idle-in-transaction timeout
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
Check the database log.
2023-02-22 10:47:07.577478 PST,"mxadmin","test",p43086,th-484071296,"[local]",,2023-02-22 10:44:30 PST,0,con25838,cmd8,seg-1,,dx25863,,sx1,"FATAL","25P03","terminating connection due to idle-in-transaction timeout",,,,,,,0,,"postgres.c",4018,
Solution
System-level parameter modification.
$ gpconfig -c idle_in_transaction_session_timeout -v 10s
$ gpstop -u // 使配置生效
Set parameters at the session level.
$ set idle_in_transaction_session_timeout='10s';
insufficient memory reserved for statement
Problem Analysis
The session uses memory that exceeds the statement_mem
setting memory. The amount of customer data is not large, but there are very many partitions. Too many partitions will cause a large memory usage when count(*)
.
Problem recurs
Create a partition table with as many partitions as possible.
test=# create table test (id int,read numeric,write numeric,dttime timestamp)
test-# Distributed by (id)
test-# Partition by range(dttime)
test-# (start ('2022-11-01'::date)
test(# end ('2022-12-01'::date)
test(# every ('1 hours'::interval));
CREATE TABLE
Insert a small amount of data into the table.
test=# insert into test
test-# select j as id,random()*10 as read,random()*10 as write,i as dttime from generate_series('2022-11-01', '2022-11-30',interval '1 hours') as i,generate_series(1,100) j;
INSERT 0 69800
An error occurred in the search table.
test=# select count(*) from test;
psql: ERROR: insufficient memory reserved for statement
Solution
statement_mem
parameter at the session level
set statement_mem ='1000MB'
statement_mem
gpconfig -c statement_mem -v 10000MB. When adjusting, you need to pay attention to the size of
max_statement_mem`.evalplanqual can not hanlde subplan with motion node
Problem Analysis
Different transactions update the same distributed key data. After the initial transaction is submitted, other transactions report the error.
Problem recurs
Create a table.
test=# create table test1 as
test-# select i id,md5(random()::text) tag_id1
test-# from generate_series(1,10) i;
psql: NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'id' as the Greenplum Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
SELECT 10
Session A enables a record of transaction update id=1.
test=# begin;
BEGIN
test=# update test1 set id=11 where id=1;
UPDATE 1
Session B Enables a record of transaction update id=1.
test=# begin;
BEGIN
test=# update test1 set id=11 where id=1;
Submit transactions for session A.
test=# begin;
BEGIN
test=# update test1 set id=11 where id=1;
UPDATE 1
test=#
test=# commit;
COMMIT
Session B reported an error.
test=# begin;
BEGIN
test=# update test1 set id=11 where id=1;
psql: ERROR: EvalPlanQual can not handle subPlan with Motion node (seg1 192.168.8.12:6001 pid=49302)
Solution
Error message
SELECT * FROM ttemp;
psql: ERROR: relation "ttemp" does not exist
LINE 1: SELECT * FROM ttemp;
^
Problem Analysis
View error.
psql: ERROR: relation "ttemp" does not exist
LINE 1: SELECT * FROM ttemp;
^
\dn
to view the schema list, show search_path;
to view the current default Schema, confirm which Schema this table is created, and execute the query plus schema.table
.
postgres=# \dn
List of schemas
Name | Owner
-------------+----------
gp_toolkit | mxadmin
public | mxadmin
(2 rows)
"$user", public (1 row)
I found that the default Schema is `public`.
2. Execute `\l` to view the database list.
```Bash
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
------------+------------+--------------------------------------------------------------------------------------------------------
matrixmgr | mxadmin | UTF8 | en_US.utf8 | en_US.utf8 |
postgres | mxadmin | UTF8 | en_US.utf8 | en_US.utf8 |
template0 | mxadmin | UTF8 | en_US.utf8 | en_US.utf8 | =c/mxadmin +
| | | | | mxadmin=CTc/mxadmin
template1 | mxadmin | UTF8 | en_US.utf8 | en_US.utf8 | =c/mxadmin +
| | | | | mxadmin=CTc/mxadmin
(4 rows)
It was found that the program was connected to the database cluster with port 5432
, and the Linux one was connected to the port 5433
.
postgres=# SHOW port;
port
------
5433
(1 row)
Solution
Close the cluster with port 5433
.
[mxadmin@mdw ~]$ gpstop -a
Modify the port and MASTER_DATA_DIRECTORY
information of the environment variable.
$ vim ~mxadmin/.matrixdb.env
export PGPORT=5432
export MASTER_DATA_DIRECTORY=/mxdata_20220909145815/master/mxseg-1
Load new environment variables.
$ source ~mxadmin/.matrixdb.env
Problem Description
Perform the INSERT INTO table_uuid SELECT ... FROM
operation, and the number of inserted data strips fluctuates. At the same time, the SELECT
result does not fluctuate and has stable output.
Given the following example:
Execute the INSERT
operation multiple times and find that the execution result fluctuates, and the SELECT
result is a stable 1 output.
INSERT 0 0
Time: 25.774 ms
INSERT 0 1
Time: 40.934 ms
INSERT 0 2
Time: 34.338 ms
INSERT 0 3
Time: 32.562 ms
Problem Analysis
This problem occurs because of a combination of the following three conditions:
SET OPTIMIZER TO ON
.random()
to generate a UUID.Solution
Change one of the above three conditions to avoid such problems.
Problem Analysis
Redhat8 has added the ssd
service, which will authenticate all connection accesses, resulting in a lower efficiency in connection creation and authentication, especially in scenarios where there are a large number of slices, and the efficiency of query statements has decreased significantly.
Problem recurs
Solution
systemctl stop sssd
systemctl stop sssd-kcm.socket
insufficial memory reserved for statement
error is displayed when executing SQL statementsProblem Analysis
The memory that querying the scheduled evaluation when executing SQL exceeds the limit of the statement_mem
parameter.
Solution
Increase the statement_mem
parameter value to avoid the memory evaluated during query exceeding the limit.
Check whether the query table is a partitioned table, and if so, merge the partitioned table.