Maintenance

Internal Management View

  • mx_stream stores basic information of streams
  • mx_stream_info saves the correspondence between stream and slot and other information
  • pg_class Add two new fields, relisstream identifies whether it is a stream table, relhasdownstream identifies whether there is a downstream stream table

Configuration generation management

  • mx_max_number_streams allows to create up to 8 streams, and the database is restarted when modifying it.
  • max_replication_slots controls the number of replication slots on the segment node, with a stream corresponding to a replication slot.
  • mx_stream_message_level Prints additional stream-related logs, which are default to DEBUG4, and can be modified at the session level, and can be set to LOG when a developer debugs a failure.
  • mxstream.debug_plan and mxstream.debug_plan_verbose If open, additional execution plan information can be printed when creating a stream, which can be used as the basis for performance problems and can be turned on or off at the session level.
  • set mx_stream_message_level to notice, print more detailed flow construction execution log output
  • mxstream.debug_tuple_decoding is turned off by default. Open will print a huge amount of debugging information in the log, and it is only used temporarily for development and failure analysis.
  • mxstream.simple_one.skip_check is closed by default, and on allow creation of any streams, only for developers.
  • mx_stream_internal_modify allows direct DML operation of data in the stream [Hazardous parameters, caution adjustment]
  • mxstream.domino_join_skip_index_check allows no index when double stream is created
  • mxstream.scan_tuples_limit and mxstream.scan_xlog_size_limit,
  • scan_xlog_size_limit is the xlog log that controls the scanned by each flow table process, not the valid log size;
  • scan_tuples_limit is the number of valid records that control the processing of flow forms;
  • It is recommended that scan_xlog_size_limit be set to 0, scan_tuples_limit control the single processing size of the flow according to actual needs.

Common management operations

  • Check what streams are currently available
    =# select /*streamoid,*/streamrelid::regclass,dbid,provider,plugin,querytext from mx_stream;
    streamoid | streamrelid | dbid  | provider |   plugin   | querytext 
    ----------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
       23184 | s8          | 19789 | mxstream | simple_one | 
       23195 | s1          | 19789 | mxstream | domino_agg | 
       23222 | s61         | 19789 | mxstream | simple_one | 
       23230 | s62         | 19789 | mxstream | simple_one | 
       23244 | s51         | 19789 | mxstream | simple_one | 
       23252 | s52         | 19789 | mxstream | simple_one | 
       23266 | s3          | 19789 | mxstream | simple_one | 
       23280 | s4          | 19789 | mxstream | simple_one | 
    (8 rows)
  • Delete streams
    =# drop stream s8;
    DROP STREAM
  • Adjust the upper limit of the number of streams in the library
    
    gpconfig -c max_replication_slots -v 100
    gpconfig -c mx_max_number_streams -v 100

注:调整完上述操作后需要重启数据库 注:增加上限会使 postgres 进程申请额外的共享内存,因此不是越大越好

- Check which objects exist in downstream flow table

=# select relname,relkind,relisstream,relhasdownstream from pg_class where relname ='t8'; relname | relkind | relisstream | relhasdownstream ---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- t8 | r | f | t (1 row)

=# select relname,relkind,relisstream,relhasdownstream from pg_class where relname ='s8'; relname | relkind | relisstream | relhasdownstream ---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- s8 | r | t | f (1 row)

- View flow table definition

    - psql client
\dS+ s8
                                TABLE "public.s1"
 Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 id | integer | | | plain | |
 v1 | integer | | | | plain | |
 v2 | integer | | | | plain | |
STREAM definition:
 SELECT t1.id,
    t1.v1,
    t2.v2
   FROM STREAMING t1
     JOIN STREAMING t2 ON t1.id = t2.id;
DISTRIBUTED BY: (id)
Access method: heap

```
  • Other SQL terminals

      SELECT pg_catalog.pg_get_viewdef('s1');
    
                      pg_get_viewdef
      ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        SELECT t1.id, +
           t1.v1, +
           t2.v2 +
          FROM (STREAMING t1 +
            JOIN STREAMING t2 ON ((t1.id = t2.id)));
  • Check the data processing efficiency of each flow table through master node logs

    cat  gpdb-2024-08-26_121305.csv|grep -iE "77748"|grep -iE -A1 "process"|awk -F',' '{print $1"-"$19}'|grep -v  '^-'|sed -n '{N;s/\n/\t/p}'|sed 's/processed.*execution took//g' >/tmp/stream1_77748.log
  • Check the processing progress of each flow table replication slot through the segment node

    select slot_name,restart_lsn,pg_wal_lsn_diff(pg_current_wal_lsn(),confirmed_flush_lsn)/8/1024 as currentlsn_diff  from pg_replication_slots;\watch 1
  • Flow table check (all segments are consistent)

    
    for i in {20000..20010};  
    do
     PGOPTIONS='-c gp_role=utility' psql -d yonyou_ap -U ap_test -h 127.0.0.1 -p $i  -c "select count(*) from pg_replication_slots;"; 
    done

for i in {20000..20010}; do
PGOPTIONS='-c gp_role=utility' psql -d yonyou_ap -U ap_test -h 127.0.0.1 -p $i -c "select pg_drop_replication_slot('stream_slot_27534');"; done

for i in {20000..20010};
do PGOPTIONS='-c gp_role=utility' psql -d yonyou_ap -U ap_test -h 127.0.0.1 -p $i -c "select count(*) from pg_class where relisstream is true;"; done