Maintenance

Internal Management Views

  • mx_stream stores basic information about streams.
  • mx_stream_slot maintains the mapping between streams and slots, along with other metadata.
  • Two fields are added to pg_class:
    • relisstream indicates whether a table is a stream.
    • relhasdownstream indicates whether downstream streams exist.

Configuration Management

  • mx_max_number_streams limits the maximum number of streams to 8. Changing this value requires a database restart.
  • max_replication_slots controls the number of replication slots on each segment node. Each stream uses one replication slot.
  • log_min_messages for stream-related logs is set to DEBUG4 by default. This can be adjusted at the session level. Set to LOG when debugging issues.
  • mx_stream_print_plan and mx_stream_print_parsed_plan: When enabled, additional execution plan details are printed during stream creation. Useful for performance troubleshooting. Can be enabled or disabled per session.
  • Set client_min_messages to notice to enable more detailed stream creation execution logs.
  • mx_stream_debug_print is disabled by default. When enabled, it generates extensive debug logs. Use only temporarily for development or fault analysis.
  • mx_stream_allow_any_provider is disabled by default. When enabled, allows creation of any type of stream. For developer use only.
  • mx_stream_allow_dml allows direct DML operations on stream data. This is a dangerous setting; adjust with caution.
  • mx_stream_allow_noindex_join allows creation of dual-stream joins without indexes.
  • mx_stream_xlog_prefetch_bytes and mx_stream_xlog_prefetch_records:
    • mx_stream_xlog_prefetch_bytes controls the amount of XLOG scanned per process for each stream table (not the size of valid log records).
    • mx_stream_xlog_prefetch_records limits the number of valid records processed per batch.
  • It is recommended to set mx_stream_xlog_prefetch_bytes to 0, and adjust mx_stream_xlog_prefetch_records based on actual workload requirements.

Common Management Operations

  • List existing streams
=# select /*streamoid,*/streamrelid::regclass,dbid,provider,plugin,querytext from mx_stream;
 streamoid | streamrelid | dbid  | provider |   plugin   | querytext 
-----------+-------------+-------+----------+------------+-----------
     23184 | s8          | 19789 | mxstream | simple_one | 
     23195 | s1          | 19789 | mxstream | domino_agg | 
     23222 | s61         | 19789 | mxstream | simple_one | 
     23230 | s62         | 19789 | mxstream | simple_one | 
     23244 | s51         | 19789 | mxstream | simple_one | 
     23252 | s52         | 19789 | mxstream | simple_one | 
     23266 | s3          | 19789 | mxstream | simple_one | 
     23280 | s4          | 19789 | mxstream | simple_one | 
(8 rows)
  • Drop a stream
=# drop stream s8;
DROP STREAM
  • Adjust the maximum number of streams in the database
gpconfig -c max_replication_slots -v 100
gpconfig -c mx_max_number_streams -v 100

Note: A database restart is required after these changes.
Note: Increasing the limit causes the postgres process to allocate additional shared memory; therefore, setting it too high is not recommended.
  • Check which objects have downstream stream tables
=# select relname,relkind,relisstream,relhasdownstream  from pg_class where relname ='t8';
 relname | relkind | relisstream | relhasdownstream 
---------+---------+-------------+------------------
 t8      | r       | f           | t
(1 row)

=# select relname,relkind,relisstream,relhasdownstream  from pg_class where relname ='s8';
 relname | relkind | relisstream | relhasdownstream 
---------+---------+-------------+------------------
 s8      | r       | t           | f
(1 row)
  • View stream definition

    • Using the psql client:
\dS+ s8
                                TABLE "public.s1"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+---------+--------------+-------------
 id     | integer |           |          |         | plain   |              |
 v1     | integer |           |          |         | plain   |              |
 v2     | integer |           |          |         | plain   |              |
STREAM definition:
 SELECT t1.id,
    t1.v1,
    t2.v2
   FROM STREAMING t1
     JOIN STREAMING t2 ON t1.id = t2.id;
DISTRIBUTED BY: (id)
Access method: heap
- Using other SQL clients:
SELECT pg_catalog.pg_get_viewdef('s1');

                pg_get_viewdef
-----------------------------------------------
  SELECT t1.id,                               +
     t1.v1,                                   +
     t2.v2                                    +
    FROM (STREAMING t1                        +
      JOIN STREAMING t2 ON ((t1.id = t2.id)));
  • Check data processing efficiency of each stream via the master log
cat gpdb-2024-08-26_121305.csv | grep -iE "77748" | grep -iE -A1 "process" | awk -F',' '{print $1"-"$19}' | grep -v '^-' | sed -n '{N;s/\n/\t/p}' | sed 's/processed.*execution took//g' > /tmp/stream1_77748.log
  • Check replication slot progress for each stream on segment nodes
select slot_name,restart_lsn,pg_wal_lsn_diff(pg_current_wal_lsn(),confirmed_flush_lsn)/8/1024 as currentlsn_diff from pg_replication_slots; \watch 1
  • Stream table consistency check across segments
for i in {20000..20010};  
do
   PGOPTIONS='-c gp_role=utility' psql -d yonyou_ap -U ap_test -h 127.0.0.1 -p $i -c "select count(*) from pg_replication_slots;";
done

for i in {20000..20010}; 
do  
  PGOPTIONS='-c gp_role=utility' psql -d yonyou_ap -U ap_test -h 127.0.0.1 -p $i -c "select pg_drop_replication_slot('stream_slot_27534');"; 
done

for i in {20000..20010};  
do
   PGOPTIONS='-c gp_role=utility' psql -d yonyou_ap -U ap_test -h 127.0.0.1 -p $i -c "select count(*) from pg_class where relisstream is true;";
done