Query Statistics

This document describes the deployment, table structure, usage, configuration, and disabling of the YMatrix query statistics feature. When you notice slow database queries or want to better plan database resource utilization, you can use this feature for optimization.

The mxstat query statistics feature includes:

Execution details of SQL statements, such as query duration.
Statistics on compute resource consumption.

Enabling Query Statistics

After YMatrix cluster initialization, the matrixmgr database and matrixmgr extension are created by default, and this feature is enabled by default.

To manually enable the feature, run the following commands:

=# createdb matrixmgr
=# psql -d matrixmgr
matrixmgr=# CREATE EXTENSION matrixmgr CASCADE;
NOTICE:  installing required extension "matrixts"
CREATE EXTENSION

Available Views

After successful deployment, the following tables and views appear in the matrixmgr database under the matrixmgr_internal schema:

View	Description
mxstat_execute	Provides execution details of SQL statements, such as the segment instance where the query ran, user, query text, minimum and maximum execution times, and timestamps for each phase of the slowest execution. Use this view to analyze execution times across different SQL statements and examine the slowest queries.
mxstat_usage	Provides statistics on compute resource consumption during SQL statement execution.
mx_query_execute_history	Historical data from the `mxstat_execute` view.
mx_query_usage_history	Historical data from the `mxstat_usage` view.
mx_querytext	Stores the mapping between `queryid` and `query`. Use this mapping to retrieve the original SQL text.
mx_query_execute_history_with_text	A joined view of `mx_query_execute_history` and `mx_querytext` using `queryid`.

For detailed information about each view, see matrixgmr Extension.

Examples

The following three examples demonstrate how to progressively examine SQL statement execution and associated resource consumption.

Example 1

Execute the following three SQL statements to pause the database server process for 5, 10, and 15 seconds respectively:

=# SELECT pg_sleep(5);
=# SELECT pg_sleep(10);
=# SELECT pg_sleep(15);

Query the mxstat_execute view for statistics. The results show:

Both query start (sample_start) and end (sample_exec_end) timestamps are recorded.
The query was called 3 times.
The query text is normalized, with time parameters replaced by $1.
Based on timestamps from the slowest execution, there was a 15-second gap between consecutive queries, matching the pg_sleep value.

matrixmgr=# SELECT * FROM matrixmgr_internal.mxstat_execute WHERE query LIKE '%pg_sleep%';
 seg | userid | dbid  |       queryid        | nestlevel |        query        | calls_begin | calls_alive | calls_end | total_time | min_time
| max_time  |     mean_time      |    stddev_time    |    sample_planid     |         sample_start          |       sample_parse_done       |
     sample_plan_done        |       sample_exec_start       |        sample_exec_end
-----+--------+-------+----------------------+-----------+---------------------+-------------+-------------+-----------+------------+----------
+-----------+--------------------+-------------------+----------------------+-------------------------------+-------------------------------+--
-----------------------------+-------------------------------+-------------------------------
  -1 |     10 | 16384 | -2007749946425010549 |         0 | select pg_sleep($1) |           3 |           0 |         3 |      30041 | 5009.054
| 15018.717 | 10013.666666666666 | 4086.427819588182 | -2693056513545111817 | 2022-03-25 13:58:58.503851-04 | 2022-03-25 13:58:58.503933-04 | 2
022-03-25 13:58:58.503994-04 | 2022-03-25 13:58:58.504008-04 | 2022-03-25 13:59:13.522725-04
(1 row)

Join using queryid and query mxstat_usage to view resource usage. The result shows only one record with seg = -1, indicating the query executed only on the Master.

matrixmgr=# SELECT * FROM matrixmgr_internal.mxstat_usage WHERE queryid = -2007749946425010549;
 seg | userid | dbid  |       queryid        | nestlevel | rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied | shared_blks_writte
n | local_blks_hit | local_blks_read | local_blks_dirtied | local_blks_written | temp_blks_read | temp_blks_written | blk_read_time | blk_write
_time | ru_utime | ru_stime | ru_maxrss | ru_ixrss | ru_idrss | ru_isrss | ru_minflt | ru_majflt | ru_nswap | ru_inblock | ru_oublock | ru_msgs
nd | ru_msgrcv | ru_nsignals | ru_nvcsw | ru_nivcsw
-----+--------+-------+----------------------+-----------+------+-----------------+------------------+---------------------+-------------------
--+----------------+-----------------+--------------------+--------------------+----------------+-------------------+---------------+----------
------+----------+----------+-----------+----------+----------+----------+-----------+-----------+----------+------------+------------+--------
---+-----------+-------------+----------+-----------
  -1 |     10 | 16384 | -2007749946425010549 |         0 |    3 |               0 |                0 |                   0 |
0 |              0 |               0 |                  0 |                  0 |              0 |                 0 |             0 |
    0 | 0.001297 | 0.000431 |     20568 |        0 |        0 |        0 |         6 |         0 |        0 |          0 |          0 |
 0 |         0 |           0 |      122 |         0
(1 row)

Example 2

Create a table named test1 using the following command:

=# CREATE TABLE test1(
  c1 int,
  c2 int
  )
  USING MARS3
  DISTRIBUTED BY(c1)
  ORDER BY(c1);

Query statistics from mxstat_execute:

matrixmgr=# SELECT * FROM matrixmgr_internal.mxstat_execute WHERE query LIKE '%create table test1%';
 seg | userid | dbid  |       queryid        | nestlevel |                         query                         | calls_begin | calls_alive |
calls_end | total_time | min_time | max_time | mean_time | stddev_time | sample_planid |         sample_start          | sample_parse_done | sa
mple_plan_done |       sample_exec_start       |        sample_exec_end
-----+--------+-------+----------------------+-----------+-------------------------------------------------------+-------------+-------------+-
----------+------------+----------+----------+-----------+-------------+---------------+-------------------------------+-------------------+---
---------------+-------------------------------+-------------------------------
  -1 |     10 | 16384 | -6276724884379903029 |         0 | create table test1(c1 int, c2 int) distributed by(c1) |           1 |           0 |
        1 |     46.221 |   46.221 |   46.221 |    46.221 |           0 |             0 | 2022-03-25 14:08:51.754458-04 |                   |
               | 2022-03-25 14:08:51.754735-04 | 2022-03-25 14:08:51.800956-04
(1 row)

Join using queryid and query mxstat_usage to view resource usage. Results show entries from all segments, indicating that data is distributed across multiple MXSegment instances.

matrixmgr=# SELECT * FROM matrixmgr_internal.mxstat_usage WHERE queryid = -6276724884379903029;
 seg | userid | dbid  |       queryid        | nestlevel | rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied | shared_blks_writte
n | local_blks_hit | local_blks_read | local_blks_dirtied | local_blks_written | temp_blks_read | temp_blks_written | blk_read_time | blk_write
_time | ru_utime | ru_stime | ru_maxrss | ru_ixrss | ru_idrss | ru_isrss | ru_minflt | ru_majflt | ru_nswap | ru_inblock | ru_oublock | ru_msgs
nd | ru_msgrcv | ru_nsignals | ru_nvcsw | ru_nivcsw
-----+--------+-------+----------------------+-----------+------+-----------------+------------------+---------------------+-------------------
--+----------------+-----------------+--------------------+--------------------+----------------+-------------------+---------------+----------
------+----------+----------+-----------+----------+----------+----------+-----------+-----------+----------+------------+------------+--------
---+-----------+-------------+----------+-----------
  -1 |     10 | 16384 | -6276724884379903029 |         0 |    0 |             295 |               59 |                  21 |
0 |              0 |               0 |                  0 |                  0 |              0 |                 0 |             0 |
    0 | 0.004053 |        0 |     22744 |        0 |        0 |        0 |       429 |         0 |        0 |          0 |          0 |
 0 |         0 |           0 |        6 |         0
   1 |     10 | 16384 | -6276724884379903029 |         0 |    0 |             261 |               82 |                  19 |
0 |              0 |               0 |                  0 |                  0 |              0 |                 0 |             0 |
    0 | 0.001691 | 0.001558 |     19284 |        0 |        0 |        0 |       510 |         0 |        0 |          0 |          0 |
 0 |         0 |           0 |        0 |         1
   0 |     10 | 16384 | -6276724884379903029 |         0 |    0 |             314 |               34 |                  19 |
0 |              0 |               0 |                  0 |                  0 |              0 |                 0 |             0 |
    0 | 0.002537 | 0.000193 |     18508 |        0 |        0 |        0 |       574 |         0 |        0 |          0 |          0 |
 0 |         0 |           0 |        1 |         1
   2 |     10 | 16384 | -6276724884379903029 |         0 |    0 |             261 |               82 |                  19 |
0 |              0 |               0 |                  0 |                  0 |              0 |                 0 |             0 |
    0 | 0.003043 |  2.9e-05 |     19292 |        0 |        0 |        0 |       514 |         0 |        0 |          0 |          0 |
 0 |         0 |           0 |        0 |         2
(4 rows)

Example 3

In Examples 1 and 2, the nestlevel value in mxstat_execute is 0 because there was no nested execution. The following example demonstrates nested execution, typically achieved by calling a User Defined Function (UDF).

Create a UDF named nest_query that defines the SQL statement SELECT 1;, then invoke it:

=# CREATE OR REPLACE FUNCTION nest_query() RETURNS SETOF RECORD
AS $$ SELECT 1;$$
LANGUAGE SQL;

Note!
By default, mxstat does not record nested execution information. You must set the mxstat_statements.track parameter to all.

Query execution details for the inner statement SELECT $1. The results show:

The normalized query text is SELECT $1, as constants like 1 are replaced with parameter placeholders.
The nestlevel value is 1, indicating one level of nesting. Deeper nesting increases this value; matrixmgr records up to level 31.

matrixmgr=# SELECT * FROM matrixmgr_internal.mxstat_execute WHERE query LIKE '%SELECT $1%';
 seg | userid | dbid  |       queryid        | nestlevel |   query   | calls_begin | calls_alive | calls_end | total_time | min_time | max_time
 | mean_time | stddev_time |    sample_planid    |         sample_start          |       sample_parse_done       |       sample_plan_done
  |       sample_exec_start       |        sample_exec_end
-----+--------+-------+----------------------+-----------+-----------+-------------+-------------+-----------+------------+----------+---------
-+-----------+-------------+---------------------+-------------------------------+-------------------------------+-----------------------------
--+-------------------------------+-------------------------------
  -1 |     10 | 16384 | -4554727679305370053 |         1 | SELECT $1 |           1 |           0 |         1 |      0.031 |    0.031 |    0.031
 |     0.031 |           0 | -382705668420232707 | 2022-03-25 14:35:30.668124-04 | 2022-03-25 14:35:30.668373-04 | 2022-03-25 14:35:30.668403-0
4 | 2022-03-25 14:35:30.668408-04 | 2022-03-25 14:35:30.668439-04
(1 row)

Disabling Query Statistics

Note!
Disabling query statistics does not require stopping or restarting the database.

Stop the background collection worker on the Master:

$ psql -d matrixmgr -c "SELECT matrixmgr_internal.mxstat_stop_worker();"

Globally disable collection and apply the change:

$ gpconfig -s mxstat_statements.track 
$ gpconfig -c mxstat_statements.track -v none
$ mxstop -u

Note!
These two steps temporarily disable the feature. It will be re-enabled by default upon cluster restart. To permanently disable it, run DROP EXTENSION after the above steps.

Clear already collected data:

$ psql -d matrixmgr -c "SELECT matrixmgr_internal.mxstat_reset();"

(Optional) Restart the background collection worker:

$ psql -d matrixmgr -c "SELECT matrixmgr_internal.mxstat_start_worker();"

Appendix

Query Classification

matrixmgr does not record every individual query. Instead, it aggregates similar queries and presents statistics based on parsed results.

Only queries differing in parameters are grouped together. For example, the following three INSERT statements all insert into the test1 table with different values. They are grouped into one class and share the same queryid.

=# INSERT INTO test1 VALUES(1);
=# INSERT INTO test1 VALUES(2);
=# INSERT INTO test1 VALUES(3);

The following queries differ only in conditional values and are also grouped:

=# SELECT * FROM test1 WHERE c1 = 1;
=# SELECT * FROM test1 WHERE c1 = 2;
=# SELECT * FROM test1 WHERE c1 = 3;

The following queries, although similar, access different tables and are not grouped:

=# SELECT * FROM test1 WHERE c1 = 1;
=# SELECT * FROM test2 WHERE c1 = 1;
=# SELECT * FROM test3 WHERE c1 = 1;

Configuration

The query statistics feature includes the following parameters to control monitoring behavior.

Parameter	Type	Description
mxstat_statements.max	Integer	Number of hash slots for storing query timing data. Represents the maximum number of queries stored in memory. Default is `1000`.
mxstat_statements.usage_multiple	Integer	Multiplier for hash slots used to store resource usage data relative to timing slots. Default is `2`.
mxstat_statements.track	String	`top`: Track only top-level queries, no nesting (default); `all`: Track all nested levels; `none`: Disable tracking.
mxstat_statements.track_utility	Boolean	Whether to track utility statement execution. Default is `on`.
mxstat_statements.save	Boolean	Whether to dump in-memory statistics to disk during cluster restart. Default is `on`.
mxstat_statements.harvest_interval	Integer	Interval (in minutes) at which statistics are harvested from shared memory into history tables. Default is `5` minutes.
mxstat_statements.harvest_usage	Boolean	Whether to collect resource usage statistics. Default is `on`.
mxstat_statements.ignored_databases	String	Comma-separated list of databases to ignore. Default is `template0,template1`.
mxstat_statements.start_reaper_on_start	Boolean	Whether to start the reaper process at cluster startup. This process periodically writes collected data into the `mx_query_execute_history`, `mx_query_usage_history`, `mx_querytext`, `mx_query_execute_history_with_text`, and related tables.

English

Русский

简体中文

Query Statistics

Enabling Query Statistics

Available Views

Examples

Example 1

Example 2

Example 3

Disabling Query Statistics

Appendix

Query Classification

Configuration

See Also

English Русский 简体中文

Query Statistics

Enabling Query Statistics

Available Views

Examples

Example 1

Example 2

Example 3

Disabling Query Statistics

Appendix

Query Classification

Configuration

See Also

English

Русский

简体中文