Using Resource Groups

This document describes how to use resource groups to manage transaction concurrency, CPU, and memory resource allocation in YMatrix. After defining a resource group, you can assign it to one or more database roles to control the resources that the roles can use.

YMatrix uses Linux-based control groups (cgroups) to manage database resources and uses Runaway for memory statistics, tracking, and management.

Resource Group Attributes and Limits

The following table lists the configurable resource group parameters:

Limit Type Description Value Range Default
CONCURRENCY Maximum number of concurrent transactions allowed in the resource group, including active and idle transactions. [0 - max_connection] 20
CPU_MAX_PERCENT Maximum percentage of CPU resources that the resource group can use. [1 - 100] -1 (not set)
CPU_WEIGHT Scheduling priority of the resource group. [1 - 500] 100
CPUSET Specific CPU logical cores (or logical threads in hyper-threading) reserved for the resource group. It depends on system core configuration -1
MEMORY_QUOTA Memory limit value specified for the resource group. Integer(MB) -1 (not set, use statement_mem as the memory limit for a single query)
MIN_COST Minimum cost of a query plan to be included in the resource group. Integer 0
IO_LIMIT The limit for the maximum read/write disk I/O throughput, and maximum read/write I/O operations per second. Set the value on a per-tablespace basis. [2 - 4294967295 or max] -1

Note!
Resource limits are not enforced on SET, RESET, and SHOW commands.

Setting Descriptions

When a user runs a query, YMatrix evaluates the query status based on a set of limits defined for the resource group.

Concurrency Limit

CONCURRENCY controls the maximum number of concurrent transactions allowed in the resource group. The default limit is 20, and a value of 0 means that this resource group does not allow queries to run. If the resource limit has not been reached and the query will not exceed the concurrency transaction limit of the resource group, YMatrix will run the query immediately. If the maximum number of concurrent transactions in the resource group has been reached, YMatrix will queue any transactions submitted after the resource group reaches its CONCURRENCY limit and wait for other queries to complete before running them.

gp_resource_group_queuing_timeout can be used to specify the waiting time before a transaction in the queue is canceled. The default is 0, meaning that transactions are queued indefinitely.

Bypass Resource Group Allocation Limits

  • gp_resource_group_bypass is used to enable or disable the concurrency transaction limit of the resource group, allowing queries to be executed immediately. If set to true, the query will skip the concurrency limit of the resource group. At this time, the memory quota of the query is allocated according to statement_mem. If there is insufficient memory, the query will fail. This parameter can only be set in a single session and cannot be set within a transaction or function.
  • gp_resource_group_bypass_catalog_query is used to determine whether system table queries skip the resource group's resource limits. The default is true. This is applicable to situations where GUI clients of similar databases run catalog queries to obtain metadata. Its resource allocation is outside the resource group, and the memory quota for each query is statement_mem.
  • gp_resource_group_bypass_direct_dispatch is used to control whether direct dispatch queries skip the limits of the resource group. If set to true, the query will no longer be constrained by the CPU or memory limits allocated by its resource group and can be executed immediately. At this time, the memory quota of the query is allocated according to statement_mem. If there is insufficient memory, the query will fail. This parameter can only be set in a single session and cannot be set within a transaction or function.

CPU Limit

YMatrix allocates CPU resources in the following two ways:

  • By assigning a percentage of CPU resources to resource groups.
  • By assigning specific CPU cores to resource groups

Different CPU resource allocation modes can be configured for different resource groups on the same cluster, and each resource group can only use one CPU resource allocation mode. The CPU resource allocation mode of the resource group can also be changed at runtime.

You can determine the maximum percentage of system CPU resources to be allocated to the resource groups of each segment node through gp_resource_group_cpu_limit.

Assigning CPU Resources by Core

CPUSET is used to set the CPU cores to be reserved for the resource group. When CPUSET is configured for a resource group, YMatrix will disable the CPU_MAX_PERCENT and CPU_WEIGHT of the group and set their values to -1.

Usage Notes:

  • Use a semicolon (;) to separate the master node and segment node to specify CPU cores. Use a comma (,) to separate a list of single core numbers or number ranges, and enclose the core numbers/range list in single quotes (' '). For example, '1;1,3-4' uses core1 on the master node and cores 1, 3, and 4 on the segment node.
  • Avoid using CPU core0. When assigning cores to resource groups, use the lowest possible core numbers. If you replace a database node and the new node has fewer CPU cores than the original node, or if you back up the database and want to restore it on a cluster with fewer CPU cores, the operation may fail. For example, if the database cluster has 16 cores, assigning cores 1-7 is the best choice. If you create a resource group and assign CPU core 9 to it, restoring the database to an 8-core node will fail.

Assigning CPU Resources by Percentage

CPU_MAX_PERCENT is used to set the upper limit of CPU resources that a segment can use. For example, setting it to 40 means that at most 40% of the available CPU resources can be used. When tasks in the resource group are idle and do not use any CPU time, the remaining time will be collected into a global pool of unused CPU cycles, from which other resource groups can borrow CPU cycles.

CPU_WEIGHT is used to allocate the scheduling priority of the current group. The default value is 100, and the value range is 1 to 500. This value specifies the relative share of CPU time that tasks in the resource group can obtain.

Usage Notes:

  • If a resource group has a relative share of 100, and the other two resource groups have a relative share of 50, when all processes in the resource groups try to use 100% of the CPU (i.e., the CPU_MAX_PERCENT value of all groups is set to 100), the first resource group will get 50% of the total CPU time, and the other two groups will each get 25%.
  • If another resource group with a relative share of 100 (with CPU_MAX_PERCENT set to 100) is added again, the first group can only use 33% of the CPU, and the remaining groups will get 16.5%, 16.5%, and 33% respectively.

配置说明

Group Name CONCURRENCY CPU_MAX_PERCENT CPU_WEIGHT
default_group 20 50 10
admin_group 10 70 30
system_group 10 30 10
test 10 10 10
  • Roles in default_group have an available CPU ratio (determined by CPU_WEIGHT) of 10/(10+30+10+10)=16%. This means that they can use at least 16% of the CPU when the system workload is high. When the system has idle CPU resources, they can use more resources, as the hard limit (set by CPU_MAX_PERCENT) is 50%.

  • Roles in admin_group have an available CPU ratio of 30/(10+30+10+10)=50% when the system workload is high. When the system has idle CPU resources, they can use resources up to the hard limit of 70%.

  • Roles in test have a CPU ratio of 10/(10+30+10+10)=16%. However, as the hard limit determined by CPU_MAX_PERCENT is 10%, they can only use up to 10% of the resources even when the system is idle.

Memory Limit

MEMORY_QUOTA represents the maximum amount of memory reserved for the resource group on the Segment. This is the total amount of memory that all query worker processes on the Segment can consume during query execution. The amount of memory allocated to a query is the memory limit of the group divided by the concurrency limit of the group: MEMORY_QUOTA / CONCURRENCY

If a query requires more memory, you can set the required amount of memory for the query at the session level using gp_resgroup_memory_query_fixed_mem, which can override and exceed the memory allocated by the resource group. Precautions for use

  • If gp_resgroup_memory_query_fixed_mem is set, the setting value is used to bypass the resource group settings.
  • If gp_resgroup_memory_query_fixed_mem is not set, MEMORY_QUOTA / CONCURRENCY is used as the amount of memory allocated to the query.
  • If MEMORY_QUOTA is not set, the value assigned by query memory defaults to statement_mem.
  • For all queries, if there is not enough memory in the system, they will overflow to disk. YMatrix generates an out of memory (OOM) error when the limit gp_workfile_limit_files_per_query is reached.

Configuration Example Consider a resource group named adhoc with MEMORY_QUOTA set to 1.5 GB and CONCURRENCY set to 3. By default, each statement submitted to the group is allocated 500MB of memory. Now consider the following series of events:

  • User ADHOC_1 submits query Q1, override gp_resgroup_memory_query_fixed_mem to 800MB. The Q1 statement is allowed to enter the system.
  • User ADHOC_2 submits query Q2, using the default 500MB.
  • With Q1 and Q2 still running, the user ADHOC_3 submits a query `Q3, using the default 500MB.
  • Query Q1 and Q2 have used 1300MB of the resource group 1500MB. However, if the system has enough memory available for querying Q3 in some periods, it will work properly.
  • User ADHOC_4 submits a query Q4, set to 700MB using gp_resgroup_memory_query_fixed_mem. Since the query Q4 bypasses resource group restrictions, it will run immediately.

Special precautions

  • If gp_resource_group_bypass or gp_resource_group_bypass_catalog_query sets configuration parameters to bypass resource group restrictions, the memory limit value of the query is statement_mem.
  • When ( MEMORY_QUOTA / CONCURRENCY)< statement_mem, statement_mem is used as the fixed amount of memory allocated by the query.
  • The maximum maximum value of statement_mem is max_statement_mem.
  • Query statements with a cost of less than the limit value of MIN_COST, using statement_mem as memory quota.

Disk I/O Limitations

IO_LIMIT limits the maximum read-write disk I/O throughput assigned to a specific resource group query, and the maximum read-write I/O operators per second. Ensure the use of high-priority resource groups and avoid overuse of disk bandwidth. The value of this parameter needs to be set according to the table space.

Notes!
IO_LIMIT is only supported by Cgroup v2.

When performing disk I/O restrictions, you can configure it with the following parameters:

  • Sets the tablespace name or tablespace object ID (OID) that needs to be restricted. Or use * to set restrictions for all tablespaces.
  • rbps and wbps are used to limit the maximum read and write throughput of disk I/O in resource groups in MB/s. The default value is max, indicating no limit.
  • riops and wiops are used to limit the maximum read and write I/O operators per second in a resource group. The default value is max, indicating no limit.

Configuration Description

If the IO_LIMIT parameter is not set, the default values ​​​​​for rbps, wpbs, riops, and wiops will be set to max, meaning there is no limit on disk I/O. If only partial values ​​​​​of IO_LIMIT are set (for example, rbps), the unset parameters will default to max (in this case wbps, riops, and wiops are the default values ​​​​​`max).

Configure resource groups

  1. Verify the cgroup version configured in your environment by checking the file system mounted by default during system startup:
    stat -fc %T /sys/fs/cgroup/

    For cgroup v1, the output is tmpfs. For cgroup v2, the output is cgroup2fs.

If you do not need to change the version of cgroup, just skip to Configure cgroup v1 or Configure cgroup v2 to complete the configuration operation.

If you need to switch from cgroup v1 to v2, run the following command as root:

  • Red Hat 8/Rocky 8/Oracle 8
    grubby --update-kernel=/boot/vmlinuz-$(uname -r) --args="systemd.unified_cgroup_hierarchy=1"
  • Ubuntu
    vim /etc/default/grub
    # add or modify: GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=1"
    update-grub

    If you need to switch from cgroup v2 to v1, run the following command as root:

  • Red Hat 8/Rocky 8/Oracle 8
    grubby --update-kernel=/boot/vmlinuz-$(uname -r) --args="systemd.unified_cgroup_hierarchy=0 systemd.legacy_systemd_cgroup_controller"
  • Ubuntu
    vim /etc/default/grub
    # add or modify: GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=0"
    update-grub

    If you want to continue using cgroup v1, make sure that there are no limits for each memory.limit_in_bytes file under /sys/fs/cgroup/memory/gpdb (including /sys/fs/cgroup/memory/gpdb/memory.limit_in_bytes and /sys/fs/cgroup/memory/gpdb/[OID]/memory.limit_in_bytes. If so, run:

    echo -1 >> memory.limit_in_bytes

    After that, restart the host for the changes to take effect.


Configure cgroup v1

  1. Operate on each node in the cluster:

Notes!
You need to use the SuperUser or a user with sudo access to edit this file.

vi /etc/cgconfig.conf

Add the following information to the configuration file.

group gpdb {
perm {
task {
uid = mxadmin;
gid = mxadmin;
}
admin {
uid = mxadmin;
gid = mxadmin;
}
}
cpu {
}
cpuacct {
}
cpuset {
}
memory {
}
} 

This content is configured with CPU, CPU accounting, CPU core sets, and memory control groups managed by the mxadmin user.

  1. Turn on the cgroup service for each node of the YMatrix cluster.
    cgconfigparser -l /etc/cgconfig.conf 
    systemctl enable cgconfig.service
    systemctl start cgconfig.service
  2. Determine the cgroup directory mount point of the node
    grep cgroup /proc/mounts 
  • The output of the first line is cgroup directory mount point /sys/fs/cgroup.
    tmpfs /sys/fs/cgroup tmpfs ro,nosuid,nodev,noexec,mode=755 0 0                                                             
    cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd 0 0                                                                                                      
    cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0                                           
    cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_prio,net_cls 0 0                         
    cgroup /sys/fs/cgroup/pids cgroup rw,nosuid,nodev,noexec,relatime,pids 0 0                                                 
    cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0                                           
    cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpuacct,cpu 0 0                                   
    cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0                                     
    cgroup /sys/fs/cgroup/hugetlb cgroup rw,nosuid,nodev,noexec,relatime,hugetlb 0 0                                           
    cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0                                             
    cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0                                             
    cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0  
  1. Verify whether the configuration is correct
    ls -l <cgroup_mount_point>/cpu/gpdb
    ls -l <cgroup_mount_point>/cpuset/gpdb
    ls -l <cgroup_mount_point>/memory/gpdb

    If the directory exists and the owner is mxadmin:mxadmin, it means that the resource management of the YMatrix database is successfully configured with cgroup.


Configure cgroup v2

  1. Configure the system to mount cgroups-v2 by default when the system starts, and implement it as root user through the systemd system and service manager.

    grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=1"
  2. Restart the system to make the changes take effect.

    reboot now
  3. Create the directory /sys/fs/cgroup/matrixdb6.service, add all necessary controllers, and make sure that the mxadmin user has read and write permissions to it.

    mkdir -p /sys/fs/cgroup/matrixdb6.service
    echo "+cpuset +io +cpu +memory" | tee -a /sys/fs/cgroup/cgroup.subtree_control
    chown -R mxadmin:mxadmin /sys/fs/cgroup/matrixdb6.service

    After running the above command, you may encounter an error invalid parameter. This is because cgroup v2 does not support controlling real-time processes, and the cpu controller can only be enabled if all real-time processes are in the root cgroup. In this case, find all the live processes and move them to the root cgroup before re-enable the controller.

  4. Ensure that mxadmin has write permissions to /sys/fs/cgroup/cgroup.procs. This is to move the YMatrix process from the user to /sys/fs/cgroup/matrixdb6.service/ after the cluster is started to manage the postmaster service and all of its required worker processes.

    chmod a+w /sys/fs/cgroup/cgroup.procs

    Since the resource group manually manages the cgroup file, the above settings will be invalid after the system restarts. Add the following bash script for systemd to make it run automatically during system startup. Perform the following steps as root:

  5. Create matrixdb6.service

    vim /etc/systemd/system/matrixdb6.service
  6. Write the following content to matrixdb6.service, and if the user is not mxadmin, replace it with the corresponding user.

    
    [Unit]
    Description=Greenplum Cgroup v2 Configuration Service
    [Service]
    Type=simple
    WorkingDirectory=/sys/fs/cgroup/matrixdb6.service
    Delegate=yes
    Slice=-.slice

set hierarchies only if cgroup v2 mounted

ExecCondition=bash -c '[ xcgroup2fs = x$(stat -fc "%%T" /sys/fs/cgroup) ] || exit 1' ExecStartPre=bash -ec " \ chown -R mxadmin:mxadmin .; \ chmod a+w ../cgroup.procs; \ mkdir -p helper.scope" ExecStart=sleep infinity ExecStartPost=bash -ec "echo $MAINPID > /sys/fs/cgroup/cgroup.procs;" [Install] WantedBy=basic.target

3. Reload the `systemd` daemon and enable the service:

systemctl daemon-reload systemctl enable gpdb.service

## Enable resource groups
1. Set the `gp_resource_manager` server configuration parameter to the value `"group"`

gpconfig -c gp_resource_manager -v "group"

2. Restart the YMatrix database cluster

mxstop -arf

When enabled, any transactions submitted by a role will be directed to the resource group assigned to the role and will be subject to the concurrency, memory, and CPU restrictions of the resource group.

YMatrix creates role resource groups named `admin_group`, `default_group`, and `system_group` by default. When a resource group is enabled, any role that does not explicitly assign a resource group is assigned to the default group for that role feature. The `SUPERUSER` role is assigned to `admin_group`, the non-admin role is assigned to `default_group`, and the resources of the system process are assigned to `system_group`. Among them, any role cannot be assigned to `system_group` manually.

Each role resource group is configured as follows.

Parameters | admin_group | default_group | system_group
--- | --- | --- | ---
CONCURRENCY | 10 | 5 | 0
CPU_MAX_PERCENT | 10 | 20 | 10
CPU_WEIGHT | 100 | 100 | 100
CPUSET | -1 | -1
IO_LIMIT | -1 | -1 | -1
MEMORY_LIMIT | -1 | -1 | -1
MIN_COST | 0 | 0 | 0


## Create a resource group

The `CREATE RESOURCE GROUP` command creates a new resource group. When creating a resource group for a role, provide the name and CPU resource allocation pattern (core or percentage). The `CPU_MAX_PERCENT` or `CPUSET` limit value must be provided.

**Usage Example**
Create a resource group named `rgroup1` with a CPU limit of 20, a memory limit of 250, a CPU priority of 500, and a minimum overhead of 50:

CREATE RESOURCE GROUP rgroup1 WITH (CONCURRENCY=20, CPU_MAX_PERCENT=20, MEMORY_QUOTA=250, CPU_WEIGHT=500, MIN_COST=50);

Where, the CPU and memory limits are shared by each role assigned to `rgroup1`.

`ALTER RESOURCE GROUP` command updates the limit of resource groups.

ALTER RESOURCE GROUP rg_role_light SET CONCURRENCY 7; ALTER RESOURCE GROUP exec SET MEMORY_QUOTA 30; ALTER RESOURCE GROUP rgroup1 SET CPUSET '1;2,4';

> ***Note!***
> The `CONCURRENCY` value of `admin_group` cannot be set or changed to 0.

The `DROP RESOURCE GROUP` command deletes a resource group. To delete a resource group for a role, the group cannot be assigned to any role, and there cannot be any active or waiting transactions in the resource group.

DROP RESOURCE GROUP exec;

## Configure automatic query termination based on memory usage
YMatrix supports Runaway detection. For queries managed by resource groups, the query can be automatically terminated based on the amount of memory used by the query.
The relevant configuration parameters are as follows:
- `gp_vmem_protect_limit`: Sets the amount of memory that all postgres processes in the active segment instance can consume. If a query causes this limit to be exceeded, no memory will be allocated and the query will fail.
- `runaway_detector_activation_percent`: When enabling resource groups, if the amount of memory used exceeds the specified value `gp_vmem_protect_limit` * `runaway_detector_activation_percent`, YMatrix terminates queries managed by resource groups (excluding queries in the `system_group` resource group) based on memory usage. Query termination starts with the query that consumes the largest amount of memory until the memory usage is less than the specified percentage terminates.

## Assign resource groups to roles

- Assign resource groups to database roles using the `RESOURCE GROUP` clause of the `CREATE ROLE` or `ALTER ROLE` command.

ALTER ROLE bill RESOURCE GROUP rg_light; CREATE ROLE mary RESOURCE GROUP exec;

Resource groups can be assigned to one or more roles. If the hierarchy of a role has been defined, the resource group assigned to the parent role is not propagated down to members of that role group.

- If you need to remove resource group assignment from a role and assign a default group to a role, you can change the role's group name assignment to `NONE`.

ALTER ROLE mary RESOURCE GROUP NONE;

## Monitor resource group status

- View resource group restrictions

SELECT * FROM gp_toolkit.gp_resgroup_config;

- Check the resource group query status

SELECT * FROM gp_toolkit.gp_resgroup_status;

- Check the memory usage of resource groups on each host

SELECT * FROM gp_toolkit.gp_resgroup_status_per_host;

- View resource groups assigned to roles

SELECT rolname, rsgname FROM pg_roles, pg_resgroup WHERE pg_roles.rolresgroup=pg_resgroup.oid;

- View the running and pending queries of resource groups

SELECT query, rsgname,wait_event_type, wait_event FROM pg_stat_activity;

- Cancel running or queued transactions in resource groups
To manually cancel a running or queued transaction, you must first determine the process id or pid associated with the transaction. After obtaining the process id, you can call `pg_cancel_backend()` to end the process.
The specific steps are as follows:
  - First run the following query to view the process information associated with all statements currently active or idle in all resource groups. If the query does not return any results, there are no running or queued transactions in the resource group.
SELECT rolname, g.rsgname, pid, waiting, state, query, datname
FROM pg_roles, gp_toolkit.gp_resgroup_status g, pg_stat_activity
WHERE pg_roles.rolresgroup=g.groupid
AND pg_stat_activity.usename=pg_roles.rolname;
- Query result example
rolname | rsgname | pid | waiting | state | query | datname
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  sammy | rg_light | 31861 | f | idle | SELECT * FROM mytesttbl; | testdb
  billy | rg_light | 31905 | t | active | SELECT * FROM topten; | testdb
```
  • Transaction process ends
      SELECT pg_cancel_backend(31905);

    Notes!
    Do not use the operating system KILL command to cancel any YMatrix database process.

Mobile query to different resource groups

Users with superuser permissions can run the gp_toolkit.pg_resgroup_move_query() function to move a running query from one resource group to another without stopping the query. Use this function to speed up queries by moving long-running queries to resource allocations or resource groups with higher availability.

pg_resgroup_move_query() moves only the specified query to the target resource group, and subsequent subscriptions are still assigned to the original resource group.

Notes!
Only active or running queries can be moved to a new resource group. Due to concurrency or memory restrictions, queued or pending queries that are in idle state cannot be moved.

pg_resgroup_move_query() requires the process id or pid of the running query, and the name of the resource group to which the query is to be moved.

pg_resgroup_move_query( pid int4, group_name text );

As described in Cancel a running or queued transaction in a resource group, you can use the gp_toolkit.gp_resgroup_status view to list the name, id, and status of each resource group.

When the pg_resgroup_move_query() function is called, the running query is restricted by the target resource group configuration, including concurrent task limits, memory limits, etc.

  • If the target resource group has reached its concurrent task limit, the database queues the query until a slot is available, or if gp_resource_group_queuing_timeout is set, it queues for the specified number of million seconds.
  • If the target resource group has free slots, pg_resgroup_move_query() attempts to hand over slot control to the target process, at most attempts to gp_resource_group_move_timeout as specified millionseconds. If the target process cannot process the move request within gp_resource_group_queuing_timeout time, the database returns an error message.
  • If pg_resgroup_move_query() is cancelled, but the target process has already obtained all slots empty, the segment process will not move to the new group and the target process will retain the slots. This inconsistent state will be fixed at the end of the transaction or in any next command scheduled by the target process in the same transaction.
  • If the target resource group does not have enough memory available to meet the current memory requirements of the query, the database will return an error message. You can choose to increase the shared memory allocated by the target resource group, or wait for a while for the running query to complete before calling the function.

After moving the query, it is not possible to ensure that the query currently running in the target resource group will not exceed the memory quota for that group. In this case, one or more running queries in the target resource group may fail, including moving queries. You can minimize the possibility of this happening by reserved ​​​enough resource groups to share global memory.