This document describes how to use resource groups to manage transaction concurrency, CPU, and memory resource allocation in YMatrix. After defining a resource group, you can assign it to one or more database roles to control the resources that the roles can use.
YMatrix uses Linux-based control groups (cgroups) to manage database resources and uses Runaway for memory statistics, tracking, and management.
The following table lists the configurable resource group parameters:
Limit Type | Description | Value Range | Default |
---|---|---|---|
CONCURRENCY | Maximum number of concurrent transactions allowed in the resource group, including active and idle transactions. | [0 - max_connection] | 20 |
CPU_MAX_PERCENT | Maximum percentage of CPU resources that the resource group can use. | [1 - 100] | -1 (not set) |
CPU_WEIGHT | Scheduling priority of the resource group. | [1 - 500] | 100 |
CPUSET | Specific CPU logical cores (or logical threads in hyper-threading) reserved for the resource group. | It depends on system core configuration | -1 |
MEMORY_QUOTA | Memory limit value specified for the resource group. | Integer(MB) | -1 (not set, use statement_mem as the memory limit for a single query) |
MIN_COST | Minimum cost of a query plan to be included in the resource group. | Integer | 0 |
IO_LIMIT | The limit for the maximum read/write disk I/O throughput, and maximum read/write I/O operations per second. Set the value on a per-tablespace basis. | [2 - 4294967295 or max ] |
-1 |
Note!
Resource limits are not enforced onSET
,RESET
, andSHOW
commands.
When a user runs a query, YMatrix evaluates the query status based on a set of limits defined for the resource group.
CONCURRENCY
controls the maximum number of concurrent transactions allowed in the resource group. The default limit is 20, and a value of 0 means that this resource group does not allow queries to run.
If the resource limit has not been reached and the query will not exceed the concurrency transaction limit of the resource group, YMatrix will run the query immediately. If the maximum number of concurrent transactions in the resource group has been reached, YMatrix will queue any transactions submitted after the resource group reaches its CONCURRENCY
limit and wait for other queries to complete before running them.
gp_resource_group_queuing_timeout
can be used to specify the waiting time before a transaction in the queue is canceled. The default is 0, meaning that transactions are queued indefinitely.
Bypass Resource Group Allocation Limits
gp_resource_group_bypass
is used to enable or disable the concurrency transaction limit of the resource group, allowing queries to be executed immediately. If set to true
, the query will skip the concurrency limit of the resource group. At this time, the memory quota of the query is allocated according to statement_mem
. If there is insufficient memory, the query will fail. This parameter can only be set in a single session and cannot be set within a transaction or function.gp_resource_group_bypass_catalog_query
is used to determine whether system table queries skip the resource group's resource limits. The default is true
. This is applicable to situations where GUI clients of similar databases run catalog queries to obtain metadata. Its resource allocation is outside the resource group, and the memory quota for each query is statement_mem
.gp_resource_group_bypass_direct_dispatch
is used to control whether direct dispatch queries skip the limits of the resource group. If set to true
, the query will no longer be constrained by the CPU or memory limits allocated by its resource group and can be executed immediately. At this time, the memory quota of the query is allocated according to statement_mem
. If there is insufficient memory, the query will fail. This parameter can only be set in a single session and cannot be set within a transaction or function.YMatrix allocates CPU resources in the following two ways:
Different CPU resource allocation modes can be configured for different resource groups on the same cluster, and each resource group can only use one CPU resource allocation mode. The CPU resource allocation mode of the resource group can also be changed at runtime.
You can determine the maximum percentage of system CPU resources to be allocated to the resource groups of each segment node through gp_resource_group_cpu_limit
.
Assigning CPU Resources by Core
CPUSET
is used to set the CPU cores to be reserved for the resource group. When CPUSET
is configured for a resource group, YMatrix will disable the CPU_MAX_PERCENT
and CPU_WEIGHT
of the group and set their values to -1
.
Usage Notes:
Assigning CPU Resources by Percentage
CPU_MAX_PERCENT
is used to set the upper limit of CPU resources that a segment can use. For example, setting it to 40 means that at most 40% of the available CPU resources can be used. When tasks in the resource group are idle and do not use any CPU time, the remaining time will be collected into a global pool of unused CPU cycles, from which other resource groups can borrow CPU cycles.
CPU_WEIGHT
is used to allocate the scheduling priority of the current group. The default value is 100, and the value range is 1 to 500. This value specifies the relative share of CPU time that tasks in the resource group can obtain.
Usage Notes:
CPU_MAX_PERCENT
value of all groups is set to 100), the first resource group will get 50% of the total CPU time, and the other two groups will each get 25%.CPU_MAX_PERCENT
set to 100) is added again, the first group can only use 33% of the CPU, and the remaining groups will get 16.5%, 16.5%, and 33% respectively.配置说明
Group Name | CONCURRENCY | CPU_MAX_PERCENT | CPU_WEIGHT |
---|---|---|---|
default_group | 20 | 50 | 10 |
admin_group 10 | 70 | 30 | |
system_group | 10 | 30 | 10 |
test | 10 | 10 | 10 |
Roles in default_group
have an available CPU ratio (determined by CPU_WEIGHT
) of 10/(10+30+10+10)=16%. This means that they can use at least 16% of the CPU when the system workload is high. When the system has idle CPU resources, they can use more resources, as the hard limit (set by CPU_MAX_PERCENT
) is 50%.
Roles in admin_group
have an available CPU ratio of 30/(10+30+10+10)=50% when the system workload is high. When the system has idle CPU resources, they can use resources up to the hard limit of 70%.
Roles in test have a CPU ratio of 10/(10+30+10+10)=16%. However, as the hard limit determined by CPU_MAX_PERCENT
is 10%, they can only use up to 10% of the resources even when the system is idle.
MEMORY_QUOTA
represents the maximum amount of memory reserved for the resource group on the Segment. This is the total amount of memory that all query worker processes on the Segment can consume during query execution. The amount of memory allocated to a query is the memory limit of the group divided by the concurrency limit of the group: MEMORY_QUOTA
/ CONCURRENCY
。
If a query requires more memory, you can set the required amount of memory for the query at the session level using gp_resgroup_memory_query_fixed_mem
, which can override and exceed the memory allocated by the resource group. Precautions for use
gp_resgroup_memory_query_fixed_mem
is set, the setting value is used to bypass the resource group settings.gp_resgroup_memory_query_fixed_mem
is not set, MEMORY_QUOTA
/ CONCURRENCY
is used as the amount of memory allocated to the query.MEMORY_QUOTA
is not set, the value assigned by query memory defaults to statement_mem
.gp_workfile_limit_files_per_query
is reached.Configuration Example
Consider a resource group named adhoc with MEMORY_QUOTA
set to 1.5 GB and CONCURRENCY
set to 3. By default, each statement submitted to the group is allocated 500MB of memory. Now consider the following series of events:
ADHOC_1
submits query Q1
, override gp_resgroup_memory_query_fixed_mem
to 800MB. The Q1
statement is allowed to enter the system.ADHOC_2
submits query Q2
, using the default 500MB.Q1
and Q2
still running, the user ADHOC_3
submits a query `Q3, using the default 500MB.Q1
and Q2
have used 1300MB of the resource group 1500MB. However, if the system has enough memory available for querying Q3
in some periods, it will work properly.ADHOC_4
submits a query Q4
, set to 700MB using gp_resgroup_memory_query_fixed_mem
.
Since the query Q4
bypasses resource group restrictions, it will run immediately.Special precautions
gp_resource_group_bypass
or gp_resource_group_bypass_catalog_query
sets configuration parameters to bypass resource group restrictions, the memory limit value of the query is statement_mem
.MEMORY_QUOTA
/ CONCURRENCY
)< statement_mem
, statement_mem
is used as the fixed amount of memory allocated by the query.statement_mem
is max_statement_mem
.MIN_COST
, using statement_mem
as memory quota.IO_LIMIT
limits the maximum read-write disk I/O throughput assigned to a specific resource group query, and the maximum read-write I/O operators per second. Ensure the use of high-priority resource groups and avoid overuse of disk bandwidth. The value of this parameter needs to be set according to the table space.
Notes!
IO_LIMIT
is only supported by Cgroup v2.
When performing disk I/O restrictions, you can configure it with the following parameters:
*
to set restrictions for all tablespaces.rbps
and wbps
are used to limit the maximum read and write throughput of disk I/O in resource groups in MB/s. The default value is max
, indicating no limit.riops
and wiops
are used to limit the maximum read and write I/O operators per second in a resource group. The default value is max
, indicating no limit.Configuration Description
If the IO_LIMIT
parameter is not set, the default values for rbps
, wpbs
, riops
, and wiops
will be set to max
, meaning there is no limit on disk I/O. If only partial values of IO_LIMIT
are set (for example, rbps
), the unset parameters will default to max
(in this case wbps
, riops
, and wiops
are the default values `max).
cgroup
version configured in your environment by checking the file system mounted by default during system startup:stat -fc %T /sys/fs/cgroup/
For cgroup v1
, the output is tmpfs
. For cgroup v2
, the output is cgroup2fs
.
If you do not need to change the version of cgroup, just skip to Configure cgroup v1 or Configure cgroup v2 to complete the configuration operation.
If you need to switch from cgroup v1 to v2, run the following command as root:
grubby --update-kernel=/boot/vmlinuz-$(uname -r) --args="systemd.unified_cgroup_hierarchy=1"
vim /etc/default/grub
# add or modify: GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=1"
update-grub
If you need to switch from cgroup v2 to v1, run the following command as root:
grubby --update-kernel=/boot/vmlinuz-$(uname -r) --args="systemd.unified_cgroup_hierarchy=0 systemd.legacy_systemd_cgroup_controller"
vim /etc/default/grub
# add or modify: GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=0"
update-grub
If you want to continue using cgroup v1, make sure that there are no limits for each memory.limit_in_bytes
file under /sys/fs/cgroup/memory/gpdb
(including /sys/fs/cgroup/memory/gpdb/memory.limit_in_bytes
and /sys/fs/cgroup/memory/gpdb/[OID]/memory.limit_in_bytes
. If so, run:
echo -1 >> memory.limit_in_bytes
After that, restart the host for the changes to take effect.
Notes!
You need to use the SuperUser or a user withsudo
access to edit this file.vi /etc/cgconfig.conf
Add the following information to the configuration file.
group gpdb { perm { task { uid = mxadmin; gid = mxadmin; } admin { uid = mxadmin; gid = mxadmin; } } cpu { } cpuacct { } cpuset { } memory { } }
This content is configured with CPU, CPU accounting, CPU core sets, and memory control groups managed by the
mxadmin
user.
cgroup
service for each node of the YMatrix cluster.cgconfigparser -l /etc/cgconfig.conf
systemctl enable cgconfig.service
systemctl start cgconfig.service
cgroup
directory mount point of the nodegrep cgroup /proc/mounts
cgroup
directory mount point /sys/fs/cgroup
.tmpfs /sys/fs/cgroup tmpfs ro,nosuid,nodev,noexec,mode=755 0 0
cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd 0 0
cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0
cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_prio,net_cls 0 0
cgroup /sys/fs/cgroup/pids cgroup rw,nosuid,nodev,noexec,relatime,pids 0 0
cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpuacct,cpu 0 0
cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0
cgroup /sys/fs/cgroup/hugetlb cgroup rw,nosuid,nodev,noexec,relatime,hugetlb 0 0
cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0
cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0
cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0
ls -l <cgroup_mount_point>/cpu/gpdb
ls -l <cgroup_mount_point>/cpuset/gpdb
ls -l <cgroup_mount_point>/memory/gpdb
If the directory exists and the owner is mxadmin:mxadmin
, it means that the resource management of the YMatrix database is successfully configured with cgroup
.
Configure the system to mount cgroups-v2 by default when the system starts, and implement it as root user through the systemd system and service manager.
grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=1"
Restart the system to make the changes take effect.
reboot now
Create the directory /sys/fs/cgroup/matrixdb6.service
, add all necessary controllers, and make sure that the mxadmin
user has read and write permissions to it.
mkdir -p /sys/fs/cgroup/matrixdb6.service
echo "+cpuset +io +cpu +memory" | tee -a /sys/fs/cgroup/cgroup.subtree_control
chown -R mxadmin:mxadmin /sys/fs/cgroup/matrixdb6.service
After running the above command, you may encounter an error invalid parameter. This is because cgroup v2 does not support controlling real-time processes, and the cpu controller can only be enabled if all real-time processes are in the root cgroup. In this case, find all the live processes and move them to the root cgroup before re-enable the controller.
Ensure that mxadmin
has write permissions to /sys/fs/cgroup/cgroup.procs
. This is to move the YMatrix process from the user to /sys/fs/cgroup/matrixdb6.service/
after the cluster is started to manage the postmaster service and all of its required worker processes.
chmod a+w /sys/fs/cgroup/cgroup.procs
Since the resource group manually manages the cgroup
file, the above settings will be invalid after the system restarts. Add the following bash
script for systemd
to make it run automatically during system startup. Perform the following steps as root:
Create matrixdb6.service
vim /etc/systemd/system/matrixdb6.service
Write the following content to matrixdb6.service
, and if the user is not mxadmin
, replace it with the corresponding user.
[Unit]
Description=Greenplum Cgroup v2 Configuration Service
[Service]
Type=simple
WorkingDirectory=/sys/fs/cgroup/matrixdb6.service
Delegate=yes
Slice=-.slice
ExecCondition=bash -c '[ xcgroup2fs = x$(stat -fc "%%T" /sys/fs/cgroup) ] || exit 1' ExecStartPre=bash -ec " \ chown -R mxadmin:mxadmin .; \ chmod a+w ../cgroup.procs; \ mkdir -p helper.scope" ExecStart=sleep infinity ExecStartPost=bash -ec "echo $MAINPID > /sys/fs/cgroup/cgroup.procs;" [Install] WantedBy=basic.target
3. Reload the `systemd` daemon and enable the service:
systemctl daemon-reload systemctl enable gpdb.service
## Enable resource groups
1. Set the `gp_resource_manager` server configuration parameter to the value `"group"`
gpconfig -c gp_resource_manager -v "group"
2. Restart the YMatrix database cluster
mxstop -arf
When enabled, any transactions submitted by a role will be directed to the resource group assigned to the role and will be subject to the concurrency, memory, and CPU restrictions of the resource group.
YMatrix creates role resource groups named `admin_group`, `default_group`, and `system_group` by default. When a resource group is enabled, any role that does not explicitly assign a resource group is assigned to the default group for that role feature. The `SUPERUSER` role is assigned to `admin_group`, the non-admin role is assigned to `default_group`, and the resources of the system process are assigned to `system_group`. Among them, any role cannot be assigned to `system_group` manually.
Each role resource group is configured as follows.
Parameters | admin_group | default_group | system_group
--- | --- | --- | ---
CONCURRENCY | 10 | 5 | 0
CPU_MAX_PERCENT | 10 | 20 | 10
CPU_WEIGHT | 100 | 100 | 100
CPUSET | -1 | -1
IO_LIMIT | -1 | -1 | -1
MEMORY_LIMIT | -1 | -1 | -1
MIN_COST | 0 | 0 | 0
## Create a resource group
The `CREATE RESOURCE GROUP` command creates a new resource group. When creating a resource group for a role, provide the name and CPU resource allocation pattern (core or percentage). The `CPU_MAX_PERCENT` or `CPUSET` limit value must be provided.
**Usage Example**
Create a resource group named `rgroup1` with a CPU limit of 20, a memory limit of 250, a CPU priority of 500, and a minimum overhead of 50:
CREATE RESOURCE GROUP rgroup1 WITH (CONCURRENCY=20, CPU_MAX_PERCENT=20, MEMORY_QUOTA=250, CPU_WEIGHT=500, MIN_COST=50);
Where, the CPU and memory limits are shared by each role assigned to `rgroup1`.
`ALTER RESOURCE GROUP` command updates the limit of resource groups.
ALTER RESOURCE GROUP rg_role_light SET CONCURRENCY 7; ALTER RESOURCE GROUP exec SET MEMORY_QUOTA 30; ALTER RESOURCE GROUP rgroup1 SET CPUSET '1;2,4';
> ***Note!***
> The `CONCURRENCY` value of `admin_group` cannot be set or changed to 0.
The `DROP RESOURCE GROUP` command deletes a resource group. To delete a resource group for a role, the group cannot be assigned to any role, and there cannot be any active or waiting transactions in the resource group.
DROP RESOURCE GROUP exec;
## Configure automatic query termination based on memory usage
YMatrix supports Runaway detection. For queries managed by resource groups, the query can be automatically terminated based on the amount of memory used by the query.
The relevant configuration parameters are as follows:
- `gp_vmem_protect_limit`: Sets the amount of memory that all postgres processes in the active segment instance can consume. If a query causes this limit to be exceeded, no memory will be allocated and the query will fail.
- `runaway_detector_activation_percent`: When enabling resource groups, if the amount of memory used exceeds the specified value `gp_vmem_protect_limit` * `runaway_detector_activation_percent`, YMatrix terminates queries managed by resource groups (excluding queries in the `system_group` resource group) based on memory usage. Query termination starts with the query that consumes the largest amount of memory until the memory usage is less than the specified percentage terminates.
## Assign resource groups to roles
- Assign resource groups to database roles using the `RESOURCE GROUP` clause of the `CREATE ROLE` or `ALTER ROLE` command.
ALTER ROLE bill RESOURCE GROUP rg_light; CREATE ROLE mary RESOURCE GROUP exec;
Resource groups can be assigned to one or more roles. If the hierarchy of a role has been defined, the resource group assigned to the parent role is not propagated down to members of that role group.
- If you need to remove resource group assignment from a role and assign a default group to a role, you can change the role's group name assignment to `NONE`.
ALTER ROLE mary RESOURCE GROUP NONE;
## Monitor resource group status
- View resource group restrictions
SELECT * FROM gp_toolkit.gp_resgroup_config;
- Check the resource group query status
SELECT * FROM gp_toolkit.gp_resgroup_status;
- Check the memory usage of resource groups on each host
SELECT * FROM gp_toolkit.gp_resgroup_status_per_host;
- View resource groups assigned to roles
SELECT rolname, rsgname FROM pg_roles, pg_resgroup WHERE pg_roles.rolresgroup=pg_resgroup.oid;
- View the running and pending queries of resource groups
SELECT query, rsgname,wait_event_type, wait_event FROM pg_stat_activity;
- Cancel running or queued transactions in resource groups
To manually cancel a running or queued transaction, you must first determine the process id or pid associated with the transaction. After obtaining the process id, you can call `pg_cancel_backend()` to end the process.
The specific steps are as follows:
- First run the following query to view the process information associated with all statements currently active or idle in all resource groups. If the query does not return any results, there are no running or queued transactions in the resource group.
SELECT rolname, g.rsgname, pid, waiting, state, query, datname
FROM pg_roles, gp_toolkit.gp_resgroup_status g, pg_stat_activity
WHERE pg_roles.rolresgroup=g.groupid
AND pg_stat_activity.usename=pg_roles.rolname;
- Query result example
rolname | rsgname | pid | waiting | state | query | datname
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sammy | rg_light | 31861 | f | idle | SELECT * FROM mytesttbl; | testdb
billy | rg_light | 31905 | t | active | SELECT * FROM topten; | testdb
```
SELECT pg_cancel_backend(31905);
Notes!
Do not use the operating systemKILL
command to cancel any YMatrix database process.
Users with superuser permissions can run the gp_toolkit.pg_resgroup_move_query()
function to move a running query from one resource group to another without stopping the query. Use this function to speed up queries by moving long-running queries to resource allocations or resource groups with higher availability.
pg_resgroup_move_query()
moves only the specified query to the target resource group, and subsequent subscriptions are still assigned to the original resource group.
Notes!
Only active or running queries can be moved to a new resource group. Due to concurrency or memory restrictions, queued or pending queries that are in idle state cannot be moved.
pg_resgroup_move_query()
requires the process id or pid of the running query, and the name of the resource group to which the query is to be moved.
pg_resgroup_move_query( pid int4, group_name text );
As described in Cancel a running or queued transaction in a resource group, you can use the gp_toolkit.gp_resgroup_status
view to list the name, id, and status of each resource group.
When the pg_resgroup_move_query()
function is called, the running query is restricted by the target resource group configuration, including concurrent task limits, memory limits, etc.
gp_resource_group_queuing_timeout
is set, it queues for the specified number of million seconds.pg_resgroup_move_query()
attempts to hand over slot control to the target process, at most attempts to gp_resource_group_move_timeout
as specified millionseconds. If the target process cannot process the move request within gp_resource_group_queuing_timeout
time, the database returns an error message.pg_resgroup_move_query()
is cancelled, but the target process has already obtained all slots empty, the segment process will not move to the new group and the target process will retain the slots. This inconsistent state will be fixed at the end of the transaction or in any next command scheduled by the target process in the same transaction.After moving the query, it is not possible to ensure that the query currently running in the target resource group will not exceed the memory quota for that group. In this case, one or more running queries in the target resource group may fail, including moving queries. You can minimize the possibility of this happening by reserved enough resource groups to share global memory.