MARS3 Automatic Tiered Storage

This document describes how YMatrix supports automatic data tiering to object storage (such as AWS S3), enabling fully automated hot and cold data tiering.

Note!
This feature is only supported for MARS3 tables.

1 Background

IoT scenarios generate massive amounts of data. While YMatrix emphasizes the importance of hot data (real-time data), it also focuses on optimizing storage performance and usability for cold data (historical data). For example, transaction orders on e-commerce platforms are frequently accessed within 7 days, but access frequency drops significantly after the return/refund period expires. However, these expired orders still consume substantial storage space. In such cases, reducing storage costs becomes a critical concern.

To address cold data storage, YMatrix has developed a data degradation feature. It enables storing hot and cold data on different media: hot data remains on high-performance storage such as standard storage, local SSDs, or HDDs, while cold data is automatically moved to cost-effective capacity-tier storage like object storage. Capacity-tier storage costs only 20% of standard storage, significantly lowering overall storage expenses. Once migrated, this data seamlessly integrates with YMatrix's powerful Massively Parallel Processing (MPP) architecture, achieving efficient data tiering and storage optimization.

Beyond direct storage cost savings, the data degradation process requires no manual intervention, minimizing operational and management overhead.

Most existing tiered storage solutions rely on FDW (Foreign Data Wrapper). In contrast, YMatrix treats object storage as an equivalent storage medium to local file systems and integrates it directly into the MARS3 storage engine. Capabilities available for regular file systems are equally applicable to degraded data.

This means MARS3 tiered storage offers greater flexibility:

Feature YMatrix (MARS3 Tiered Storage) FDW-based Solutions
Transaction Support Yes No
Automatic Cold Data Migration Yes No
Management Complexity No manual setup required Requires complex operations

2 Quick Start

This section walks you through a complete workflow to help you quickly experience the process and results of data tiering.

2.1 Create Object Storage

We use AWS S3 as an example.

First, log in to AWS, click your username at the top right, select Security credentials, then scroll down and choose Sign out. Since we need to migrate cold data from YMatrix to an AWS S3 bucket, select Applications running outside of Amazon Web Services. Creation successful.

After creating the bucket, collect the following information:

  • Access Key
  • Secret Key
  • Region
  • Bucket
  • Endpoint: https://<your_bucket>.s3.<your_region>.amazonaws.com.cn

2.2 Initialize Test Cluster

Create a tablespace directory on both the Master and Segment servers of your test cluster.

$ mkdir -p <tablespace_dir,eg:'/home/mxadmin/test'>

2.3 Initialize Tiered Storage

Use vi quick.sql to create a file named quick.sql with the following content:

-- Create matrixts extension
CREATE EXTENSION IF NOT EXISTS matrixts;

-- Configure system parameters
\!gpconfig -c mars3.enable_objectstore -v ON --skipvalidation
\!gpconfig -c mars3.degrade_min_run_size -v 0 --skipvalidation
\!gpconfig -c mars3.degrade_probe_interval -v 300 --skipvalidation

-- Create tablespace your_tbs
DROP TABLESPACE IF EXISTS your_tbs;

CREATE TABLESPACE your_tbs LOCATION :tablespace_dir
WITH(
        oss_accessid=:oss_accessid,
        oss_secretkey=:oss_secretkey,
        oss_endpoint=:oss_endpoint,
        oss_region=:oss_region,
        oss_bucket=:oss_bucket
);

Replace configuration values in < > (including < >) and execute:

$ psql -be -d <your_database> 
                  -v tablespace_dir="'<your_tablespace_dir>'" \
                  -v oss_accessid="'<your_oss_accessid>'" \
                  -v oss_secretkey="'<your_oss_secretkey>'" \
                  -v oss_endpoint="'<your_oss_endpoint>'" \
                  -v oss_region="'<your_oss_region>'" \
                  -v oss_bucket="'<your_oss_bucket>'" \
                  -f quick.sql

Reload configuration:

$ mxstop -u

2.4 Create Table with Tiering Enabled

Use vi quick2.sql to create quick2.sql containing:

\set tname t1
\set tnamestr '''t1'''

-- Create test table; configure migration of data older than 7 days
DROP TABLE IF EXISTS :tname;
CREATE TABLE IF NOT EXISTS
:tname(
   key int,
   value int,
   c3 date)
USING mars3
WITH(ttl_interval='7 d', ttl_space='your_tbs')
DISTRIBUTED BY (key)
ORDER BY (key, value);

-- Insert test data
INSERT INTO :tname SELECT i, i, current_date FROM generate_series(1, 12000000) i;

Run quick2.sql:

$ psql -be -d <your_database> -f quick2.sql

2.5 Manual Migration (For Demonstration Only)

Use the built-in function to manually migrate all data to object storage:

$ psql <your_database>
=# SELECT matrixts_internal.mars3_degrade('t1'::regclass, -1);

2.6 Verify Migration Success

After executing the above command, all data from t1 should be stored in object storage. The cluster retains only a "shell" for accessing the data. Verify access using:

=# \set tname t1

SELECT * FROM :tname LIMIT 10;
DELETE FROM :tname WHERE key < 50;
SELECT * FROM :tname WHERE key < 50;

You can also check the object storage console to confirm data presence. Try deleting the migrated objects and re-accessing them:

=# SELECT * FROM :tname WHERE key < 50;

The database will return error key not exist, confirming success: the accessed data now resides entirely in object storage.

3 Full Features

Feature Description Configuration Parameters
Authentication Modes Supports two authentication methods:
Static AccessId/SecretKey
Dynamic AccessId/SecretKey via Security Token Service (STS)
Tablespace creation parameters
System parameter mars3.degrade_credential_update_interval
Migration Policy Management Controls migration policies including criteria for identifying historical data and interval for background checks ttl_interval/ttl_space/mars3.enable_objectstore/mars3.degrade_probe_interval
Query Caching Improves I/O efficiency and query performance by caching data retrieved from object storage matrixts.enable_object_cache/mars3.enable_object_prefetch/oss_cache_size
Operations Tools Link verification tool: checks if environment meets prerequisites
Garbage scanner: identifies and allows manual cleanup of orphaned objects

3.1 Connecting to Object Storage with Different Authentication Modes

YMatrix supports two authentication modes—choose one:

  1. The quick start uses static credentials (AWS S3):

    =# CREATE TABLESPACE <your_tablespace> LOCATION '<your_tablespace_dir>'
    WITH(
     oss_accessid='<your_accessid>',
     oss_secretkey='<your_secretkey>',
     oss_endpoint='<your_address>',
     oss_bucket='<your_bucket>'
    );
  2. For STS mode, you must provide additional script to refresh credentials periodically:

-- Create tablespace
CREATE TABLESPACE <your_tablespace> LOCATION '<your_tablespace_dir>'
WITH(
        oss_endpoint='<your_address>',
        oss_bucket='<your_bucket>'
);
-- Script to update oss_sessiontoken/oss_accessid/oss_secretkey regularly
DB="postgres"
SPACE="tbs"

RES=$(aws sts get-session-token)
if [ $? != 0 ]
then
        echo "get credentials fail!"
        exit 1
fi

ak=$(echo "${RES}"|jq -r .Credentials.AccessKeyId)
sk=$(echo "${RES}"|jq -r .Credentials.SecretAccessKey)
token=$(echo "${RES}"|jq -r .Credentials.SessionToken)

if [ "${ak}" == "" ]
then
        echo "AccessKeyId is empty!"
        exit 1
fi

if [ "${sk}" == "" ]
then
        echo "SecretAccessKey is empty!"
        exit 1
fi

if [ "${token}" == "" ]
then
        echo "SessionToken is empty!"
        exit 1
fi

echo $ak
echo $sk
echo $token

psql -d "${DB}" -c "ALTER TABLESPACE ${SPACE} SET (oss_accessid='${ak}', oss_secretkey='${sk}', oss_sessiontoken='${token}');"
if [ $? != 0 ]
then
        echo "alter tablespace fail!"
        exit 1
fi

To schedule periodic credential updates, set:

Note!
See CREATE TABLESPACE for details on tablespace parameters.

3.2 Migration Policy Management

Enable data migration first using:

Example:

$ gpconfig -c mars3.enable_objectstore -v ON --skipvalidation
$ gpconfig -c mars3.degrade_probe_interval -v 300 --skipvalidation
$ mxstop -u

Set migration policy during table creation using ttl_interval and ttl_space:

=# CREATE TABLE <your_table> (
  ...
) 
...
WITH (ttl_interval='<your_interval>', ttl_space='<your_tablebspace>')
...
;

Modify policy on existing tables:

=# ALTER TABLE <your_table> SET (ttl_interval='14d');

3.3 Query Caching

Caching reduces I/O overhead when querying cold data from object storage, improving performance. Controlled by:

Example: Set 10,000MB cache per tablespace:

=# CREATE TABLESPACE <your_tablebspace> LOCATION <your_tablespace_dir>
WITH(
    ...
    oss_cache_size=10000,
    ...
);

Enable caching in session:

=# SET matrixts.enable_object_cache TO ON;
=# SET mars3.enable_object_prefetch TO ON;

Note!
Each backend process requires 64MB memory for file cache and 64MB for memory cache. A query with 10 segments may require up to 1280MB extra memory. For optimal performance, enable both caches.

3.4 Operations Tools

3.4.1 Data Link Check

Before enabling tiered storage on a production cluster, use the link checker to verify environmental compatibility.

The osscheck tool is located in the $GPHOME/bin directory.

Steps:

  1. Generate config template:

    $ osscheck -g conf
  2. Edit configuration file

Required fields: oss_accessid, oss_secretkey, oss_endpoint, oss_bucket
For STS: include oss_sessiontoken
Optional prefix: oss_keyprefix

Template (replace values in < >, including < >):

oss_accessid='<your_oss_accessid>'
oss_secretkey='<your_oss_secretkey>'
oss_sessiontoken='<your_oss_sessiontoken>'
oss_endpoint='<your_oss_endpoint>'
oss_region='<your_oss_region>'
oss_bucket='<your_oss_bucket>'
oss_keyprefix='<your_oss_keyprefix>'
oss_retry_mode=default
oss_upload_partsize=5
oss_upload_parallelnum=3
oss_retry_times=3
oss_timeout=3000
  1. Run link check:
    $ osscheck -t conf
    put object success
    get object success
    list object success
    delete object success

Success shows green success. Failure shows red fail:

$ osscheck -t conf.template
ERROR: InvalidAccessKeyId: The Access Key Id you provided does not exist in our records.
put object fail

3.4.2 Garbage Scanner

Identifies unexpected orphaned objects in object storage.

Causes of garbage data:

  • Failed deletion of object data when dropping object tables
  • Direct database or cluster deletion without removing associated object data
  • Manual errors (e.g., forgotten test files)

Notes:

  • Uses matrixts_internal.oss_junkobjects() to query catalog
  • Requires tablespace configuration; scans only under current oss_bucket/oss_keyprefix/ path
  • Ensure this path is dedicated to the current cluster

Usage:

=# CREATE EXTENSION IF NOT EXISTS matrixts;
=# SELECT * FROM matrixts_internal.oss_junkobjects('<your_tablespace>');
                 key                 |       lastmodify       |   size
-------------------------------------+------------------------+----------
 49948/2/52751/4/0                   | 2023-10-19 16:00:14+08 |    38964
 49948/2/53021/1/0                   | 2023-10-19 16:04:55+08 |    38964
 16384/2/158581/3/0                  | 2023-09-03 13:16:26+08 |     4080
 16384/2/158581/p2                   | 2023-09-03 15:18:36+08 |        5
 16384/2/202151/3/0                  | 2023-09-21 16:22:09+08 |       80
 49948/1/52751/4/0                   | 2023-10-19 16:00:14+08 |    40620
 49948/1/52861/1/0                   | 2023-10-19 16:01:34+08 |   240000
 49948/1/53021/1/0                   | 2023-10-19 16:04:55+08 |    40620
 49948/1/53038/1/0                   | 2023-10-19 16:05:10+08 |   120000
 16384/1/158581/3/0                  | 2023-09-03 13:16:26+08 |     3864
 16384/1/202151/3/0                  | 2023-09-21 16:22:09+08 |       20
 49948/0/52751/4/0                   | 2023-10-19 16:00:14+08 |    40416
 49948/0/53021/1/0                   | 2023-10-19 16:04:55+08 |    40416
 ......

Fields:

  • key: Orphaned object key
  • lastmodify: Last modified timestamp
  • size: Object size in bytes

Frequently Asked Questions

  1. Must the table have a time column for tiering?

Yes. The time column must be part of the partition key and be of type date or timestamp. Time is used to determine cold data eligibility. Currently, tables without a time-based partition key cannot use tiering.

Note!
Tables with multiple time columns in the partition key are not supported.

  1. How often is migration checked?

Default is every hour. Can be adjusted via mars3.degrade_probe_interval.

  1. How to temporarily disable auto-migration?

Set and reload:

$ gpconfig -c mars3.enable_objectstore -v off --skipvalidation
$ mxstop -u
  1. How many tablespaces can one cluster support?

Theoretically unlimited, but cache performance degrades with multiple tablespaces. Recommend one object storage tablespace per cluster.

  1. Is an object storage tablespace shared across databases?

Yes, it is shared.

  1. Can I enable only memory or file cache?

Yes, either can be used independently. However, for best performance, enable both.

  1. Can partition table settings be inherited by child partitions?

No. Child partitions inherit settings only upon creation or addition. Existing parent tables altered with ALTER TABLE do not propagate changes to existing children.

  1. Additional memory usage for tiered storage

"Additional memory" refers to: base memory + auto-tiering enabled + concurrent queries + both caches enabled.

Note!
This estimates a theoretical worst-case scenario.

Formula:

Additional memory = 
  Concurrent client queries × 
  Segments per node × 
  Average tables scanned per query × 
  (Parallel Workers + 1) × 
  (64MB + 64MB + 10MB) +
  Number of partitioned tables × 
  Segments per node × 
  64MB