mxbackup

This document introduces mxbackup, a cluster parallel backup tool.

1 Description

  • mxbackup is used to make parallel backups of clusters. The backup data needs to be restored using mxrestore.
  • mxbackup supports S3 object storage plug-in, which can upload data directly to S3 object storage through streaming replication without additional I/O overhead. Using S3 object storage to backup and restore data requires preparing relevant accounts and bucket permissions, and writing the YAML configuration file. Please see below for specific configuration parameters and usage methods.

2 Parameter information

2.1 mxbackup command line parameters

Parameter name Description
--backup-dir directory Absolute path to write to backup file
--compression-level level Compression level, range 1-9, default 1
--data-only Back up data only, not back up mode (Schema)
--dbname db Databases that need to be backed up
--debug Output log information at debug level
--exclude-schema schema Excluded schema during backup, this parameter can be specified by using multiple times
--exclude-schema-file file A list file containing schema that needs to be excluded
--exclude-table table Table excluded during backup, this parameter can be specified by using multiple times
--exclude-table-file file A list file containing tables to be excluded
--from-timestamp timestamp First timestamp during incremental backup, needs to be used with the --incremental parameter
--help Show help information
--history Show the historical timestamp of the current backup directory
--include-schema schema Specify the Schema that needs to be backed up. You can use multiple Schemas to specify multiple Schemas
--include-schema-file file Specify the Schema list file that needs to be backed up
--include-table table Specify the table that needs to be backed up, and multiple tables can be specified through multiple use
--include-table-file file Specify a list file containing the tables that need to be backed up
--incremental Incremental backup, suitable for AO tables. Need to be used with the --from-timestamp parameter
--jobs num The number of concurrent connections during backup, default is 1
--leaf-partition-data Create separate data files for each partition when backing up partition tables
--metadata-only Back up only metadata, not data in table
--no-compression Not compressing table data
--plugin-config file Specify the location of the plug-in configuration file
--quiet Do not display all non-alarm, non-error log information
--single-data-file Back up data to a separate file instead of a table per file
--verbose Print detailed log information
--version Print the tool version number and exit
--with-stats Backup Statistics
--without-globals Don't back up global metadata

2.2 S3 Object Storage Plug-in Configuration File Parameters

Parameter name Description Required
executablepath absolute path to s3 storage plugin Yes
region Area of ​​​​cloud platform, if endpoint is configured, the region will be ignored Yes
aws_access_key_id Cloud platform used to connect to S3 buckets S3 ID Yes
aws_secret_access_key Cloud platform S3 ID password Yes
bucket S3 bucket used to store mxbackup data files Yes
endpoint S3's endpoint interface No
encryption Whether SSL secure encryption is enabled for S3. Valid values ​​​​​are on and off, default on No
http_proxy The URL of the HTTP proxy server used to connect to S3 No
backup_max_concurrent_requests mxbackup's request concurrency number, default value is 6 No
backup_multipart_chunksize mxbackup's maximum cache/block size, default value is 500MB No
restore_max_concurrent_requests mxrestore's request concurrency number, default value is 6 No
restore_multipart_chunksize mxrestore's maximum cache/block size, default value is 500MB No

The configuration file template is as follows. Select the required parameter lines and replace the contents in "<>" or "[]" (and the "<>" and "[]" symbols) with the actual content.

$ executablepath: <absolute-path-to-gpbackup_s3_plugin>
  options: 
  region: <云平台区域>
  endpoint: <S3 终端节点>
  aws_access_key_id: <用户 ID>
  aws_secret_access_key: <用户密钥>
  bucket: <S3 存储桶>
  folder: <S3 上存储数据的文件目录>
  encryption: [on|off]
  backup_max_concurrent_requests: [int]
  backup_multipart_chunksize: [string] 
  restore_max_concurrent_requests: [int]
  restore_multipart_chunksize: [string] 
  http_proxy:
        <http://<用户名>:<安全密钥>@proxy.<域名>.com:端口号>

3 Examples

3.1 mxbackup Basic Functions

In the example, the database is named demo and the schema is named twitter.

Backup the database.

$ mxbackup --dbname demo

Backup data in the demo database except for twitter mode.

$ mxbackup --dbname demo --exclude-schema twitter

Backup only the demo database data in twitter mode.

$ mxbackup --dbname demo --include-schema twitter

Back up the demo database and store the backup data to the /home/mxadmin/backup directory.

$ mxbackup --dbname demo --backup-dir /home/mxadmin/backup

3.2 S3 object storage plug-in

3.2.1 Preparation for use

Before using S3 object storage to back up and restore your data, you need to first have the account and bucket permissions of the cloud platform. Include but not limited to the following permissions:

  • Upload and delete files to S3.
  • Can open, browser, and download files in S3.

3.2.2 Examples of usage

First, you need to prepare the configuration file s3-config-file.yaml of the S3 object storage plugin. Some common parameters are configured in this example. For more parameter descriptions, please refer to Section 2.2 above.

$ executablepath: $GPHOME/bin/mxbackup_s3_plugin # S3 存储插件的绝对路径
options: 
  region: us-west-2  # 云平台区域
  aws_access_key_id: test-s3-user # 登陆 S3 的 ID
  aws_secret_access_key: asdf1234asdf # 登陆 S3 的密钥
  bucket: matrixdb-backup # S3 存储桶
  folder: backup3 # S3 对象存储中的文件目录名

Then, use the mxbackup tool to back up the data in the demo database in parallel.

$ mxbackup --dbname demo --plugin-config /tmp/s3-config-file.yaml

After the backup data is successful, a timestamp directory will be generated in the S3 object store. After the backup is successful, you can use the mxrestore tool to restore the data on S3.

$ backup3/backups/20221208/20221208185654

Notes!
The log file directory of the mxbackup tool is <gpadmin_home>/gpAdminLogs/gpbackup_s3_plugin_timestamp.log, where the timestamp format is YYYMMDDHHMMSS.

Notes!
YMatrix For more information about backup and recovery data, please see Backup and Restore, and for information about recovery data tools, please see mxrestore.