mxbackup

This document introduces mxbackup, a cluster parallel backup tool.

1 Description

mxbackup is used to make parallel backups of clusters. The backup data needs to be restored using mxrestore.
mxbackup supports S3 object storage plug-in, which can upload data directly to S3 object storage through streaming replication without additional I/O overhead. Using S3 object storage to backup and restore data requires preparing relevant accounts and bucket permissions, and writing the YAML configuration file. Please see below for specific configuration parameters and usage methods.

2 Parameter information

2.1 mxbackup command line parameters

Parameter name	Description
--backup-dir directory	Absolute path to write to backup file
--compression-level level	Compression level, range 1-9, default 1
--data-only	Back up data only, not back up mode (Schema)
--dbname db	Databases that need to be backed up
--debug	Output log information at debug level
--exclude-schema schema	Excluded schema during backup, this parameter can be specified by using multiple times
--exclude-schema-file file	A list file containing schema that needs to be excluded
--exclude-table table	Table excluded during backup, this parameter can be specified by using multiple times
--exclude-table-file file	A list file containing tables to be excluded
--from-timestamp timestamp	First timestamp during incremental backup, needs to be used with the `--incremental` parameter
--help	Show help information
--history	Show the historical timestamp of the current backup directory
--include-schema schema	Specify the Schema that needs to be backed up. You can use multiple Schemas to specify multiple Schemas
--include-schema-file file	Specify the Schema list file that needs to be backed up
--include-table table	Specify the table that needs to be backed up, and multiple tables can be specified through multiple use
--include-table-file file	Specify a list file containing the tables that need to be backed up
--incremental	Incremental backup, suitable for AO tables. Need to be used with the `--from-timestamp` parameter
--jobs num	The number of concurrent connections during backup, default is 1
--leaf-partition-data	Create separate data files for each partition when backing up partition tables
--metadata-only	Back up only metadata, not data in table
--no-compression	Not compressing table data
--plugin-config file	Specify the location of the plug-in configuration file
--quiet	Do not display all non-alarm, non-error log information
--single-data-file	Back up data to a separate file instead of a table per file
--verbose	Print detailed log information
--version	Print the tool version number and exit
--with-stats	Backup Statistics
--without-globals	Don't back up global metadata

2.2 S3 Object Storage Plug-in Configuration File Parameters

Parameter name	Description	Required
executablepath	absolute path to s3 storage plugin	Yes
region	Area of cloud platform, if endpoint is configured, the region will be ignored	Yes
aws_access_key_id	Cloud platform used to connect to S3 buckets S3 ID	Yes
aws_secret_access_key	Cloud platform S3 ID password	Yes
bucket	S3 bucket used to store mxbackup data files	Yes
endpoint	S3's endpoint interface	No
encryption	Whether SSL secure encryption is enabled for S3. Valid values are on and off, default on	No
http_proxy	The URL of the HTTP proxy server used to connect to S3	No
backup_max_concurrent_requests	mxbackup's request concurrency number, default value is 6	No
backup_multipart_chunksize	mxbackup's maximum cache/block size, default value is 500MB	No
restore_max_concurrent_requests	mxrestore's request concurrency number, default value is 6	No
restore_multipart_chunksize	mxrestore's maximum cache/block size, default value is 500MB	No

The configuration file template is as follows. Select the required parameter lines and replace the contents in "<>" or "[]" (and the "<>" and "[]" symbols) with the actual content.

$ executablepath: <absolute-path-to-gpbackup_s3_plugin>
  options: 
  region: <Cloud platform zone>
  endpoint: <S3 end node>
  aws_access_key_id: <USER ID>
  aws_secret_access_key: <user key>
  bucket: <S3 storage buckets>
  folder: <File directory for storing data on S3>
  encryption: [on|off]
  backup_max_concurrent_requests: [int]
  backup_multipart_chunksize: [string] 
  restore_max_concurrent_requests: [int]
  restore_multipart_chunksize: [string] 
  http_proxy:
        <http://<username>:<security key>@proxy.<domain name>.com:port number>

3 Examples

3.1 mxbackup Basic Functions

In the example, the database is named demo and the schema is named twitter.

Backup the database.

$ mxbackup --dbname demo

Backup data in the demo database except for twitter mode.

$ mxbackup --dbname demo --exclude-schema twitter

Backup only the demo database data in twitter mode.

$ mxbackup --dbname demo --include-schema twitter

Back up the demo database and store the backup data to the /home/mxadmin/backup directory.

$ mxbackup --dbname demo --backup-dir /home/mxadmin/backup

3.2 S3 object storage plug-in

3.2.1 Preparation for use

Before using S3 object storage to back up and restore your data, you need to first have the account and bucket permissions of the cloud platform. Include but not limited to the following permissions:

Upload and delete files to S3.
Can open, browser, and download files in S3.

3.2.2 Examples of usage

First, you need to prepare the configuration file s3-config-file.yaml of the S3 object storage plugin. Some common parameters are configured in this example. For more parameter descriptions, please refer to Section 2.2 above.

$ executablepath: $GPHOME/bin/mxbackup_s3_plugin # Absolute path of S3 storage plugin
options: 
  region: us-west-2  # Cloud Platform Region
  aws_access_key_id: test-s3-user # ID for logging into S3
  aws_secret_access_key: asdf1234asdf # S3 login key
  bucket: matrixdb-backup # S3 storage buckets
  folder: backup3 # File directory names in S3 object storage

Then, use the mxbackup tool to back up the data in the demo database in parallel.

$ mxbackup --dbname demo --plugin-config /tmp/s3-config-file.yaml

After the backup data is successful, a timestamp directory will be generated in the S3 object store. After the backup is successful, you can use the mxrestore tool to restore the data on S3.

$ backup3/backups/20221208/20221208185654

Notes!
The log file directory of the mxbackup tool is <gpadmin_home>/gpAdminLogs/gpbackup_s3_plugin_timestamp.log, where the timestamp format is YYYMMDDHHMMSS.

Notes!
YMatrix For more information about backup and recovery data, please see Backup and Restore, and for information about recovery data tools, please see mxrestore.

简体中文