YMatrix and Greenplum TPC-H benchmark comparison test report

Test Overview


This performance report compares the performance performance of YMatrix and Greenplum (GPDB for short) in TPC-H analytical query scenarios. Test results show that YMatrix performed far better than GPDB in two data scales: 100 and 1000, with an increase of 13 times and 12 times respectively.

TPC-H is a decision support benchmark that contains a set of business-oriented ad hoc queries and concurrent data modifications. The data in the selected queries and databases have broad industry applicability. This benchmark demonstrates the ability of a decision support system to check large amounts of data, execute highly complex queries, and answer key business questions, while reflecting the multifaceted ability of the database system to process queries.

Test environment


Hardware environment

Data Scale Machine vCPU RAM Bandwidth EBS
100 AWS EC2, m5.8xlarge 32 128GB 10Gbps gp3, iops = 3000, thoughtput = 125MB/s
1000 AWS EC2, m5.8xlarge 32 128GB 10Gbps io1, iops = 64000

Software Environment

YMatrix's TPC-H benchmark test adopts stand-alone deployment and adopts the default deployment method that is automatically selected by YMatrix products. In the hardware environment introduced above, there are 6 Segment nodes.

  • Operating system kernel: 3.10.0-1160.66.1.el7.x86_64

  • Operating system version: CentOS Linux release 7.9.2009

  • YMatrix: Enterprise version of matrixdb5-5.0.0+enterprise_5.0.0, where YMatrix cluster deployment is available in YMatrix official website document and YMatrix's TPC-H Benchmark Tool.

  • Greenplum: open-source-greenplum-db-6.23.3-rhel7-x86_64

Test plan


YMatrix and GPDB were tested at 100 times and 1000 times data scales respectively. YMatrix and GPDB were both configured as 6 segments, and all test results were measured by YMatrix engineers using the same machine configuration.

There are differences between 100 times and 1000 times data scale tests as follows:

  • Because 100 times the data can be cached in a configuration of 128GB of memory, the 100 times test uses a gp3 type disk, and YMatrix tests both lz4 and zstd compression formats, zstd's compresslevel=1; the open source version of GPDB does not support quicklz compression format, so it only uses zstd compression, compresslevel=1.
  • At 1000 times the data scale, better performance io1 type disks are used, and both YMatrix and GPDB use zstd compression, compresslevel=1.
  • In this test, statement_mem=1GB is under 100 times the data scale, and statement_mem=2GB is under 1000 times the data scale.

Test data


TPC-H 100 times the size data:

table Records YMatrix(lz4) YMatrix(zstd) GPDB(zstd)
nation 25 5 MB 5 MB 1 MB
region 5 4 MB 4 MB 1 MB
part 20,000,000 1 GB 690 MB 592 MB
partsupp 80,000,000 4 GB 3 GB 3 GB
supplier 1,000,000 97 MB 70 MB 55 MB
customer 15,000,000 1 GB 969 MB 861 MB
orders 150,000,000 7 GB 5 GB 4 GB
lineitem 600,037,902 34 GB 19 GB 18 GB

TPC-H 1000 times the size data:

| table | Records | YMatrix(zstd) | GPDB(zstd) | | --- | --- | --- | | nation | 25 | 5 MB | 1 MB | | region | 5 | 4 MB | 1 MB | | part | 200,000,000 | 5 GB | 5 GB | | partsupp | 800,000,000 | 29 GB | 31 GB | | supplier | 10,000,000 | 616 MB | 538 MB | | customer | 150,000,000 | 8 GB | 8 GB | | orders | 1,500,000,000 | 46 GB | 46 GB | | lineitem | 5,999,989,709 | 185 GB | 184 GB |

Test steps


  1. Prepare for the test environment

    • Initialize the virtual machine environment on AWS as in the "Hardware Environment" section above.
    • Install YMatrix Test Cluster according to the YMatrix official website document.
    • Download the GPDB 6.23.3, install and Initialize the GPDB cluster.
    • It is recommended to configure the gp_vmem_protect_limit and statement_mem of the database reasonably according to the server hardware environment.
  2. Download the TPC-H benchmark tool

    git clone https://github.com/ymatrix-data/TPC-H.git

    Notes!
    YMatrix's TPC-H tool is open source and easy to use, and you are welcome to try it out.

  3. Execute the tpch.sh script

  • Configure database environment variables, specify database PORT, and specify DATABASE.
     export PGPORT=5432
     export PGDATABASE=tpch_s100
  • Execute the tpch.sh script to generate the tpch_variable.sh configuration file. The -d parameter can select the database type, such as matrixdb, greenplum, postgresql; the -s parameter can specify the data size.
     ./tpch.sh -d matrixdb -s 100
  • After completing the configuration file modification, execute tpch.sh with one click. The script will automatically generate data, create tables, load data, and execute all TPC-H queries, and generate the query execution time.
     ./tpch.sh

    Notes!
    When the tpch.sh script executes data loading, YMatrix uses the loading method based on the MatrixGate tool, and the loading method based on the gpfdist tool for GPDB.

  1. Important parameters of TPC-H benchmarking tool

You can implement your specific needs by customizing the tpch_variable.sh configuration file:

  • Configure RUN_GEN_DATA="true" to indicate the data generated.
  • Configure RUN_DDL="true", which means to create tables and indexes.
  • Configure RUN_LOAD="true" to indicate loading data.
  • Configure RUN_SQL="true" to execute all TPC-H queries.
  • Configure PREHEATING_DATA="true", indicating that the data file needs to be cached in the warm-up cycle.
  • Configure SINGLE_USER_ITERATIONS="2". If the above configuration item PREHEATING_DATA="true", it means that the TPC-H query will be executed 3 times, and there is a preheating result once, excluding the preheating result, and the minimum value of the last two results will be taken.

Example:

If you repeat the TPC-H query, you only need to modify the following configuration items of the tpch_variable.sh configuration file:

`RUN_COMPILE_TPCH="false"`
`RUN_GEN_DATA="false"`
`RUN_INIT="false"`
`RUN_LOAD="false"`
`RUN_SQL="true"`
`RUN_SINGLE_USER_REPORT="true"`

Then execute tpch.sh again.

Test Conclusion


YMatrix and GPDB were tested in multiple groups respectively, and the performance comparison was performed using their best results.

The performance data of YMatrix's TPC-H 100 adopts lz4 compression format, and the performance data of GPDB's TPC-H 100 adopts zstd compression, compresslevel=1. Because open source GPDB does not support quicklz compression format, zstd compression can only be used for comparison at present; The TPC-H 1000 performance data of YMatrix and GPDB are both zstd compression, compresslevel=1.

In terms of performance comparison, YMatrix's performance is far better than GPDB, and both TPC-H 100 and 1000 are more than 10 times faster than GPDB.

| Data Scale | GPDB (ms) | YMatrix (ms) | Increase ratio | | --- | --- | --- | | 100 | 930,071 | 70,044 | 1300% | | 1000 | 15,273,254 | 1,265,478 | 1200% |

Below are the detailed test results. YMatrix TPC-H 100 was tested in two compression formats: lz4 and zstd, respectively. Among them, lz4 has better performance because the server memory used in this round of tests is large enough to cache all data files, so the compression rate is lower, but the lz4 compression format with better decompression performance can achieve better performance.

GPDB was tested in the scenarios where orca is turned on and off, and found that the performance of orca will be worse in the scenarios where orca will be TPC-H 1000, and the performance of orca will be better in the scenarios where orca will be better. In addition, YMatrix's tests are all conducted in the scenario of orca being turned off.

| (ms) | | YMatrix TPC-H 100 planner lz4 | YMatrix TPC-H 100 planner zstd | GPDB TPC-H 100 orca zstd | GPDB TPC-H 1000 planner zstd | | YMatrix TPC-H 1000 planner zstd | GPDB TPC-H 1000 orca zstd | GPDB TPC-H 1000 planner zstd | |---|--|--|--|---|---|---|---|---|---|---|---|---| | Q01 | | 4,200 | 4,846 | 94,271 | 90,473 | | 53,291 | 929,166 | 907,474 | | Q02 | | 1,228 | 1,417 | 13,134 | 11,005 | | 12,960 | 163,967 | 132,898 | | Q03 | | 4,409 | 4,860 | 31,654 | 32,057 | | 50,194 | 406,933 | 456,339 | | Q04 | | 4,965 | 4,947 | 40,743 | 30,522 | | 103,699 | 492,440 | 429,417 | | Q05 | | 4,405 | 5,226 | 43,100 | 40,094 | | 88,930 | 787,161 | 569,668 | | Q06 | | 183 | 254 | 4,066 | 3,995 | | 2,852 | 40,141 | 38,985 | | Q07 | | 1,865 | 2,219 | 29,294 | 28,879 | | 29,921 | 340,402 | 402,481 | | Q08 | | 2,239 | 3,123 | 51,852 | 49,998 | | 41,305 | 610,720 | 650,542 | | Q09 | | 7,012 | 8,229 | 84,506 | 91,597 | | 248,033 | 1,072,529 | 1,719,890 | | Q10 | | 3,861 | 4,469 | 61,953 | 28,238 | | 64,568 | 810,094 | 395,927 | | Q11 | | 470 | 569 | 5,937 | 10,010 | | 6,475 | 54,006 | 97,012 | | Q12 | | 2,319 | 2,486 | 27,271 | 30,032 | | 26,964 | 326,579 | 335,811 | | Q13 | | 4,610 | 4,458 | 34,345 | 26,018 | | 72,861 | 631,285 | 651,340 | | Q14 | | 588 | 696 | 5,591 | 3,318 | | 7,277 | 48,476 | 47,320 | | Q15 | | 1,310 | 1,249 | 9,579 | 12,001 | | 31,236 | 93,387 | 172,448 | | Q16 | | 1,471 | 1,584 | 8,493 | ​​22,038 | | 25,295 | 141,958 | 492,614 | | Q17 | | 1,613 | 1,960 | 154,488 | 143,057 | | 28,158 | 3,299,179 | 3,272,970 | | Q18 | | 7,225 | 6,950 | 78,451 | 89,587 | | 93,391 | 1,064,011 | 1,276,977 | | Q19 | | 3,225 | 4,173 | 22,224 | 21,027 | | 40,080 | 217,796 | 208,500 | | Q20 | | 850 | 1,004 | 24,920 | 24,818 | | 9,596 | 293,892 | 421,818 | | Q21 | | 10,219 | 10,529 | 149,483 | 128,112 | | 205,788 | 2,427,732 | 2,420,413 | | Q22 | | 1,777 | 1,858 | 19,866 | 13,196 | | 22,603 ​​| 226,963 | 172,399 | | SUM | | 70,044 | 77,107 | 995,221 | 930,071 | | 1,265,478 | 14,478,829 | 15,273,254 |