Glossary

This document provides official explanations for unique or complex terms used in YMatrix for user reference.

A

ACID

ACID refers to the four essential properties that a Database Management System (DBMS) must ensure during data writing or updating to guarantee reliable transactions: Atomicity, Consistency, Isolation, and Durability.

  • Atomicity: All operations within a transaction are completed successfully, or none are executed. If an error occurs during execution, the transaction is rolled back to its initial state, as if it never happened.
  • Consistency: The database remains in a consistent state before and after the transaction. Written data must comply with all predefined rules, including accuracy, integrity, and the ability of the database to perform intended operations automatically.
  • Isolation: The database allows multiple concurrent transactions to read and modify data simultaneously. Isolation prevents data inconsistency due to concurrent execution. Transaction isolation levels include Read Uncommitted, Read Committed, Repeatable Read, and Serializable.
  • Durability: Once a transaction is completed, the data modifications are permanent and will not be lost even in the event of a system failure.

AO

AO stands for Append-Optimized, a storage format optimized for appending data. Tables using this format are called AO tables.

AO tables support bulk data loading and reading, offering performance advantages over HEAP tables. Both row-based and column-based AO tables can be compressed. Row-based AO tables are abbreviated as AORO/AORS, while column-based ones are AOCO/AOCS.

B

Encoding Chain Compression Algorithm (Mxcustom)

The full English name is MatrixCustom, a customized compression algorithm developed by YMatrix.

The encoding chain leverages the characteristics of time-series data to achieve deep compression of table data. It can be used for column-level compression, table-level compression, and adaptive encoding (Auto encoding) in MARS2/3 tables.

C

Hyperconvergence

Hyperconvergence is an emerging database concept proposed by YMatrix. Compared to other database architectures, hyperconvergence integrates multiple data types and operations, enabling high-performance support for various data types and scenarios within a single database, addressing information silos.

Internally, YMatrix features a microkernel architecture. On top of common foundational components, different storage and execution engine combinations are provided for various business scenarios, achieving different microkernels to enhance targeted write, storage, and query performance.

Continuous View (CV)

The full English name is Continuous View, abbreviated as CV, a mechanism for quickly responding to aggregation queries.

After creating a continuous view, the system automatically performs real-time aggregation calculations on the original data during data writing, synchronizing with the original table data at the transaction level.

D

Heap (HEAP)

HEAP is a storage engine provided by PostgreSQL, using row-based storage. This type of table supports high concurrency read/write operations, transactions, indexing, etc. In heap tables, data is stored without any specific order, requiring index pages to enhance query performance. Data is initially stored in the order of insertion, but the database engine may move data within the heap for efficient storage, making data order unpredictable. To ensure ordered data retrieval, users should use the ORDER BY clause.

To specify a permanent logical order for stored rows, users should create a clustered index on the table instead of using a heap.

E

ETCD

ETCD is a distributed key-value store used in distributed systems for storing and retrieving data. It uses the Raft consensus algorithm to ensure data consistency and reliability. Designed for high availability, ETCD has strong fault recovery capabilities. It provides a simple RESTful API interface, allowing applications to easily access and manipulate key-value pairs stored within it.

Raft is an algorithm for solving consensus problems in distributed systems. It allows a group of machines to work as a cohesive unit, continuing to operate even if some machines fail. Thus, consensus algorithms play a crucial role in building reliable large-scale software systems.

Related concepts:

  • Leader: The manager of the ETCD cluster, elected and unique.
  • Follower: Followers synchronize logs received from the leader. ETCD defaults to this state upon startup.
  • Candidate: Candidates can initiate leader elections.

F

FTP Server

FTP (File Transfer Protocol) server is a server software used for file transfer over computer networks. It provides a standard method for uploading and downloading files and supports file and directory management operations.

G

Failover

Refers to the mechanism in the automated operations system that switches primary and standby nodes to transfer faults by retrieving node status diagnostic information from the ETCD cluster.

The ETCD cluster is a core component of the YMatrix cluster status service, responsible for managing the status information of all nodes. When any node in the cluster fails, the database system automatically performs node failover without manual intervention.

Failback

After failover is completed, if the corresponding node only has a Primary/Master without a healthy standby node, it cannot recover if another failure occurs. Therefore, the mxrecover tool is needed to generate healthy Mirror/Standby nodes for the new Primary/Master.

H

Sliding Window

A common feature in stream computing scenarios. In stream computing data scenarios, aggregation operations can be continuously performed on data within a recent time frame.

Sliding windows are often used with monitoring and alerting. When data within a recent time frame meets preset conditions, the data server sends alert messages to clients. For example, calculating the average temperature per minute for each device and alerting if it exceeds 90 degrees.

Ring Mirroring

Ring mirroring is the default mirroring distribution strategy in YMatrix: with at least two hosts, the Primary sequence of one host is viewed as a one-dimensional strip, folded in the middle, and then Mirrors are evenly arranged clockwise on other hosts starting from the middle number of the fold.

Adding new Primaries during scaling does not affect previously completed mirror loops but starts a new mirror ring.

In addition to ring mirroring, YMatrix supports the following two mirroring distribution strategies:

  • Spread Mirroring: Mirrors of each host are scattered across the remaining hosts in the cluster, requiring the number of hosts in the cluster to be greater than the number of Primaries on each host. To ensure even distribution, the number of independent hosts in the array must be greater than the number of node instances on each host.
  • Group Mirroring: All Mirrors of the Primaries on the same host are grouped and placed together on another host in the cluster.

Volcano Model

Also known as the Pull-Based execution model. The Volcano execution engine supported by YMatrix uses this model for iterative computation.

SQL queries are parsed in the database to generate a query tree, with each execution node of the tree being an algebraic operator (Operator).

The Volcano model treats Operators as iterators, each providing a next() function as an interface, implemented in three steps:

  1. Call the next() interface of the child node Operator to get a tuple of data.
  2. Perform Operator-specific processing on the Tuple.
  3. Return the processed Tuple or NULL.

Thus, query execution involves calling the next() function from top to bottom of the query tree, with data being pulled and processed from bottom to top. The advantage is clear processing logic, with each Operator only needing to focus on its own logic, resulting in low coupling. The disadvantage is that data is processed row by row, which is not conducive to CPU cache utilization. Additionally, processing each row requires multiple next() function calls, incurring high overhead.

The vectorized model is similar to the Volcano model, both generating query trees and pulling for execution. The difference is that each iteration of the vectorized model returns a set of Tuples instead of one. Its advantages include reduced iteration counts and the ability to leverage new hardware features like SIMD for performance improvements, especially friendly to columnar storage.

I

Interconnect

Refers to the network layer in the database architecture, the inter-process communication between Segments and the network infrastructure this communication relies on, using a standard Ethernet switching network.

J

Cost-Based Optimizer (CBO)

Also known as Cost-Based Optimization, abbreviated as CBO, YMatrix uses this optimizer by default to generate query plans.

Optimizers generally optimize in two stages: Rule-Based Optimization (RBO) and Cost-Based Optimization (CBO). Compared to RBO, CBO relies on the accuracy and timeliness of statistical information, adjusting execution plans promptly according to data changes.

Downgrade Storage

Downgrade storage is a method used by YMatrix to achieve fully automated hot and cold data tiering, integrating with the MARS3 storage engine: cold data is automatically downgraded and transferred to object storage, seamlessly integrating with the MPP foundation for efficient analysis of cold data. The data downgrade storage process requires no manual intervention, minimizing management and usage costs.

Object Storage is a computer data storage architecture that manages data as objects, unlike other storage architectures (such as file systems that manage data as file hierarchies, or block storage that manages data as blocks within sectors and tracks). Object storage is a cost-effective, low-cost storage method.


Job (MatrixGate job, Job)

In MatrixGate (mxgate), a Job is responsible for writing data to the corresponding table. Each Job has a slot scheduler, ensuring that only one slot's write transaction is executed for each Job at any given time. Whether using a configuration file or command line to start mxgate, the target table name for data writing needs to be specified (via the --target parameter). After mxgate verifies the existence of the table, it triggers the creation of a Job exclusive to that table. The Job activates slots and connects them to various data node instances (Segments) for parallel data writing. A table can only trigger one Job, but a Job can activate multiple slots.

Decoupling

Decoupling refers to reducing the coupling between different parts or modules of a system, allowing them to be developed, tested, and maintained more independently. This technical term is commonly used in software development, system architecture design, and network communication. By reducing dependencies between components, system reliability, scalability, and maintainability can be improved.

M

MARS

YMatrix's self-developed series of storage engines, full name Matrix Append-optimized Resilient Storage, i.e., append-optimized resilient storage.

Includes MARS2 and MARS3:

  • MARS2, with its physically ordered merge method, reduces I/O seek times, thereby improving query performance on table data. It supports encoding chain compression, columnar storage, and other features.
  • MARS3 is a storage engine developed based on MARS2, adding support for AP and TP scenarios, further enhancing write performance.

Master

Generally refers to the primary node instance of the cluster, mainly serving the following purposes:

  • Responsible for establishing and managing session connections with clients.
  • Responsible for parsing SQL statements and forming query plans.
  • Distributes query plans to Segments, monitors query execution, and collects feedback results to return to clients.
  • The Master does not store business data, only the data dictionary, i.e., the definitions and attributes of all data elements used in the system.
  • In a cluster, there can only be one Master, which can be configured as one primary and one standby, with the standby node called Standby.

MPP

MPP refers to a database architecture where each node (Node) in a non-shared (Shared Nothing) cluster has independent disk storage and memory systems. Business data is distributed across nodes based on the database model and application characteristics. Each data node (Segment Node) is interconnected via proprietary or commercial general networks, collaborating to provide database services. Non-shared database clusters offer advantages in scalability, high availability, high performance, and cost-effectiveness.

In simple terms, the MPP architecture distributes tasks in parallel across multiple servers and nodes. After computation on each node is completed, the results are aggregated to obtain the final result.

MVCC

Multi-Version Concurrency Control (MVCC) uses a multi-version model to maintain data consistency. MVCC implements transaction isolation for each database session, with each query transaction seeing a snapshot of the data. This ensures that the transaction sees consistent data unaffected by other concurrent transactions.

O

ORCA

The default optimizer for Greenplum databases, extending the planning and optimization capabilities of traditional optimizers, achieving better performance in multi-core environments. In YMatrix, it needs to be enabled by setting the optimizer parameter. In version 5.2.0 and above, it supports integration with the vectorized execution engine.

GPORCA enhances query performance tuning in the following areas:

  • Queries on partitioned tables
  • Queries containing Common Table Expressions (CTE)
  • Queries containing subqueries

R

Runtime Filter

Runtime Filter is one of the key technologies to enhance the performance of YMatrix's execution engine. It refers to dynamically constructed filters during execution engine valuation after the optimizer generates the physical execution plan, differing from filters pre-planned by the optimizer.

RPO

Recovery Point Objective (RPO) is a metric used to measure the amount of data loss after a system failure, i.e., the maximum tolerable data loss for the system. The smaller the value, the lower the risk of data loss.

RTO

Recovery Time Objective (RTO) refers to the maximum time required to restore the system to a state supporting normal operations after a crash. It reflects the recovery capability of the business system after a disaster.

S

Transaction

A series of operations executed as a single logical unit of work. Multiple operations are submitted to the database system as an indivisible whole, either all executed or none executed, and must fully possess ACID properties.

In relational databases, a transaction can be a single SQL statement, a set of SQL statements, or an entire program. It forms the basic unit of database recovery and concurrency control, typically starting with the BEGIN/START TRANSACTION clause and ending with the END/COMMIT clause.

Databases allow multiple concurrent transactions to read and modify data simultaneously. Isolation prevents data inconsistency due to concurrent execution. The SQL standard defines four transaction isolation levels, from highest to lowest strictness:

  • Serializable
  • Repeatable Read
  • Read Committed
  • Read Uncommitted

Segment

Mostly refers to Segment Instance, a data node instance, essentially an independent PostgreSQL database, each storing a portion of the data and performing the main part of query processing. In rare cases, it refers to Segment Node, the physical node storing data, with one or more data node instances on each node.

Data node instances typically include Primary Instance and Mirror Instance (if mirroring is enabled). Among them:

  • Primary Instance: The primary instance.
  • Mirror Instance: The mirror instance.

SIMD

SIMD stands for Single Instruction Multiple Data, a computer instruction set architecture. This architecture allows multiple processing units to execute the same instruction simultaneously, but each processing unit operates on different data.

SIMD is widely used in image processing, audio processing, scientific computing, and other fields, significantly improving computational efficiency.


Slot

In MatrixGate (mxgate), a slot is a process used to write data to data node instances (Segments). After mxgate receives a data writing task (Job), it generates the corresponding number of slots based on the configured --stream-prepared parameter. Each slot writes external data into YMatrix's Segments by executing INSERT INTO dest SELECT * FROM external. The duration of each data send by a slot is configured via the --interval parameter. If --interval is set to 100, a slot sends data for 100 ms each time.

Standby

Refers to the standby node (instance) of the primary Master node.

Users can optionally deploy a backup or mirror of the Master instance on a different host from the Master. When the Master host becomes unavailable, the backup Master host serves as a warm standby.

Standby maintains synchronization with the Master using streaming replication. The replication process runs on the Standby, responsible for synchronizing data between the Master and Standby hosts.

T

Graphical User Interface (MatrixUI)

YMatrix's graphical interactive interface. Mainly used for graphical installation, operations, and monitoring, with rich features and simple operations.

Currently includes the following functions/interfaces:

  • Multi-platform installation and deployment
  • Simulating time-series scenarios
  • Scaling
  • Kafka data stream writing
  • Query monitoring
  • Cluster management
  • Health monitoring
  • Self-service inspection
  • Load analysis
  • SQL editor

U

UDF

That is, User-Defined Function. User-defined functions are functions that the database allows users to create. Created using the CREATE FUNCTION statement, modified using the ALTER FUNCTION statement, and deleted using the DROP FUNCTION statement. Each fully qualified user-defined function name: database_name.owner_name.function_name must be unique.

W

WAL

WAL (Write-Ahead Logging), or write-ahead log system, is an efficient logging algorithm in databases. For non-memory databases, when performing operations such as inserting, updating, or deleting data, these operations must be recorded in a specific log file before actual execution. If an error occurs during execution, the database can be restored to a known state based on the records in the write-ahead log file to ensure data consistency and integrity.

X

Vectorized Model

The vectorized execution engine supported by YMatrix uses this model for iterative computation.

Vectorized models are similar to volcano models, both generating query trees and pulling for execution. The difference is that each iteration of the vectorized model returns a set of Tuples instead of one. Its advantages include reduced iteration counts and the ability to leverage new hardware features such as SIMD for performance improvements, especially friendly to columnar storage.

Vectorized Execution Engine, also known as MatrixVector, is a high-performance execution engine specifically designed for columnar storage engines (such as MARS3, MARS2, AOCO). For common queries, it offers one to two orders of magnitude performance improvement compared to traditional row-oriented execution engines.

Compared to scalar execution engines (such as volcano execution engines), the performance benefits of vectorized executors come from the following aspects:

  1. Batch processing reduces execution overhead.
  2. Small batch processing improves data locality, enhancing data access performance by keeping data in the CPU cache.
  3. Selecting the best processing path based on data characteristics.
  4. Reducing function call overhead and further improving efficiency by utilizing CPU SIMD instructions.
  5. Column-by-column loading to avoid loading unnecessary data.

Sparse Index

Sparse Index is an index structure used to optimize data storage and retrieval. In applications such as databases and search engines, indexes are used to quickly locate and access data stored in data structures. Unlike Dense Index, sparse indexes only contain index entries for part of the data items (those physically close when stored in the table). Sparse indexes are often used in cases with large datasets and highly duplicate values, where low-frequency values are not indexed to save index storage space.

Z

State Data Management Service (Cluster Service)

Refers to ensuring high availability of databases by collecting and managing node status information.

YMatrix uses the ETCD cluster to implement this service: When a database node goes down, ETCD uses its stored node status data to confirm that the current healthy node becomes the new master node, thereby ensuring the entire cluster's availability by promoting this node.

For example, if the Master goes down, the Standby is promoted to Master; if the Standby goes down, it has no impact on the overall cluster. Similarly, if a Primary goes down, its Mirror is promoted to Primary; a Mirror's downtime has no impact on the overall cluster.

This service mainly includes two functions: Failover and Failback supported by the mxrecover tool. Using these two functions, a complete recovery process for node failures can be achieved.

Automated Partition Management (APM)

Full English name Auto Partition Management. Refers to the UDF provided by YMatrix to manage partitions, allowing database administrators to manually maintain partitions according to their needs, reducing partition maintenance costs. Includes:

  • auto_partitioning
  • auto_splitting
  • auto_partitioning_ex

Main functions include:

  • Automatic partition creation and deletion
  • Default partitioning automatically
  • Bulk partition creation
  • Forced retention of specific historical partitions
  • Customizing automatic partition operation periods