Glossary

This document provides official explanations of proprietary and complex terms used in YMatrix.

A

ACID

ACID refers to the four essential properties that a database management system (DBMS) must guarantee to ensure transaction reliability during data writes or updates: Atomicity, Consistency, Isolation, and Durability.

Atomicity: All operations within a transaction either complete fully or not at all. If an error occurs during execution, the transaction is rolled back to its initial state, as if it never occurred.
Consistency: The database remains in a consistent state before and after a transaction. Data written must comply with all predefined rules, including accuracy, referential integrity, and automatic fulfillment of subsequent tasks.
Isolation: Ensures that concurrent transactions do not interfere with each other, preventing data inconsistency due to interleaved execution. Isolation levels include Read Uncommitted, Read Committed, Repeatable Read, and Serializable.
Durability: Once a transaction completes, its changes are permanently stored, even in the event of system failure.

AO

AO stands for Append-Optimized, a storage format optimized for append operations. Tables using this format are called AO tables.

AO tables support bulk data loading and reading, offering performance advantages over HEAP tables. Both row-oriented and column-oriented AO tables can be compressed. Row-oriented AO tables are abbreviated as AORO/AORS, and column-oriented ones as AOCO/AOCS.

B

MatrixCustom Compression Algorithm (Mxcustom)

Short for MatrixCustom, this is a proprietary compression algorithm developed by YMatrix.

The encoding chain leverages characteristics of time-series data to achieve deep compression. It supports column-level and table-level compression for MARS2/3 tables, as well as auto encoding.

C

Hyper-Convergence

Hyper-convergence is an emerging database concept introduced by YMatrix. Compared to other architectures, hyper-convergence integrates multiple data types and operations within a single database, enabling high-performance support across diverse data types and scenarios, thereby eliminating data silos.

Internally, YMatrix features a micro-kernel architecture. On top of common infrastructure components, it provides tailored combinations of storage and execution engines for different business scenarios, forming specialized micro-kernels to enhance write, storage, and query performance.

Continuous View (CV)

Short for Continuous View, CV is a mechanism that enables fast responses to aggregation queries.

After a CV is created, the system automatically performs real-time aggregation during data ingestion, synchronized with the base table at the transaction level.

D

Heap (HEAP)

HEAP is the storage engine provided by PostgreSQL, using row-oriented storage. HEAP tables support high-concurrency read/write operations, transactions, and indexing.

In heap tables, data has no inherent order and relies on index pages to improve query performance. Data is initially stored in insertion order, but the database engine may relocate rows within the heap for efficient storage. Therefore, row order cannot be predicted. To ensure ordered output, users should use the ORDER BY clause.

To define a permanent logical order for stored rows, users should create a clustered index instead of using a heap table.

E

ETCD

ETCD is a distributed key-value store used for storing and retrieving data in distributed systems. It uses the Raft consensus algorithm to ensure data consistency and reliability. Designed for high availability, ETCD offers robust fault recovery capabilities. It provides a simple RESTful API, allowing applications to easily access and manipulate key-value pairs.

Raft is an algorithm that solves consensus problems in distributed systems. It enables a group of machines to function as a single unit, continuing operation even if some nodes fail. Thus, consensus algorithms play a critical role in building reliable large-scale software systems.

Related concepts:

Leader: The manager of an ETCD cluster, elected through consensus, and uniquely exists at any time.
Follower: Replicates logs received from the leader. This is the default state when ETCD starts.
Candidate: A node that can initiate a leader election.

F

FTP Server

An FTP (File Transfer Protocol) server is software used for transferring files over computer networks. It provides a standard method for uploading and downloading files and supports file and directory management operations.

G

Failover

A mechanism in automated operations that detects node status via the ETCD cluster and automatically switches primary and standby nodes to handle failures.

ETCD is a core component of YMatrix's cluster state service, managing the status of all nodes. When any node fails, the database system automatically performs failover without manual intervention.

Failback

After failover completes, the affected node may only have a Primary/Master instance without a healthy Mirror/Standby. If another failure occurs, recovery is impossible. Therefore, the mxrecover tool must be used to create a healthy Mirror/Standby for the new Primary/Master.

H

Sliding Window

A common feature in streaming computation scenarios. It allows continuous aggregation over data within a recent time window.

Sliding windows are often used with monitoring and alerting. When data within the window meets predefined conditions, the server sends alerts to clients. For example, compute the average temperature per device every minute and trigger an alert if it exceeds 90 degrees.

Ring Mirroring

The default mirroring distribution strategy in YMatrix. With at least two hosts, the Primary segment sequence on one host is treated as a linear tape, folded in half. Mirrors are then placed clockwise on other hosts starting from the midpoint.

Adding new Primaries during expansion does not disrupt existing mirror rings but creates a new one.

Besides ring mirroring, YMatrix supports:

Spread Mirroring: Distributes Mirrors of each host across remaining hosts. Requires more hosts than Primary instances per host. For even distribution, the number of independent hosts must exceed the number of segment instances per host.
Group Mirroring: Places all Mirrors of a host’s Primaries together on another host as a group.

Volcano Model

Also known as Pull-Based Execution. One of the execution engines supported by YMatrix, the Volcano Execution Engine implements iterative computation using this model.

After SQL parsing, a query tree is generated, where each node represents an algebraic operator.

In the Volcano model, operators act as iterators, each providing a next() interface. The implementation involves three steps:

Call the child operator’s next() to fetch one tuple.
Process the tuple according to the operator’s logic.
Return the processed tuple or NULL.

During execution, next() is called top-down from the root of the query tree, while data flows bottom-up. Advantages include clear logic and low coupling. However, processing one row at a time hinders CPU cache utilization, and frequent next() calls incur high overhead.

The vectorized model is similar to the Volcano model, generating a query tree and pull-based execution. The key difference is that each iteration returns a batch of tuples instead of one. This reduces iteration overhead and enables use of hardware features like SIMD, especially beneficial for columnar storage.

I

Interconnect

Refers to the network layer in the database architecture, enabling inter-process communication between Segments and relying on standard Ethernet switching infrastructure.

J

Cost-Based Optimizer (CBO)

Also known as Cost-Based Optimization, CBO is the default optimizer in YMatrix for generating query plans.

Optimizers typically perform two stages: Rule-Based Optimization (RBO) and Cost-Based Optimization (CBO). Unlike RBO, CBO relies on accurate and up-to-date statistics, allowing execution plans to adapt dynamically to data changes.

Downgrade Storage

A method used by YMatrix to automate cold-hot data tiering, integrated with the MARS3 storage engine. Cold data is automatically downgraded and stored in object storage, seamlessly accessible via the MPP layer for efficient cold data analysis. The downgrade process requires no manual intervention, minimizing management and operational costs.

Object Storage is a data storage architecture that manages data as objects, differing from file systems (which use hierarchical files) and block storage (which uses sectors and blocks). It offers high cost-effectiveness and low storage cost.

Job (MatrixGate job, Job)

In MatrixGate (mxgate), a Job is responsible for writing data into a target table. Each Job has a slot scheduler, ensuring only one slot’s write transaction executes at a time. Whether started via configuration or command line, mxgate requires specifying the target table name (via --target). Once mxgate verifies the table exists, it triggers a table-specific Job. The Job activates slots and connects them to Segment instances for parallel data writes. One table triggers one Job, and one Job can activate multiple slots.

Decoupling

Decoupling refers to reducing dependencies between system components or modules, allowing them to be developed, tested, and maintained independently. This technique is widely used in software development, system architecture, and network communication. By minimizing inter-component dependencies, decoupling improves system reliability, scalability, and maintainability.

M

MARS

A series of storage engines developed by YMatrix, standing for Matrix Append-Optimized Resilient Storage.

Includes MARS2 and MARS3:

MARS2: Uses physically ordered merging to reduce I/O seek operations, improving query performance. Supports encoding chain compression and columnar storage.
MARS3: Built on MARS2, adds support for both AP and TP workloads, further enhancing write performance.

Master

Refers to the primary node instance in a cluster, with the following responsibilities:

Manages client session connections.
Parses SQL statements and generates query plans.
Distributes query plans to Segments, monitors execution, and returns results to clients.
Does not store business data; stores only the system catalog (metadata of all data elements).
Only one Master is allowed per cluster, optionally with a Standby for redundancy.

MPP

Stands for Massively Parallel Processing. In a Shared-Nothing database cluster, each node has independent disk and memory systems. Business data is partitioned across nodes, which are interconnected via dedicated or commercial networks to collaboratively perform computations and provide database services.

In short, MPP architecture distributes tasks across multiple servers and nodes, computes in parallel, and aggregates results for the final output.

MVCC

Multi-Version Concurrency Control (MVCC) maintains data consistency using a multi-version model. MVCC provides transaction isolation for each session, ensuring each query sees a consistent snapshot of data unaffected by concurrent transactions.

O

ORCA

The default optimizer in Greenplum Database, extending traditional query planning and optimization capabilities for better performance in multi-core environments. In YMatrix, ORCA is enabled by setting the optimizer parameter and is supported in versions 5.2.0 and above, integrating with the vectorized execution engine.

GPORCA enhances query performance in the following areas:

Queries on partitioned tables
Queries with Common Table Expressions (CTE)
Queries with subqueries

R

Runtime Filter

A key technology for improving YMatrix execution engine performance. Runtime Filters are dynamically built during execution based on cost estimation, differing from filters pre-planned by the optimizer.

RPO

Recovery Point Objective (RPO) measures the maximum tolerable data loss after a system failure. A smaller RPO indicates lower data loss risk.

RTO

Recovery Time Objective (RTO) is the maximum acceptable time to restore the system to normal operation after a crash. It reflects the business system’s recovery capability after a disaster.

S

Transaction

A sequence of operations executed as a single logical unit. Multiple operations are submitted to the database as an indivisible whole—either all succeed or all fail—and must fully satisfy ACID properties.

In relational databases, a transaction can be a single SQL statement, a group of statements, or an entire program. It forms the basic unit for database recovery and concurrency control, typically starting with BEGIN/START TRANSACTION and ending with END/COMMIT.

Databases allow multiple concurrent transactions to read, write, and modify data. Isolation prevents data inconsistency caused by concurrent execution. SQL defines four isolation levels, in descending order of strictness:

Serializable
Repeatable Read
Read Committed
Read Uncommitted

Segment

Most commonly refers to Segment Instance—a data node instance, essentially an independent PostgreSQL database storing a portion of data and performing most query processing. Occasionally refers to Segment Node—the physical machine hosting one or more Segment Instances.

A Segment Instance typically includes a Primary Instance and a Mirror Instance (if mirroring is enabled):

Primary Instance: The main instance.
Mirror Instance: The mirrored instance.

SIMD

Single Instruction, Multiple Data (SIMD) is a computer instruction set architecture that allows multiple processing units to execute the same instruction on different data simultaneously.

SIMD is widely used in image processing, audio processing, and scientific computing, significantly improving computational efficiency.

Slot

In MatrixGate (mxgate), a slot is a process that writes data to Segment instances. After receiving a data loading job, mxgate creates a number of slots based on the configured --stream-prepared parameter. Each slot executes INSERT INTO dest SELECT * FROM external to import external data into YMatrix Segments. The duration of each data transmission is set by the --interval parameter. For example, if --interval is set to 100, each slot transmits data every 100 ms.

Standby

Refers to the standby node (instance) of the Master.

Users can optionally deploy a backup or mirrored Master instance on a host separate from the primary Master. When the Master becomes unavailable, the Standby takes over as a warm standby.

The Standby stays synchronized with the Master via streaming replication. A replication process runs on the Standby, responsible for synchronizing data between the Master and Standby hosts.

T

Graphical User Interface (MatrixUI)

The graphical interface of YMatrix, used for visual installation, operations, and monitoring. It offers rich functionality with a simple interface.

Current features include:

Multi-platform deployment
Simulated time-series scenarios
Cluster expansion
Kafka data stream ingestion
Query monitoring
Cluster management
Health checks
Self-service inspection
Workload analysis
SQL editor

U

UDF

User-Defined Function. A function created by users in the database.

UDFs are created with the CREATE FUNCTION statement, modified with ALTER FUNCTION, and dropped with DROP FUNCTION. Each valid UDF name database_name.owner_name.function_name must be unique.

W

WAL

Write-Ahead Logging (WAL) is an efficient logging mechanism in databases. For non-in-memory databases, insert, update, and delete operations are first recorded in a dedicated log file before being applied to the actual data. In case of failure, the database can be restored to a known consistent state using the WAL, ensuring data integrity.

X

Vectorized Model

The vectorized execution engine in YMatrix uses this model for iterative computation.

Similar to the Volcano model, it generates a query tree and uses pull-based execution. The key difference is that each iteration returns a batch of tuples instead of one. Benefits include reduced iteration overhead and better utilization of hardware features like SIMD, making it especially suitable for columnar storage.

The Vectorized Execution Engine, also known as MatrixVector, is a high-performance engine designed for column-oriented storage (e.g., MARS3, MARS2, AOCO). It delivers one to two orders of magnitude performance improvement over traditional row-based execution engines for common queries.

Performance gains of the vectorized executor over scalar engines (e.g., Volcano) come from:

Batch processing to reduce execution overhead
Small-batch processing to improve data locality and cache utilization
Selecting optimal processing paths based on data characteristics
Reducing function call overhead and leveraging CPU SIMD instructions
Column-wise processing with on-demand loading to avoid reading unused data

Sparse Index

A sparse index is an indexing structure designed to optimize data storage and retrieval. Unlike dense indexes, which contain an entry for every data item, sparse indexes only include entries for selected items—typically those physically close in storage. Sparse indexes are commonly used in large datasets with highly repeated values, where infrequent values are not indexed to save space.

Z

Cluster Service (State Data Management Service)

A service that ensures high availability by collecting and managing node state information.

YMatrix uses an ETCD cluster to implement this service. When a database node fails, ETCD uses stored state data to promote a healthy node as the new primary, maintaining cluster availability.

For example:

If the Master fails, its Standby is promoted to Master.
If the Standby fails, the cluster remains unaffected.
If a Primary fails, its Mirror is promoted to Primary.
A Mirror failure does not affect the cluster.

This service includes two main components: automatic failover and automatic failback (supported by the mxrecover tool), enabling complete node recovery.

Auto Partition Management (APM)

Short for Auto Partition Management.

A set of UDFs provided by YMatrix for managing partitions, allowing DBAs to maintain partitions manually based on requirements, reducing maintenance cost. Functions include:

auto_partitioning
auto_splitting
auto_partitioning_ex

Key features:

Automatic creation and deletion of partitions
Automatic splitting of default partitions
Bulk partition creation
Forced retention of specific historical partitions
Custom scheduling of automatic partition operations

← Previous

Prometheus Parameter

ABORT

English Русский 简体中文