YMatrix - Hyper-Converged Database

Basic Features

YMatrix is a distributed database product developed based on the PostgreSQL/Greenplum open-source database, featuring the following core characteristics:

  • Supports clusters with up to 100 nodes, enabling multi-node, multi-core parallel computing.
  • The cluster supports online expansion.
  • The cluster has financial-grade high availability and can complete automatic failover within 3 seconds.
  • Suitable for TB to PB level data processing.
  • Integrates analysis, transaction, and time series capabilities, and is widely used in intelligent manufacturing, finance, and Internet of Vehicles scenarios.

In addition to the commercial version, YMatrix also offers a free community version. We welcome your experience and feedback. https://ymatrix.cn/download

“Hyper-convergence” concept

A hyper-converged database refers to a database product that integrates transaction-oriented databases (OLTP), analytical databases (OLAP), time-series databases, and data lake capabilities.

YMatrix's hyper-convergence philosophy rejects the fragmentation of data processing, instead integrating computing, storage, and network resources into a single system. By considering factors such as the original database types, versions, cluster topologies, and business characteristics of specific business scenarios, YMatrix provides different storage and execution engine combinations on top of the database's common core components to achieve different microkernels, thereby delivering targeted improvements in write, storage, and query performance.

Integration Capabilities

YMatrix believes that databases should focus more on full-scenario functionality and performance, including writing, querying, analysis, machine learning, and more. By integrating various capabilities into a single database product, it can address a variety of complex scenarios, thereby achieving multi-model, scalability, and cost control for business applications from multiple perspectives.

· Analysis Capabilities

· Transaction Capabilities

· Time Series Capabilities

Unified Interface

YMatrix provides services using SQL as the unified interface for all data at the upper layer.

Open Architecture

YMatrix has strong scalability.

On one hand, YMatrix has expanded into an increasing number of business scenarios during its iterative development, including vehicle networking, smart manufacturing, finance, and vector computing; on the other hand, we provide capabilities such as machine learning and data federation, enabling more heterogeneous and cross-source business scenarios to run efficiently on YMatrix through database extensions (Extension).


YMatrix hyper-converged databases help users simplify their infrastructure architecture, significantly reducing the complexity of their technology stacks and improving the performance of data infrastructure in different scenarios. They also reduce the risks associated with the coexistence and interaction of multiple systems, thereby helping enterprises build a comprehensive data governance mechanism and fully unleash the digital potential of the data era.

Proprietary Core Technologies

YMatrix drives the implementation of the “hyper-convergence” concept in its products through a number of key proprietary technologies.

Storage Engine: MARS3

With the goal of simultaneously adapting to analysis, transactions, and time series scenarios, MARS3 provides two modes for users to choose from: column storage and row-column mixed storage. In addition to providing excellent storage performance (including compression, status diagnosis, etc.), the row-column mixed storage mode also ensures high-performance writing. Both modes implement the MVCC mechanism. For partitioned tables, they support automatic partition management and automatic storage degradation.

Execution Engine: Vectorization

The vectorized execution engine is a high-performance execution engine designed specifically for column-oriented storage engines (such as MARS3, MARS2, and AOCO). For common queries, it offers a performance improvement of one to two orders of magnitude compared to traditional row-oriented execution engines.

High Availability Architecture: ALOHA

ALOHA (Advanced Least Operation High Availability) is the cluster state data management service introduced in YMatrix 5.X. It operates independently of the cluster, allowing separate disk and monitoring configurations. Even in harsh environments, it ensures low-latency node state detection and management, completing failure automatic transfer within 3 seconds.。

Platform Capabilities

MatrixUI: Visual Installation and Operation

  • Graphical Installation: Complete cluster deployment in 10 minutes; simulate time series scenario queries and writes in 3 minutes.
  • Graphical Operations Monitoring: One-click self-service inspection, one-click second-level scaling.

MatrixGate: High-Concurrency Writing

  • Low Latency, High Concurrency: Supports parallel writing of massive amounts of data, fully utilizes bandwidth to compress data, and can improve writing speed by up to 100 times.
  • Supports integration with different data sources/types.
  • Supports batch writing and streaming writing of data.
  • Supports UPSERT capabilities: used to address complex write issues such as data disorder and batching in scenarios where data is merged in batches.

MatrixShift: Point-to-Point Migration

  • Efficient point-to-point migration: enables data transmission from Segment to Segment, eliminating single points of failure that may occur in typical migration operations.
  • Full-scenario migration: supports full, incremental, and conditional filtering migration scenarios.
  • Replace Greenplum: Supports migration of cluster data from Greenplum 4.3.X/5/6 to YMatrix.

Enterprise-level security

  • Authentication: Rich authentication methods. Trust authentication, password authentication, PAM certificates, and other authentication methods.
  • Permission control: Uses role-based access control mechanisms to simplify the association between users and permissions through role mechanisms.
  • Encryption: Provides different levels of encryption. Password storage encryption; encryption for specified fields; SSL host authentication; client encryption; network encryption of data; cross-network encryption of passwords; database partition encryption.
  • Auditing: Records user logins, logouts, and post-login database operations. Different levels of auditing are set based on security levels.
  • Resource Control: Enforces strict address access restrictions to ensure the trustworthiness of user origins; configurable maximum concurrent connection count for users; default connection timeout policy.

Enhanced Compatibility

  • Fully compatible with the PostgreSQL/Greenplum ecosystem toolchain.

Support for Multiple Business Scenarios

Super Data Warehouse Scenario

Powerful analytical computing capabilities

The primary query scenario for data warehouses is historical data analysis. Traditionally, this is accomplished using the Hadoop ecosystem for data production and consumption: historical data is first stored on the Hadoop platform, then Spark is used to calculate report metrics, a complex process.

YMatrix not only leverages hyper-converged capabilities to address complex ecosystem challenges but also enhances analytical performance through targeted optimizations: by integrating structured and unstructured data types, data federation access, and other methods, it completes business intelligence (BI) and reporting analysis tasks in classic OLAP scenarios such as finance, telecommunications, government, energy, and manufacturing. Through vectorization, Runtime Filter, sliding windows, continuous aggregation, and other query optimization technologies, it achieves powerful analytical computing capabilities.

Complex Time Series Analysis Scenarios

Balancing high-speed writing, low-cost storage, and real-time queries

Due to the real-time nature of time series data, time series scenarios place high demands on database write, storage, and query capabilities.

YMatrix is optimized for time. Thanks to the MARS series storage engine's physical sorting, different frequency uploads, batch uploads, and MatrixGate's high concurrency and high-performance batch data write capabilities, YMatrix can exceed expectations in meeting the needs of real-time warehousing, real-time queries, and transaction guarantees in enterprise time series scenarios.

YMatrix supports graphical scaling with simple operations, enabling rapid scaling in seconds; it also supports smooth scaling without interrupting business operations, ensuring business safety and smoothness, reducing downtime losses, and lowering risks.

Converged Technology Stack Scenarios

Leveraging hyper-converged capabilities to integrate data pipelines

Data silos are a common phenomenon in traditional industrial enterprises. The inability to circulate and utilize data constrains enterprise management, operations, and development, severely impacting the acquisition of competitive advantages in business operations. This is a critical challenge that must be overcome in enterprise digital transformation.

Currently, YMatrix's hyper-converged architecture has been successfully applied in real production scenarios such as factory data foundations, large corporate group data warehouses, intelligent connected vehicles, and IoT device intelligent operations, significantly lowering technical barriers during enterprise selection, procurement, use, and maintenance, and receiving positive feedback. For example, in smart manufacturing scenarios, a single repository can handle the collection, storage, computation, modeling, querying, and analysis of data from enterprise resource planning systems (ERP), manufacturing execution systems (MES), and equipment data.