Blog/Product

How MARS3 Works: Hybrid Row/Column Storage for High-Frequency Writes and Fast Analytics

2026-01-09 · YMatrix Team
#Product

Preface

Traditional storage formats each have their strengths.

  • Row stores write fast, but analytics can be slow because every query tends to read more data than it needs.

  • Column stores excel at analytics, but ingestion can suffer—especially under high-frequency, small-batch, continuously appended workloads.

In smart factory scenarios, you typically need both: efficient time-series ingestion and near-real-time analytics on the same dataset. Pure row-store or pure column-store approaches often fail to meet these combined requirements.

In this article, we start from real production-line workloads and unpack how MARS3, the storage engine in YMatrix, is designed to solve the “time-series + analytics” challenge.

1) Why production-line data can overwhelm a database

A smart factory production line usually involves tens of thousands of IoT devices and hundreds of industrial lines. The data has distinctive characteristics:

  1. Continuous ingestion (24/7) Device states and sensor metrics are written around the clock.

  2. Timestamp-first access patterns The primary question is rarely “what is the ID?”—it’s “when did it happen?”

  3. Analyze immediately after writing Operations monitoring, quality analysis, and anomaly tracing are not “run offline jobs tomorrow.” They are often interactive and time-windowed.

These characteristics translate into a set of database pressures:

  • Extremely high write rates and rapid data growth—how do you keep ingestion stable and efficient?

  • Queries almost always include time predicates—how do you store data to match that access pattern?

  • Analytical queries typically touch only a few metric columns—how do you minimize wasted I/O?

2) Row store vs. column store: why neither is enough

A typical smart-factory table might look like:

ts | device_id | metric_1 | metric_2 | metric_3 | ...

Row store: the default for classic OLTP

A row store packs each row’s fields together—think of each row as a “boxed shipment.” This works well for OLTP workloads like payments or order creation.

In smart-factory ingestion, a row store means:

  • Every device report writes a full row
  • Point lookups can be fast

But analytics is where it hurts. If you ask:

  • “Show the temperature trend over the past 24 hours”
  • “Compute the distribution of a metric across 1,000 devices”

…you may care about only one column, but the engine still reads entire rows. That creates large amounts of irrelevant I/O, one of the most common performance killers in time-series analytics.

Column store: the default for analytics

A column store stores values by column: all ts values together, all device_id values together, and so on. It shines for large scans, aggregation, and historical analysis.

In smart-factory analytics, a column store helps because:

  • Similar data types per column enable high compression

  • Queries can read only the needed metric columns, reducing I/O

But ingestion is the bottleneck:

  • High-frequency, small-batch, append-heavy writes often cause expensive column reorganization

  • This can lead to write amplification and visible ingestion latency

So the smart-factory pattern is neither pure OLTP nor pure OLAP. It’s a mixed workload (TP + AP) that demands a storage format combining the best of both worlds.

3) MARS3: a Row→Column storage engine for “time-series + analytics”

MARS3 is YMatrix’s hybrid row/column architecture designed specifically for time-series + analytics workloads. It is not a simple “compromise” between row and column stores. Instead, it introduces a dual-path storage workflow—Row → Column—to achieve:

Ingest like a row store, analyze like a column store.

3.1 Ingestion first: accept data fast, then optimize

In smart factories, the top priority for ingestion is:

  • no blocking
  • no jitter
  • no data loss

In MARS3, new data is written into a row-oriented staging area—similar to a buffer or draft zone:

  • No heavy compression
  • No immediate sorting requirements

This design makes it easier to absorb:

  • high-frequency sensor reports
  • continuous device status updates
  • streaming production-line data

3.2 Background “columnization”: turn raw streams into analytics-friendly data

When the row-store staging data reaches a certain size (e.g., 64MB), MARS3 automatically reorganizes it:

  1. Sort by timestamp
  2. Convert to columnar format
  3. Split into ranges (range-based chunking)
  4. Compress metric columns

This aligns naturally with time-series data properties:

  • timestamps are often monotonic
  • metrics evolve gradually over time
  • adjacent values are highly similar

That’s why columnar compression is a strong fit for time-series workloads.

3.3 Merge and maintenance: performance that doesn’t degrade over time

With continuous ingestion, an engine can degrade if it accumulates:

  • overlapping data segments that queries must repeatedly scan
  • increasing read amplification and write amplification

MARS3 performs automatic merge operations to:

  • merge overlapping data
  • remove expired data and apply updates/deletes
  • control read amplification over time

Crucially, this process is designed to avoid blocking writes or reads, helping long-running factory systems stay stable and performant.

3.4 Query path: the timestamp becomes a “natural index”

Smart-factory queries are typically time-windowed, for example:

WHERE ts >= now() - interval '1 hour'

MARS3 uses a four-layer filtering approach to minimize irrelevant I/O and accelerate queries:

  1. Run filtering Use MIN/MAX metadata to skip entire files that cannot match the time range.

  2. Range filtering Within a run, use MIN/MAX per range block to target only relevant blocks.

  3. Column pruning Read only the necessary columns (e.g., only ts + selected metrics), avoiding unrelated data.

  4. Version filtering Skip invalid rows that have been deleted or updated, preventing wasted computation.

The result is a query path that can be faster than traditional column stores for typical time-series analytics patterns—because it skips more data earlier and reads less overall.

4) Summary

MARS3’s advantages can be summarized in four simple statements:

  • No waiting on ingestion: stable handling of continuous, high-frequency time-series writes

  • Hybrid row/column: automatic row-to-column conversion without manual intervention

  • Analytics without scanning everything: multiple filters + column pruning reduce query latency

  • No long-term degradation: automated merges and maintenance keep performance steady

This is one of the key reasons MARS3 can handle the challenges that smart-factory “time-series + analytics” workloads place on a database.

YMatrix also provides a mature Smart Factory solution, already applied across many manufacturing enterprises.