Time-series Modeling Approach

This document is the second in the "Time-series Data Modeling" section. YMatrix believes that data model design directly impacts the value of data consumption and usage. Therefore, beyond technical introductions, this entire section aims to provide you with a clear understanding of the concept, applications, and evolution of time-series data models.

The first article, "What Is a Time-series Data Model?", answers a series of progressively deeper questions to help you clearly understand the concept of time-series data models.
This second article, "Time-series Modeling Approach", provides theoretical guidance on relational model design principles from YMatrix.
The third and fourth articles present modeling examples for connected vehicle and smart home scenarios. These follow the "Time-series Modeling Approach" to demonstrate best practices for different time-series use cases in YMatrix.

1 First: Understand the Time-series Data Model

Before building your own time-series data model, please read the first article in this section: What Is a Time-series Data Model?, to understand how YMatrix defines a time-series data model.

2 Second: Understand Time-series Scenario Requirements

YMatrix believes that understanding common time-series query patterns equates to understanding core scenario requirements. A real-world time-series scenario typically involves a combination of several query types. The table below uses connected vehicles as an example to illustrate these application scenarios.

Query Pattern	Description	Application Scenario
Single Device
Single-device Point Query	Retrieve one or more metrics of a single device at a specific point in time. Most commonly used to get the latest or last non-null values of certain metrics.	Real-time speed of a vehicle during driving
Single-device Detail Query	Retrieve detailed data of a single device over a time range, possibly involving one or multiple metrics.	Detailed analysis such as root cause investigation in after-sales service
Single-device Aggregation	Retrieve aggregated values (e.g., max, min, first, last, average) of certain metrics for a single device over a time interval.	Average speed of a vehicle over a given period
Multiple Devices
Multi-device Point Query	Retrieve metric values across multiple devices at a specific timestamp, possibly involving one or more metrics.	Combined signal queries; status monitoring (e.g., trigger low battery alert); compare metric a now with metric b from 10 seconds ago to determine if a vehicle is aging
Multi-device Detail Query	Retrieve detailed data of multiple devices over a time range, possibly involving one or more metrics.	Discovery of data patterns, such as root cause analysis or new product tracking. Typically used by R&D teams
Multi-device Aggregation	Retrieve aggregated values (e.g., max, min, first, last, average) of certain metrics across multiple devices over a time interval, grouped by device.	Check network conditions across regions; compute average speeds to verify whether multiple vehicles experience poor connectivity at a location
Other Queries
Multi-dimensional Query	Query latest, detailed, or aggregated values of devices matching tag or attribute filters. If multiple devices are involved, results are grouped by device.	Monitor current state of vehicles
Multi-table Join	Join queries clarify the semantic meaning of time-series data. The joined result acts as a set of qualifiers defining the context of time-series records. In YMatrix, this is achieved using the `JOIN` statement.	Cross-analysis between different vehicle business tables
Advanced Analytics	Subqueries, window functions, Cube, CTE, cumulative sums, incremental calculations, growth rate computation, change point detection, signal jump difference queries, percentile functions, pivot/unpivot operations, cumulative distribution functions, `CASE...WHEN...` expressions, sliding windows, linear interpolation, peak detection, recursive queries, etc.	Pattern recognition, trend forecasting, root cause analysis, threshold adjustment, anomaly detection, post-sale fault analysis, dashboard visualization, ad-hoc analysis, battery alerts, spike detection alerts, anomaly cleaning, null value imputation, top-N peak values for specified metrics of a device within a time range
Federated Query	Access external databases such as MySQL or Hive	Some data resides in MySQL; support join queries with such databases. Offline data stored in Hive; support federated queries with those systems
Machine Learning	Linear regression prediction	Use data from Jan 1–10 to train a linear regression model and predict metric values for Jan 11
Spatio-temporal Queries	`first` function, `last` function, window queries, sometimes combined with spatial data	Initial fuel level, final fuel level

3 Finally: Design the Time-series Data Model

Now that you have inferred your required time-series query patterns and data sources (from data collection origins) based on desired visualization outcomes, you are ready to enter the formal model design phase—congratulations.

From the above information, you should be able to estimate key parameters: number of devices, data volume, number of metrics, and data types. Indeed, these factors shape your modeling approach.

3.1 Core Decision Factors

Core Decision Factor	Description
Number of Devices	The term "device" may vary by scenario. In connected vehicles, a "device" could mean a car or a sensor.
Metric Type	Data type of metrics, e.g., int, float, text
Number of Metrics	Total count or scale of metrics

YMatrix identifies three core decision factors for time-series modeling: number of devices, metric type, and number of metrics.

These factors impact early-stage model design in different ways:

First, all three affect the model’s capacity to store data points. Additionally, data ingestion frequency and retention duration also influence capacity estimation. As business evolves, the number of devices, metric types, and counts will grow. Different businesses have varying needs for data retention and sampling frequency. During initial design, it is essential to forecast the dynamic storage capacity requirements over a foreseeable period.
Second, the certainty of metric types and counts during early design directly affects the choice between wide table, wide table variant, narrow table, or narrow table variant models. When both metric types and counts are highly predictable, wide tables or their variants are preferred—they deliver optimal query performance. When metric counts are unpredictable but types are stable, consider a narrow table variant. When both are uncertain, use a narrow table model to maximize write flexibility.

3.2 Selecting the Time-series Model Type

The first article in this chapter, What Is a Time-series Data Model?, explains the relationship between non-relational and relational models and briefly introduces differences among wide, wide variant, narrow, and narrow variant models. This document focuses on how to choose the appropriate model type.

3.2.1 Scenarios with Known Metrics and Types — Wide Table

A wide table model results in a time-series table containing many metric name and value columns. Depending on the number of metrics, the table can become very wide—hence the name. With proper design, wide tables offer fast query performance. It's common to see tables with 200 or more columns. Below is a simple example:

Timestamp	Device ID	Temperature	Wind Speed	Humidity
2021-11-16 13:57:13	1	29.2	3.2	55
2021-11-16 13:57:13	1	21.5	8.5	35
2021-11-16 13:57:13	1	22.3	7.2	40
2021-11-16 13:57:13	2	24.1	6.8	21
2021-11-16 13:57:13	2	19.8	2.9	38
2021-11-16 13:57:13	2	20.7	4.5	56
......

The wide table is a strength of YMatrix. Each row carries more data, while only one timestamp and one device ID column exist (since there is only one metric table), reducing redundancy. Index creation overhead is significantly higher for narrow tables, whereas wide tables have fewer rows and thus lower indexing cost, benefiting query performance. For instance, to retrieve temperature data per device, you can create a Lookup table mapping devices to metrics, enabling efficient single- or partial multi-device queries.

Does a single-table wide layout achieve optimal storage utilization? Not necessarily—it depends on how often null values are written. Whether a value column is null or not depends on actual data collection methods and frequencies. In scenarios with frequent null writes (also known as ultra-wide sparse tables), nulls consume considerable storage space.

Adding new "metric name + value" column pairs or introducing new metric value types into a well-designed wide table is not straightforward. Thus, the upfront design cost is higher than for narrow tables, requiring careful planning. However, once properly designed, the model delivers excellent query performance later.

It is best suited for scenarios where metrics and their types are relatively fixed during the design phase, such as certain connected vehicle applications.

3.2.2 Clearly Tiered Query Frequency Scenarios — Wide Table Variant

In some scenarios, query frequencies differ significantly across metrics. YMatrix addresses this with a structured + semi-structured wide table variant workflow. High-frequency queried data and low-frequency queried data are written differently—protecting query performance for hot data while maintaining write convenience for cold data.

"Structured" refers to traditional wide table storage in relational databases. "Semi-structured" means using a semi-structured format (JSON / MXKV) in the last column of an existing wide table (approaching maximum column limits) to store extended metric names and values, enabling flexible scalability for ultra-wide tables. A simple example:

Timestamp	Temperature	Wind Speed	Humidity	Tags
2021-11-16 13:57:13	29.2	3.2	55	Weather Sensor, Model A, Beijing, Huairou District
2021-11-16 13:57:13	21.5	8.5	35	Weather Sensor, Model A, Shanghai, Chongming District
2021-11-16 13:57:13	22.3	7.2	40	Weather Sensor, Model A, Beijing, Haidian District
2021-11-16 13:57:13	24.1	6.8	21	Weather Sensor, Model B, Beijing, Huairou District
2021-11-16 13:57:13	19.8	2.9	38	Weather Sensor, Model B, Shanghai, Chongming District
2021-11-16 13:57:13	20.7	4.5	56	Weather Sensor, Model B, Beijing, Haidian District
......

3.2.3 Scenarios with Uncertain Data Metrics and Types — Narrow Table

Narrow table layout is the opposite of wide table. A typical feature of narrow tables is having very few value columns—usually just one. Below is a simple example:

Timestamp	Device ID	Metric Name	Metric Value
2021-11-16 13:57:13	1	Temperature	29.2
2021-11-16 13:57:13	1	Wind Speed	3.2
2021-11-16 13:57:13	2	Temperature	21.5
2021-11-16 13:57:13	2	Wind Speed	8.5
......

As shown, apart from the fixed "timestamp + tag (device ID)" pair, the narrow table has only one metric name column and one metric value column. The metric name column stores all metric identifiers, and the value column holds all metric values of the same data type. To add a new metric or device, simply insert a new row. If additional metric value types need to be stored, either add corresponding value-type columns or create separate tables. This requires altering the schema or creating extra tables. Alternatively, if you frequently use JSONB, you may choose to store valid JSON values in this column, but note this incurs additional JSON parsing overhead.

The narrow table model excels in scenarios like general IoT or smart homes, where the number, category, or metrics of devices are highly uncertain during early design. Data storage, ingestion, and model extension are simple—but under the assumption that metric value types are relatively uniform. Each new value type requires a new table with redundant columns except for the value itself.

3.2.4 3.2.4 Scenarios with Uncertain Metrics but Defined Types — Narrow Table Variant

In some real-world scenarios, metric value types increase with evolving business needs. Traditional narrow tables can only scale by adding more tables—a clearly limited approach. Hence, we can design a narrow table variant: a model between narrow and wide tables. The basic principle is to create one column per required data type. For example, if two data types exist—integer and floating-point (say, float8)—you end up with two separate value columns:

Timestamp	Device ID	Metric Name	Value_int	Value_float
2021-11-16 13:57:13	1	Temperature		29.2
2021-11-16 13:57:13	1	Wind Speed		3.2
2021-11-16 13:57:13	1	Humidity	55
2021-11-16 13:57:13	2	Temperature		21.5
2021-11-16 13:57:13	2	Wind Speed		8.5
2021-11-16 13:57:13	2	Humidity	35
......

This design usually requires an additional Lookup table:

Metric Name	Data Type Column
Temperature	Value_float
Wind Speed	Value_float
Humidity	Value_int

During write or query operations, use the JOIN statement to join the two tables and locate the correct data column.
To support more value types, simply add more columns. While predicting all possible metrics in advance is difficult, predicting all possible value types is easier—provided you know the mapping between metric names and their data types.

Note!
For concrete modeling examples, refer to Best Practices for Connected Vehicle Scenarios and Best Practices for Smart Home Scenarios.

← Previous

Overview

Example: Vehicle Networking

English Русский 简体中文