This document is the second in the "Time-series Data Modeling" section. YMatrix believes that data model design directly impacts the value of data consumption and usage. Therefore, beyond technical introductions, this entire section aims to provide you with a clear understanding of the concept, applications, and evolution of time-series data models.
Before building your own time-series data model, please read the first article in this section: What Is a Time-series Data Model?, to understand how YMatrix defines a time-series data model.
YMatrix believes that understanding common time-series query patterns equates to understanding core scenario requirements. A real-world time-series scenario typically involves a combination of several query types. The table below uses connected vehicles as an example to illustrate these application scenarios.
| Query Pattern | Description | Application Scenario |
|---|---|---|
| Single Device | ||
| Single-device Point Query | Retrieve one or more metrics of a single device at a specific point in time. Most commonly used to get the latest or last non-null values of certain metrics. | Real-time speed of a vehicle during driving |
| Single-device Detail Query | Retrieve detailed data of a single device over a time range, possibly involving one or multiple metrics. | Detailed analysis such as root cause investigation in after-sales service |
| Single-device Aggregation | Retrieve aggregated values (e.g., max, min, first, last, average) of certain metrics for a single device over a time interval. | Average speed of a vehicle over a given period |
| Multiple Devices | ||
| Multi-device Point Query | Retrieve metric values across multiple devices at a specific timestamp, possibly involving one or more metrics. | Combined signal queries; status monitoring (e.g., trigger low battery alert); compare metric a now with metric b from 10 seconds ago to determine if a vehicle is aging |
| Multi-device Detail Query | Retrieve detailed data of multiple devices over a time range, possibly involving one or more metrics. | Discovery of data patterns, such as root cause analysis or new product tracking. Typically used by R&D teams |
| Multi-device Aggregation | Retrieve aggregated values (e.g., max, min, first, last, average) of certain metrics across multiple devices over a time interval, grouped by device. | Check network conditions across regions; compute average speeds to verify whether multiple vehicles experience poor connectivity at a location |
| Other Queries | ||
| Multi-dimensional Query | Query latest, detailed, or aggregated values of devices matching tag or attribute filters. If multiple devices are involved, results are grouped by device. | Monitor current state of vehicles |
| Multi-table Join | Join queries clarify the semantic meaning of time-series data. The joined result acts as a set of qualifiers defining the context of time-series records. In YMatrix, this is achieved using the JOIN statement. |
Cross-analysis between different vehicle business tables |
| Advanced Analytics | Subqueries, window functions, Cube, CTE, cumulative sums, incremental calculations, growth rate computation, change point detection, signal jump difference queries, percentile functions, pivot/unpivot operations, cumulative distribution functions, CASE...WHEN... expressions, sliding windows, linear interpolation, peak detection, recursive queries, etc. |
Pattern recognition, trend forecasting, root cause analysis, threshold adjustment, anomaly detection, post-sale fault analysis, dashboard visualization, ad-hoc analysis, battery alerts, spike detection alerts, anomaly cleaning, null value imputation, top-N peak values for specified metrics of a device within a time range |
| Federated Query | Access external databases such as MySQL or Hive | Some data resides in MySQL; support join queries with such databases. Offline data stored in Hive; support federated queries with those systems |
| Machine Learning | Linear regression prediction | Use data from Jan 1–10 to train a linear regression model and predict metric values for Jan 11 |
| Spatio-temporal Queries | first function, last function, window queries, sometimes combined with spatial data |
Initial fuel level, final fuel level |
Now that you have inferred your required time-series query patterns and data sources (from data collection origins) based on desired visualization outcomes, you are ready to enter the formal model design phase—congratulations.
From the above information, you should be able to estimate key parameters: number of devices, data volume, number of metrics, and data types. Indeed, these factors shape your modeling approach.
| Core Decision Factor | Description |
|---|---|
| Number of Devices | The term "device" may vary by scenario. In connected vehicles, a "device" could mean a car or a sensor. |
| Metric Type | Data type of metrics, e.g., int, float, text |
| Number of Metrics | Total count or scale of metrics |
YMatrix identifies three core decision factors for time-series modeling: number of devices, metric type, and number of metrics.
These factors impact early-stage model design in different ways:
The first article in this chapter, What Is a Time-series Data Model?, explains the relationship between non-relational and relational models and briefly introduces differences among wide, wide variant, narrow, and narrow variant models. This document focuses on how to choose the appropriate model type.
A wide table model results in a time-series table containing many metric name and value columns. Depending on the number of metrics, the table can become very wide—hence the name. With proper design, wide tables offer fast query performance. It's common to see tables with 200 or more columns. Below is a simple example:
| Timestamp | Device ID | Temperature | Wind Speed | Humidity |
|---|---|---|---|---|
| 2021-11-16 13:57:13 | 1 | 29.2 | 3.2 | 55 |
| 2021-11-16 13:57:13 | 1 | 21.5 | 8.5 | 35 |
| 2021-11-16 13:57:13 | 1 | 22.3 | 7.2 | 40 |
| 2021-11-16 13:57:13 | 2 | 24.1 | 6.8 | 21 |
| 2021-11-16 13:57:13 | 2 | 19.8 | 2.9 | 38 |
| 2021-11-16 13:57:13 | 2 | 20.7 | 4.5 | 56 |
| ...... |
The wide table is a strength of YMatrix. Each row carries more data, while only one timestamp and one device ID column exist (since there is only one metric table), reducing redundancy. Index creation overhead is significantly higher for narrow tables, whereas wide tables have fewer rows and thus lower indexing cost, benefiting query performance. For instance, to retrieve temperature data per device, you can create a Lookup table mapping devices to metrics, enabling efficient single- or partial multi-device queries.
Does a single-table wide layout achieve optimal storage utilization? Not necessarily—it depends on how often null values are written. Whether a value column is null or not depends on actual data collection methods and frequencies. In scenarios with frequent null writes (also known as ultra-wide sparse tables), nulls consume considerable storage space.
Adding new "metric name + value" column pairs or introducing new metric value types into a well-designed wide table is not straightforward. Thus, the upfront design cost is higher than for narrow tables, requiring careful planning. However, once properly designed, the model delivers excellent query performance later.
It is best suited for scenarios where metrics and their types are relatively fixed during the design phase, such as certain connected vehicle applications.
In some scenarios, query frequencies differ significantly across metrics. YMatrix addresses this with a structured + semi-structured wide table variant workflow. High-frequency queried data and low-frequency queried data are written differently—protecting query performance for hot data while maintaining write convenience for cold data.
"Structured" refers to traditional wide table storage in relational databases. "Semi-structured" means using a semi-structured format (JSON / MXKV) in the last column of an existing wide table (approaching maximum column limits) to store extended metric names and values, enabling flexible scalability for ultra-wide tables. A simple example:
| Timestamp | Temperature | Wind Speed | Humidity | ...... | Tags |
|---|---|---|---|---|---|
| 2021-11-16 13:57:13 | 29.2 | 3.2 | 55 | Weather Sensor, Model A, Beijing, Huairou District | |
| 2021-11-16 13:57:13 | 21.5 | 8.5 | 35 | Weather Sensor, Model A, Shanghai, Chongming District | |
| 2021-11-16 13:57:13 | 22.3 | 7.2 | 40 | Weather Sensor, Model A, Beijing, Haidian District | |
| 2021-11-16 13:57:13 | 24.1 | 6.8 | 21 | Weather Sensor, Model B, Beijing, Huairou District | |
| 2021-11-16 13:57:13 | 19.8 | 2.9 | 38 | Weather Sensor, Model B, Shanghai, Chongming District | |
| 2021-11-16 13:57:13 | 20.7 | 4.5 | 56 | Weather Sensor, Model B, Beijing, Haidian District | |
| ...... |
Narrow table layout is the opposite of wide table. A typical feature of narrow tables is having very few value columns—usually just one. Below is a simple example:
| Timestamp | Device ID | Metric Name | Metric Value |
|---|---|---|---|
| 2021-11-16 13:57:13 | 1 | Temperature | 29.2 |
| 2021-11-16 13:57:13 | 1 | Wind Speed | 3.2 |
| 2021-11-16 13:57:13 | 2 | Temperature | 21.5 |
| 2021-11-16 13:57:13 | 2 | Wind Speed | 8.5 |
| ...... |
As shown, apart from the fixed "timestamp + tag (device ID)" pair, the narrow table has only one metric name column and one metric value column. The metric name column stores all metric identifiers, and the value column holds all metric values of the same data type. To add a new metric or device, simply insert a new row. If additional metric value types need to be stored, either add corresponding value-type columns or create separate tables. This requires altering the schema or creating extra tables. Alternatively, if you frequently use JSONB, you may choose to store valid JSON values in this column, but note this incurs additional JSON parsing overhead.
The narrow table model excels in scenarios like general IoT or smart homes, where the number, category, or metrics of devices are highly uncertain during early design. Data storage, ingestion, and model extension are simple—but under the assumption that metric value types are relatively uniform. Each new value type requires a new table with redundant columns except for the value itself.
In some real-world scenarios, metric value types increase with evolving business needs. Traditional narrow tables can only scale by adding more tables—a clearly limited approach. Hence, we can design a narrow table variant: a model between narrow and wide tables. The basic principle is to create one column per required data type. For example, if two data types exist—integer and floating-point (say, float8)—you end up with two separate value columns:
| Timestamp | Device ID | Metric Name | Value_int | Value_float |
|---|---|---|---|---|
| 2021-11-16 13:57:13 | 1 | Temperature | 29.2 | |
| 2021-11-16 13:57:13 | 1 | Wind Speed | 3.2 | |
| 2021-11-16 13:57:13 | 1 | Humidity | 55 | |
| 2021-11-16 13:57:13 | 2 | Temperature | 21.5 | |
| 2021-11-16 13:57:13 | 2 | Wind Speed | 8.5 | |
| 2021-11-16 13:57:13 | 2 | Humidity | 35 | |
| ...... |
This design usually requires an additional Lookup table:
| Metric Name | Data Type Column |
|---|---|
| Temperature | Value_float |
| Wind Speed | Value_float |
| Humidity | Value_int |
During write or query operations, use the JOIN statement to join the two tables and locate the correct data column.
To support more value types, simply add more columns. While predicting all possible metrics in advance is difficult, predicting all possible value types is easier—provided you know the mapping between metric names and their data types.
Note!
For concrete modeling examples, refer to Best Practices for Connected Vehicle Scenarios and Best Practices for Smart Home Scenarios.