This document provides examples of typical queries in time-series scenarios. Since most real-world query patterns are composite in nature, the classification here is not strictly exclusive.
Note!
Understanding the query requirements of a scenario is one of the key steps in building an accurate data model.
Query the value of one or more metrics for a single device at a specific point in time. The most common case is retrieving the latest or last non-null value of certain metrics — for example, the current speed of a moving vehicle:
=# SELECT timestamp, car_id, value FROM car
WHERE car_id = '2'
ORDER BY timestamp DESC
LIMIT 1;
Retrieve detailed data for a single device over a specified time interval. This may involve one or multiple metrics and is typically used for in-depth analysis, such as root cause investigation during after-sales service:
=# SELECT * FROM analysis
WHERE device_id = '2'
AND timestamp >= '2023-10-08 00:00:00'
AND timestamp <= '2023-10-08 09:00:00';
Compute aggregated values (e.g., count, sum, average, max, min) of certain metrics for a single device within a time range:
=# SELECT count(1),sum(c1),avg(c1),max(c2),min(c2) FROM metrics
WHERE device_id = '1'
AND timestamp >= '2023-10-08 00:00:00'
AND timestamp <= '2023-10-08 09:00:00';
Query metric values across multiple devices at a given time point. This may include one or more metrics, such as combined signal checks:
=# SELECT timestamp, device_id, value FROM metrics
WHERE device_id = '2' OR device_id = '3'
AND timestamp = '2023-10-08 00:00:00'
ORDER BY device_id, timestamp DESC;
Tip
The result returns rows satisfying the conditions, first sorted in ascending order by device_id. If there are multiple records with the same device_id, they are then sorted in descending order by timestamp. To sort both columns in descending order, modify the SQL to use ORDER BY device_id DESC, timestamp DESC.
Fetch detailed data from multiple devices over a time window. This type of query supports use cases like anomaly detection, root cause analysis, or new product tracking, and is commonly used by R&D teams:
=# SELECT d.device_id, d.timestamp, d.value, s.device_name, l.location_name
FROM device_data AS d
JOIN device_status AS s ON d.device_id = s.device_id
JOIN device_location AS l ON d.device_id = l.device_id
WHERE d.device_id IN ('device1', 'device2', 'device3')
AND d.timestamp >= '2023-11-01 00:00:00'
AND d.timestamp <= '2023-11-10 23:59:59'
ORDER BY d.device_id, d.timestamp;
This is an example of root cause analysis, assuming the following tables exist:
device_data: Table containing device data with columns device_id, timestamp, valuedevice_status: Table storing device status with columns device_id, device_namedevice_location: Table storing device location information with columns device_id, location_nameKey Points:
JOIN clause links related tables using the device ID.WHERE clause filters data for specific device IDs and time range.Compute aggregated metrics (e.g., max, min, sum, average) across multiple devices within a time interval, grouped by device or other attributes — for instance, checking network conditions per region:
=# SELECT l.location_name, COUNT(d.device_id) AS total_devices, AVG(d.value) AS average_value
FROM device_data AS d
JOIN device_location AS l ON d.device_id = l.device_id
WHERE l.location_name IN ('location1', 'location2', 'location3')
AND d.timestamp >= '2023-11-01 00:00:00'
AND d.timestamp <= '2023-11-10 23:59:59'
GROUP BY l.location_name;
This example assumes two tables:
device_data: Contains device data with columns device_id, timestamp, valuedevice_location: Stores device location with columns device_id, location_nameKey Points:
JOIN clause connects tables via device ID.WHERE clause filters data by region and time.GROUP BY clause groups results by location.COUNT and AVG aggregate functions compute device count and average value.Queries involving temporal functions such as first() and last(), sometimes combined with spatial data — for example, tracking start and end fuel levels.
To use YMatrix's time-series functions, create the extension first:
=# CREATE EXTENSION matrixts;
The extension is database-level; create it once per database.
Returns the value from the first record in the specified column:
=# SELECT id,first(c1, ts) AS c,
FROM t
GROUP BY id
ORDER BY id;
Returns the value from the last record in the specified column:
=# SELECT device_id, last(value, timestamp) AS last_value
FROM t
GROUP BY device_id
ORDER BY device_id;
=# SELECT time_bucket('5 s', timestamp) AS five_second, count(*) FROM t
GROUP BY five_second;
The above SQL means that table t is sampled every 5s seconds using count(*).
Key Point:
GROUP BY to group raw data into wider time intervals and compute summary statistics is known as downsampling. Downsampling reduces storage cost while preserving key data characteristics, enabling historical trend analysis and forecasting.=# SELECT
device_id,
max(timestamp) as max_timestamp,
last_not_null_value(value1, timestamp) as lastvalue1,
last_not_null_value(value2, timestamp) as lastvalue2,
last_not_null_value(value3, timestamp) as lastvalue3
FROM t
GROUP BY device_id;
Or:
=# SELECT
device_id,
max(timestamp) as max_timestamp,
last_not_null(value1, timestamp) as lastvalue1,
last_not_null(value2, timestamp) as lastvalue2,
last_not_null(value3, timestamp) as lastvalue3
FROM t
GROUP BY device_id;
Key Points:
last_not_null() or last_not_null_value() functions.last_not_null(), last_not_null_value() returns both value and timestamp. The return type is string, formatted as [<value>, <time>].Calculate differences between consecutive values to detect changes — for example, monitoring stock behavior:
=# SELECT
device_id,
timestamp,
value,
lag(value) OVER(PARTITION BY device_id ORDER BY timestamp),
lead(value) OVER(PARTITION BY device_id ORDER BY timestamp),
lead(value) OVER(PARTITION BY device_id ORDER BY timestamp) - value AS lead_value
FROM t;
Key Points:
lag() retrieves the previous row’s value.lead() retrieves the next row’s value.PARTITION BY clause with device_id applies the window function within each device partition.lead(value) - value computes the difference between the next and current value, named lead_value.=# SELECT time_bucket_gapfill('45 minutes', timestamp) AS five_min,
locf(AVG(value)) As locf_value,
interpolate(AVG(value)) AS interpolate_value
FROM t
WHERE device_id = '1'
AND timestamp >= '2021-12-01 00:00:00'
AND timestamp < '2021-12-02 06:00:00'
GROUP BY five_min
ORDER BY five_min;
Key Points:
time_bucket_gapfill() along with LOCF (Last Observation Carried Forward) and linear interpolation to fill missing values in table t.time_bucket_gapfill('45 minutes', timestamp): Groups timestamps (timestamp) into fixed buckets (45-minute intervals) and fills gaps to ensure continuity.locf(AVG(c1)): Applies LOCF method to fill value in each bucket using the last observed non-null value.interpolate(AVG(c1)): Uses linear interpolation to estimate value based on surrounding non-null values.Detect anomalies using shifts in cumulative average or median:
-- Cumulative sum
=# SELECT device_id, timestamp, value,
SUM(value) OVER(PARTITION BY device_id, timestamp::date ORDER BY timestamp) AS accumulation_value
FROM t;
-- Cumulative average
=# SELECT device_id, timestamp, value,
AVG(value) OVER(PARTITION BY device_id, timestamp::date ORDER BY timestamp) AS accumulation_value
FROM t;
-- Median
=# SELECT device_id,
percentile_cont(0.5) WITHIN GROUP (ORDER BY value) AS accumulation_value
FROM t
GROUP BY device_id
ORDER BY device_id;
Compute increments for monotonic sequences or simple changes between consecutive records, useful for anomaly detection.
Example 1
Computes disk read change for disk tag_id with tag ID 1 between 2021-4-10 21:00:00 and 21:01:00:
=# SELECT
time,
(
CASE WHEN lag(read) OVER (ORDER BY time) IS NULL THEN NULL
ELSE round(read - lag(read) OVER (ORDER BY time))
END
) AS read
FROM disk
WHERE time BETWEEN '2021-04-10 21:00:00'::timestamp AND '2021-04-10 21:01:00'::timestamp
AND tag_id = 1
ORDER BY time;
Key Points:
lag() to get prior value.Example 2
Calculates daily sales increment using the sales table:
=# SELECT
date,
sales_amount,
sales_amount - LAG(sales_amount) OVER (ORDER BY date) AS sales_increment
FROM
sales
ORDER BY
date;
Key Points:
Example 1
Divide the increment by time interval to get rate of change:
=# SELECT
time,
(
CASE WHEN lag(read) OVER (ORDER BY time) IS NULL THEN NULL
ELSE round(read - lag(read) OVER (ORDER BY time))
END
) / extract(epoch from time - lag(time) OVER (ORDER BY time)) AS read_rate,
extract(epoch from time - lag(time) OVER (ORDER BY time)) AS "time lag"
FROM disk
WHERE time BETWEEN '2021-04-10 21:00:00'::timestamp AND '2021-04-10 21:01:00'::timestamp
AND tag_id = 1
ORDER BY time;
Key Points:
增量/时间间隔=增速.Identify jumps in metric values, returning timestamps, device IDs, and pre/post-jump values — useful for auditing stock trades or detecting sudden state changes:
=# SELECT stock_id, timestamp, value, lag_value, lead_value, lag_diff_value, lead_diff_value FROM (
SELECT stock_id, timestamp, value,
lag(value) OVER (PARTITION BY stock_id ORDER BY timestamp) AS lag_value, -- Previous value in value column
lead(value) OVER (PARTITION BY stock_id ORDER BY timestamp) AS lead_value, -- Next value in value column
value - lag(value) OVER (PARTITION BY stock_id ORDER BY timestamp) AS lag_diff_value,
value - lead(value) OVER (PARTITION BY stock_id ORDER BY timestamp) AS lead_diff_value
FROM t
WHERE 1=1
AND stock_id in ('1')
AND timestamp >= '2021-12-01 00:00:00'
AND timestamp < '2021-12-02 23:59:59'::timestamp
AND value IS NOT NULL
) ht
WHERE abs(lag_diff_c1) > 3 OR abs(lead_diff_c1) > 3
ORDER BY stock_id, timestamp;
Key Points:
lag() and lead() to fetch previous and next values within each stock_id partition ordered by timestamp.lag_value: Previous value of value.lead_value: Next value of value.lag_diff_value: Difference between current value and previous value.lead_diff_value: Difference between current value and next value.Note!
Jump detection identifies abrupt changes in a metric exceeding a threshold between adjacent timestamps for a given device within a time window.
Use percentile functions to compute real-time dashboard metrics — for example, using the 90th percentile to reflect stock trends:
=# SELECT stock_id,
percentile_cont(0.9) WITHIN GROUP (ORDER BY value) AS value_9
FROM t
GROUP BY stock_id
ORDER BY stock_id;
Use CDF to compute the cumulative distribution of prices or metrics over time. This helps assess relative positioning and define price ranges and statistical properties — ideal for displaying distribution statistics on real-time dashboards:
=# SELECT stock_id, timestamp, price,
cume_dist() OVER(PARTITION BY stock_id, timestamp::date ORDER BY timestamp) AS cume_dist
FROM t;
Key Points:
cume_dist() function computes cumulative distribution: 累积分布值 = (在当前行之前或者平级的分区行数) / 分区行总数In time-series scenarios, pivot operations (row-to-column or column-to-row transformations) support flexible ad hoc analysis.
=# SELECT name,
max(CASE WHEN attribute='age' THEN value ELSE 0 END) AS age,
max(CASE WHEN attribute='height' THEN value ELSE 0 END) AS height,
max(CASE WHEN attribute='weight' THEN value ELSE 0 END) AS weight,
max(CASE WHEN attribute='shoe_size' THEN value ELSE 0 END) AS shoe_size
FROM t
GROUP BY name
ORDER BY age DESC;
Sample structure of table t:
+------+----------+-------+
| name | attribute | value |
+------+----------+-------+
| John | age | 30 |
| John | height | 175 |
| John | weight | 70 |
| John | shoe_size| 9.5 |
| Mary | age | 25 |
| Mary | height | 160 |
| Mary | weight | 55 |
| Mary | shoe_size| 8 |
+------+----------+-------+
Row-to-column output example:
+------+-----+-------+--------+-----------+
| name | age | height| weight | shoe_size |
+------+-----+-------+--------+-----------+
| John | 30 | 175 | 70 | 9.5 |
| Mary | 25 | 160 | 55 | 8 |
+------+-----+-------+--------+-----------+
Key Points:
max) extract their values.=# SELECT currenttimestamp,
deviceid,
devicetemplatecode,
statisticstype,
(b.rec).key AS key,
(b.rec).value AS value
FROM
(SELECT currenttimestamp,
deviceid,
devicetemplatecode,
statisticstype,
jsonb_each_text(row_to_json(t.*)::jsonb-'currenttimestamp'-'deviceid'-'devicetemplatecode'-'statisticstype') AS rec
FROM t
) b
WHERE (b.rec).value IS NOT NULL;
Sample structure of table t:
+---------------------+----------+-------------------+----------------+--------+--------+--------+
| currenttimestamp | deviceid | devicetemplatecode| statisticstype | key1 | key2 | key3 |
+---------------------+----------+-------------------+----------------+--------+--------+--------+
| 2023-11-13 08:30:45 | 1234567 | template1 | type1 | value1 | value2 | value3 |
+---------------------+----------+-------------------+----------------+--------+--------+--------+
Column-to-row output example:
+---------------------+----------+-------------------+----------------+------+-------+
| currenttimestamp | deviceid | devicetemplatecode| statisticstype | key | value |
+---------------------+----------+-------------------+----------------+------+-------+
| 2023-11-13 08:30:45 | 123456 | template1 | type1 | key1 | value1|
| 2023-11-13 08:30:45 | 123456 | template1 | type1 | key2 | value2|
| 2023-11-13 08:30:45 | 123456 | template1 | type1 | key3 | value3|
| ... | ... | ... | ... | ... | ... |
+---------------------+----------+-------------------+----------------+------+-------+
Key Points:
row_to_json() to convert each row into a JSON object, then jsonb_each_text() to expand it into key-value pairs, achieving column-to-row transformation.--- SPLIT ---
3.7 Real-time Metric Labeling
Display the status of a machine, account, or individual:
=# SELECT first_name,
last_name,
salary,
CASE
WHEN salary >= 80000 THEN 'senior'
WHEN salary >= 60000 THEN 'intermediate'
ELSE 'junior'
END AS employee_level
FROM employees;
Key Points:
CASE clause to derive composite metrics:salary is greater than or equal to 80000, the employee_level column will be labeled 'senior'.salary is greater than or equal to 60000 and less than 80000, the employee_level column will be labeled 'intermediate'.salary is less than 60000, the employee_level column will be labeled 'junior'.Identify the top 10 peaks for a specified metric of a given stock within a time range, along with their timestamps:
=# SELECT stock_id, timestamp, value FROM (
SELECT stock_id, timestamp, value,
row_number() OVER (PARTITION BY stock_id ORDER BY value DESC) AS rn
FROM t_test
WHERE stock_id ='1'
AND timestamp >= '2021-12-01 00:00:00'
AND timestamp < '2021-12-03 00:00:00'
) AS derived_table
WHERE rn <= 10;
Key Points:
row_number() to filter and sort data from table t_test.value for each stock_id.Used for data provenance tracing and traversal of metrics in JSON-type fields:
=# WITH RECURSIVE json_recursive AS (
SELECT id,
'person' AS key,
data->'person' AS value
FROM my_table
UNION ALL
SELECT j.id,
k AS key,
v AS value
FROM json_recursive j,
jsonb_each(j.value) AS kv(k, v)
WHERE jsonb_typeof(j.value) = 'object'
)
SELECT id, key, value
FROM json_recursive;
Key Points:
WITH RECURSIVE) and two subqueries.id, a fixed key 'person', and the JSON value data->'person' from table my_table as the initial result set.json_recursive CTE with the result of the jsonb_each() function. In each iteration, it decomposes the current JSON object (value) into key-value pairs using jsonb_each(). The resulting key (k) and value (v) become part of the next iteration.WHERE condition jsonb_typeof(j.value) = 'object' ensures that only records where the JSON value is an object are processed, providing a termination condition for the recursion.id, key, and value from the json_recursive CTE to provide complete provenance information.Machine learning can be used for prediction and classification tasks.
For example, use data from December 1 to December 10 to build a linear regression model and predict the metric value on December 11:
=# SELECT '2021-12-11 00:00:01' AS timestamp,
extract(epoch FROM '2021-12-11 00:00:01'::timestamp) * slope + intercept AS c1_value
FROM (
SELECT regr_slope(value, extract(epoch from timestamp)) AS slope,
regr_intercept(value, extract(epoch from timestamp)) AS intercept
FROM t_test
WHERE vin = '1'
AND timestamp >= '2021-12-01 00:00:00'
AND timestamp < '2021-12-11 00:00:00'
) AS liner_regression;
Key Points:
regr_slope() and regr_intercept() to perform linear regression on table t_test.vin equals '1', and timestamp is between '2021-12-01 00:00:00' (inclusive) and '2021-12-11 00:00:00' (exclusive). It calculates the slope and intercept between the value column and the Unix epoch time extracted from timestamp.timestamp to '2021-12-11 00:00:01', then computes the predicted c1_value using the linear equation: slope * epoch_time + intercept.