Typical Queries for IoT Time-Series Scenarios

This document provides examples of typical queries in time-series scenarios. Since most real-world query patterns are composite in nature, the classification here is not strictly exclusive.

Note!
Understanding the query requirements of a scenario is one of the key steps in building an accurate data model.


1 Single-Device Basic Queries

1.1 Single-Device Point Query

Query the value of one or more metrics for a single device at a specific point in time. The most common case is retrieving the latest or last non-null value of certain metrics — for example, the current speed of a moving vehicle:

=# SELECT timestamp, car_id, value FROM car
    WHERE car_id = '2' 
    ORDER BY timestamp DESC
    LIMIT 1;

1.2 Single-Device Detailed Query

Retrieve detailed data for a single device over a specified time interval. This may involve one or multiple metrics and is typically used for in-depth analysis, such as root cause investigation during after-sales service:

=# SELECT * FROM analysis
    WHERE device_id = '2' 
    AND timestamp >= '2023-10-08 00:00:00' 
    AND timestamp <= '2023-10-08 09:00:00';

1.3 Single-Device Aggregation Query

Compute aggregated values (e.g., count, sum, average, max, min) of certain metrics for a single device within a time range:

=# SELECT count(1),sum(c1),avg(c1),max(c2),min(c2) FROM metrics
    WHERE device_id = '1'
    AND timestamp >= '2023-10-08 00:00:00' 
    AND timestamp <= '2023-10-08 09:00:00';


2 Multi-Device Basic Queries

2.1 Multi-Device Point Query

Query metric values across multiple devices at a given time point. This may include one or more metrics, such as combined signal checks:

=# SELECT timestamp, device_id, value FROM metrics
    WHERE device_id = '2' OR device_id = '3'
    AND timestamp = '2023-10-08 00:00:00'
    ORDER BY device_id, timestamp DESC;

Tip
The result returns rows satisfying the conditions, first sorted in ascending order by device_id. If there are multiple records with the same device_id, they are then sorted in descending order by timestamp. To sort both columns in descending order, modify the SQL to use ORDER BY device_id DESC, timestamp DESC.

2.2 Multi-Device Detailed Query

Fetch detailed data from multiple devices over a time window. This type of query supports use cases like anomaly detection, root cause analysis, or new product tracking, and is commonly used by R&D teams:

=# SELECT d.device_id, d.timestamp, d.value, s.device_name, l.location_name
FROM device_data AS d
JOIN device_status AS s ON d.device_id = s.device_id
JOIN device_location AS l ON d.device_id = l.device_id
WHERE d.device_id IN ('device1', 'device2', 'device3')
  AND d.timestamp >= '2023-11-01 00:00:00'
  AND d.timestamp <= '2023-11-10 23:59:59'
ORDER BY d.device_id, d.timestamp;

This is an example of root cause analysis, assuming the following tables exist:

  • device_data: Table containing device data with columns device_id, timestamp, value
  • device_status: Table storing device status with columns device_id, device_name
  • device_location: Table storing device location information with columns device_id, location_name

Key Points:

  • The query joins all three tables to retrieve device data, status, and location.
    • The JOIN clause links related tables using the device ID.
    • The WHERE clause filters data for specific device IDs and time range.

2.3 Multi-Device Aggregation Query

Compute aggregated metrics (e.g., max, min, sum, average) across multiple devices within a time interval, grouped by device or other attributes — for instance, checking network conditions per region:

=# SELECT l.location_name, COUNT(d.device_id) AS total_devices, AVG(d.value) AS average_value
FROM device_data AS d
JOIN device_location AS l ON d.device_id = l.device_id
WHERE l.location_name IN ('location1', 'location2', 'location3')
  AND d.timestamp >= '2023-11-01 00:00:00'
  AND d.timestamp <= '2023-11-10 23:59:59'
GROUP BY l.location_name;

This example assumes two tables:

  • device_data: Contains device data with columns device_id, timestamp, value
  • device_location: Stores device location with columns device_id, location_name

Key Points:

  • The query joins the two tables to get device data and location.
    • The JOIN clause connects tables via device ID.
    • The WHERE clause filters data by region and time.
    • The GROUP BY clause groups results by location.
    • The COUNT and AVG aggregate functions compute device count and average value.

3 Advanced Queries


3.1 Spatio-Temporal Queries

Queries involving temporal functions such as first() and last(), sometimes combined with spatial data — for example, tracking start and end fuel levels.

To use YMatrix's time-series functions, create the extension first:

=# CREATE EXTENSION matrixts;

The extension is database-level; create it once per database.


3.1.1 Query Earliest Value

Returns the value from the first record in the specified column:

=# SELECT id,first(c1, ts) AS c,
    FROM t
    GROUP BY id
    ORDER BY id;


3.1.2 Query Latest Value

Returns the value from the last record in the specified column:

=# SELECT device_id, last(value, timestamp) AS last_value
    FROM t
    GROUP BY device_id
    ORDER BY device_id;


3.1.3 Compute Average Over Time Interval

=# SELECT time_bucket('5 s', timestamp) AS five_second, count(*) FROM t
    GROUP BY five_second;

The above SQL means that table t is sampled every 5s seconds using count(*).

Key Point:

  • Using GROUP BY to group raw data into wider time intervals and compute summary statistics is known as downsampling. Downsampling reduces storage cost while preserving key data characteristics, enabling historical trend analysis and forecasting.


3.1.4 Last Non-Null Value

=# SELECT 
    device_id,
    max(timestamp) as max_timestamp,
    last_not_null_value(value1, timestamp) as lastvalue1,
    last_not_null_value(value2, timestamp) as lastvalue2,
    last_not_null_value(value3, timestamp) as lastvalue3
   FROM t
     GROUP BY device_id;

Or:

=# SELECT 
    device_id,
    max(timestamp) as max_timestamp,
    last_not_null(value1, timestamp) as lastvalue1,
    last_not_null(value2, timestamp) as lastvalue2,
    last_not_null(value3, timestamp) as lastvalue3
   FROM t
     GROUP BY device_id;

Key Points:

  • Use last_not_null() or last_not_null_value() functions.
  • Compared to last_not_null(), last_not_null_value() returns both value and timestamp. The return type is string, formatted as [<value>, <time>].


3.1.5 Difference Calculation

Calculate differences between consecutive values to detect changes — for example, monitoring stock behavior:

=# SELECT
    device_id,
    timestamp, 
    value,
    lag(value) OVER(PARTITION BY device_id ORDER BY timestamp),
    lead(value) OVER(PARTITION BY device_id ORDER BY timestamp),
    lead(value) OVER(PARTITION BY device_id ORDER BY timestamp) - value AS lead_value 
FROM t;

Key Points:

  • Window function lag() retrieves the previous row’s value.
  • Window function lead() retrieves the next row’s value.
  • The PARTITION BY clause with device_id applies the window function within each device partition.
  • lead(value) - value computes the difference between the next and current value, named lead_value.


3.2 Anomaly Data Cleaning

=# SELECT time_bucket_gapfill('45 minutes', timestamp) AS five_min,
          locf(AVG(value)) As locf_value,
          interpolate(AVG(value)) AS interpolate_value
   FROM t
       WHERE device_id = '1'
       AND timestamp >= '2021-12-01 00:00:00'
       AND timestamp < '2021-12-02 06:00:00'
   GROUP BY five_min
   ORDER BY five_min;

Key Points:

  • The query uses time_bucket_gapfill() along with LOCF (Last Observation Carried Forward) and linear interpolation to fill missing values in table t.
    • time_bucket_gapfill('45 minutes', timestamp): Groups timestamps (timestamp) into fixed buckets (45-minute intervals) and fills gaps to ensure continuity.
    • locf(AVG(c1)): Applies LOCF method to fill value in each bucket using the last observed non-null value.
    • interpolate(AVG(c1)): Uses linear interpolation to estimate value based on surrounding non-null values.


3.3 Anomaly Detection

3.3.1 Cumulative Sum, Average, and Median

Detect anomalies using shifts in cumulative average or median:

-- Cumulative sum
=# SELECT device_id, timestamp, value,
          SUM(value) OVER(PARTITION BY device_id, timestamp::date ORDER BY timestamp) AS accumulation_value
   FROM t;

-- Cumulative average
=# SELECT device_id, timestamp, value,
       AVG(value) OVER(PARTITION BY device_id, timestamp::date ORDER BY timestamp) AS accumulation_value
   FROM t;

-- Median
=# SELECT device_id,
       percentile_cont(0.5) WITHIN GROUP (ORDER BY value) AS accumulation_value
   FROM t
   GROUP BY device_id
   ORDER BY device_id;

3.3.2 Increment Calculation

Compute increments for monotonic sequences or simple changes between consecutive records, useful for anomaly detection.

Example 1

Computes disk read change for disk tag_id with tag ID 1 between 2021-4-10 21:00:00 and 21:01:00:

=# SELECT
      time,
      (
       CASE WHEN lag(read) OVER (ORDER BY time) IS NULL THEN NULL
       ELSE round(read - lag(read) OVER (ORDER BY time))
       END
      ) AS read
   FROM disk
      WHERE time BETWEEN '2021-04-10 21:00:00'::timestamp AND '2021-04-10 21:01:00'::timestamp
      AND tag_id = 1
      ORDER BY time;

Key Points:

  • Uses window function lag() to get prior value.
  • Positive values indicate increase; negative values indicate decrease compared to the previous second.

Example 2

Calculates daily sales increment using the sales table:

=# SELECT
      date,
      sales_amount,
      sales_amount - LAG(sales_amount) OVER (ORDER BY date) AS sales_increment
   FROM
      sales
   ORDER BY
      date;

Key Points:

  • Computes daily sales increment by subtracting the previous day’s sales from the current day’s.

3.3.3 Rate-of-Change Calculation

Example 1

Divide the increment by time interval to get rate of change:

=# SELECT
    time,
    (
     CASE WHEN lag(read) OVER (ORDER BY time) IS NULL THEN NULL
     ELSE round(read - lag(read) OVER (ORDER BY time))
     END
    ) / extract(epoch from time - lag(time) OVER (ORDER BY time)) AS read_rate,
    extract(epoch from time - lag(time) OVER (ORDER BY time)) AS "time lag"
    FROM disk
    WHERE time BETWEEN '2021-04-10 21:00:00'::timestamp AND '2021-04-10 21:01:00'::timestamp
    AND tag_id = 1
    ORDER BY time;

Key Points:

  • 增量/时间间隔=增速.


3.4 Metric Jump Detection

Identify jumps in metric values, returning timestamps, device IDs, and pre/post-jump values — useful for auditing stock trades or detecting sudden state changes:

=# SELECT stock_id, timestamp, value, lag_value, lead_value, lag_diff_value, lead_diff_value FROM (
    SELECT stock_id, timestamp, value,
        lag(value) OVER (PARTITION BY stock_id ORDER BY timestamp) AS lag_value,  -- Previous value in value column
        lead(value) OVER (PARTITION BY stock_id ORDER BY timestamp) AS lead_value,  -- Next value in value column
        value - lag(value) OVER (PARTITION BY stock_id ORDER BY timestamp) AS lag_diff_value,
        value - lead(value) OVER (PARTITION BY stock_id ORDER BY timestamp) AS lead_diff_value
    FROM t
       WHERE 1=1
       AND stock_id in ('1')
       AND timestamp >= '2021-12-01 00:00:00'
       AND timestamp < '2021-12-02 23:59:59'::timestamp
       AND value IS NOT NULL
   ) ht 
   WHERE abs(lag_diff_c1) > 3 OR abs(lead_diff_c1) > 3
   ORDER BY stock_id, timestamp;

Key Points:

  • Subquery uses window functions lag() and lead() to fetch previous and next values within each stock_id partition ordered by timestamp.
    • lag_value: Previous value of value.
    • lead_value: Next value of value.
    • lag_diff_value: Difference between current value and previous value.
    • lead_diff_value: Difference between current value and next value.
  • Outer query filters rows where absolute difference exceeds 3.

Note!
Jump detection identifies abrupt changes in a metric exceeding a threshold between adjacent timestamps for a given device within a time window.


3.5 Monitoring Dashboard

3.5.1 Percentile Functions

Use percentile functions to compute real-time dashboard metrics — for example, using the 90th percentile to reflect stock trends:

=# SELECT stock_id,
          percentile_cont(0.9) WITHIN GROUP (ORDER BY value) AS value_9
    FROM t
    GROUP BY stock_id
    ORDER BY stock_id;

3.5.2 Cumulative Distribution Function (CDF)

Use CDF to compute the cumulative distribution of prices or metrics over time. This helps assess relative positioning and define price ranges and statistical properties — ideal for displaying distribution statistics on real-time dashboards:

=# SELECT stock_id, timestamp, price, 
          cume_dist() OVER(PARTITION BY stock_id, timestamp::date ORDER BY timestamp) AS cume_dist
   FROM t;

Key Points:

  • The cume_dist() function computes cumulative distribution: 累积分布值 = (在当前行之前或者平级的分区行数) / 分区行总数


3.6 Ad Hoc Analysis

In time-series scenarios, pivot operations (row-to-column or column-to-row transformations) support flexible ad hoc analysis.

3.6.1 Row-to-Column Transformation

=# SELECT name,
          max(CASE WHEN attribute='age' THEN value ELSE 0 END) AS age,
          max(CASE WHEN attribute='height' THEN value ELSE 0 END) AS height,
          max(CASE WHEN attribute='weight' THEN value ELSE 0 END) AS weight,
          max(CASE WHEN attribute='shoe_size' THEN value ELSE 0 END) AS shoe_size
   FROM t
   GROUP BY name
   ORDER BY age DESC;

Sample structure of table t:

+------+----------+-------+
| name | attribute | value |
+------+----------+-------+
| John | age      | 30    |
| John | height   | 175   |
| John | weight   | 70    |
| John | shoe_size| 9.5   |
| Mary | age      | 25    |
| Mary | height   | 160   |
| Mary | weight   | 55    |
| Mary | shoe_size| 8     |
+------+----------+-------+

Row-to-column output example:

+------+-----+-------+--------+-----------+
| name | age | height| weight | shoe_size |
+------+-----+-------+--------+-----------+
| John | 30  | 175   | 70     | 9.5       |
| Mary | 25  | 160   | 55     | 8         |
+------+-----+-------+--------+-----------+

Key Points:

  • This transformation is useful when comparing or presenting row-based attributes as columns. Conditional expressions convert attributes into columns, and aggregate functions (e.g., max) extract their values.

3.6.2 Column-to-Row Transformation

=# SELECT currenttimestamp, 
          deviceid, 
          devicetemplatecode,
          statisticstype,
          (b.rec).key AS key, 
          (b.rec).value AS value 
   FROM
     (SELECT currenttimestamp, 
             deviceid, 
             devicetemplatecode,
             statisticstype,
             jsonb_each_text(row_to_json(t.*)::jsonb-'currenttimestamp'-'deviceid'-'devicetemplatecode'-'statisticstype') AS rec  
     FROM t
   ) b
   WHERE (b.rec).value IS NOT NULL;

Sample structure of table t:

+---------------------+----------+-------------------+----------------+--------+--------+--------+
| currenttimestamp    | deviceid | devicetemplatecode| statisticstype | key1   | key2   | key3   |
+---------------------+----------+-------------------+----------------+--------+--------+--------+
| 2023-11-13 08:30:45 | 1234567  | template1         | type1          | value1 | value2 | value3 |
+---------------------+----------+-------------------+----------------+--------+--------+--------+

Column-to-row output example:

+---------------------+----------+-------------------+----------------+------+-------+
| currenttimestamp    | deviceid | devicetemplatecode| statisticstype | key  | value |
+---------------------+----------+-------------------+----------------+------+-------+
| 2023-11-13 08:30:45 | 123456   | template1         | type1          | key1 | value1|
| 2023-11-13 08:30:45 | 123456   | template1         | type1          | key2 | value2|
| 2023-11-13 08:30:45 | 123456   | template1         | type1          | key3 | value3|
| ...                 | ...      | ...               | ...            | ...  | ...   |
+---------------------+----------+-------------------+----------------+------+-------+

Key Points:

  • Uses row_to_json() to convert each row into a JSON object, then jsonb_each_text() to expand it into key-value pairs, achieving column-to-row transformation.

--- SPLIT ---

3.7 Real-time Metric Labeling

Display the status of a machine, account, or individual:

=# SELECT first_name,
          last_name,
          salary,
    CASE
     WHEN salary >= 80000 THEN 'senior'
     WHEN salary >= 60000 THEN 'intermediate'
    ELSE 'junior'
    END AS employee_level
   FROM employees;

Key Points:

  • Use the CASE clause to derive composite metrics:
    • If salary is greater than or equal to 80000, the employee_level column will be labeled 'senior'.
    • If salary is greater than or equal to 60000 and less than 80000, the employee_level column will be labeled 'intermediate'.
    • Otherwise, when salary is less than 60000, the employee_level column will be labeled 'junior'.


3.8 Peak Detection

Identify the top 10 peaks for a specified metric of a given stock within a time range, along with their timestamps:

=# SELECT  stock_id, timestamp, value FROM (
    SELECT stock_id, timestamp, value,
           row_number() OVER (PARTITION BY stock_id ORDER BY value DESC) AS rn
    FROM t_test
      WHERE stock_id ='1'
      AND timestamp >= '2021-12-01 00:00:00'
      AND timestamp < '2021-12-03 00:00:00'
   ) AS derived_table
   WHERE rn <= 10;

Key Points:

  • The query uses a subquery and the window function row_number() to filter and sort data from table t_test.
  • It returns the top 10 rows with the highest value for each stock_id.


3.9 Data Provenance Query

Used for data provenance tracing and traversal of metrics in JSON-type fields:

=# WITH RECURSIVE json_recursive AS (
    SELECT id,
           'person' AS key,
           data->'person' AS value
    FROM my_table
    UNION ALL
    SELECT j.id,
           k AS key,
           v AS value
    FROM json_recursive j,
         jsonb_each(j.value) AS kv(k, v)
    WHERE jsonb_typeof(j.value) = 'object'
   )
   SELECT id, key, value
   FROM json_recursive;

Key Points:

  • The query uses a recursive CTE (WITH RECURSIVE) and two subqueries.
    • The anchor member (first subquery) selects the id, a fixed key 'person', and the JSON value data->'person' from table my_table as the initial result set.
    • The recursive member (second subquery) iteratively joins the json_recursive CTE with the result of the jsonb_each() function. In each iteration, it decomposes the current JSON object (value) into key-value pairs using jsonb_each(). The resulting key (k) and value (v) become part of the next iteration.
    • The WHERE condition jsonb_typeof(j.value) = 'object' ensures that only records where the JSON value is an object are processed, providing a termination condition for the recursion.
  • The final output selects id, key, and value from the json_recursive CTE to provide complete provenance information.


3.10 Machine Learning

Machine learning can be used for prediction and classification tasks.

For example, use data from December 1 to December 10 to build a linear regression model and predict the metric value on December 11:

=# SELECT '2021-12-11 00:00:01' AS timestamp,
    extract(epoch FROM '2021-12-11 00:00:01'::timestamp) * slope + intercept AS c1_value
   FROM (
    SELECT regr_slope(value, extract(epoch from timestamp)) AS slope,
         regr_intercept(value, extract(epoch from timestamp)) AS intercept
    FROM t_test
    WHERE vin = '1'
        AND timestamp >= '2021-12-01 00:00:00'
        AND timestamp < '2021-12-11 00:00:00'
   ) AS liner_regression;

Key Points:

  • This query uses two subqueries and linear regression functions regr_slope() and regr_intercept() to perform linear regression on table t_test.
  • These functions compute the slope and intercept of a linear model, enabling basic machine learning tasks directly in SQL.
    • The inner subquery filters records where vin equals '1', and timestamp is between '2021-12-01 00:00:00' (inclusive) and '2021-12-11 00:00:00' (exclusive). It calculates the slope and intercept between the value column and the Unix epoch time extracted from timestamp.
    • The outer query sets the timestamp to '2021-12-11 00:00:01', then computes the predicted c1_value using the linear equation: slope * epoch_time + intercept.