This document describes the functions and expressions provided by YMatrix for querying data.
In addition to basic query statements, YMatrix provides many advanced analytical functions and SQL expressions.
Window functions support complex reporting and analytical operations, commonly used in OLAP and time-series scenarios.
A "window" defines a set of rows related to the current row. The key difference between window functions and other functions is the presence of the OVER clause. If a function includes an OVER clause, it becomes a window function. This clause specifies the window over which the function operates.
The following table highlights the differences between aggregate window functions and specialized window functions:
| Aggregate Window Functions | Specialized Window Functions | |
|---|---|---|
| Output | One row per input row | Multiple rows |
| Functions | max(), min(), count(), sum(), avg(), sum(), etc. |
In addition to standard aggregates, specialized functions such as row_number(), rank(), dense_rank(), first_value(), last_value(), etc. |
| Usage | Typically used with GROUP BY; can also act as window functions when combined with OVER |
Used with OVER. Usually, the OVER clause follows immediately after the function name and arguments. Within OVER, you may use PARTITION BY, ORDER BY, or ROWS BETWEEN. Using only PARTITION BY, or combining PARTITION BY and ORDER BY, creates a static window (fixed size and position). When PARTITION BY is used along with ROWS BETWEEN, or both ROWS BETWEEN and ORDER BY, a sliding window is formed—applying a fixed-size window across each input row sequentially. |
Like aggregate functions, window functions appear in the SELECT clause of a table expression:
=# SELECT
<function_name(arguments)> OVER (
PARTITION BY …
ORDER BY …
)
FROM …;
The OVER clause defines the window scope:
PARTITION BY list divides rows into partitions; each partition acts as a separate window. If omitted, all rows belong to one partition.ORDER BY clause determines the ordering of rows within a partition.YMatrix supports the following specialized window functions:
| Function | Description |
|---|---|
| row_number() | Assigns a unique sequential integer to each row within a result set partition, starting from 1 |
| rank() | Assigns a rank value to each row in a partition. Tied values receive the same rank, and subsequent ranks are incremented accordingly. Ranks may not be consecutive when ties exist |
| dense_rank() | Assigns a rank value to each row in a partition. Tied values receive the same rank, but the next rank continues without gaps. Thus, ranks remain consecutive even with ties |
| first_value( |
Returns the first value in the ordered partition |
| last_value( |
Returns the last value in the ordered partition |
| ntile( |
Divides the ordered partition into n buckets (groups) of equal size |
| cume_dist() | Computes the proportion of rows in the current partition with values greater than or equal to the current row (including itself) |
| percent_rank() | Computes the relative rank of a row: the proportion of rows in the partition (excluding the current row) with values higher than the current row |
| lag( |
Accesses the value of a column from a specified number of rows before the current row |
| lead( |
Accesses the value of a column from a specified number of rows after the current row |
Syntax:
=# SELECT …,
ntile(<n>) OVER (ORDER BY …),
cume_dist() OVER (ORDER BY …),
percent_rank() OVER (ORDER BY …)
FROM …;
Note!
The cume_dist() and percent_rank() functions differ in both numerator and denominator:
- The denominator of cume_dist() is the total number of rows in the current partition, while the denominator of percent_rank() is the total number of rows in the current partition minus one.
- The numerator of cume_dist() is the number of rows with values greater than or equal to the current row's value plus one (includes the current row itself), whereas the numerator of percent_rank() is the number of rows with values greater than the current row's value (excludes the current row itself).
lag() and lead() are used to access data from previous or future rows. They are often applied in comparisons between current and historical data.
Syntax:
=# SELECT
lag(<offset_column>,<offset_rows>) OVER (ORDER BY …) AS …,
lead(<offset_column>,<offset_rows>) OVER (ORDER BY …) AS …
FROM …;
YMatrix supports, but is not limited to, the following aggregate functions:
| Function | Description |
|---|---|
| count( |
Counts non-null values (count(*) includes nulls) |
| sum( |
Sum of values |
| avg( |
Average of values |
| min( |
Minimum value |
| max( |
Maximum value |
Using aggregate functions with OVER differs from using them with GROUP BY:
GROUP BY groups input rows first, then applies aggregation per group, returning one output row per group.OVER clause, every input row corresponds to a window. The aggregate function computes results over each window, producing one result per input row.A window frame restricts the range of rows included in a window calculation. It applies to both specialized and aggregate window functions.
It follows the ORDER BY clause. Full syntax:
=# <aggregate_function(…)> OVER (… ORDER BY … ROWS|RANGE {
UNBOUNDED PRECEDING
| n PRECEDING
| CURRENT ROW
| BETWEEN <frame_boundary> AND <frame_boundary> })
Valid window frame boundaries are listed below. Here, n can be a number or any expression evaluating to a number:
| Frame Boundary | Description |
|---|---|
| UNBOUNDED PRECEDING | First row of the partition |
| n PRECEDING | n rows before the current row |
| CURRENT ROW | Current row |
| n FOLLOWING | n rows after the current row |
| UNBOUNDED FOLLOWING | Last row of the partition |
Note!
For usage examples of window functions, see Typical Queries in OLAP Scenarios.
Ordered-set aggregate functions perform aggregations on sorted sets of data. These functions are useful when order-sensitive computations are required.
| Function | Description | Syntax |
|---|---|---|
| mode() | Returns the most frequent input value (first one if multiple values have the same frequency) | mode() WITHIN GROUP (ORDER BY …) |
| percentile_cont( |
Computes continuous percentile(s): ࠾ Single percentile: returns a value corresponding to the given fractional position in the sorted dataset. If no exact match exists, linear interpolation is used ࠾ Multiple percentiles: returns an array shaped like the input fraction array, with each element replaced by its corresponding percentile value |
Single: percentile_cont(<分数>) WITHIN GROUP (ORDER BY …)Multiple: percentile_cont(<分数数组>) WITHIN GROUP (ORDER BY …) |
| percentile_disc( |
Computes discrete percentile(s): ࠾ Single percentile: returns the first actual input value whose position in the sorted order is equal to or exceeds the specified fraction. No interpolation is performed ࠾ Multiple percentiles: returns an array where each non-null element is replaced by the corresponding input value at that percentile |
Single: percentile_disc(<分数>) WITHIN GROUP (ORDER BY …)Multiple: percentile_disc(<分数数组>) WITHIN GROUP (ORDER BY …) |
To ensure correct and consistent use of ordered-set aggregate functions, note the following:
0 and 1. Providing a value outside this range causes an error.Note!
For usage examples, see Continuous Percentile.
Note!
These functions require thematrixtsextension to be created before use.
| Function | Description |
|---|---|
| time_bucket(' |
Groups time-series data into fixed intervals for aggregation |
| time_bucket_gapfill(' |
Fills missing data points to create smooth time-series distributions. Supports two strategies: locf (Last Observation Carried Forward) and interpolate |
| first( |
Returns the first observed value of the metric. Note: second parameter need not be timestamp; labeled as such for time-series context |
| last( |
Returns the last observed value of the metric. Same note applies |
| last_not_null_value( |
Returns the last non-NULL value of the specified column |
| last_not_null( |
Returns the last non-NULL value and its timestamp |
This function works with aggregate functions to compute aggregated values over fixed time intervals.
| Parameter | Supported Data Types | Supported Units |
|---|---|---|
| Interval | int16; int32; int64; interval |
us / ms / s / minute(min, minutes) / hour / day, e.g., 1 day |
| Timestamp Column | int16; int32; int64; date; timestamp; timestamptz |
Example syntax:
=# SELECT time_bucket('<interval>', <timestamp_column>) AS …,
<aggregate_function> AS …
FROM …
WHERE …
GROUP BY …
ORDER BY …;
Similar to time_bucket(), this function fills gaps in time-series data. Use it when data is missing and needs smoothing for analysis.
| Parameter | Supported Data Types | Supported Units |
|---|---|---|
| Interval | int16; int32; int64; interval |
us / ms / s / minute(min, minutes) / hour / day, e.g., 1 day |
| Timestamp Column | int16; int32; int64; date; timestamp; timestamptz |
|
| Value Column | Original column type | Original unit |
Two fill strategies are supported: locf and interpolate:
locf: Uses the last observed non-missing value (LOCF). Preserves trend and returns existing actual values.interpolate: Performs linear interpolation between known points. Returns computed estimates.Syntax examples:
=# SELECT time_bucket_gapfill('<interval>', <timestamp_column>) AS …,
<aggregate_function> AS …
FROM …
WHERE …
GROUP BY …
ORDER BY …;
locf strategy=# SELECT time_bucket_gapfill('<interval>', <timestamp_column>) AS …,
locf(<aggregate_function>) AS …
FROM …
WHERE …
GROUP BY …
ORDER BY …;
interpolate strategy=# SELECT time_bucket_gapfill('<interval>', <timestamp_column>) AS …,
interpolate(<aggregate_function>) AS …
FROM …
WHERE …
GROUP BY …
ORDER BY …;
Note!
For usage examples, see Spatiotemporal Queries.
A Common Table Expression (CTE) allows defining a temporary view to simplify complex queries. It is implemented using the WITH keyword. The following table compares CTEs with CREATE VIEW statements:
| CTE | CREATE VIEW | |
|---|---|---|
| Definition | Not a standalone SQL statement; part of a larger expression | Standalone SQL statement |
| Scope | Temporary view accessible only within the enclosing query | View available globally for all queries |
Syntax:
=# WITH <cte_name> (<column_list>, …) AS (
SELECT … FROM …
) SELECT … FROM …;
Note!
For usage examples, see Metric Comparison Analysis.
Subqueries can appear in various parts of a query:
In the SELECT list, for example:
=# SELECT (SELECT …) AS … FROM …;
In the FROM list, for example:
=# SELECT …
FROM … AS …, (SELECT … FROM …) AS …
WHERE …;
Note!
Subqueries in theFROMlist must have an alias.
WHERE expressions, for example:=# SELECT …
FROM …
WHERE … IN (SELECT … FROM … WHERE …);
When a subquery returns a single value, it can appear in the SELECT/FROM list or in WHERE conditions. It can be used with comparison operators such as =, <, <=, >, >=, !=.
Assume tables t1 (with column c1) and t2 (with column c2):
=# SELECT (SELECT max(c1) FROM t1), t2.*
FROM t2
WHERE c2 > (SELECT avg(c2) FROM t2);
Here, scalar aggregate subqueries appear in both the SELECT list and the WHERE condition.
Execution proceeds as follows: First, the inner query SELECT avg(c2) FROM t2 computes the average of column t2 in table c2. Then, the outer query filters rows based on this average and other conditions c2 > (SELECT avg(c2) FROM t2). For each matching row, it returns all columns from t2 and the result of SELECT max(c1) FROM t1.
If a subquery returns multiple rows, it can act as a temporary table in the FROM list and participate in joins (JOIN). Like scalar subqueries, they can also be used with comparison operators.
Assume tables ttt1 (columns c1, c4) and ttt2 (columns c2, c3):
=# SELECT t1.c1, t2.c2
FROM ttt1 AS t1
INNER JOIN (SELECT c3, c2 FROM ttt2) AS t2
ON t1.c4 = t2.c3;
Execution steps:
SELECT c3, c2 FROM ttt2, selecting columns c3 and c2 from table ttt2, resulting in temporary table t2.ttt1 with t2 using an inner join.t1.c1 and t2.c2.Set-returning subqueries can also be used in WHERE conditions via:
Assume table t1:
=# CREATE TABLE t1 (
c1 int,
c2 text
);
=# INSERT INTO t1 VALUES (1,'content');
INSERT INTO t1 VALUES (2,'content');
INSERT INTO t1 VALUES (3,'content');
INSERT INTO t1 VALUES (4,'text');
INSERT INTO t1 VALUES (5,'text');
ALL Example=# SELECT * FROM t1
WHERE c1 >= ALL (
SELECT c1 FROM t1
WHERE c2 = 'content'
ORDER by c1
);
Since the subquery yields (1,2,3), the above is equivalent to:
=# SELECT * FROM t1
WHERE c1 >= 1
AND c1 >= 2
AND c1 >= 3
ORDER by c1;
c1 | c2
----+---------
3 | content
4 | text
5 | text
(3 rows)
ANY Example=# SELECT * FROM t1
WHERE c1 >= ANY (
SELECT c1 FROM t1
WHERE c2 = 'content'
);
Equivalent to:
=# SELECT * FROM t1
WHERE c1 >= 1
OR c1 >= 2
OR c1 >= 3;
IN Example=# SELECT * FROM t1
WHERE c1 IN (
SELECT c1 FROM t1
WHERE c2 = 'content'
);
Equivalent to:
=# SELECT * FROM t1
WHERE c1 = 1
OR c1 = 2
OR c1 = 3;
EXISTS Example=# SELECT * FROM t1
WHERE EXISTS (
SELECT * FROM t1
WHERE c1 < 60
AND c2 = 'content'
);
Since the subquery returns non-empty results:
=# SELECT * FROM t1
WHERE true;
If empty:
=# SELECT * FROM t1
WHERE false;
A correlated subquery depends on values from the outer query. For each row in the outer query, the subquery is re-evaluated.
Execution process:
WHERE condition using the subquery result.Assume tables t1 (column c1) and t2 (column c2):
=# SELECT * FROM t1
WHERE NOT EXISTS (
SELECT * FROM t2
WHERE t2.c2 = t1.c1
);
For each row in outer table t1, the subquery checks whether a matching row exists in t2. The NOT EXISTS expression determines whether to include the t1 row in the final result.
Note!
For usage examples, see Ad Hoc Analysis.
In Business Intelligence (BI), users analyze datasets across multiple dimensions and visualize insights for decision-making. The data cube feature enables multidimensional analysis by generating a multi-dimensional array of grouped data.
_1696736236.png)
Data cubes use a model based on facts and dimensions:
GROUPING SETS performs multiple groupings in a single query. Aggregation occurs after FROM and WHERE evaluation, computing results for each specified grouping and returning them together.
Assume table t1 (columns c1, c2, c3):
=# SELECT c1, c2, sum(c3) AS sum
FROM t1
GROUP BY GROUPING SETS (c1, c2)
ORDER BY (c1,c2);
c1 | c2 | sum
----+----+----
a | | 6
b | | 15
c | | 24
| aa | 12
| bb | 15
| cc | 18
(6 rows)
Results show sums grouped by:
c1: Total of c3 for a, b, cc2: Total of c3 for aa, bb, ccUsing GROUPING SETS is equivalent to running multiple UNION ALL queries, but more concise and standardized.
--- SPLIT ---
6.2 ROLLUP Clause
In addition to GROUPING SETS, YMatrix provides other simplified methods to specify commonly used types of grouping sets.
Aggregation results grouped by the ROLLUP clause are rolled up layer by layer in the order of the grouping keys.
In fact, ROLLUP ( c1, c2, … ) is equivalent to:
=# GROUPING SETS (
( c1, c2, … ),
…
( c1 ),
( )
)
where the last empty grouping key represents aggregation over all returned rows.
Assume we have a table t1 with columns c1, c2, and c3:
=# SELECT c1, c2, sum(c3) AS sum
FROM t1
WHERE c1 = 'a' OR c1 = 'b'
GROUP BY ROLLUP (c1, c2)
ORDER BY (c1,c2);
c1 | c2 | sum
----+----+----
a | aa | 1
a | bb | 2
a | cc | 3
a | | 6
b | aa | 4
b | bb | 5
b | cc | 6
b | | 15
| | 21
(9 rows)
From the result:
(c1,c2) (in this example, these are the individual values of c3);(c1) (i.e., the totals for rows 1–3 and 5–7 respectively);The ROLLUP clause is typically used for historical data analysis, such as calculating total salaries by department, region, and company-wide levels. The results can be visualized using multi-level pie charts or similar representations.
Aggregation results grouped by the CUBE clause compute roll-ups for each combination of grouping keys, then take the union and remove duplicates.
CUBE ( c1, c2 ) is equivalent to:
=# GROUPING SETS (
( c1, c2 ),
( c1 ),
( c2 ),
( )
)
where the last empty grouping set represents aggregation over all returned rows.
Assume we have a table t1 with columns c1, c2, and c3:
=# SELECT c1, c2, sum(c3) AS sum
FROM t1
WHERE c1 = 'a' OR c1 = 'b'
GROUP BY CUBE (c1, c2)
ORDER BY (c1,c2);
c1 | c2 | sum
----+----+-----
a | aa | 1
a | bb | 2
a | cc | 3
a | | 6
b | aa | 4
b | bb | 5
b | cc | 6
b | | 15
| aa | 5
| bb | 7
| cc | 9
| | 21
(12 rows)
As shown, in addition to the same results as ROLLUP (rows 1–8 and the last row):
(c2) (e.g., aa = 1+4, bb = 2+5).Results from the CUBE clause can be rendered into rich visualizations and interactive dashboards using BI tools, such as Sankey diagrams.
Note!
For an example of data cube usage, see Multidimensional Data Query.