This document introduces the basic query statement classification and syntax composition supported by YMatrix.
Table expressions are usually FROM
clause that can calculate tables (SELECT
followed by list content is also a table expression, but they are not the focus of this section, so they will not be emphasized). The FROM clause can be freely selected followed by the WHERE
/GROUP BY
/HAVING
clause or subquery.
A simple table expression simply refers to a table stored on disk (with temporary tables or views), which is the so-called base table. However, in actual use, we can use more complex expressions to modify or combine the base table in various ways to achieve query purposes.
Here are some examples of table expressions:
Table expression syntax for reading data from a single table:
=# FROM <table name>
Joint table syntax:
=# FROM t1 <connection type> t2 [Join Condition]
Where <connection type>
can be CROSS JOIN
/INNER JOIN
/LEFT JOIN
/RIGHT JOIN
/FULL JOIN
, INNER JOIN
is an inner connection, and the last three are outer connections.
FROM t1 CROSS JOIN t2
is equivalent to FROM t1 INNER JOIN t2 ON TRUE
, equivalent to FROM t1,t2
.
Suppose there is a table t1
(including c1
, c2
columns), t2
(including c1
, c2
columns):
=# FROM t1 INNER JOIN t2 ON t1.c1 = t2.c1 AND t1.c2 = t2.c2
=# FROM t1 LEFT JOIN t2 USING (c1,c2)
When an outer query contains one or more inner query, it is called a nested query. The outer query is called the main query, and the inner query is called the subquery.
Subqueries can be nested in multiple layers, and they can appear almost anywhere in the query, including SELECT
/GROUP BY
list, FROM
table expression, WHERE
/ON
/HAVING
conditional expression. Here is an example of subquery in the FROM
table expression.
grammar:
=# FROM (SELECT * FROM <table name>) AS <Alias>
Suppose there is a table table1
:
=# FROM (SELECT * FROM table1) AS t1
Equivalent to FROM table1 AS t1
.
The WHERE clause can be used for query condition filtering, and the syntax is as follows:
=# WHERE <Filter Condition>
Where <Filter Condition>
can be any value expression that returns a numerical or logical value.
Suppose there are tables t
(including c1
columns), t1
(including c1
columns), t2
(including c1
, c2
, c3
columns):
=# SELECT ... FROM t1 WHERE c1 > 5;
=# SELECT ... FROM t1 WHERE c1 IN (1, 2, 3);
=# SELECT ... FROM t1 WHERE c1 IN (SELECT c1 FROM t2);
=# SELECT ... FROM t1 WHERE c1 IN (SELECT c3 FROM t2 WHERE c2 = t.c1 + 10);
=# SELECT ... FROM t1 WHERE c1 BETWEEN (SELECT c3 FROM t2 WHERE c2 = t.c1 + 10) AND 100;
=# SELECT ... FROM t1 WHERE EXISTS (SELECT c1 FROM t2 WHERE c2 > t.c1);
Used to perform aggregation calculations on each group after grouping. When the GROUP BY
statement appears in the query statement, you can only use the HAVING
clause to add filter conditions, not WHERE
. Because the WHERE
filter condition can only act on the original dataset, while the HAVING
filter condition can act on the aggregated dataset.
YMatrix provides the following basic aggregation functions that can be used for group aggregation calculations:
Functions | Description |
---|---|
count( |
Number of valid values (only count(*) contains null values) |
sum( |
cumulative sum |
avg( |
Specify the average column |
min( |
Specify the minimum column value |
max( |
Specify the column maximum value |
Suppose there is a table t1
(including c1
column and c2
column):
=# SELECT count(*),
max(c1),
c2
FROM t1
GROUP BY c2;
Notice! For advanced aggregation functions, see Advanced Query.
Notice!
The non-aggregated function column that appears in the SELECT
list must appear in the GROUP BY
list, otherwise the query will report an error.
Suppose there is a table t1
(including c1
column, c2
column, c3
column):
-- Correct syntax
=# SELECT c1, avg(c2)
FROM t1
GROUP BY c1;
-- Correct syntax
=# SELECT c1, c2, avg(c3)
FROM t1
GROUP BY (c1,c2);
-- Error syntax
=# SELECT c1, c2, avg(c3)
FROM t1
GROUP BY c1;
The results of the aggregation function cannot be filtered. The regular WHERE
clause is not used, and a special HAVING
clause is required. The reason is that the WHERE
filter condition can only act on the original dataset, while the HAVING
filter condition can act on the aggregated dataset.
Suppose there is a table t1
(including c1
column and c2
column):
-- Correct syntax
=# SELECT c1, max(c2)
FROM t1
GROUP BY c1
HAVING avg(c2) < 60;
-- Error syntax
=# SELECT c1, max(c2)
FROM t1
GROUP BY c1
WHERE avg(c2) < 60;
The SELECT list can create a temporary virtual table through calculations and determine which columns will be displayed in the final output result.
For example, query the c1
, c2
, c3
columns of the t1
table:
=# SELECT c1, c2, c3 FROM t1;
Query the c1
, c2
columns of the t1
table and the c1
columns of the t2
table. If there are columns with the same name in multiple tables, you need to specify the table name before the column:
=# SELECT t1.c1, t2.c1, t1.c2 FROM t1,t2;
Use the AS
keyword to define an alias for subsequent expressions using the SELECT
list:
=# SELECT c1 AS value, c2 + 1 AS sum FROM t1,t2;
The AS
keyword can be omitted, but in this example, since value
is also one of the keywords of YMatrix, you need to add double quotes to distinguish:
=# SELECT c1 "value", c2 + 1 AS sum FROM t1,t2;
Adding the DISTINCT
keyword after the SELECT
keyword can deduplicate the query result set and ensure the uniqueness of the result set:
=# SELECT DISTINCT c1 FROM t1;
Combined queries refer to combining the result sets of two queries using the UNION
, INTERSECT
or EXCEPT
keywords, respectively, in a way that computes union, intersection, or complement.
We can combine the results of two queries with the following syntax:
=# <Query1> UNION [ALL] <Query2>
=# <Query1> INTERSECT [ALL] <Query2>
=# <Query1> EXCEPT [ALL] <Query2>
Explicitly specifying ALL
retains duplicate lines in the output result.
Query 1 and Query 2 refer to queries that use one of any features discussed based on this point.
UNION
effectively appends the result of query 2 to query 1, that is, the union of query 1 and query 2.INTERSECT
returns all result rows that belong to both query 1 and query 2, that is, the intersection of query 1 and query 2.EXCEPT
returns all rows that belong to query 1 but not to query 2, that is, the complement set of query 2 in the two result sets.UNION
syntax example, the other two combination keywords can also be referenced:
=# SELECT c1 FROM t1 UNION SELECT c2 FROM t2 LIMIT 10;
LIMIT
in this query is a restriction on all the aforementioned calculation results, i.e. (SELECT c1 FROM t1 UNION SELECT c2 FROM t2) LIMIT 10;
.
Use the ORDER BY
clause to select columns for sorting.
Use the AS
keyword to define column alias and sort them with them (AS
can be omitted):
=# SELECT c1 + c2 AS sum FROM t1 ORDER BY sum;
Use the ASC
keyword to sort the result set in ascending order (from small to large):
=# SELECT c1 FROM t1 ORDER BY c1 ASC;
Use the DESC
keyword to sort the result set in descending order (from large to small):
=# SELECT c1 FROM t1 ORDER BY c1 DESC;
The sorting option works independently for each sorting column, so if there are multiple columns that need to be sorted, you need to specify their sorting method separately. For example, the following two SQLs can implement ascending columns and descending columns in c2
:
=# SELECT c1, c2 FROM t1 ORDER BY c1 ASC, c2 DESC;
=# SELECT c1, c2 FROM t1 ORDER BY c1, c2 DESC;
When no ascending or descending sort is specified, the default ascending sort is sorted.
Use LIMIT
and OFFSET
to limit the output of the query statement. LIMIT
can limit the number of rows returned. For example, LIMIT 1
means that the query will eventually return 1 row result, and LIMIT ALL
is equivalent to no limit. OFFSET
is used to specify the offset of the final result. After the query calculation is completed, the specified number of rows in the result set is skipped and the final result starts from the next row of the specified number of rows.
Instructions for use:
OFFSET
clause must always be used with the LIMIT
clause, and if LIMIT
is omitted, the OFFSET
clause will be ignored and return all rows.OFFSET
and only use LIMIT
, which is equivalent to OFFSET 0
.OFFSET
must be a non-negative integer, otherwise an error will be thrown.OFFSET
specifies that skipped rows still need to be calculated before the final return, so specifying a large offset may be inefficient.grammar:
=# SELECT <List>
FROM <Table Expression>
[ ORDER BY ... ]
[ LIMIT { <Number> | ALL } ] [ OFFSET <Number> ]
Suppose there is a table t1
(including c1
column), and only the first 10 pieces of data sorted in descending order (the ten pieces with the largest value):
=# SELECT c1 FROM t1
ORDER BY c1 DESC
LIMIT 10;
View only the 11th to 20th data:
=# SELECT c1 FROM t1
ORDER BY c1 DESC
LIMIT 10 OFFSET 10;
This query operation will first offset the sorted results from the first data, with an offset of 10
, that is, the limit number is calculated from the 11th data, so the final output results are from the 11th data.
You can use the JOIN
clause and specify the connection key with USING
to perform multi-table join operations:
Suppose there is a table t1
(including c1
column, c2
column), t2
(including c2
column):
=# SELECT t1.c1, t2.* FROM t2 JOIN t1 USING (c2) ORDER BY t2.c2 DESC LIMIT 10;