Basic Query

This document introduces the basic query statement classification and syntax composition supported by YMatrix.


1 Table expression

Table expressions are usually FROM clause that can calculate tables (SELECT followed by list content is also a table expression, but they are not the focus of this section, so they will not be emphasized). The FROM clause can be freely selected followed by the WHERE/GROUP BY/HAVING clause or subquery.

A simple table expression simply refers to a table stored on disk (with temporary tables or views), which is the so-called base table. However, in actual use, we can use more complex expressions to modify or combine the base table in various ways to achieve query purposes.

Here are some examples of table expressions:

Table expression syntax for reading data from a single table:

=# FROM <table name>

1.1 FROM clause

1.1.1 Connection table

Joint table syntax:

=# FROM t1 <connection type> t2 [Join Condition]

Where <connection type> can be CROSS JOIN/INNER JOIN/LEFT JOIN/RIGHT JOIN/FULL JOIN, INNER JOIN is an inner connection, and the last three are outer connections. FROM t1 CROSS JOIN t2 is equivalent to FROM t1 INNER JOIN t2 ON TRUE, equivalent to FROM t1,t2.

Suppose there is a table t1 (including c1, c2 columns), t2 (including c1, c2 columns):

=# FROM t1 INNER JOIN t2 ON t1.c1 = t2.c1 AND t1.c2 = t2.c2
=# FROM t1 LEFT JOIN t2 USING (c1,c2)

1.1.2 Subquery

When an outer query contains one or more inner query, it is called a nested query. The outer query is called the main query, and the inner query is called the subquery.

Subqueries can be nested in multiple layers, and they can appear almost anywhere in the query, including SELECT/GROUP BY list, FROM table expression, WHERE/ON/HAVING conditional expression. Here is an example of subquery in the FROM table expression.

grammar:

=# FROM (SELECT * FROM <table name>) AS <Alias>

Suppose there is a table table1:

=# FROM (SELECT * FROM table1) AS t1

Equivalent to FROM table1 AS t1.

1.2 WHERE clause

The WHERE clause can be used for query condition filtering, and the syntax is as follows:

=# WHERE <Filter Condition>

Where <Filter Condition> can be any value expression that returns a numerical or logical value.

Suppose there are tables t (including c1 columns), t1 (including c1 columns), t2 (including c1, c2, c3 columns):

=# SELECT ... FROM t1 WHERE c1 > 5;
=# SELECT ... FROM t1 WHERE c1 IN (1, 2, 3);
=# SELECT ... FROM t1 WHERE c1 IN (SELECT c1 FROM t2);
=# SELECT ... FROM t1 WHERE c1 IN (SELECT c3 FROM t2 WHERE c2 = t.c1 + 10);
=# SELECT ... FROM t1 WHERE c1 BETWEEN (SELECT c3 FROM t2 WHERE c2 = t.c1 + 10) AND 100;
=# SELECT ... FROM t1 WHERE EXISTS (SELECT c1 FROM t2 WHERE c2 > t.c1);

1.3 GROUP BY and HAVING clauses

Used to perform aggregation calculations on each group after grouping. When the GROUP BY statement appears in the query statement, you can only use the HAVING clause to add filter conditions, not WHERE. Because the WHERE filter condition can only act on the original dataset, while the HAVING filter condition can act on the aggregated dataset.

1.3.1 Basic aggregation function that can be used for grouping aggregation calculations

YMatrix provides the following basic aggregation functions that can be used for group aggregation calculations:

Functions Description
count(/*) Number of valid values ​​​​​(only count(*) contains null values)
sum() cumulative sum
avg() Specify the average column
min() Specify the minimum column value
max() Specify the column maximum value

Suppose there is a table t1 (including c1 column and c2 column):

=#  SELECT count(*), 
           max(c1),
           c2 
    FROM t1
    GROUP BY c2;

Notice! For advanced aggregation functions, see Advanced Query.

1.3.2 GROUP BY clause

Notice! The non-aggregated function column that appears in the SELECT list must appear in the GROUP BY list, otherwise the query will report an error.

Suppose there is a table t1 (including c1 column, c2 column, c3 column):

-- Correct syntax
=# SELECT c1, avg(c2)
    FROM t1
    GROUP BY c1;

-- Correct syntax
=# SELECT c1, c2, avg(c3)
    FROM t1
    GROUP BY (c1,c2);

-- Error syntax
=# SELECT c1, c2, avg(c3)
    FROM t1
    GROUP BY c1;

1.3.3 HAVING clause

The results of the aggregation function cannot be filtered. The regular WHERE clause is not used, and a special HAVING clause is required. The reason is that the WHERE filter condition can only act on the original dataset, while the HAVING filter condition can act on the aggregated dataset.

Suppose there is a table t1 (including c1 column and c2 column):

-- Correct syntax
=# SELECT c1, max(c2)
    FROM t1
    GROUP BY c1
    HAVING avg(c2) < 60;

-- Error syntax
=# SELECT c1, max(c2)
    FROM t1
    GROUP BY c1
    WHERE avg(c2) < 60;


2 SELECT List

The SELECT list can create a temporary virtual table through calculations and determine which columns will be displayed in the final output result.

For example, query the c1, c2, c3 columns of the t1 table:

=# SELECT c1, c2, c3 FROM t1;

Query the c1, c2 columns of the t1 table and the c1 columns of the t2 table. If there are columns with the same name in multiple tables, you need to specify the table name before the column:

=# SELECT t1.c1, t2.c1, t1.c2 FROM t1,t2;

Use the AS keyword to define an alias for subsequent expressions using the SELECT list:

=# SELECT c1 AS value, c2 + 1 AS sum FROM t1,t2;

The AS keyword can be omitted, but in this example, since value is also one of the keywords of YMatrix, you need to add double quotes to distinguish:

=# SELECT c1 "value", c2 + 1 AS sum FROM t1,t2;

Adding the DISTINCT keyword after the SELECT keyword can deduplicate the query result set and ensure the uniqueness of the result set:

=# SELECT DISTINCT c1 FROM t1;


3 Combination Query

Combined queries refer to combining the result sets of two queries using the UNION, INTERSECT or EXCEPT keywords, respectively, in a way that computes union, intersection, or complement.

We can combine the results of two queries with the following syntax:

=# <Query1> UNION [ALL] <Query2>
=# <Query1> INTERSECT [ALL] <Query2>
=# <Query1> EXCEPT [ALL] <Query2>

Explicitly specifying ALL retains duplicate lines in the output result.

Query 1 and Query 2 refer to queries that use one of any features discussed based on this point.

  • UNION effectively appends the result of query 2 to query 1, that is, the union of query 1 and query 2.
  • INTERSECT returns all result rows that belong to both query 1 and query 2, that is, the intersection of query 1 and query 2.
  • EXCEPT returns all rows that belong to query 1 but not to query 2, that is, the complement set of query 2 in the two result sets.

UNION syntax example, the other two combination keywords can also be referenced:

=# SELECT c1 FROM t1 UNION SELECT c2 FROM t2 LIMIT 10;

LIMIT in this query is a restriction on all the aforementioned calculation results, i.e. (SELECT c1 FROM t1 UNION SELECT c2 FROM t2) LIMIT 10;.


4 row sorting

Use the ORDER BY clause to select columns for sorting.

Use the AS keyword to define column alias and sort them with them (AS can be omitted):

=# SELECT c1 + c2 AS sum FROM t1 ORDER BY sum;

Use the ASC keyword to sort the result set in ascending order (from small to large):

=# SELECT c1 FROM t1 ORDER BY c1 ASC;

Use the DESC keyword to sort the result set in descending order (from large to small):

=# SELECT c1 FROM t1 ORDER BY c1 DESC;

The sorting option works independently for each sorting column, so if there are multiple columns that need to be sorted, you need to specify their sorting method separately. For example, the following two SQLs can implement ascending columns and descending columns in c2:

=# SELECT c1, c2 FROM t1 ORDER BY c1 ASC, c2 DESC;
=# SELECT c1, c2 FROM t1 ORDER BY c1, c2 DESC;

When no ascending or descending sort is specified, the default ascending sort is sorted.


5 LIMIT and OFFSET

Use LIMIT and OFFSET to limit the output of the query statement. LIMIT can limit the number of rows returned. For example, LIMIT 1 means that the query will eventually return 1 row result, and LIMIT ALL is equivalent to no limit. OFFSET is used to specify the offset of the final result. After the query calculation is completed, the specified number of rows in the result set is skipped and the final result starts from the next row of the specified number of rows.

Instructions for use:

  • The OFFSET clause must always be used with the LIMIT clause, and if LIMIT is omitted, the OFFSET clause will be ignored and return all rows.
  • It is OK to omit OFFSET and only use LIMIT, which is equivalent to OFFSET 0.
  • OFFSET must be a non-negative integer, otherwise an error will be thrown.
  • OFFSET specifies that skipped rows still need to be calculated before the final return, so specifying a large offset may be inefficient.

grammar:

=# SELECT <List>
    FROM <Table Expression>
    [ ORDER BY ... ]
    [ LIMIT { <Number> | ALL } ] [ OFFSET <Number> ]

Suppose there is a table t1 (including c1 column), and only the first 10 pieces of data sorted in descending order (the ten pieces with the largest value):

=# SELECT c1 FROM t1
    ORDER BY c1 DESC
    LIMIT 10;

View only the 11th to 20th data:

=# SELECT c1 FROM t1
    ORDER BY c1 DESC
    LIMIT 10 OFFSET 10;

This query operation will first offset the sorted results from the first data, with an offset of 10, that is, the limit number is calculated from the 11th data, so the final output results are from the 11th data.


6 JOIN and USING

You can use the JOIN clause and specify the connection key with USING to perform multi-table join operations:

Suppose there is a table t1 (including c1 column, c2 column), t2 (including c2 column):

=# SELECT t1.c1, t2.* FROM t2 JOIN t1 USING (c2) ORDER BY t2.c2 DESC LIMIT 10;