Basic Queries

This document describes the classification and syntax structure of basic query statements supported by YMatrix.


1 Table Expressions

A table expression is typically a FROM clause that computes a table. While other clauses following FROM (such as lists) are also considered table expressions, they are not the focus of this section and thus not emphasized. The FROM clause may be followed by one or more JOIN, ON, USING clauses, or subqueries.

A simple table expression refers to a base table — a table stored on disk (excluding temporary tables or views). However, in practice, more complex expressions can be used to modify or combine base tables in various ways to achieve desired query results.

Examples of table expressions:

Syntax for reading data from a single table:

=# FROM <table_name>

1.1 FROM Clause

1.1.1 Joined Tables

Join syntax:

=# FROM t1 <join_type> t2 [join_condition]

Here, <join_type> can be INNER JOIN, LEFT JOIN, RIGHT JOIN, or FULL JOIN. The first is an inner join; the latter three are outer joins.
INNER JOIN is equivalent to JOIN, which is also equivalent to simply writing JOIN.

Assume two tables: t1 (with columns c1, c2) and t2 (with columns c1, c2):

=# FROM t1 INNER JOIN t2 ON t1.c1 = t2.c1 AND t1.c2 = t2.c2
=# FROM t1 LEFT JOIN t2 USING (c1,c2)

1.1.2 Subqueries

When a query contains one or more inner queries, it is called a nested query. The outer query is known as the main query, and the inner query is referred to as a subquery.

Subqueries can be nested multiple levels deep and can appear almost anywhere in a query, including in SELECT, WHERE lists, table expressions, and HAVING, FROM, ORDER BY conditions. This section focuses on subqueries within table expressions.

Syntax:

=# FROM (SELECT * FROM <table_name>) AS <alias>

Assume a table table1:

=# FROM (SELECT * FROM table1) AS t1

This is equivalent to FROM table1.

1.2 WHERE Clause

The WHERE clause filters rows based on specified conditions. Syntax:

=# WHERE <condition>

Here, <condition> can be any value expression that returns a numeric or Boolean result.

Assume tables t1 (with column c1), t2 (with column c2), and t3 (with columns c1, c2, c3):

=# SELECT ... FROM t1 WHERE c1 > 5;
=# SELECT ... FROM t1 WHERE c1 IN (1, 2, 3);
=# SELECT ... FROM t1 WHERE c1 IN (SELECT c1 FROM t2);
=# SELECT ... FROM t1 WHERE c1 IN (SELECT c3 FROM t2 WHERE c2 = t.c1 + 10);
=# SELECT ... FROM t1 WHERE c1 BETWEEN (SELECT c3 FROM t2 WHERE c2 = t.c1 + 10) AND 100;
=# SELECT ... FROM t1 WHERE EXISTS (SELECT c1 FROM t2 WHERE c2 > t.c1);

1.3 GROUP BY and HAVING Clauses

These clauses are used to group rows and perform aggregate calculations per group. When a query includes an aggregate function, use HAVING to filter grouped results instead of WHERE. This is because WHERE filters operate on raw data before grouping, while HAVING filters apply after aggregation.

1.3.1 Basic Aggregate Functions

YMatrix provides the following basic aggregate functions for grouped computations:

Function Description
count(/*) Number of non-null values (count(*) includes nulls)
sum() Sum of values
avg() Average of values
min() Minimum value
max() Maximum value

Assume table t1 (with columns c1, c2):

=#  SELECT count(*), 
           max(c1),
           c2 
    FROM t1
    GROUP BY c2;

Note!
For advanced aggregate functions, see Advanced Queries.

1.3.2 GROUP BY Clause

Note!
Any non-aggregate column in the SELECT list must appear in the GROUP BY list; otherwise, the query will fail.

Assume table t1 (with columns c1, c2, c3):

-- Correct syntax
=# SELECT c1, avg(c2)
    FROM t1
    GROUP BY c1;

-- Correct syntax
=# SELECT c1, c2, avg(c3)
    FROM t1
    GROUP BY (c1,c2);

-- Incorrect syntax
=# SELECT c1, c2, avg(c3)
    FROM t1
    GROUP BY c1;

1.3.3 HAVING Clause

To filter results of aggregate functions, use the HAVING clause instead of WHERE. As noted, WHERE applies before grouping, while HAVING applies after aggregation.

Assume table t1 (with columns c1, c2):

-- Correct syntax
=# SELECT c1, max(c2)
    FROM t1
    GROUP BY c1
    HAVING avg(c2) < 60;

-- Incorrect syntax
=# SELECT c1, max(c2)
    FROM t1
    GROUP BY c1
    WHERE avg(c2) < 60;


2 SELECT List

The SELECT list constructs a temporary virtual table through computation and determines which columns appear in the final output.

For example, select columns c1, c2, c3 from table t1:

=# SELECT c1, c2, c3 FROM t1;

Select columns c1, c2 from t1 and column c1 from t2. If multiple tables have columns with the same name, qualify them with the table name:

=# SELECT t1.c1, t2.c1, t1.c2 FROM t1,t2;

Use the AS keyword in the SELECT list to define aliases for use in subsequent expressions:

=# SELECT c1 AS value, c2 + 1 AS sum FROM t1,t2;

The AS keyword is optional, but in this example, since value is also a reserved keyword in YMatrix, double quotes are required:

=# SELECT c1 "value", c2 + 1 AS sum FROM t1,t2;

Add the DISTINCT keyword after SELECT to remove duplicates and ensure result uniqueness:

=# SELECT DISTINCT c1 FROM t1;


3 Combined Queries

Combined queries use the UNION, INTERSECT, or EXCEPT keywords to merge the result sets of two queries by computing their union, intersection, or difference.

Syntax for combining two queries:

=# <query1> UNION [ALL] <query2>
=# <query1> INTERSECT [ALL] <query2>
=# <query1> EXCEPT [ALL] <query2>

Specifying ALL explicitly preserves duplicate rows in the output.

Query 1 and Query 2 refer to any queries using features discussed herein.

  • UNION appends the result of Query 2 to Query 1 — i.e., the union of both.
  • INTERSECT returns rows present in both Query 1 and Query 2 — i.e., their intersection.
  • EXCEPT returns rows in Query 1 that are not in Query 2 — i.e., the set difference.

Example using UNION; similar syntax applies to the other operators:

=# SELECT c1 FROM t1 UNION SELECT c2 FROM t2 LIMIT 10;

In this query, LIMIT 10 applies to the final combined result — i.e., (SELECT c1 FROM t1 UNION SELECT c2 FROM t2) LIMIT 10;.


4 Row Sorting

Use the ORDER BY clause to sort results by specified columns.

Define a column alias with AS and sort by it (AS is optional):

=# SELECT c1 + c2 AS sum FROM t1 ORDER BY sum;

Use ASC to sort in ascending order (smallest to largest):

=# SELECT c1 FROM t1 ORDER BY c1 ASC;

Use DESC to sort in descending order (largest to smallest):

=# SELECT c1 FROM t1 ORDER BY c1 DESC;

Sort options act independently per column. For multiple columns, specify each sort direction separately. The following two SQL statements both sort c1 ascending and c2 descending:

=# SELECT c1, c2 FROM t1 ORDER BY c1 ASC, c2 DESC;
=# SELECT c1, c2 FROM t1 ORDER BY c1, c2 DESC;

By default, sorting is in ascending order if no direction is specified.


5 LIMIT and OFFSET

Use LIMIT and OFFSET to constrain the number of returned rows. LIMIT restricts the output count — e.g., LIMIT 1 returns only one row; LIMIT ALL imposes no limit. OFFSET specifies how many rows to skip before returning results. After query execution, the system skips the specified number of rows and returns results starting from the next row.

Usage notes:

  • The OFFSET clause must be used with LIMIT. If OFFSET is omitted, LIMIT operates from the first row.
  • Omitting OFFSET and using only LIMIT is valid and equivalent to OFFSET 0.
  • OFFSET must be a non-negative integer; otherwise, an error is thrown.
  • Rows skipped by OFFSET are still computed, so large offsets may be inefficient.

Syntax:

=# SELECT <list>
    FROM <table_expression>
    [ ORDER BY ... ]
    [ LIMIT { <number> | ALL } ] [ OFFSET <number> ]

Assume table t1 (with column c1). To view the top 10 rows in descending order (the 10 largest values):

=# SELECT c1 FROM t1
    ORDER BY c1 DESC
    LIMIT 10;

To view rows 11 through 20:

=# SELECT c1 FROM t1
    ORDER BY c1 DESC
    LIMIT 10 OFFSET 10;

This query first sorts the data, then skips the first 10 rows (OFFSET 10), and limits the output to the next 10 rows — hence returning rows 11 to 20.


6 JOIN and USING

Use the JOIN clause with USING to perform multi-table joins on specified key columns.

Assume table t1 (with columns c1, c2) and table t2 (with column c2):

=# SELECT t1.c1, t2.* FROM t2 JOIN t1 USING (c2) ORDER BY t2.c2 DESC LIMIT 10;