Mainly introduces the use of Select syntax
grammar:
Syntax Description:
select_expr, … Columns retrieved and displayed in the result, when using an alias, as is optional.
select_expr, … Retrieved target table (one or more tables (including temporary tables generated by subqueries)
where_definition retrieves the condition (expression), if there is a WHERE clause, the condition filters the row data. where_condition is an expression that evaluates to true for each row to be selected. Without the WHERE clause, the statement selects all rows. In WHERE expressions, you can use any MySQL supported functions and operators except aggregate functions
: to refresh the result set, all is all, distinct/distinctrow will refresh the duplicate columns, the default is all
INTO OUTFILE 'file_name'
: save the result to a new file (which did not exist before), the difference lies in the save format.Group [asc/desc]by having
: Group the result set, and brush the result of group by when having appears.Order by
: Sort the final result, Order by sorts the result set by comparing the size of one or more columns.Order by is a time-consuming and resource-intensive operation, because all data needs to be sent to 1 node before it can be sorted, and the sorting operation requires more memory than the non-sorting operation.
If you need to return the top N sorted results, you need to use the LIMIT clause; in order to limit memory usage, if the user does not specify the LIMIT clause, the first 65535 sorted results are returned by default.
Limit n
: limit the number of lines in the output result,limit m,n
means output n records starting from the mth line.The
Having
clause does not filter the row data in the table, but filters the results produced by the aggregate function.Typically
having
is used with aggregate functions (eg :COUNT(), SUM(), AVG(), MIN(), MAX()
) andgroup by
clauses.SELECT supports explicit partition selection using PARTITION containing a list of partitions or subpartitions (or both) following the name of the table in
table_reference
Syntax constraints:
- SELECT can also be used to retrieve calculated rows without referencing any table.
- All clauses must be ordered strictly according to the above format, and a HAVING clause must be placed after the GROUP BY clause and before the ORDER BY clause.
- The alias keyword AS is optional. Aliases can be used for group by, order by and having
- Where clause: The WHERE statement is executed to determine which rows should be included in the GROUP BY section, and HAVING is used to determine which rows in the result set should be used.
- The HAVING clause can refer to the total function, but the WHERE clause cannot refer to, such as count, sum, max, min, avg, at the same time, the where clause can refer to other functions except the total function. Column aliases cannot be used in the Where clause to define conditions.
- Group by followed by with rollup can count the results one or more times.
Join query:
Doris supports JOIN syntax
JION
table_references:
table_reference [, table_reference] …
table_reference:
table_factor
| join_table
table_factor:
tbl_name [[AS] alias]
[{USE|IGNORE|FORCE} INDEX (key_list)]
| ( table_references )
| { OJ table_reference LEFT OUTER JOIN table_reference
ON conditional_expr }
join_table:
table_reference [INNER | CROSS] JOIN table_factor [join_condition]
| table_reference STRAIGHT_JOIN table_factor
| table_reference STRAIGHT_JOIN table_factor ON condition
| table_reference LEFT [OUTER] JOIN table_reference join_condition
| table_reference NATURAL [LEFT [OUTER]] JOIN table_factor
| table_reference RIGHT [OUTER] JOIN table_reference join_condition
| table_reference NATURAL [RIGHT [OUTER]] JOIN table_factor
join_condition:
ON conditional_expr
UNION Grammar:
SELECT ...
UNION [ALL| DISTINCT] SELECT ......
[UNION [ALL| DISTINCT] SELECT ...]
UNION
is used to combine the results of multiple SELECT
statements into a single result set.
The column names in the first SELECT
statement are used as the column names in the returned results. The selected columns listed in the corresponding position of each SELECT
statement should have the same data type. (For example, the first column selected by the first statement should be of the same type as the first column selected by other statements.)
WITH statement:
To specify common table expressions, use the WITH
clause with one or more comma-separated clauses. Each subclause provides a subquery that generates the result set and associates the name with the subquery. The following example defines WITH
clauses in CTEs named cte1
and cte2
, and refers to the WITH
clause below their top-level SELECT
:
WITH
cte1 AS(SELECT a,b FROM table1),
cte2 AS(SELECT c,d FROM table2)
SELECT b,d FROM cte1 JOIN cte2
WHERE cte1.a = cte2.c;
In a statement containing the WITH
clause, each CTE name can be referenced to access the corresponding CTE result set.
CTE names can be referenced in other CTEs, allowing CTEs to be defined based on other CTEs.
A CTE can refer to itself to define a recursive CTE. Common applications of recursive CTEs include sequence generation and traversal of hierarchical or tree-structured data.
GROUP BY Example
--Query the tb_book table, group by type, and find the average price of each type of book,
select type,avg(price) from tb_book group by type;
DISTINCT Use
--Query the tb_book table to remove duplicate type data
select distinct type from tb_book;
ORDER BY Example
Sort query results in ascending (default) or descending (DESC) order. Ascending NULL is first, descending NULL is last
--Query all records in the tb_book table, sort them in descending order by id, and display three records
select * from tb_book order by id desc limit 3;
LIKE fuzzy query
Can realize fuzzy query, it has two wildcards:
%
and_
,%
can match one or more characters,_
can match one characterLIMIT limits the number of result rows
--1. Display 3 records in descending order
select * from tb_book order by price desc limit 3;
--2. Display 4 records from id=1
select * from tb_book where id limit 1,4;
CONCAT join multiple columns
--Combine name and price into a new string output
select id,concat(name,":",price) as info,type from tb_book;
Using functions and expressions
--Calculate the total price of various books in the tb_book table
select sum(price) as total,type from tb_book group by type;
--20% off price
select *,(price * 0.8) as "20%" from tb_book;
UNION Example
SELECT a FROM t1 WHERE a = 10 AND B = 1 ORDER by LIMIT 10
UNION
SELECT a FROM t2 WHERE a = 11 AND B = 2 ORDER by LIMIT 10;
WITH clause example
WITH cte AS
(
SELECT 1 AS col1, 2 AS col2
SELECT 3, 4
)
SELECT col1, col2 FROM cte;
JOIN Exampel
SELECT * FROM t1 LEFT JOIN (t2, t3, t4)
ON (t2.a = t1.a AND t3.b = t1.b AND t4.c = t1.c)
Equivalent to
SELECT * FROM t1 LEFT JOIN (t2 CROSS JOIN t3 CROSS JOIN t4)
ON (t2.a = t1.a AND t3.b = t1.b AND t4.c = t1.c)
INNER JOIN
LEFT JOIN
SELECT left_tbl.*
FROM left_tbl LEFT JOIN right_tbl ON left_tbl.id = right_tbl.id
WHERE right_tbl.id IS NULL;
RIGHT JOIN
mysql> SELECT * FROM t1 RIGHT JOIN t2 ON (t1.a = t2.a);
+------+------+------+------+
| a | b | a | c |
+------+------+------+------+
| NULL | NULL | 3 | w |
+------+------+------+------+
SELECT
-
An alias can be specified for select_expr using AS alias_name. Aliases are used as column names in expressions and can be used in GROUP BY, ORDER BY or HAVING clauses. The AS keyword is a good habit to use when specifying aliases for columns.
table_references after FROM indicates one or more tables participating in the query. If more than one table is listed, a JOIN operation is performed. And for each specified table, you can define an alias for it
The selected column after SELECT can be referenced in ORDER IN and GROUP BY by column name, column alias or integer (starting from 1) representing the column position
SELECT college, region, seed FROM tournament
ORDER BY region, seed;
SELECT college, region AS r, seed AS s FROM tournament
ORDER BY r, s;
SELECT college, region, seed FROM tournament
ORDER BY 2, 3;
If ORDER BY appears in a subquery and also applies to the outer query, the outermost ORDER BY takes precedence.
If GROUP BY is used, the grouped columns are automatically sorted in ascending order (as if there was an ORDER BY statement followed by the same columns). If you want to avoid the overhead of GROUP BY due to automatic sorting, adding ORDER BY NULL can solve it:
SELECT a, COUNT(b) FROM test_table GROUP BY a ORDER BY NULL;
When sorting columns in a SELECT using ORDER BY or GROUP BY, the server sorts values using only the initial number of bytes indicated by the max_sort_length system variable.
Having clauses are generally applied last, just before the result set is returned to the MySQL client, and is not optimized. (while LIMIT is applied after HAVING)
The SQL standard requires: HAVING must refer to a column in the GROUP BY list or used by an aggregate function. However, MySQL extends this by allowing HAVING to refer to columns in the Select clause list, as well as columns from outer subqueries.
A warning is generated if the column referenced by HAVING is ambiguous. In the following statement, col2 is ambiguous:
SELECT COUNT(col1) AS col2 FROM t GROUP BY col2 HAVING col2 = 2;
Remember not to use HAVING where WHERE should be used. HAVING is paired with GROUP BY.
The HAVING clause can refer to aggregate functions, while WHERE cannot.
SELECT user, MAX(salary) FROM users
GROUP BY user HAVING MAX(salary) > 10;
The LIMIT clause can be used to constrain the number of rows returned by a SELECT statement. LIMIT can have one or two arguments, both of which must be non-negative integers.
SELECT…INTO allows query results to be written to a file
Modifiers of the SELECT keyword
deduplication
The ALL and DISTINCT modifiers specify whether to deduplicate rows in the result set (should not be a column).
ALL is the default modifier, that is, all rows that meet the requirements are to be retrieved.
DISTINCT removes duplicate rows.
The main advantage of subqueries
- Subqueries allow structured queries so that each part of a statement can be isolated.
- Some operations require complex unions and associations. Subqueries provide other ways to perform these operations
Speed up queries
- Use Doris’s partition and bucket as data filtering conditions as much as possible to reduce the scope of data scanning
- Make full use of Doris’s prefix index fields as data filter conditions to speed up query speed
UNION
Using only the union keyword has the same effect as using union disitnct. Since the deduplication work is more memory-intensive, the query speed using the union all operation will be faster and the memory consumption will be less. If users want to perform order by and limit operations on the returned result set, they need to put the union operation in the subquery, then select from subquery, and finally put the subquery and order by outside the subquery.
select * from (select age from student_01 union all select age from student_02) as t1
order by age limit 4;
+-------------+
| age |
+-------------+
| 18 |
| 19 |
| 20 |
| 21 |
+-------------+
-
- In the inner join condition, in addition to supporting equal-valued joins, it also supports unequal-valued joins. For performance reasons, it is recommended to use equal-valued joins.
- Other joins only support equivalent joins