分区裁剪

    分区裁剪是只有当目标表为分区表时,才可以进行的一种优化方式。分区裁剪通过分析查询语句中的过滤条件,只选择可能满足条件的分区,不扫描匹配不上的分区,进而显著地减少计算的数据量。

    例如:

    1. | id | estRows | task | access object | operator info |
    2. +----------------------------+---------+-----------+------------------------+------------------------------------------------+
    3. | PartitionUnion_8 | 80.00 | root | | |
    4. | ├─TableReader_10 | 40.00 | root | | data:TableRangeScan_9 |
    5. | └─TableRangeScan_9 | 40.00 | cop[tikv] | table:t1, partition:p0 | range:[80,120], keep order:false, stats:pseudo |
    6. | └─TableReader_12 | 40.00 | root | | data:TableRangeScan_11 |
    7. | └─TableRangeScan_11 | 40.00 | cop[tikv] | table:t1, partition:p1 | range:[80,120], keep order:false, stats:pseudo |
    8. +----------------------------+---------+-----------+------------------------+------------------------------------------------+
    9. 5 rows in set (0.00 sec)

    分区表有 Range 分区和 hash 分区两种形式,分区裁剪对两种分区表也有不同的使用场景。

    Hash 分区表上可以使用分区裁剪的场景

    只有等值比较的查询条件能够支持 Hash 分区表的裁剪。

    1. create table t (x int) partition by hash(x) partitions 4;
    2. explain select * from t where x = 1;
    1. +-------------------------+----------+-----------+-----------------------+--------------------------------+
    2. | id | estRows | task | access object | operator info |
    3. +-------------------------+----------+-----------+-----------------------+--------------------------------+
    4. | TableReader_8 | 10.00 | root | | data:Selection_7 |
    5. | └─Selection_7 | 10.00 | cop[tikv] | | eq(test.t.x, 1) |
    6. | └─TableFullScan_6 | 10000.00 | cop[tikv] | table:t, partition:p1 | keep order:false, stats:pseudo |
    7. +-------------------------+----------+-----------+-----------------------+--------------------------------+

    在这条 SQL 中,由条件 x = 1 可以知道所有结果均在一个分区上。数值 1 在经过 Hash 后,可以确定其在分区 p1 中。因此只需要扫描分区 p1 ,而无需访问一定不会出现相关结果的 p2p3p4 分区。从执行计划来看,其中只出现了一个 TableFullScan 算子,且在 access object 中指定了 p1 分区,确认 partition pruning 生效了。

    Hash 分区表上不能使用分区裁剪的场景

    场景一
    1. create table t (x int) partition by hash(x) partitions 4;
    2. explain select * from t where x > 2;
    1. +------------------------------+----------+-----------+-----------------------+--------------------------------+
    2. | id | estRows | task | access object | operator info |
    3. +------------------------------+----------+-----------+-----------------------+--------------------------------+
    4. | Union_10 | 13333.33 | root | | |
    5. | ├─TableReader_13 | 3333.33 | root | | data:Selection_12 |
    6. | └─Selection_12 | 3333.33 | cop[tikv] | | gt(test.t.x, 2) |
    7. | └─TableFullScan_11 | 10000.00 | cop[tikv] | table:t, partition:p0 | keep order:false, stats:pseudo |
    8. | ├─TableReader_16 | 3333.33 | root | | data:Selection_15 |
    9. | └─TableFullScan_14 | 10000.00 | cop[tikv] | table:t, partition:p1 | keep order:false, stats:pseudo |
    10. | ├─TableReader_19 | 3333.33 | root | | data:Selection_18 |
    11. | └─Selection_18 | 3333.33 | cop[tikv] | | gt(test.t.x, 2) |
    12. | └─TableFullScan_17 | 10000.00 | cop[tikv] | table:t, partition:p2 | keep order:false, stats:pseudo |
    13. | └─TableReader_22 | 3333.33 | root | | data:Selection_21 |
    14. | └─Selection_21 | 3333.33 | cop[tikv] | | gt(test.t.x, 2) |
    15. | └─TableFullScan_20 | 10000.00 | cop[tikv] | table:t, partition:p3 | keep order:false, stats:pseudo |
    16. +------------------------------+----------+-----------+-----------------------+--------------------------------+

    在这条 SQL 中,x > 2 条件无法确定对应的 Hash Partition,所以不能使用分区裁剪。

    场景二

    由于分区裁剪的规则优化是在查询计划的生成阶段,对于执行阶段才能获取到过滤条件的场景,无法利用分区裁剪的优化。

    1. +--------------------------------------+----------+-----------+------------------------+----------------------------------------------+
    2. | id | estRows | task | access object | operator info |
    3. | Projection_13 | 9990.00 | root | | test.t2.x |
    4. | └─Apply_15 | 9990.00 | root | | inner join, equal:[eq(test.t2.x, test.t1.x)] |
    5. | ├─TableReader_18(Build) | 9990.00 | root | | data:Selection_17 |
    6. | └─Selection_17 | 9990.00 | cop[tikv] | | not(isnull(test.t2.x)) |
    7. | └─TableFullScan_16 | 10000.00 | cop[tikv] | table:t2 | keep order:false, stats:pseudo |
    8. | └─Selection_19(Probe) | 0.80 | root | | not(isnull(test.t1.x)) |
    9. | └─MaxOneRow_20 | 1.00 | root | | |
    10. | └─Union_21 | 2.00 | root | | |
    11. | ├─TableReader_24 | 2.00 | root | | data:Selection_23 |
    12. | └─Selection_23 | 2.00 | cop[tikv] | | eq(test.t2.x, test.t1.x), lt(test.t2.x, 2) |
    13. | └─TableFullScan_22 | 2500.00 | cop[tikv] | table:t1, partition:p0 | keep order:false, stats:pseudo |
    14. | └─TableReader_27 | 2.00 | root | | data:Selection_26 |
    15. | └─Selection_26 | 2.00 | cop[tikv] | | eq(test.t2.x, test.t1.x), lt(test.t2.x, 2) |
    16. | └─TableFullScan_25 | 2500.00 | cop[tikv] | table:t1, partition:p1 | keep order:false, stats:pseudo |
    17. +--------------------------------------+----------+-----------+------------------------+----------------------------------------------+

    这个查询每从 t2 读取一行,都会去分区表 t1 上进行查询,理论上这时会满足 t1.x = val 的过滤条件,但实际上由于分区裁剪只作用于查询计划生成阶段,而不是执行阶段,因而不会做裁剪。

    分区裁剪在 Range 分区表上的应用

    Range 分区表上可以使用分区裁剪的场景

    场景一

    等值比较的查询条件可以使用分区裁剪。

    1. create table t (x int) partition by range (x) (
    2. partition p0 values less than (5),
    3. partition p1 values less than (10),
    4. partition p2 values less than (15)
    5. );
    6. explain select * from t where x = 3;
    1. +-------------------------+----------+-----------+-----------------------+--------------------------------+
    2. | id | estRows | task | access object | operator info |
    3. +-------------------------+----------+-----------+-----------------------+--------------------------------+
    4. | TableReader_8 | 10.00 | root | | data:Selection_7 |
    5. | └─Selection_7 | 10.00 | cop[tikv] | | eq(test.t.x, 3) |
    6. | └─TableFullScan_6 | 10000.00 | cop[tikv] | table:t, partition:p0 | keep order:false, stats:pseudo |
    7. +-------------------------+----------+-----------+-----------------------+--------------------------------+

    使用 in 条件的等值比较查询条件也可以使用分区裁剪。

    1. create table t (x int) partition by range (x) (
    2. partition p0 values less than (5),
    3. partition p2 values less than (15)
    4. );
    5. explain select * from t where x in(1,13);
    1. +-----------------------------+----------+-----------+-----------------------+--------------------------------+
    2. | id | estRows | task | access object | operator info |
    3. +-----------------------------+----------+-----------+-----------------------+--------------------------------+
    4. | Union_8 | 40.00 | root | | |
    5. | ├─TableReader_11 | 20.00 | root | | data:Selection_10 |
    6. | └─Selection_10 | 20.00 | cop[tikv] | | in(test.t.x, 1, 13) |
    7. | └─TableFullScan_9 | 10000.00 | cop[tikv] | table:t, partition:p0 | keep order:false, stats:pseudo |
    8. | └─TableReader_14 | 20.00 | root | | data:Selection_13 |
    9. | └─Selection_13 | 20.00 | cop[tikv] | | in(test.t.x, 1, 13) |
    10. | └─TableFullScan_12 | 10000.00 | cop[tikv] | table:t, partition:p2 | keep order:false, stats:pseudo |
    11. +-----------------------------+----------+-----------+-----------------------+--------------------------------+
    场景二

    区间比较的查询条件如 between, > < = >= <= 可以使用分区裁剪。

    1. +-----------------------------+----------+-----------+-----------------------+-----------------------------------+
    2. | id | estRows | task | access object | operator info |
    3. +-----------------------------+----------+-----------+-----------------------+-----------------------------------+
    4. | Union_8 | 500.00 | root | | |
    5. | ├─TableReader_11 | 250.00 | root | | data:Selection_10 |
    6. | └─Selection_10 | 250.00 | cop[tikv] | | ge(test.t.x, 7), le(test.t.x, 14) |
    7. | └─TableFullScan_9 | 10000.00 | cop[tikv] | table:t, partition:p1 | keep order:false, stats:pseudo |
    8. | └─TableReader_14 | 250.00 | root | | data:Selection_13 |
    9. | └─Selection_13 | 250.00 | cop[tikv] | | ge(test.t.x, 7), le(test.t.x, 14) |
    10. | └─TableFullScan_12 | 10000.00 | cop[tikv] | table:t, partition:p2 | keep order:false, stats:pseudo |
    11. +-----------------------------+----------+-----------+-----------------------+-----------------------------------+
    场景三

    分区表达式为 fn(col) 的简单形式,查询条件是 > < = >= <= 之一,且 fn 是单调函数,可以使用分区裁剪。

    关于 fn 函数,对于任意 x y,如果 x > y,则 fn(x) > fn(y),那么这种是严格递增的单调函数。非严格递增的单调函数也可以符合分区裁剪要求,只要函数 fn 满足:对于任意 x y,如果 x > y,则 fn(x) >= fn(y)。理论上,所有满足单调条件(严格或者非严格)的函数都支持分区裁剪。目前,TiDB 支持的单调函数如下:

    1. unix_timestamp
    2. to_days

    例如,分区表达式是 fn(col) 形式,fn 为我们支持的单调函数 to_days,就可以使用分区裁剪:

    1. create table t (id datetime) partition by range (to_days(id)) (
    2. partition p0 values less than (to_days('2020-04-01')),
    3. partition p1 values less than (to_days('2020-05-01')));
    4. explain select * from t where id > '2020-04-18';
    1. +-------------------------+----------+-----------+-----------------------+-------------------------------------------+
    2. | id | estRows | task | access object | operator info |
    3. +-------------------------+----------+-----------+-----------------------+-------------------------------------------+
    4. | TableReader_8 | 3333.33 | root | | data:Selection_7 |
    5. | └─Selection_7 | 3333.33 | cop[tikv] | | gt(test.t.id, 2020-04-18 00:00:00.000000) |
    6. | └─TableFullScan_6 | 10000.00 | cop[tikv] | table:t, partition:p1 | keep order:false, stats:pseudo |
    7. +-------------------------+----------+-----------+-----------------------+-------------------------------------------+

    Range 分区表上不能使用分区裁剪的场景

    由于分区裁剪的规则优化是在查询计划的生成阶段,对于执行阶段才能获取到过滤条件的场景,无法利用分区裁剪的优化。

    1. create table t1 (x int) partition by range (x) (
    2. partition p0 values less than (5),
    3. partition p1 values less than (10));
    4. explain select * from t2 where x < (select * from t1 where t2.x < t1.x and t2.x < 2);