EXPLAIN ANALYZE

注意

在使用 EXPLAIN ANALYZE 执行 DML 语句时，数据的修改操作会被正常执行。但目前 DML 语句还无法展示执行计划。

ExplainSym

ExplainStmt

ExplainableStmt

EXPLAIN ANALYZE 输出格式

和 EXPLAIN 不同，EXPLAIN ANALYZE 会执行对应的 SQL 语句，记录其运行时信息，和执行计划一并返回出来。因此，可以将 EXPLAIN ANALYZE 视为 EXPLAIN 语句的扩展。EXPLAIN ANALYZE 语句的返回结果相比 EXPLAIN，增加了 actRows，execution info，memory，disk 这几列信息：

CREATE TABLE t1 (id INT NOT NULL PRIMARY KEY AUTO_INCREMENT, c1 INT NOT NULL);

Query OK, 0 rows affected (0.12 sec)

INSERT INTO t1 (c1) VALUES (1), (2), (3);

Query OK, 3 rows affected (0.02 sec)
Records: 3  Duplicates: 0  Warnings: 0

+-------------+---------+---------+------+---------------+----------------------------------------------------------------+---------------+--------+------+
| id          | estRows | actRows | task | access object | execution info                                                 | operator info | memory | disk |
+-------------+---------+---------+------+---------------+----------------------------------------------------------------+---------------+--------+------+
| Point_Get_1 | 1.00    | 1       | root | table:t1      | time:757.205µs, loops:2, Get:{num_rpc:1, total_time:697.051µs} | handle:1      | N/A    | N/A  |
+-------------+---------+---------+------+---------------+----------------------------------------------------------------+---------------+--------+------+
1 row in set (0.01 sec)

EXPLAIN ANALYZE SELECT * FROM t1;

+-------------------+----------+---------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+-----------+------+
| id                | estRows  | actRows | task      | access object | execution info                                                                                                                                                                                                                            | operator info                  | memory    | disk |
+-------------------+----------+---------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+-----------+------+
| TableReader_5     | 10000.00 | 3       | root      |               | time:278.2µs, loops:2, cop_task: {num: 1, max: 437.6µs, proc_keys: 3, rpc_num: 1, rpc_time: 423.9µs, copr_cache_hit_ratio: 0.00}                                                                                                          | data:TableFullScan_4           | 251 Bytes | N/A  |
| └─TableFullScan_4 | 10000.00 | 3       | cop[tikv] | table:t1      | tikv_task:{time:0s, loops:1}, scan_detail: {total_process_keys: 3, total_process_keys_size: 111, total_keys: 4, rocksdb: {delete_skipped_count: 0, key_skipped_count: 3, block: {cache_hit_count: 0, read_count: 0, read_byte: 0 Bytes}}} | keep order:false, stats:pseudo | N/A       | N/A  |
+-------------------+----------+---------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+-----------+------+
2 rows in set (0.00 sec)

算子执行信息介绍

execution info 信息除了基本的 time 和 loop 信息外，还包含算子特有的执行信息，主要包含了该算子发送 RPC 请求的耗时信息以及其他步骤的耗时。

Point_Get 算子可能包含以下执行信息：

Get:{num_rpc:1, total_time:697.051µs}：向 TiKV 发送 Get 类型的 RPC 请求的数量 (num_rpc) 和所有 RPC 请求的总耗时 (total_time)。
ResolveLock:{num_rpc:1, total_time:12.117495ms}：读数据遇到锁后，进行 resolve lock 的时间。一般在读写冲突的场景下会出现。
regionMiss_backoff:{num:11, total_time:2010 ms},tikvRPC_backoff:{num:11, total_time:10691 ms}：RPC 请求失败后，会在等待 backoff 的时间后重试，包括了 backoff 的类型（如 regionMiss，tikvRPC），backoff 等待的总时间 (total_time) 和 backoff 的总次数 (num)。

Batch_Point_Get

Batch_Point_get 算子的执行信息和 Point_Get 算子类似，不过 Batch_Point_Get 一般向 TiKV 发送 BatchGet 类型的 RPC 请求来读取数据。

BatchGet:{num_rpc:2, total_time:83.13µs}：向 TiKV 发送 BatchGet 类型的 RPC 请求的数量 (num_rpc) 和所有 RPC 请求的总耗时 (total_time)。

TableReader

TableReader 算子可能包含以下执行信息：

cop_task: {num: 6, max: 1.07587ms, min: 844.312µs, avg: 919.601µs, p95: 1.07587ms, max_proc_keys: 16, p95_proc_keys: 16, tot_proc: 1ms, tot_wait: 1ms, rpc_num: 6, rpc_time: 5.313996ms, copr_cache_hit_ratio: 0.00}

cop_task：包含 cop task 的相关信息，如：
- num：cop task 的数量
- max,min,avg,：所有 cop task 中执行时间的最大值，最小值，平均值和 P95 值。
- rpc_num, rpc_time：向 TiKV 发送 Cop 类型的 RPC 请求总数量和总时间。
- copr_cache_hit_ratio：cop task 请求的 Coprocessor Cache 缓存命中率。。
backoff：包含不同类型的 backoff 以及等待总耗时。

Insert 算子可能包含以下执行信息：

prepare：准备写入前的耗时，包括表达式，默认值相关的计算等。
check_insert：这个信息一般出现在 insert ignore 和 insert on duplicate 语句中，包含冲突检查和写入 TiDB 事务缓存的耗时。注意，这个耗时不包含事务提交的耗时。具体包含以下信息：
- total_time：check_insert 步骤的总耗时。
- mem_insert_time：将数据写入 TiDB 事务缓存的耗时。
- prefetch：从 TiKV 中获取需要检查冲突的数据的耗时，该步骤主要是向 TiKV 发送 BatchGet 类型的 RPC 请求的获取数据。
- rpc：向 TiKV 发送 RPC 请求的总耗时，一般包含 BatchGet 和 Get 两种类型的 RPC 耗时，其中：
  - BatchGet 请求是 prefetch 步骤发送的 RPC 请求。
  - Get 请求是 insert on duplicate 语句在执行 duplicate update 时发送的 RPC 请求。
backoff：包含不同类型的 backoff 以及等待总耗时。

IndexJoin

IndexJoin 算子有 1 个 outer worker 和 N 个 inner worker 并行执行，其 join 结果的顺序和 outer table 的顺序一致，具体执行流程如下：

Outer worker 读取 N 行 outer table 的数据，然后包装成一个 task 发送给 result channel 和 inner worker channel。
Inner worker 从 inner worker channel 里面接收 task，然后根据 task 生成需要读取 inner table 的 key ranges 范围，然后读取相应范围的 inner table 行数据，并生成一个 inner table row 的 hash table。
IndexJoin 的主线程从 result channel 中接收 task，然后等待 inner worker 执行完这个 task。
IndexJoin 的主线程用 outer table rows 和 inner table rows 的 hash table 做 join。

inner:{total:4.297515932s, concurrency:5, task:17, construct:97.96291ms, fetch:4.164310088s, build:35.219574ms}, probe:53.574945ms

inner：inner worker 的执行信息，具体如下：
- total：inner worker 的总耗时。
- concurrency：inner worker 的数量。
- task：inner worker 处理 task 的总数量。
- construct：inner worker 读取 task 对应的 inner table rows 之前的准备时间。
- fetch：inner worker 读取 inner table rows 的总耗时。
- build: inner worker 构造 inner table rows 对应的 hash table 的总耗时。
probe：IndexJoin 主线程用 outer table rows 和 inner table rows 的 hash table 做 join 的总耗时。

IndexHashJoin

IndexHashJoin 算子和 IndexJoin 算子执行流程类似，也有 1 个 outer worker 和 N 个 inner worker 并行执行，但是其 join 结果的顺序是不和 outer table 一致。具体执行流程如下：

Outer worker 读取 N 行 out table 的数据，然后包装成一个 task 发送给 inner worker channel。
Inner worker 从 inner worker channel 里面接收 task，然后做以下三件事情，其中步骤 a 和 b 是并行执行。 a. 用 outer table rows 生成一个 hash table。 b. 根据 task 生成 key 的范围后，读取 inner table 相应范围的行数据。 c. 用 inner table rows 和 outer table rows 的 hash table 做 join，然后把 join 结果发送给 result channel。
IndexHashJoin 的主线程从 result channel 中接收 join 结果。

IndexHashJoin 算子包含以下执行信息：

inner:{total:4.429220003s, concurrency:5, task:17, construct:96.207725ms, fetch:4.239324006s, build:24.567801ms, join:93.607362ms}

inner：inner worker 的执行信息，具体如下：
- total：inner worker 的总耗时。
- concurrency：inner worker 的数量。
- task：inner worker 处理 task 的总数量。
- construct：inner worker 读取 task 对应的 inner table rows 之前的准备时间。
- fetch：inner worker 读取 inner table rows 的总耗时。
- build: inner worker 构造 outer table rows 对应的 hash table 的总耗时。
- join: inner worker 用 inner table rows 和 outer table rows 的 hash table 做 join 的总耗时。

HashJoin 算子有一个 inner worker，一个 outer worker 和 N 个 join worker，其具体执行逻辑如下：

inner worker 读取 inner table rows 并构造 hash table。
outer worker 读取 outer table rows，然后包装成 task 发送给 join worker。
等待第 1 步的 hash table 构造完成。
join worker 用 task 里面的 outer table rows 和 hash table 做 join，然后把 join 结果发送给 result channel。
HashJoin 的主线程从 result channel 中接收 join 结果。

HashJoin 算子包含以下执行信息：

build_hash_table:{total:146.071334ms, fetch:110.338509ms, build:35.732825ms}, probe:{concurrency:5, total:857.162518ms, max:171.48271ms, probe:125.341665ms, fetch:731.820853ms}

probe: join worker 的执行信息：
- concurrency：join worker 的数量。
- total：所有 join worker 执行的总耗时。
- max：单个 join worker 执行的最大耗时。
- probe: 用 outer table rows 和 hash table 做 join 的总耗时。
- fetch：join worker 等待读取 outer table rows 数据的总耗时。

TableFullScan (TiFlash)

在 TiFlash 节点上执行的 TableFullScan 算子包含以下执行信息：

tiflash_scan: {
  dtfile: {
    total_scanned_packs: 2, 
    total_skipped_packs: 1, 
    total_scanned_rows: 16000, 
    total_skipped_rows: 8192, 
    total_rough_set_index_load_time: 2ms, 
    total_read_time: 20ms
  }, 
  total_create_snapshot_time: 1ms
}

dtfile：扫表过程中与 DTFile （即 DeltaTree File）相关的信息；这基本反映了 TiFlash 在 Stable 层数据的读取情况。
- total_scanned_packs：DTFile 内累计读取的 Pack 的数量；Pack 是 TiFlash DTFile 读取的最小粒度，默认情况下每 8192 行构成一个 Pack。
- total_skipped_packs：DTFile 内累计跳过的 Pack 的数量；Pack 会由于 WHERE 条件命中粗糙索引或主键范围过滤而被跳过。
- total_scanned_rows：DTFile 内累计读取的行数；若存在 MVCC 带来的多版本更新或删除，则每个版本独立计数。
- total_skipped_rows：DTFile 内累计跳过的行数。
- total_rs_index_load_time：读取 DTFile 粗糙索引的累计耗时。
- total_read_time：读取 DTFile 数据的累计耗时。
total_create_snapshot_time：扫表过程中创建快照的总耗时。

lock_keys 执行信息

在悲观事务中执行 DML 语句时，算子的执行信息还可能包含 lock_keys 的执行信息，示例如下：

time：执行 lock_keys 操作的总耗时。
region：执行 lock_keys 操作涉及的 Region 数量。
keys：需要 Lock 的 Key 的数量。
lock_rpc：向 TiKV 发送 Lock 类型的 RPC 总耗时。因为可以并行发送多个 RPC 请求，所以总 RPC 耗时可能比 lock_keys 操作总耗时大。
rpc_count：向 TiKV 发送 Lock 类型的 RPC 总数量。

在 autocommit=1 的事务中执行写入类型的 DML 语句时，算子的执行信息还会包括事务提交的耗时信息，示例如下：

commit_txn: {prewrite:48.564544ms, wait_prewrite_binlog:47.821579, get_commit_ts:4.277455ms, commit:50.431774ms, region_num:7, write_keys:16, write_byte:536}

prewrite：事务 2PC 提交阶段中 prewrite 阶段的耗时。
wait_prewrite_binlog:：等待写 prewrite Binlog 的耗时。
get_commit_ts：获取事务提交时间戳的耗时。
commit：事务 2PC 提交阶段中，commit 阶段的耗时。
write_keys：事务中写入 key 的数量。
write_byte：事务中写入 key-value 的总字节数量，单位是 byte。

其它常见执行信息

Coprocessor 算子通常包含 cop_task 和 tikv_task 两部分执行时间信息。前者是 TiDB 端记录的时间，从发出请求到接收回复；后者是 TiKV Coprocessor 算子自己记录的时间。两者相差较大可能说明在等待、gRPC 或网络上耗时较长。

EXPLAIN ANALYZE

EXPLAIN ANALYZE

EXPLAIN ANALYZE 输出格式

算子执行信息介绍

Batch_Point_Get

TableReader

IndexJoin

IndexHashJoin

TableFullScan (TiFlash)

lock_keys 执行信息

其它常见执行信息

另请参阅