MySQL Optimizer的子查询优化行为

    xiaoxiao2025-12-10  4

    MySQL Optimizer

    先谈谈Optimizer。

    Optimizer即数据库优化器,在MySQL中大家普遍称之为查询优化器。在任何一款数据库中,Optimizer一直占据着最为核心的角色,有些人将它比喻为数据库的大脑,真是太贴切了,因为从一定程度上来讲优化器的设计原则决定着一款数据库的行为。那么优化的前提是了解优化器,优化的作用是辅助优化器

    MySQL Optimizer属于CBO,并且在不断地改进中已经渐渐开始支持各种主流的优化技术,在生成执行计划之前进行部分优化工作。mySQL中的SQL处理过程大致如下,

    语法分析器处理的结果是将SQL语句解析为查询树(select_lex),优化器以JOIN::prepare方法作为入口,最终将语法树转变为最优的执行计划。期间交错进行着逻辑优化与物理优化,各个组件及数据结构彼此配合,共同完成查询优化以及执行计划生成这一个完整的工作流程。

    再来说一下逻辑优化与物理优化。

    逻辑优化,主要功能是基于关系代数以及启发式规则,找出SQL语句等价的变换形式,使得SQL执行更高效;物理优化,即根据代价估算模型,选择最优的表连接顺序以及数据访问方式,这个阶段依赖于对数据的了解。

    从我们的角度来看,逻辑优化就是优化器对我们的SQL进行了一次改进优化,那么优化器就不单单只是简单地解析SQL,为了让生活更美好,它会选择自己更乐意接受的语法形式

    子查询优化技术

    在逻辑优化阶段,其中一个核心工作就是进行子查询的优化,因为子查询实在是太消耗性能了,MySQL查询优化器在演进历程中渐渐支持了子查询的优化,这还是相当有必要的。在MySQL Optimizer中,子查询的处理贯穿在整个优化器代码中,

    在JOIN::prepare查询准备阶段就进行了一部分特殊子查询的优化,如

    remove_redundant_subquery_clauses //去冗余子句 resolve_subquery //简单格式的semi-join优化,否则调用select_transformer select_transformer //其他类型子查询优化,如IN/ALL/ANY等价转换

    JOIN::optimize方法中,融合了semi-join、Materialization以及EXISTS strategy这三种主要的优化策略,如

    flatten_subqueries //子查询展开,即子查询上拉,无法上拉则使用EXISTS strategy方式 make_join_statistics //完成多表连接,此函数会进行嵌套连接的Materialization优化,并确定子查询的优化策略

    子查询优化的主流技术有三种:

    子查询合并。即把多个子查询合并成一个子查询,作用是减少元组扫描的次数

    子查询反嵌套。也称作子查询上拉,即将子查询重写为等价的多表连接

    聚集子查询消除。即聚集函数上推,将消除聚集函数的子查询与父查询做外连接

    其中,子查询反嵌套是一种最常用的子查询优化技术,同时MySQL中也只支持这种优化技术。MySQL 5.6中处理子查询的思路是,基于查询重写技术的规则,尽可能将子查询转换为连接(semi-join Semi-Join Transformations),并配合基于代价估算的Materialize、exists 等优化策略让子查询执行更优

    接下来会对各种技术以及实现场景进行测试验证,测试用表如下,

    [root@sakila 11:49:54]> desc customer_t; +-------------+----------------------+------+-----+-------------------+-----------------------------+ | Field | Type | Null | Key | Default | Extra | +-------------+----------------------+------+-----+-------------------+-----------------------------+ | | smallint(5) unsigned | NO | PRI | NULL | | ... ... 9 rows in set (0.07 sec) [root@sakila 02:38:29]> desc payment_t; +--------------+----------------------+------+-----+-------------------+-----------------------------+ | Field | Type | Null | Key | Default | Extra | +--------------+----------------------+------+-----+-------------------+-----------------------------+ | payment_id | smallint(5) unsigned | NO | PRI | NULL | | | | smallint(5) unsigned | NO | MUL | NULL | | ... ... 7 rows in set (0.00 sec)

    不支持的优化技术

    1、子查询合并技术

    对于两个相似的子查询作为where条件的查询,MySQL不支持合并处理。如下,

    [root@sakila 05:00:22]> explain extended select customer_t.store_id from customer_t where customer_id in (select customer_id from payment_t where payment_id>1000) or customer_id in (select customer_id from payment_t where payment_id<10); +----+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+ | 1 | PRIMARY | customer_t | ALL | NULL | NULL | NULL | NULL | 644 | 100.00 | Using where | | 3 | SUBQUERY | payment_t | range | PRIMARY | PRIMARY | 2 | NULL | 9 | 100.00 | Using where | | 2 | SUBQUERY | payment_t | range | PRIMARY | PRIMARY | 2 | NULL | 8272 | 100.00 | Using where | +----+-------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

    改写后的SQL语句如下,

    /* select#1 */ select `sakila`.`customer_t`.`store_id` AS `store_id` from `sakila`.`customer_t` where (<in_optimizer>(`sakila`.`customer_t`.`customer_id`,`sakila`.`customer_t`.`customer_id` in ( <materialize> (/* select#2 */ select `sakila`.`payment_t`.`customer_id` from `sakila`.`payment_t` where (`sakila`.`payment_t`.`payment_id` > 1000) ), <primary_index_lookup>(`sakila`.`customer_t`.`customer_id` in <temporary table> on <auto_key> where ((`sakila`.`customer_t`.`customer_id` = `materialized-subquery`.`customer_id`))))) or <in_optimizer>(`sakila`.`customer_t`.`customer_id`,`sakila`.`customer_t`.`customer_id` in ( <materialize> (/* select#3 */ select `sakila`.`payment_t`.`customer_id` from `sakila`.`payment_t` where (`sakila`.`payment_t`.`payment_id` < 10) ), <primary_index_lookup>(`sakila`.`customer_t`.`customer_id` in <temporary table> on <auto_key> where ((`sakila`.`customer_t`.`customer_id` = `materialized-subquery`.`customer_id`))))))

    从转换结果来看,优化器使用了多种优化策略来获得更快的元组扫描速率,但是两个子查询仍然单独进行,表payment_t被扫描了两次,而没有将两个子查询进行合并处理

    2、聚集子查询消除。

    子查询包含聚集函数,MySQL不支持消除,如下,

    [root@sakila 09:38:54]> explain extended select * from customer_t where customer_id in (select min(customer_id) from payment_t); +----+-------------+------------+------+---------------+------+---------+------+-------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+------+---------------+------+---------+------+-------+----------+-------------+ | 1 | PRIMARY | customer_t | ALL | NULL | NULL | NULL | NULL | 644 | 100.00 | Using where | | 2 | SUBQUERY | payment_t | ALL | NULL | NULL | NULL | NULL | 15448 | 100.00 | NULL | +----+-------------+------------+------+---------------+------+---------+------+-------+----------+-------------+

    优化后语句如下,

    /* select#1 */ select `sakila`.`customer_t`.`customer_id` AS `customer_id`,`sakila`.`customer_t`.`store_id` AS `store_id`,`sakila`.`customer_t`.`first_name` AS `first_name`,`sakila`.`customer_t`.`last_name` AS `last_name`,`sakila`.`customer_t`.`email` AS `email`,`sakila`.`customer_t`.`address_id` AS `address_id`,`sakila`.`customer_t`.`active` AS `active`,`sakila`.`customer_t`.`create_date` AS `create_date`,`sakila`.`customer_t`.`last_update` AS `last_update` from `sakila`.`customer_t` where <in_optimizer>(`sakila`.`customer_t`.`customer_id`,`sakila`.`customer_t`.`customer_id` in ( <materialize> (/* select#2 */ select min(`sakila`.`payment_t`.`customer_id`) from `sakila`.`payment_t` having 1 ), <primary_index_lookup>(`sakila`.`customer_t`.`customer_id` in <temporary table> on <auto_key> where ((`sakila`.`customer_t`.`customer_id` = `materialized-subquery`.`min(customer_id)`))))) 可以发现被优化后的查询仍然有子查询,select_type值为SUBQUERY,这里用到了materialize、auto_key等各种优化手段,但是没有实现聚集子查询消除

    特殊类型子查询的优化

    1.IN类型。

    MySQL支持简单in查询的优化,主要策略为semi-join、Materialization,否则会使用EXISTS strategy优化。这里用两个测试场景来分析优化行为,

    场景一:... out_expr in (select primary_key ... )

    [root@sakila 08:30:46]> explain extended select * from payment_t where customer_id in (select customer_id from customer_t); +----+-------------+------------+--------+---------------+---------+---------+------------------------------+-------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+--------+---------------+---------+---------+------------------------------+-------+----------+-------------+ | 1 | SIMPLE | payment_t | ALL | NULL | NULL | NULL | NULL | 15448 | 100.00 | NULL | | 1 | SIMPLE | customer_t | eq_ref | PRIMARY | PRIMARY | 2 | sakila.payment_t.customer_id | 1 | 100.00 | Using index | +----+-------------+------------+--------+---------------+---------+---------+------------------------------+-------+----------+-------------+改写后的语句如下, /* select#1 */ select `sakila`.`payment_t`.`payment_id` AS `payment_id`,`sakila`.`payment_t`.`customer_id` AS `customer_id`,`sakila`.`payment_t`.`staff_id` AS `staff_id`,`sakila`.`payment_t`.`rental_id` AS `rental_id`,`sakila`.`payment_t`.`amount` AS `amount`,`sakila`.`payment_t`.`payment_date` AS `payment_date`,`sakila`.`payment_t`.`last_update` AS `last_update` from `sakila`.`customer_t` join `sakila`.`payment_t` where (`sakila`.`customer_t`.`customer_id` = `sakila`.`payment_t`.`customer_id`) 通过优化器的转换结果,我们发现这是一个标准的子查询反嵌套,子查询被上拉至外部表,句式被转换成内表与外表的inner join,并且执行计划中也没有subquery。

    场景二:... primary_key in (select inner_expr ... )

    [root@sakila 08:30:21]> explain extended select * from customer_t where customer_id in (select customer_id from payment_t); +----+--------------+-------------+--------+---------------+------------+---------+-------------------------------+-------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------+-------------+--------+---------------+------------+---------+-------------------------------+-------+----------+-------------+ | 1 | SIMPLE | customer_t | ALL | PRIMARY | NULL | NULL | NULL | 644 | 100.00 | Using where | | 1 | SIMPLE | <subquery2> | eq_ref | <auto_key> | <auto_key> | 2 | sakila.customer_t.customer_id | 1 | 100.00 | NULL | | 2 | MATERIALIZED | payment_t | ALL | NULL | NULL | NULL | NULL | 15448 | 100.00 | NULL | +----+--------------+-------------+--------+---------------+------------+---------+-------------------------------+-------+----------+-------------+ 转换后的语句如下, /* select#1 */ select `sakila`.`customer_t`.`customer_id` AS `customer_id`,`sakila`.`customer_t`.`store_id` AS `store_id`,`sakila`.`customer_t`.`first_name` AS `first_name`,`sakila`.`customer_t`.`last_name` AS `last_name`,`sakila`.`customer_t`.`email` AS `email`,`sakila`.`customer_t`.`address_id` AS `address_id`,`sakila`.`customer_t`.`active` AS `active`,`sakila`.`customer_t`.`create_date` AS `create_date`,`sakila`.`customer_t`.`last_update` AS `last_update` from `sakila`.`customer_t` semi join (`sakila`.`payment_t`) where (`<subquery2>`.`customer_id` = `sakila`.`customer_t`.`customer_id`)

    这个语句和前面一个很类似,但是处理结果有一些区别。从执行计划看,好像还是有subquery,但是结合转换结果看,这种子查询被物化处理,然后进行了与外部表的semi join转换。

    这两个语句的差别在哪里呢?

    语句本身无差异,但是转换成join就会有很大区别,就是场景二的SQL转换为join之后还需要进行消重处理,这里的处理策略是物化为临时表,然后semi join消重。在这种场景下MySQL优化器还提供多种优化策略,如FirstMatch、LooseScan等,根据成本选择。

    这两种场景都实现了MySQL的子查询反嵌套技术,只是在后一种场景下进行了更细致的处理,让SQL的处理消耗更少的资源。在MySQL中,与各种独特优化策略相结合的子查询上拉技术,被统称为Semi-Join Transformations。

    2.ANY/ALL/SOME。

    MySQL对于ANY/ALL/SOME类型的优化,主要策略是基于查询重写技术,将查询转换成更易被优化的min()/max(),= ANY则等价于in,= ALL一般会通过EXISTS strategy进行优化

    场景一:outer_expr > ANY(select inner_expr ... )

    [root@sakila 10:32:10]> explain extended select * from customer_t where customer_id > ANY(select customer_id from payment_t); +----+-------------+------------+------+---------------+------+---------+------+-------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+------+---------------+------+---------+------+-------+----------+-------------+ | 1 | PRIMARY | customer_t | ALL | NULL | NULL | NULL | NULL | 644 | 100.00 | Using where | | 2 | SUBQUERY | payment_t | ALL | NULL | NULL | NULL | NULL | 15448 | 100.00 | NULL | +----+-------------+------------+------+---------------+------+---------+------+-------+----------+-------------+ 转换后的语句如下, /* select#1 */ select `sakila`.`customer_t`.`customer_id` AS `customer_id`,`sakila`.`customer_t`.`store_id` AS `store_id`,`sakila`.`customer_t`.`first_name` AS `first_name`,`sakila`.`customer_t`.`last_name` AS `last_name`,`sakila`.`customer_t`.`email` AS `email`,`sakila`.`customer_t`.`address_id` AS `address_id`,`sakila`.`customer_t`.`active` AS `active`,`sakila`.`customer_t`.`create_date` AS `create_date`,`sakila`.`customer_t`.`last_update` AS `last_update` from `sakila`.`customer_t` where <nop>((`sakila`.`customer_t`.`customer_id` > (/* select#2 */ select min(`sakila`.`payment_t`.`customer_id`) from `sakila`.`payment_t`)))

    从等价转换后的句式可以看到,子查询还是存在的,但是ANY被转换为>min()。那么这种转换的意义在哪里呢?

    现在对payment_t表添加一个索引:alter table payment_t add key(customer_id),然后再看看这条查询的执行计划

    [root@sakila 12:02:16]> explain extended select * from customer_t where customer_id > ANY(select customer_id from payment_t); +----+-------------+------------+------+---------------+------+---------+------+------+----------+------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+------+---------------+------+---------+------+------+----------+------------------------------+ | 1 | PRIMARY | customer_t | ALL | NULL | NULL | NULL | NULL | 644 | 100.00 | Using where | | 2 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away | +----+-------------+------------+------+---------------+------+---------+------+------+----------+------------------------------+

    可以发现,在customer_id字段添加了索引之后,优化器直接通过索引获取min(),Extra信息为Select tables optimized away。那么优化器的转换思路就是,朝更容易被使用索引进行物理优化的方向进行改写。

    场景二:... outer_expr < ALL(select inner_expr ... )

    [root@sakila 12:02:21]> explain extended select * from customer_t where customer_id < ALL(select customer_id from payment_t); 转换后的语句如下, /* select#1 */ select `sakila`.`customer_t`.`customer_id` AS `customer_id`,`sakila`.`customer_t`.`store_id` AS `store_id`,`sakila`.`customer_t`.`first_name` AS `first_name`,`sakila`.`customer_t`.`last_name` AS `last_name`,`sakila`.`customer_t`.`email` AS `email`,`sakila`.`customer_t`.`address_id` AS `address_id`,`sakila`.`customer_t`.`active` AS `active`,`sakila`.`customer_t`.`create_date` AS `create_date`,`sakila`.`customer_t`.`last_update` AS `last_update` from `sakila`.`customer_t` where <not>((`sakila`.`customer_t`.`customer_id` >= (/* select#2 */ select min(`sakila`.`payment_t`.`customer_id`) from `sakila`.`payment_t`)))和场景一类似,将其改写成更易获取的min()形式

    场景三:... outer_expr = ANY(select inner_expr ... )

    [root@sakila 12:06:15]> explain extended select * from customer_t where customer_id = ANY (select customer_id from payment_t); +----+--------------+-------------+--------+---------------+------------+---------+-------------------------------+-------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------+-------------+--------+---------------+------------+---------+-------------------------------+-------+----------+-------------+ | 1 | SIMPLE | customer_t | ALL | PRIMARY | NULL | NULL | NULL | 644 | 100.00 | Using where | | 1 | SIMPLE | <subquery2> | eq_ref | <auto_key> | <auto_key> | 2 | sakila.customer_t.customer_id | 1 | 100.00 | NULL | | 2 | MATERIALIZED | payment_t | ALL | NULL | NULL | NULL | NULL | 15448 | 100.00 | NULL | +----+--------------+-------------+--------+---------------+------------+---------+-------------------------------+-------+----------+-------------+ 转换后如下, /* select#1 */ select `sakila`.`customer_t`.`customer_id` AS `customer_id`,`sakila`.`customer_t`.`store_id` AS `store_id`,`sakila`.`customer_t`.`first_name` AS `first_name`,`sakila`.`customer_t`.`last_name` AS `last_name`,`sakila`.`customer_t`.`email` AS `email`,`sakila`.`customer_t`.`address_id` AS `address_id`,`sakila`.`customer_t`.`active` AS `active`,`sakila`.`customer_t`.`create_date` AS `create_date`,`sakila`.`customer_t`.`last_update` AS `last_update` from `sakila`.`customer_t` semi join (`sakila`.`payment_t`) where (`<subquery2>`.`customer_id` = `sakila`.`customer_t`.`customer_id`)此场景下,= ANY被作为IN处理,然后使用物化策略以及半连接转换进行了优化

    场景四:... outer_expr = ALL(select inner_expr ... )

    [root@sakila 12:06:21]> explain extended select * from customer_t where customer_id = ALL (select customer_id from payment_t); +----+--------------------+------------+------+---------------+------+---------+------+-------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------------+------------+------+---------------+------+---------+------+-------+----------+-------------+ | 1 | PRIMARY | customer_t | ALL | NULL | NULL | NULL | NULL | 644 | 100.00 | Using where | | 2 | DEPENDENT SUBQUERY | payment_t | ALL | NULL | NULL | NULL | NULL | 15448 | 100.00 | Using where | +----+--------------------+------------+------+---------------+------+---------+------+-------+----------+-------------+ 转换后句式如下, /* select#1 */ select `sakila`.`customer_t`.`customer_id` AS `customer_id`,`sakila`.`customer_t`.`store_id` AS `store_id`,`sakila`.`customer_t`.`first_name` AS `first_name`,`sakila`.`customer_t`.`last_name` AS `last_name`,`sakila`.`customer_t`.`email` AS `email`,`sakila`.`customer_t`.`address_id` AS `address_id`,`sakila`.`customer_t`.`active` AS `active`,`sakila`.`customer_t`.`create_date` AS `create_date`,`sakila`.`customer_t`.`last_update` AS `last_update` from `sakila`.`customer_t` where <not>(<in_optimizer>(`sakila`.`customer_t`.`customer_id`,<exists>(/* select#2 */ select 1 from `sakila`.`payment_t` where (<cache>(`sakila`.`customer_t`.`customer_id`) <> `sakila`.`payment_t`.`customer_id`))))从优化后的查询语句来看,= ALL被转换为exists语句,这说明使用了EXISTS strategy进行优化

    非特殊子查询

    1.Derived Tables。

    MySQL支持对派生表形式的子查询进行优化。

    对于派生类子查询的处理,优化器进一步优化了Materialization策略,只有当派生表中的数据必须加载时才可能被物化;对于join派生表的子查询类型,MySQL Optimizer支持auto_key策略优化,该策略会在物化的子查询结果集上添加索引,从而加速访问效率。

    模拟SQL场景如下,

    [root@sakila 02:32:16]> explain extended select customer_t.customer_id,derived_t.payment_id from customer_t join (select * from payment_t) as derived_t on customer_t.customer_id=derived_t.customer_id where derived_t.customer_id <100; +----+-------------+------------+-------+---------------+-------------+---------+-------------------------------+-------+----------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+-------+---------------+-------------+---------+-------------------------------+-------+----------+--------------------------+ | 1 | PRIMARY | customer_t | range | PRIMARY | PRIMARY | 2 | NULL | 98 | 100.00 | Using where; Using index | | 1 | PRIMARY | <derived2> | ref | <auto_key0> | <auto_key0> | 2 | sakila.customer_t.customer_id | 23 | 100.00 | NULL | | 2 | DERIVED | payment_t | ALL | NULL | NULL | NULL | NULL | 15448 | 100.00 | NULL | +----+-------------+------------+-------+---------------+-------------+---------+-------------------------------+-------+----------+--------------------------+ 优化器转换后如下, /* select#1 */ select `sakila`.`customer_t`.`customer_id` AS `customer_id`,`derived_t`.`payment_id` AS `payment_id` from `sakila`.`customer_t` join (/* select#2 */ select `sakila`.`payment_t`.`payment_id` AS `payment_id`,`sakila`.`payment_t`.`customer_id` AS `customer_id`, `sakila`.`payment_t`.`staff_id` AS `staff_id`,`sakila`.`payment_t`.`rental_id` AS `rental_id`,`sakila`.`payment_t`.`amount` AS `amount`, `sakila`.`payment_t`.`payment_date` AS `payment_date`,`sakila`.`payment_t`.`last_update` AS `last_update` from `sakila`.`payment_t`) `derived_t` where ((`derived_t`.`customer_id` = `sakila`.`customer_t`.`customer_id`) and (`sakila`.`customer_t`.`customer_id` < 100))可以看到,payment_t表首先被扫描,而在join操作时,派生表被使用索引方式访问,大大减少了不必要的扫描开销。但实际上这个SQL我们可以进行更高性能的上拉改写,从逻辑优化角度看,优化器能做到的并不多。

    2.其他复杂子查询

    MySQL Optimizer对具有UNION、ORDER BY ...LIMIT、GROUP BY等操作的子查询支持的也不是很好。大家可以找几个复杂的例子做下测试。

    另外,对于NULL值的忠告,当一个涉及多表操作的子查询的外部表达式包含NULL值时(如NULL in (select inner_expe ...)),NULL值将会在join optimizer中进行单独处理,无法享受到优化待遇,大量NULL值可能会带来严重的性能问题。

    辅助优化策略

    知道了优化器怎么做,我们才知道如何让它更好的工作,那么想要构造高性能的子查询,使用策略主要是以下几点,

    1、NOT NULL约束。不管是否为子查询,NULL值得处理都会是一个棘手的问题

    2、建立合适的索引。优化器总是朝更容易利用索引进行优化的方向进行转换

    3、多用连接而不是子查询。优化器处理子查询总是想方设法的将它转换为join,看来join才是他更喜欢的形式

    4、不要太复杂。要知道,目前的MySQL查询优化器对复杂查询的优化支持的并不好

    最新回复(0)