日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 运维知识 > 数据库 >内容正文

数据库

庖丁解牛|图解 MySQL 8.0 优化器查询转换篇

發(fā)布時間:2024/8/23 数据库 30 豆豆
生活随笔 收集整理的這篇文章主要介紹了 庖丁解牛|图解 MySQL 8.0 优化器查询转换篇 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

簡介:?本篇介紹子查詢、分析表和JOIN的復(fù)雜轉(zhuǎn)換過程

一 ?背景和架構(gòu)

在《庖丁解牛-圖解MySQL 8.0優(yōu)化器查詢解析篇》一文中我們重點介紹了MySQL最新版本8.0.25關(guān)于SQL基本元素表、列、函數(shù)、聚合、分組、排序等元素的解析、設(shè)置和轉(zhuǎn)換過程,本篇我們繼續(xù)來介紹更為復(fù)雜的子查詢、分區(qū)表和JOIN的復(fù)雜轉(zhuǎn)換過程,大綱如下:

Transformation

  • remove_redundant_subquery_clause :Permanently remove redundant parts from the query if 1) This is a subquery 2) Not normalizing a view. Removal should take place when a query involving a view is optimized, not when the view is created.
  • remove_base_options:Remove SELECT_DISTINCT options from a query block if can skip distinct
  • resolve_subquery :Resolve predicate involving subquery, perform early unconditional subquery transformations
  • Convert subquery predicate into semi-join, or
  • Mark the subquery for execution using materialization, or
  • Perform IN->EXISTS transformation, or
  • Perform more/less ALL/ANY -> MIN/MAX rewrite
  • Substitute trivial scalar-context subquery with its value
  • transform_scalar_subqueries_to_join_with_derived:Transform eligible scalar subqueries to derived tables.
  • flatten_subqueries :Convert semi-join subquery predicates into semi-join join nests. Convert candidate subquery predicates into semi-join join nests. This transformation is performed once in query lifetime and is irreversible.
  • apply_local_transforms :
  • delete_unused_merged_columns : If query block contains one or more merged derived tables/views, walk through lists of columns in select lists and remove unused columns.
  • simplify_joins : Convert all outer joins to inner joins if possible.
  • prune_partitions :Perform partition pruning for a given table and condition.
  • push_conditions_to_derived_tables :Pushing conditions down to derived tables must be done after validity checks of grouped queries done by apply_local_transforms();
  • Window::eliminate_unused_objects:Eliminate unused window definitions, redundant sorts etc.

二 ?詳細轉(zhuǎn)換過程

1 ?解析子查詢(resolve_subquery)

解析條件中帶有子查詢的語句,做一些早期的無限制的子查詢轉(zhuǎn)換,包括:

  • 標(biāo)記subquery是否變成semi-join

轉(zhuǎn)換判斷條件

  • 檢查OPTIMIZER_SWITCH_SEMIJOIN和HINT沒有限制
  • 子查詢是IN/=ANY和EXIST subquery的謂詞
  • 子查詢是簡單查詢塊而不是UNION
  • 子查詢無隱形和顯性的GROUP BY
  • 子查詢沒有HAVING、WINDOW函數(shù)
  • Resolve的階段是Query_block::RESOLVE_CONDITION和Query_block::RESOLVE_JOIN_NEST并且沒有用到最新的Hyper optimizer優(yōu)化器。
  • 外查詢塊可以支持semijoins
  • 至少要一個表,而不是類似"SELECT 1"
  • 子查詢的策略還沒有指定Subquery_strategy::UNSPECIFIED
  • 父查詢也至少有一個表
  • 父查詢和子查詢都不能有straight join
  • 父查詢塊不禁止semijoin
  • IN謂詞返回值是否是確定的,不是RAND
  • 根據(jù)子查詢判斷結(jié)果是否需要轉(zhuǎn)成true還是false以及是否為NULL,判斷是可以做antijoin還是semijoin
  • Antijoin是可以支持的,或者是semijoin
  • offset和limit對于semjoin是有效的,offset是從第一行開始,limit也不是0

設(shè)置Subquery_strategy::CANDIDATE_FOR_SEMIJOIN并添加sj_candidates

  • 標(biāo)記subquery是否執(zhí)行時采用materialization方案
  • 如果不符合轉(zhuǎn)換semijoin,嘗試使用物化方式,轉(zhuǎn)換判斷條件
  • Optimzier開關(guān)subquery_to_derived=on
  • 子查詢是IN/=ANY or EXISTS謂詞
  • 子查詢是簡單查詢塊而不是UNION
  • 如果是[NOT] EXISTS,必須沒有聚合
  • Subquery謂詞在WHERE子句(目前沒有在ON子句實現(xiàn)),而且是ANDs or ORs的表達式tree
  • 父查詢塊支持semijoins
  • 子查詢的策略還沒有指定Subquery_strategy::UNSPECIFIED
  • 父查詢也至少有一個表,然后可以做LEFT JOIN
  • 父查詢塊不禁止semijoin
  • IN謂詞返回值是否是確定的,不是RAND
  • 根據(jù)子查詢判斷結(jié)果是否需要轉(zhuǎn)成true還是false以及是否為NULL,判斷是可以做antijoin還是semijoin
  • 不支持左邊參數(shù)不是multi-column子查詢(WHERE (outer_subq) = ROW(derived.col1,derived.col2))
  • 該子查詢不支持轉(zhuǎn)換為Derived table(m_subquery_to_derived_is_impossible)
  • 設(shè)置Subquery_strategy::CANDIDATE_FOR_DERIVED_TABLE并添加sj_candidates
  • 如果上面兩個策略無法使用,根據(jù)類型選擇transformer
  • Item_singlerow_subselect::select_transformer
  • 對于簡單的標(biāo)量子查詢,在查詢中直接用執(zhí)行結(jié)果代替
select * from t1 where a = (select 1); => select * from t1 where a = 1;

Item_in_subselect/Item_allany_subselect::select_transformer->select_in_like_transformer

  • select_in_like_transformer函數(shù)來處理 IN/ALL/ANY/SOME子查詢轉(zhuǎn)換transformation
  • 處理"SELECT 1"(Item_in_optimizer)

  • 如果目前還沒有子查詢的執(zhí)行方式,也就是無法使用semijoin/antijoin執(zhí)行的子查詢,會做IN->EXISTS的轉(zhuǎn)換,本質(zhì)是在物化執(zhí)行和迭代式循環(huán)執(zhí)行中做選擇。IN語法代表非相關(guān)子查詢僅執(zhí)行一次,將查詢結(jié)果物化成臨時表,之后需要結(jié)果時候就去物化表中查找;EXISTS代表對于外表的每一條記錄,子查詢都會執(zhí)行一次,是迭代式循環(huán)執(zhí)行。子查詢策略設(shè)定為Subquery_strategy::CANDIDATE_FOR_IN2EXISTS_OR_MAT
  • 重寫single-column的IN/ALL/ANY子查詢(single_value_transformer)
oe $cmp$ (SELECT ie FROM ... WHERE subq_where ... HAVING subq_having) => - oe $cmp$ (SELECT MAX(...) ) // handled by Item_singlerow_subselect - oe $cmp$ \<max\>(SELECT ...) // handled by Item_maxmin_subselect ? fails=>Item_in_optimizer - 對于已經(jīng)是materialized方案,不轉(zhuǎn)換 - 通過equi-join轉(zhuǎn)換IN到EXISTS
  • 如果是ALL/ANY單值subquery謂詞,嘗試用MIN/MAX子查詢轉(zhuǎn)換
SELECT * FROM t1 WHERE a < ANY (SELECT a FROM t1); => SELECT * FROM t1 WHERE a < (SELECT MAX(a) FROM t1)

  • 不滿足上面,調(diào)用single_value_in_to_exists_transformer轉(zhuǎn)換IN到EXISTS
  • 轉(zhuǎn)換將要將子查詢設(shè)置為相關(guān)子查詢,設(shè)置UNCACHEABLE_DEPENDENT標(biāo)識
  • 如果子查詢包含聚合函數(shù)、窗口函數(shù)、GROUP語法、HAVING語法,將判斷條件加入到HAVING子句中,另外通過ref_or_null_helper來區(qū)分NULL和False的結(jié)果,如需要處理NULL IN (SELECT ...)還需要封裝到Item_func_trig_cond觸發(fā)器中。
SELECT ... FROM t1 WHERE t1.b IN (SELECT <expr of SUM(t1.a)> FROM t2) => SELECT ... FROM t1 WHERE t1.b IN (SELECT <expr of SUM(t1.a)> FROM t2[trigcond] HAVING t1.b=ref-to-<expr of SUM(t1.a)>)

  • 如果子查詢不包含聚合函數(shù)、窗口函數(shù)、GROUP語法,會放在WHERE查詢條件中,當(dāng)然如果需要處理NULL情況還是要放入HAVING子句(Item_func_trig_cond+Item_is_not_null_test)。
不需要區(qū)分NULL和FALSE的子查詢: ? SELECT 1 FROM ... WHERE (oe $cmp$ ie) AND subq_where ? 需要區(qū)分的子查詢: SELECT 1 FROM ...WHERE subq_where AND trigcond((oe $cmp$ ie) OR (ie IS NULL))HAVING trigcond(@<is_not_null_test@>(ie))
  • JOIN::optimize()會計算materialization和EXISTS轉(zhuǎn)換的代價進行選擇,設(shè)置m_subquery_to_derived_is_impossible = true
  • ROW值轉(zhuǎn)換,通過Item_in_optimizer,不支持ALL/ANY/SOME(row_value_transformer)
  • Item_in_subselect::row_value_in_to_exists_transformer
for (each left operand)create the equi-join conditionif (is_having_used || !abort_on_null)create the "is null" and is_not_null_test itemsif (is_having_used)add the equi-join and the null tests to HAVINGelseadd the equi-join and the "is null" to WHEREadd the is_not_null_test to HAVING
  • 沒有HAVING表達式
(l1, l2, l3) IN (SELECT v1, v2, v3 ... WHERE where) => EXISTS (SELECT ... WHERE where and(l1 = v1 or is null v1) and(l2 = v2 or is null v2) and(l3 = v3 or is null v3)[ HAVING is_not_null_test(v1) andis_not_null_test(v2) andis_not_null_test(v3)) ] <-- 保證不為NULL可以去掉HAVING
  • 有HAVING表達式
(l1, l2, l3) IN (SELECT v1, v2, v3 ... HAVING having) => EXISTS (SELECT ... HAVING having and(l1 = v1 or is null v1) and(l2 = v2 or is null v2) and(l3 = v3 or is null v3) andis_not_null_test(v1) andis_not_null_test(v2) andis_not_null_test(v3))

2 ?轉(zhuǎn)換的標(biāo)量子查詢轉(zhuǎn)換成Derived Table(transform_scalar_subqueries_to_join_with_derived)

該特性是官方在8.0.16中為了更好的支持Secondary Engine(Heapwave)的分析下推,增強了子查詢的轉(zhuǎn)換能力。可以先直觀的看下轉(zhuǎn)換和不轉(zhuǎn)換的執(zhí)行計劃的不同:

root:test> set optimizer_switch = 'subquery_to_derived=off'; Query OK, 0 rows affected (0.00 sec) ? root:test> EXPLAIN SELECT b, MAX(a) AS ma FROM t4 GROUP BY b HAVING ma < (SELECT MAX(t2.a) FROM t2 WHERE t2.b=t4.b); +----+--------------------+-------+------------+------+---------------+------+---------+------+------+----------+-----------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------------+-------+------------+------+---------------+------+---------+------+------+----------+-----------------+ | 1 | PRIMARY | t4 | NULL | ALL | NULL | NULL | NULL | NULL | 10 | 100.00 | Using temporary | | 2 | DEPENDENT SUBQUERY | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 33.33 | Using where | +----+--------------------+-------+------------+------+---------------+------+---------+------+------+----------+-----------------+ 2 rows in set, 3 warnings (0.00 sec) ? root:test> set optimizer_switch = 'subquery_to_derived=on'; Query OK, 0 rows affected (0.00 sec) ? root:test> EXPLAIN SELECT b, MAX(a) AS ma FROM t4 GROUP BY b HAVING ma < (SELECT MAX(t2.a) FROM t2 WHERE t2.b=t4.b); +----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+ | 1 | PRIMARY | t4 | NULL | ALL | NULL | NULL | NULL | NULL | 10 | 100.00 | Using temporary | | 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | Using where; Using join buffer (hash join) | | 2 | DERIVED | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | Using temporary | +----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+ 3 rows in set, 3 warnings (0.01 sec)
  • transform_scalar_subqueries_to_join_with_derived具體轉(zhuǎn)換的過程如下:
  • 首先從JOIN條件、WHERE條件、HAVING條件和SELECT list中收集可以轉(zhuǎn)換的標(biāo)量子查詢(Item::collect_scalar_subqueries)。
  • 遍歷這些子查詢,判斷是否可以增加一個額外的轉(zhuǎn)換(transform_grouped_to_derived):把隱性的GROUP BY標(biāo)量子查詢變成Derived Table。
SELECT SUM(c1), (SELECT SUM(c1) FROM t3) scalar FROM t1; 轉(zhuǎn)換為=> SELECT derived0.summ, derived1.scalar FROM (SELECT SUM(a) AS summ FROM t1) AS derived0LEFT JOIN(SELECT SUM(b) AS scalar FROM t3) AS derived1ON TRUE 執(zhí)行計劃如下: explain SELECT SUM(a), (SELECT SUM(c1) FROM t3) scalar FROM t1; +----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+ | 1 | PRIMARY | <derived3> | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL | | 1 | PRIMARY | <derived4> | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | Using where; Using join buffer (hash join) | | 4 | DERIVED | t3 | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL | | 3 | DERIVED | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | NULL | +----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
  • 收集唯一的聚合函數(shù)Item列表(collect_aggregates),這些Item將會被新的Derived Table的列代替。
  • 還需要添加所有引用到這些Item的fields,包括直接在SELECT列表的,Window函數(shù)參數(shù)、ORDER by、Partition by包含的,還有該查詢塊中ORDER BY的列,因為他們都會引動到Derived Table里。
  • 創(chuàng)建Derived Table需要的Query_expression/Query_block(create_query_expr_and_block)。
  • 添加Derived Table到查詢塊和top_join_list中。
  • 保留舊的子查詢單元塊,如果包含可以轉(zhuǎn)化的Derived的移到Derived Table下面的Query_block,如果不包含,保留到原來的子查詢塊中。
  • 將之前的聚合函數(shù)Item列表插入到Derived Table的查詢塊中。
  • 收集除GROUP AGG表達式中的列,由于這些fields已經(jīng)移動到Derived Table中,刪除不合理的fields引用。
  • 收集所有唯一的列和View的引用后,將他們加到新的Derived Table列表中。
  • 對新的新的Derived Table進行flatten_subqueries/setup_tables
  • 重新resolve_placeholder_tables,不處理進行轉(zhuǎn)換后的子查詢。
  • 處理Derived Table中,新加入的HAVING條件中的聚合函數(shù)Item,并通過Item_aggregate_refs引用到new_derived->base_ref_items而不是之前的父查詢塊base_ref_items。
  • 永久代替父查詢塊中的聚合函數(shù)列表,變成Derived Table的列,并刪除他們。
  • 之前保存和加入到Derived Table的唯一的列和View的引用,也要替換新的fields代替他們的引用。

  • 但目前不支持HAVING表達式中包含該子查詢,其實也是可以轉(zhuǎn)換的。
SELECT SUM(a), (SELECT SUM(b) FROM t3) scalar FROM t1 HAVING SUM(a) > scalar; 轉(zhuǎn)換為=> SELECT derived0.summ, derived1.scalar FROM (SELECT SUM(a) AS summ FROM t1) AS derived0LEFT JOIN(SELECT SUM(b) AS scalar FROM t3) AS derived1ON TRUE WHERE derived0.sum > derived1.scalar;
  • 接下來遍歷所有可以轉(zhuǎn)換的子查詢,把他們轉(zhuǎn)換成derived tables,并替換相應(yīng)的表達式變成列(transform_subquery_to_derived)。
  • 生成derived table的TABLE_LIST(synthesize_derived)。
  • 將可以移動到derived table的where_cond設(shè)置到j(luò)oin_cond上。
  • 添加derived table到查詢塊的表集合中。
  • decorrelate_derived_scalar_subquery_pre
  • 添加非相關(guān)引用列(NCF)到SELECT list,這些條件被JOIN條件所引用,并且還有另外一個fields包含了外查詢相關(guān)的列,我們稱之為'lifted_where'
  • 添加COUNT(*)到SELECT list,這樣轉(zhuǎn)換的查詢塊可以進行cardinality的檢查。比如沒有任何聚合函數(shù)在子查詢中。如果確定包含聚合函數(shù),返回一行一定是NCF同時在GROUP BY列表中。
  • 添加NCF到子查詢的GROUP列表中,如果已經(jīng)在了,需要加到最后,如果發(fā)生GROUP BY的列由于依賴性檢查失敗,還要加Item_func_any_value(非聚合列)到SELECT list。對于NCF會創(chuàng)建 derived.field和derived.`count(field)` 。
  • 設(shè)置物化的一些準(zhǔn)備(setup_materialized_derived)。
  • decorrelate_derived_scalar_subquery_post:
  • 創(chuàng)建對應(yīng)的'lifted_fields'。
  • 更新JOIN條件中相關(guān)列的引用,不在引用外查詢而換成Derived table相關(guān)的列。
  • 代替WHERE、JOIN、HAVING條件和SELECT list中的子查詢的表達式變成對應(yīng)的Derived Table里面列。

下面圖解該函數(shù)的轉(zhuǎn)換過程和結(jié)果:

3 ?扁平化子查詢(flatten_subqueries)

該函數(shù)主要是將Semi-join子查詢轉(zhuǎn)換為nested JOIN,這個過程只有一次,并且不可逆。

  • 簡單來講步驟可以簡化理解為:
  • 創(chuàng)建SEMI JOIN (it1 ... itN)語以部分,并加入到外層查詢塊的執(zhí)行計劃中。
  • 將子查詢的WHERE條件以及JOIN條件,加入到父查詢的WHERE條件中。
  • 將子查詢謂詞從父查詢的判斷謂詞中消除。
  • 由于MySQL在一個query block中能夠join的tables數(shù)是有限的(MAX_TABLES),不是所有sj_candidates都可以做因此做flatten_subqueries 的,因此需要有優(yōu)先級決定的先后順序先unnesting掉,優(yōu)先級規(guī)則如下:
  • 相關(guān)子查詢優(yōu)先于非相關(guān)的
  • inner tables多的子查詢大于inner tables少的
  • 位置前的子查詢大于位置后的
subq_item->sj_convert_priority =(((dependent * MAX_TABLES_FOR_SIZE) + // dependent subqueries firstchild_query_block->leaf_table_count) *65536) + // then with many tables(65536 - subq_no); // then based on position
  • 另外,由于遞歸調(diào)用flatten_subqueries是bottom-up,依次把下層的子查詢展開到外層查詢塊中。
for SELECT#1 WHERE X IN (SELECT #2 WHERE Y IN (SELECT#3)) : ?Query_block::prepare() (select#1)-> fix_fields() on IN condition-> Query_block::prepare() on subquery (select#2)-> fix_fields() on IN condition-> Query_block::prepare() on subquery (select#3)<- Query_block::prepare()<- fix_fields()-> flatten_subqueries: merge #3 in #2<- flatten_subqueries<- Query_block::prepare()<- fix_fields()-> flatten_subqueries: merge #2 in #1
  • 遍歷子查詢列表,刪除Item::clean_up_after_removal標(biāo)記為Subquery_strategy::DELETED的子查詢,并且根據(jù)優(yōu)先級規(guī)則設(shè)置sj_convert_priority。根據(jù)優(yōu)先級進行排序。
  • 遍歷排序后的子查詢列表,對于Subquery_strategy::CANDIDATE_FOR_DERIVED_TABLE策略的子查詢,轉(zhuǎn)換子查詢([NOT] {IN, EXISTS})為JOIN的Derived table(transform_table_subquery_to_join_with_derived)
FROM [tables] WHERE ... AND/OR oe IN (SELECT ie FROM it) ... => FROM (tables) LEFT JOIN (SELECT DISTINCT ie FROM it) AS derivedON oe = derived.ie WHERE ... AND/OR derived.ie IS NOT NULL ...
  • 設(shè)置策略為Subquery_strategy::DERIVED_TABLE
  • semijoin子查詢不能和antijoin子查詢相互嵌套,或者外查詢表已經(jīng)超過MAX_TABLE,不做轉(zhuǎn)換,否則標(biāo)記為Subquery_strategy::SEMIJOIN策略。
  • 判斷子查詢的WHERE條件是否為常量。如果判斷條件永遠為FALSE,那么子查詢結(jié)果永遠為空。該情況下,調(diào)用Item::clean_up_after_removal標(biāo)記為Subquery_strategy::DELETED,刪除該子查詢。
  • 如果無法標(biāo)記為Subquery_strategy::DELETED/設(shè)置Subquery_strategy::SEMIJOIN策略的重新標(biāo)記會Subquery_strategy::UNSPECIFIED繼續(xù)下一個。
  • 替換外層查詢的WHERE條件中子查詢判斷的條件(replace_subcondition)
  • 子查詢內(nèi)條件并不永遠為FALSE,或者永遠為FALSE的情況下,需要改寫為antijoin(antijoin情況下,子查詢結(jié)果永遠為空,外層查詢條件永遠通過)。此時將條件改為永遠為True。
  • 子查詢永遠為FALSE,且不是antijoin。那么將外層查詢中的條件改成永遠為False。
  • Item_subselect::EXISTS_SUBS不支持有聚合操作
  • convert_subquery_to_semijoin函數(shù)解析如下模式的SQL
  • IN/=ANY謂詞
  • 如果條件滿足解關(guān)聯(lián),解關(guān)聯(lián)decorrelate_condition
  • 添加解關(guān)聯(lián)的內(nèi)表表達式到 SELECT list
  • 收集FROM子句中的外表相關(guān)的 derived table或join條件
  • 去掉關(guān)聯(lián)標(biāo)識UNCACHEABLE_DEPENDENT,更新used table
  • Derived table子查詢增加SELECT_DISTINCT標(biāo)識
  • 轉(zhuǎn)換子查詢成為一個derived table,并且插入到所屬于的查詢塊FROM后(transform_subquery_to_derived)
  • 創(chuàng)建derived table及其join條件
  • 遍歷父查詢塊的WHERE,替換該子查詢的Item代替成derived table(replace_subcondition)
  • 遍歷排序后的子查詢列表,對于Subquery_strategy::CANDIDATE_FOR_SEMIJOIN策略的子查詢。
  • 判斷是否可以轉(zhuǎn)換為semijoin
  • 遍歷排序后的子查詢列表,對于Subquery_strategy::SEMIJOIN的子查詢,開始轉(zhuǎn)換為semijoin/antijoin(convert_subquery_to_semijoin)
  • convert_subquery_to_semijoin函數(shù)解析如下模式的SQL
  • IN/=ANY謂詞
SELECT ...FROM ot1 ... otNWHERE (oe1, ... oeM) IN (SELECT ie1, ..., ieMFROM it1 ... itK[WHERE inner-cond])[AND outer-cond][GROUP BY ...] [HAVING ...] [ORDER BY ...] =>SELECT ...FROM (ot1 ... otN) SJ (it1 ... itK)ON (oe1, ... oeM) = (ie1, ..., ieM)[AND inner-cond][WHERE outer-cond][GROUP BY ...] [HAVING ...] [ORDER BY ...]
  • EXISTS謂詞
SELECT ...FROM ot1 ... otNWHERE EXISTS (SELECT expressionsFROM it1 ... itK[WHERE inner-cond])[AND outer-cond][GROUP BY ...] [HAVING ...] [ORDER BY ...] =>SELECT ...FROM (ot1 ... otN) SJ (it1 ... itK)[ON inner-cond][WHERE outer-cond][GROUP BY ...] [HAVING ...] [ORDER BY ...]
  • NOT EXISTS謂詞
SELECT ...FROM ot1 ... otNWHERE NOT EXISTS (SELECT expressionsFROM it1 ... itK[WHERE inner-cond])[AND outer-cond][GROUP BY ...] [HAVING ...] [ORDER BY ...] =>SELECT ...FROM (ot1 ... otN) AJ (it1 ... itK)[ON inner-cond][WHERE outer-cond AND is-null-cond(it1)][GROUP BY ...] [HAVING ...] [ORDER BY ...]
  • NOT IN謂詞
SELECT ...FROM ot1 ... otNWHERE (oe1, ... oeM) NOT IN (SELECT ie1, ..., ieMFROM it1 ... itK[WHERE inner-cond])[AND outer-cond][GROUP BY ...] [HAVING ...] [ORDER BY ...] =>SELECT ...FROM (ot1 ... otN) AJ (it1 ... itK)ON (oe1, ... oeM) = (ie1, ..., ieM)[AND inner-cond][WHERE outer-cond][GROUP BY ...] [HAVING ...] [ORDER BY ...]
  • 查找可以插入semi-join嵌套和其生成的條件的位置,比如對于 t1 LEFT JOIN t2, embedding_join_nest為t2,t2也可以是nested,如t1 LEFT JOIN (t2 JOIN t3))
  • 生成一個新的semijoin嵌套的TABLE_LIST表
  • 處理Antijoin
  • 將子查詢中潛在的表合并到上述join表(TABLE_LIST::merge_underlying_tables)
  • 將子查詢的葉子表插入到當(dāng)前查詢塊的葉子表后面,重新設(shè)置子查詢的葉子表的序號和依賴的外表。將子查詢的葉子表重置。
  • 如果是outer join的話,在join鏈表中傳遞可空性(propagate_nullability)
  • 將內(nèi)層子查詢中的關(guān)聯(lián)條件去關(guān)聯(lián)化,這些條件被加入到semijoin的列表里。這些條件必須是確定的,僅支持簡單判斷條件或者由簡單判斷條件組成的AND條件(Query_block::decorrelate_condition)
  • 判斷左右條件是否僅依賴于內(nèi)外層表,將其表達式分別加入到semijoin內(nèi)外表的表達式列表中(decorrelate_equality)
  • 解關(guān)聯(lián)內(nèi)層查詢的join條件(Query_block::decorrelate_condition)
  • 移除該子查詢表達式在父查詢的AST(Query_express::exclude_level)
  • 根據(jù)semi-join嵌套產(chǎn)生的WHERE/JOIN條件更新對應(yīng)的table bitmap(Query_block::fix_tables_after_pullout)
  • 將子查詢的WHERE條件上拉,更新使用表的信息(Item_cond_and::fix_after_pullout())
  • 根據(jù)semijoin的條件列表創(chuàng)建AND條件,如果有條件為常量True,則去除該條件;如果常量為False,則整個條件都去除(Query_block::build_sj_cond)
  • 將創(chuàng)建出來的semijoin條件加入到外層查詢的WHERE條件中
  • 最后遍歷排序后的子查詢列表,對于沒有轉(zhuǎn)換的子查詢,對于Subquery_strategy::UNSPECIFIED的策略,執(zhí)行IN->EXISTS改寫(select_transformer),如果確實原有的子查詢已經(jīng)有替代的Item,調(diào)用replace_subcondition解析并把他們加入到合適的WHERE或者ON子句。
  • 清除所有的sj_candidates列表
  • Semi-join有5中執(zhí)行方式,本文并不介紹Optimizer和Execution過程,詳細可以參考引用文章中關(guān)于semijoin的介紹,最后引入下控制semijoin優(yōu)化和執(zhí)行的優(yōu)化器開關(guān),其中semijoin=on/off是總開關(guān)。
SELECT @@optimizer_switch\G *************************** 1. row *************************** @@optimizer_switch: ......materialization=on,semijoin=on,loosescan=on,firstmatch=on,subquery_materialization_cost_based=on,......
  • 下圖舉例說明該轉(zhuǎn)換過程:
SELECT * FROM t1 WHERE t1.a in (SELECT t2.c1 FROM t2 where t2.c1 > 0); => /* select#1 */ SELECT `t1`.`a` AS `a` FROM `t1` SEMI JOIN (`t2`) WHERE ((`t1`.`a` = `t2`.`c1`) and (`t2`.`c1` > 0)) 執(zhí)行計劃如下: explain SELECT * FROM t1 WHERE t1.a in (SELECT t2.c1 FROM t2 where t2.c1 > 0); +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-----------------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-----------------------------------------------------------+ | 1 | SIMPLE | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | Using where; Start temporary | | 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 50.00 | Using where; End temporary; Using join buffer (hash join) | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-----------------------------------------------------------+

4 ?應(yīng)用當(dāng)前查詢塊轉(zhuǎn)換(apply_local_transforms)

該函數(shù)在flattern subqueries之后,bottom-up調(diào)用,主要分幾個步驟:

刪除無用列(delete_unused_merged_columns)

如果查詢塊已經(jīng)刪除了一些derived tables/views,遍歷SELECT列表的列,刪除不必要的列

簡化JOIN(simplify_joins)

該函數(shù)會把Query_block中的top_join_list的嵌套join的簡化為扁平化的join list。嵌套連接包括table1 join table2,也包含table1, (table2, table3)這種形式。如果所示的簡化過程:

分區(qū)表的靜態(tài)剪枝(prune_partitions)

由于剪枝根據(jù)HASH/RANGE/LIST及二級分區(qū)都有不同,這里簡單介紹下剪枝過程,現(xiàn)有prune_partitions是在prepare和optimize階段會被調(diào)用,某些常量子查詢被評估執(zhí)行完。

struct TABLE {...... partition_info *part_info{nullptr}; /* Partition related information *//* If true, all partitions have been pruned away */bool all_partitions_pruned_away{false};...... }SQL tranformation phase SELECT_LEX::apply_local_transforms --> prune_partitions ? for example, select * from employee where company_id = 1000 ; ? SQL optimizer phase JOIN::prune_table_partitions --> prune_partitions ------> based on tbl->join_cond_optim() or JOIN::where_cond ? for example, explain select * from employee where company_id = (select c1 from t1);
  • 舉例下面RANGE剪枝的過程:
root:ref> CREATE TABLE R2 (-> a INT,-> d INT-> ) PARTITION BY RANGE(a) (-> PARTITION p20 VALUES LESS THAN (20),-> PARTITION p40 VALUES LESS THAN (40),-> PARTITION p60 VALUES LESS THAN (60),-> PARTITION p80 VALUES LESS THAN (80),-> PARTITION p100 VALUES LESS THAN MAXVALUE-> ); Query OK, 0 rows affected (0.09 sec) ? root:ref> Select * From R2 where a > 40 and a < 80;
  • 剪枝詳細過程如下:
  • 由于剪枝需要根據(jù)不同條件產(chǎn)生的pruning結(jié)果進行交集,因此剪枝過程中需要使用read_partitions這樣的bitmap來保存是否使用該對應(yīng)分區(qū)。另外剪枝過程類似迭代判斷,因此引入了part_iterator來保存開始、結(jié)束和當(dāng)前,以及對應(yīng)需要獲取區(qū)間范圍的endpoint函數(shù)和獲取下一個值next的迭代器函數(shù)。這里巧妙的運用了指針,來兼容不同分區(qū)類型Hash/Range/List類型,如下圖所示:

  • 獲取join_cond或者m_where_cond的SEL_TREE紅黑樹(get_mm_tree)
  • 調(diào)用find_used_partitions來獲取滿足的分區(qū),對于SEL_TREE的每個區(qū)間(interval):1. 獲取區(qū)間的左右端點 2.從左邊繼續(xù)獲取下一個滿足的分區(qū),直到到右邊端點結(jié)束,每次調(diào)用完滿足條件的分區(qū)需要使用bitmap_set_bit設(shè)置該分區(qū)在part_info->read_partitions上的位點。
  • find_used_partitions是根據(jù)SEL_TREE的結(jié)構(gòu)進行遞歸,如圖從左到右遍歷next_key_part(and condition),然后再遍歷SEL_TREE的左右(也就是上下方向,or condition)深度遞歸。
(start)| $| Partitioning keyparts $ subpartitioning keyparts| $| ... ... $| | | $| +---------+ +---------+ $ +-----------+ +-----------+\-| par1=c1 |--| par2=c2 |-----| subpar1=c3|--| subpar2=c5|+---------+ +---------+ $ +-----------+ +-----------+| $ | || $ | +-----------+| $ | | subpar2=c6|| $ | +-----------+| $ || $ +-----------+ +-----------+| $ | subpar1=c4|--| subpar2=c8|| $ +-----------+ +-----------+| $| $+---------+ $ +------------+ +------------+| par1=c2 |------------------| subpar1=c10|--| subpar2=c12|+---------+ $ +------------+ +------------+| $... $ ? 例如第一行(par1=c1 and par2=c2 and subpar1=c3 and subpar2=c5)的遍歷的stack將是: in find_used_partitions(key_tree = "subpar2=c5") (***) in find_used_partitions(key_tree = "subpar1=c3") in find_used_partitions(key_tree = "par2=c2") (**) in find_used_partitions(key_tree = "par1=c1") in prune_partitions(...) 然后是繼續(xù)下面的條件,以此類推 or(par1=c1 and par2=c2 and subpar1=c3 and subpar2=c6) or(par1=c1 and par2=c2 and subpar1=c4 and subpar2=c8) or(par1=c2 and subpar1=c10 and subpar2=c12)
  • 下圖來展示了pruning的結(jié)構(gòu)和過程:

5 ?下推條件到Derived Table(push_conditions_to_derived_tables)

該函數(shù)將條件下推到derived tables,詳細見WL#8084 - Condition pushdown to materialized derived table。

root:test> set optimizer_switch = 'derived_merge=off'; // 關(guān)閉dervied_merge 測試下推能力 Query OK, 0 rows affected (0.00 sec) ? root:test> EXPLAIN FORMAT=tree SELECT * FROM (SELECT c1,c2 FROM t1) as dt WHERE c1 > 10; +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Table scan on dt (cost=2.51..2.51 rows=1)-> Materialize (cost=2.96..2.96 rows=1)-> Filter: (t1.c1 > 10) (cost=0.35 rows=1)-> Table scan on t1 (cost=0.35 rows=1)| +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

過程如下:

  • 遍歷derived table列表,判斷是否可以下推(can_push_condition_to_derived),如果包括下面的情況則不能下推:
  • Derived table有UNION
  • Derived table有LIMIT
  • Derived table不能是outer join中的內(nèi)表,會導(dǎo)致更多NULL補償?shù)男?/li>
  • 不能是CTE包含的Derived table
  • 創(chuàng)建可以下推到的Derived table的where cond(Condition_pushdown::make_cond_for_derived)
  • 保留剩余不能下推的條件(Condition_pushdown::get_remainder_cond)
  • Top-down遞歸調(diào)用push_conditions_to_derived_tables

詳細圖解該過程如下:

三 ?綜述

兩篇文章重點介紹了下優(yōu)化器的基于規(guī)則的優(yōu)化部分,并沒有涉及更多的基于代價的優(yōu)化,可以看到對于直接運用規(guī)則優(yōu)化帶來執(zhí)行的加速,那么可以直接轉(zhuǎn)換,尤其是對于查詢結(jié)構(gòu)上面的變化類轉(zhuǎn)換,如merge_derived。對于運用規(guī)則優(yōu)化無法判斷是否帶來執(zhí)行的加速,那么優(yōu)化器會保留一些臨時結(jié)構(gòu),為后續(xù)的代價估算提供更多選擇,如IN/EXIST/Materialized轉(zhuǎn)換。當(dāng)然還有一些,又改變查詢結(jié)構(gòu)又無法判定是否規(guī)則轉(zhuǎn)換帶來的執(zhí)行加速,MySQL目前還不支持。文章雖然詳盡,但無法覆蓋全部情況,也是為了拋磚引玉,還需要讀者自己通過調(diào)試的方法更進一步了解某一類SQL的具體過程。

四 ?參考資料

《MySQL 8.0 Server層最新架構(gòu)詳解》

《WL#13520: Transform correlated scalar subqueries》

《WL#8084 - Condition pushdown to materialized derived table》

《WL#2980: Subquery optimization: Semijoin》

  • WL#3740: Subquery optimization: Semijoin: Pull-out of inner tables
  • WL#3741: Subquery optimization: Semijoin: Duplicate elimination strategy
  • WL#3750: Subquery optimization: Semijoin: First-match strategy
  • WL#3751: Subquery optimization: Semijoin: Inside-out strategy

《WL#4389: Subquery optimizations: Make IN optimizations also handle EXISTS》

《WL#4245: Subquery optimization: Transform NOT EXISTS and NOT IN to anti-join》

《WL#2985: Perform Partition Pruning of Range conditions》
《MySQL · 源碼分析 · Semi-join優(yōu)化執(zhí)行代碼分析》
《MySQL·源碼分析·子查詢優(yōu)化源碼分析》《Optimizing Subqueries, Derived Tables, View References, and Common Table Expressions》

原文鏈接
本文為阿里云原創(chuàng)內(nèi)容,未經(jīng)允許不得轉(zhuǎn)載。?

總結(jié)

以上是生活随笔為你收集整理的庖丁解牛|图解 MySQL 8.0 优化器查询转换篇的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。