[Spark SQL基础]-- 基本语法之 select [hints ...]

    xiaoxiao2022-07-14  164

    背景

          今天偶然有机会看见了以前一位同学在 join 中使用了 mapjoin 小表广播的优化,由此激起了我对 select 语法中的 hints 部分的深入挖掘,并分享出来,供小伙伴们参考,不足之处,还望赐教!

          注:本文基于 spark 2.4.5,最新版本的 spark 可能会有 api 变化。

    目录

    select 基本语法hints 来源hints 的语法和选项hints 使用的组合

    内容

    1 select 基本语法结构

    SELECT [hints, ...] [ALL|DISTINCT] named_expression[, named_expression, ...] FROM relation[, relation, ...] [lateral_view[, lateral_view, ...]] [WHERE boolean_expression] [aggregation [HAVING boolean_expression]] [ORDER BY sort_expressions] [CLUSTER BY expressions] [DISTRIBUTE BY expressions] [SORT BY sort_expressions] [WINDOW named_window[, WINDOW named_window, ...]] [LIMIT num_rows] named_expression: : expression [AS alias] relation: | join_relation | (table_name|query|relation) [sample] [AS alias] : VALUES (expressions)[, (expressions), ...] [AS (column_name[, column_name, ...])] expressions: : expression[, expression, ...] sort_expressions: : expression [ASC|DESC][, expression [ASC|DESC], ...]

    2 hints 来源

    这是来源于创始人 Reynold Xin 提出的,自 Spark-2.2 开始增加的 框架。

    Patch:https://issues.apache.org/jira/browse/SPARK-20857

     

    3 hints 的语法和选项

    SELECT /*+ MAPJOIN(table_name) */ SELECT /*+ BROADCASTJOIN(table_name) */ SELECT /*+ BROADCAST(table_name) */ // spark -2.4.0 之后新增的功能 // 由中国贡献者提出并参与贡献 // https://issues.apache.org/jira/browse/SPARK-24940 SELECT /*+ REPARTITION(number) */ SELECT /*+ COALESCE(number) */

    4 hints 使用的组合

    mapjoin 结合 unionall 使用:select /*+ mapjoin(a) ,a.*,b.* from t_test a join t_map b on a.id=bid.id;repartition 和 coalesce 结合 group by 使用,用于修改 并行度和分区数量

     

    参考

    https://docs.databricks.com/spark/latest/spark-sql/language-manual/select.html

    https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-hint-framework.html

    https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala

    https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/hints.scala

    https://issues.apache.org/jira/browse/SPARK-16475

    https://issues.apache.org/jira/browse/SPARK-20857

    https://issues.apache.org/jira/browse/SPARK-24940

     

     

    最新回复(0)