SQL数据处理

    xiaoxiao2024-10-14  58

    DBeaver数据库操作软件,SQL语言

    SELECT …… FROM…… AS

    select 表内容 form 数据库名 order by asc/desc 升序或降序排列 只能出现一个selcet

    从一个/多个表中检索数据。 select 是sql中的数据查询关键词 from 和 where是后缀的定位词 。 select from【<数据库名!>】<表名>【,【<数据库名!>】<表名>……】 where<连接条件> SELECT后跟想检索的内容,一/多个字段,即可以是字段名也可以是函数(系统自定义的);也可以是星号,表示输出表中所有的字段。 FROM是检索内容的来源,即表的名称。 WHERE是指定查询条件,把满足逻辑表达式的座位查询结果,是可选项。

    SELECT …… FROM…… WHERE

    如:SELECT 学号,成绩 FROM 学生选课 WHERE 学号 like ‘2006*’ 在学生选课表中查询学号以2006开头的学生的学号和成绩。

    select distinct :去重操作,当多行重复时,只取出一行

    如: select DISTINCT t1.send_stockorg_code as org_id, t2.mng_org_name as mng_reg_id from edw_ai.dim_stockorg_relationship as t1 left join (select store_code, mng_org_name from edw_ai.dim_mng_org_store) as t2 on t1.receive_stockorg_code = t2.store_code ##

    两个应用,一个是SELECT…FROM…AS… 一个是,作为连接语句的操作符。 如sql中:create table tablename as select * from tablename2; 解释:先获取tablename表中的所有记录,之后创建一张tablename表,结构和tablename2表相同,记录为后面语句的查询结果。

    WITH… AS…

    with 公用表达式名字 as (select * from xxx where 条件) select * from 公用表达式名字 如:with A as (select * from class) select *from A 这个语句的意思就是,先执行select * from class 得到一个结果,将这个结果记录为A ,再执行select *from A 语句。A 表只是一个别名。 也就是将重复用到的大批量 的SQL语句,放到with as 中,加一个别名,在后面用到的时候就可以直接用。(也就是说,with as 类似于子函数)

    对于大批量的SQL数据,起到优化的作用。

    WITH的相关总结

    with as 的使用范围仅限制在,with as 紧跟着的下一条SQL。之后with as 公用SQL失效。 1.使用with子句可以让子查询重用相同的with查询块,通过select调用(with子句只能被select查询块引用),一般在with查询用到多次情况下。在引用的select语句之前定义,同级只能定义with关键字只能使用一次,多个用逗号分割。 2.with子句的返回结果存到用户的临时表空间中,只做一次查询,反复使用,提高效率。 3.在同级select前有多个查询定义的时候,第1个用with,后面的不用with,并且用逗号隔开。 4.最后一个with 子句与下面的查询之间不能有逗号,只通过右括号分割,with 子句的查询必须用括号括起来。 5.如果定义了with子句,而在查询中不使用,那么会报ora-32035 错误:未引用在with子句中定义的查询名。(至少一个with查询的name未被引用,解决方法是移除未被引用的with查询),注意:只要后面有引用的就可以,不一定非要在主查询中引用,比如后面的with查询也引用了,也是可以的。 6.前面的with子句定义的查询在后面的with子句中可以使用。但是一个with子句内部不能嵌套with子句。 7.当一个查询块名字和一个表名或其他的对象相同时,解析器从内向外搜索,优先使用子查询块名字。 8.with查询的结果列有别名,引用的时候必须使用别名或*。 ​

    多表合并的时候,经常会用到 WITH xxx AS和JOIN命令,比如说:

    JOIN SELECT t1.product_code AS prod_id, t1.color_code AS color_id, t1.size_code AS size_id, t1.size_order, FROM edw_ai.dim_sku AS t1 JOIN (SELECT product_year, product_quarter FROM edw_ai.config_target_product WHERE day_date = '2019-04-15') AS t2 ON t1.sku_year = t2.product_year AND t1.sku_quarter = t2.product_quarter

    注释:

    ’‘’用product,color,size的code来代替其id 2.’’'在form的表里的2019-04-15一天里找到poduct_year及product_quarter 3.JOIN ON 语句几位连接t1和t2表

    WITH……AS和JOIN一起使用,

    WITH……AS相当于建立了一个临时表

    WITH prod_info_targ AS (SELECT t1.product_code AS prod_id, t1.color_code AS color_id FROM edw_ai.dim_sku AS t1 JOIN (SELECT product_year, product_quarter FROM edw_ai.config_target_product WHERE day_date = '{0}') AS t2 ON t1.sku_year = t2.product_year AND t1.sku_quarter = t2.product_quarter) SELECT t1.product_code AS prod_id, t1.color_code AS color_id, t1.size_code AS size_id, t1.size_code_num AS lfs FROM edw.skc_main_size AS t1 JOIN prod_info_targ AS t2 ON t1.product_code = t2.prod_id AND t1.color_code = t2.color_id with t1 as (SELECT stockorg_code as store_id, org_code AS mng_reg_id FROM edw_ai.dim_stockorg), t2 as (SELECT send_stockorg_code as org_id, receive_stockorg_code, t1.store_id, t1.mng_reg_id FROM edw_ai.dim_stockorg_relationship join t1 on t1.store_id = edw_ai.dim_stockorg_relationship.receive_stockorg_code where move_type = '发货') select store_id, mng_reg_id, org_id, receive_stockorg_code from t2

    偶尔还会用到的命令:

    COALESCE(t1.qty, 0) 这个函数主要用来进行空值处理。即返回其参数中第一个非空表达式,如果所有参数均为 NULL,则 COALESCE 返回 NULL。至少应有一个 Null 值为 NULL 类型。 sql = ''' WITH prod_info_targ AS (SELECT t1.product_code AS prod_id, t1.color_code AS color_id, t1.size_code AS size_id FROM edw_ai.dim_sku AS t1 JOIN (SELECT product_year, product_quarter FROM edw_ai.config_target_product WHERE day_date = '{1}') AS t2 ON t1.sku_year = t2.product_year AND t1.sku_quarter = t2.product_quarter), org_info_targ AS (SELECT DISTINCT stockorg_code AS org_id FROM edw_ai.dim_stockorg WHERE status = '正常' AND org_flag = '1') SELECT t1.product_code AS prod_id, t1.color_code AS color_id, t1.size_code AS size_id, t1.org_code AS org_id, t1.sale_date AS date_sell, SUM(COALESCE(t1.qty, 0)) AS s FROM edw_ai.fct_sales AS t1 JOIN prod_info_targ AS t2 ON t1.product_code = t2.prod_id AND t1.color_code = t2.color_id AND t1.size_code = t2.size_id JOIN org_info_targ AS t3 ON t1.org_code = t3.org_id WHERE t1.sale_date BETWEEN '{0}' AND '{1}' GROUP BY t1.product_code, t1.color_code, t1.size_code, t1.org_code, t1.sale_date '''.format((date_decision - relativedelta(weeks=4)) .strftime('%Y-%m-%d'), (date_decision - relativedelta(days=1)) .strftime('%Y-%m-%d')) sales_info = spark.sql(sql).toPandas() sql = ''' select product_code as prod_id,color_code as color_id ,receive_org_code as org_id ,min(receive_date)as first_date from edw_ai.fct_stock_move where send_date BETWEEN '{0}' and '{1}' group by product_code,color_code,receive_org_code '''.format((date_decision - relativedelta(years=1)) .strftime('%Y-%m-%d'), (date_decision - relativedelta(days=1)) .strftime('%Y-%m-%d')) data_type = {'dt': ['first_date'], 'int': [], 'float': [], 'str': ['prod_id', 'color_id', 'org_id']} first_rec_date = spark.sql(sql).toPandas() first_rec_date = self.trans_data(first_rec_date, data_type) # 转换类型函数 def trans_data(a,b) df_t = df.copy() for col in data_type['dt']: if col in df_t.columns: df_t[col] = pd.to_datetime(df_t[col]) for col in data_type['int']: if col in df_t.columns: df_t[col] = df_t[col].astype('float').astype('int') for col in data_type['float']: if col in df_t.columns: df_t[col] = df_t[col].astype('float') for col in data_type['str']: if col in df_t.columns: df_t[col] = df_t[col].astype('str') SELECT stockorg_code AS org_id, province AS prov_id, city AS city_id, reserved1 AS dist_id, org_code AS mng_reg_id, brand_code, (CASE org_flag WHEN 1 THEN 1 ELSE 0 END) AS is_store FROM edw_ai.dim_stockorg WHERE status = '正常' and ((stockorg_code <> '9686' and org_flag= 2) OR org_flag = 1 )

    COALESCE ( expression,value1,value2……,valuen) COALESCE()函数的第一个参数expression为待检测的表达式,而其后的参数个数不定。 COALESCE()函数将会返回包括expression在内的所有参数中的第一个非空表达式。如果 expression不为空值则返回expression;否则判断value1是否是空值,如果value1不为空值则返 回value1;否则判断value2是否是空值,如果value2不为空值则返回value3;……以此类推, 如果所有的表达式都为空值,则返回NULL。

    GROUP BY 统计某一个组的信息。如: SELECT Customer,SUM(OrderPrice) FROM Orders GROUP BY Customer 即统计customer的orderprice之和。

    BETWEEN ‘2019-04-15’ AND ‘2019-05-01’ 选取介于两个值之间的数据范围。这些值可以是数值、文本或者日期。 有的是介于两者之间,不包含边界;有的是包含左界,不包含右届。

    SUM(t1.qty) AS q

    最新回复(0)