Hive SQL-实时最大在线人数

xiaoxiao2023-09-30 165

我们给登录时间加一个数值标记1，登出时间加标记-1。然后对排序后的数据求和该字段，最终得到我们的结果。

select max(max_index) from ( select sum(index) over(order by `timestamp`) as max_index --排序后第一行到本行的和 from ( select order_id, unix_timestamp(login_time) as `timestamp`, 1 as index from connection_detail where dt = '20190101' and is_td_finish = 1 union all select order_id, unix_timestamp(logout_time) as `timestamp`, -1 as index from connection_detail where dt = '20190101' )a --将登录时间和登出时间多列成多行 )b

可能阻碍大家想到这一逻辑的点在于sum() over()这一用法，该窗口函数对每一行数据实现了计算第一行到该行的求和计算，具体介绍网上很多，不熟悉的同学可以百度一下。该代码对于千万量级的数据sparksql计算了65秒，属于一个可以接受的范围。理解了上述代码的同学可以发现过程中我们一度得到了每个时刻的在线人数(子查询b)。对这一数据进行可视化可以直观了解服务器的负载变化情况。

原文：https://blog.csdn.net/adrian_wang/article/details/89840671

最新回复(0)