在“GROUP BY"中重用选择表达式的结果;条款?

reuse the result of a select expression in the quot;GROUP BYquot; clause?(在“GROUP BY中重用选择表达式的结果;条款?)
本文介绍了在“GROUP BY"中重用选择表达式的结果;条款?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

在 MySQL 中,我可以有这样的查询:

In MySQL, I can have a query like this:

select  
    cast(from_unixtime(t.time, '%Y-%m-%d %H:00') as datetime) as timeHour
    , ... 
from
    some_table t 
group by
    timeHour, ...
order by
    timeHour, ...

其中 GROUP BY 中的 timeHour 是选择表达式的结果.

where timeHour in the GROUP BY is the result of a select expression.

但是我刚刚尝试了一个类似于 Sqark SQL 中的查询,我得到了一个错误

But I just tried a query similar to that in Sqark SQL, and I got an error of

Error: org.apache.spark.sql.AnalysisException: 
cannot resolve '`timeHour`' given input columns: ...

我对 Spark SQL 的查询是这样的:

My query for Spark SQL was this:

select  
      cast(t.unixTime as timestamp) as timeHour
    , ...
from
    another_table as t
group by
    timeHour, ...
order by
    timeHour, ...

这个结构在 Spark SQL 中可行吗?

Is this construct possible in Spark SQL?

推荐答案

这个结构在 Spark SQL 中可行吗?

Is this construct possible in Spark SQL?

是的,是.您可以通过两种方式使其在 Spark SQL 中工作,以在 GROUP BYORDER BY 子句中使用新列

Yes, It is. You can make it works in Spark SQL in 2 ways to use new column in GROUP BY and ORDER BY clauses

使用子查询的方法一:

SELECT timeHour, someThing FROM (SELECT  
      from_unixtime((starttime/1000)) AS timeHour
    , sum(...)                          AS someThing
    , starttime
FROM
    some_table) 
WHERE
    starttime >= 1000*unix_timestamp('2017-09-16 00:00:00')
      AND starttime <= 1000*unix_timestamp('2017-09-16 04:00:00')
GROUP BY
    timeHour
ORDER BY
    timeHour
LIMIT 10;

方法 2 使用 WITH//优雅的方式:

-- create alias 
WITH table_aliase AS(SELECT  
      from_unixtime((starttime/1000)) AS timeHour
    , sum(...)                          AS someThing
    , starttime
FROM
    some_table)

-- use the same alias as table
SELECT timeHour, someThing FROM table_aliase
WHERE
    starttime >= 1000*unix_timestamp('2017-09-16 00:00:00')
      AND starttime <= 1000*unix_timestamp('2017-09-16 04:00:00')
GROUP BY
    timeHour
ORDER BY
    timeHour
LIMIT 10;

在 Scala 中使用 Spark DataFrame(wo SQL) API 的替代方法:

// This code may need additional import to work well

val df = .... //load the actual table as df

import org.apache.spark.sql.functions._

df.withColumn("timeHour", from_unixtime($"starttime"/1000))
  .groupBy($"timeHour")
  .agg(sum("...").as("someThing"))
  .orderBy($"timeHour")
  .show()

//another way - as per eliasah comment
df.groupBy(from_unixtime($"starttime"/1000).as("timeHour"))
  .agg(sum("...").as("someThing"))
  .orderBy($"timeHour")
  .show()

这篇关于在“GROUP BY"中重用选择表达式的结果;条款?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

相关文档推荐

ibtmp1是非压缩的innodb临时表的独立表空间,通过innodb_temp_data_file_path参数指定文件的路径,文件名和大小,默认配置为ibtmp1:12M:autoextend,也就是说在文件系统磁盘足够的情况下,这个文件大小是可以无限增长的。 为了避免ibtmp1文件无止境的暴涨导致
What does SQL clause quot;GROUP BY 1quot; mean?(SQL 子句“GROUP BY 1是什么意思?意思是?)
MySQL groupwise MAX() returns unexpected results(MySQL groupwise MAX() 返回意外结果)
MySQL SELECT most frequent by group(MySQL SELECT 按组最频繁)
Why Mysql#39;s Group By and Oracle#39;s Group by behaviours are different(为什么 Mysql 的 Group By 和 Oracle 的 Group by 行为不同)
MySQL GROUP BY DateTime +/- 3 seconds(MySQL GROUP BY DateTime +/- 3 秒)