日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Spark _28_窗口函数

發(fā)布時間:2024/2/28 编程问答 29 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Spark _28_窗口函数 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

?

窗口函數(shù)

注意:

row_number() 開窗函數(shù)是按照某個字段分組,然后取另一字段的前幾個的值,相當(dāng)于 分組取topN

如果SQL語句里面使用到了開窗函數(shù),那么這個SQL語句必須使用HiveContext來執(zhí)行,HiveContext默認情況下在本地?zé)o法創(chuàng)建。在MySql8之后也增加了開窗函數(shù)。

窗口函數(shù)格式:

row_number() over (partitin by XXX order by XXX)

數(shù)據(jù)信息:

java:

SparkConf conf?= new?SparkConf();conf.setAppName("windowfun");JavaSparkContext sc?= new?JavaSparkContext(conf);HiveContext hiveContext?= new?HiveContext(sc);hiveContext.sql("use spark");hiveContext.sql("drop table if exists sales");hiveContext.sql("create table if not exists sales (riqi string,leibie string,jine Int) "+ "row format delimited fields terminated by '\t'");hiveContext.sql("load data local inpath '/root/test/sales' into table sales");/*** 開窗函數(shù)格式:* 【 rou_number() over (partitin by XXX?order by XXX) 】*/DataFrame result?= hiveContext.sql("select riqi,leibie,jine "+ "from ("+ "select riqi,leibie,jine,"+ "row_number() over (partition by leibie order by jine desc) rank "+ "from sales) t "+ "where t.rank<=3");result.show();sc.stop();

scalaAPI:

import org.apache.spark.sql.{SaveMode, SparkSession}/*** over 窗口函數(shù)* row_number() over(partition by xx order by xx) as rank* rank 在每個分組內(nèi)從1開始*/ object OverFunctionOnHive {def main(args: Array[String]): Unit = {val spark = SparkSession.builder().appName("over").enableHiveSupport().getOrCreate()spark.sql("use spark")spark.sql("create table if not exists sales (riqi string,leibie string,jine Int) " + "row format delimited fields terminated by '\t'")spark.sql("load data local inpath '/root/test/sales' into table sales")/*** rank 在每個組內(nèi)從1開始* 5 A 200 --- 1* 3 A 100 ---2* 4 A 80 ---3* 7 A 60 ---4** 1 B 100 ---1* 8 B 90 ---2* 6 B 80 ---3* 1 B 70 ---4*/val result = spark.sql("select"+" riqi,leibie,jine "+ "from ("+ "select "+"riqi,leibie,jine,row_number() over (partition by leibie order by jine desc) rank "+ "from sales) t "+ "where t.rank<=3")result.write.mode(SaveMode.Append).saveAsTable("salesResult")result.show(100)} }

?

補充一個容易理解的實例:

Rank

數(shù)據(jù)準備:

{"name":"喬治","pro":"語文","score":87} {"name":"喬治","pro":"數(shù)學(xué)","score":95} {"name":"喬治","pro":"英語","score":68} {"name":"大海","pro":"語文","score":94} {"name":"大海","pro":"數(shù)學(xué)","score":56} {"name":"大海","pro":"英語","score":84} {"name":"宋宋","pro":"語文","score":64} {"name":"宋宋","pro":"數(shù)學(xué)","score":86} {"name":"宋宋","pro":"英語","score":84} {"name":"婷婷","pro":"語文","score":65} {"name":"婷婷","pro":"數(shù)學(xué)","score":85} {"name":"婷婷","pro":"英語","score":78}

scalaAPI:

package com.lianxiimport org.apache.spark.sql.{DataFrame, SparkSession}object RankDemo {def main(args: Array[String]): Unit = {val spark = SparkSession.builder().master("local").appName("rank").getOrCreate()spark.sparkContext.setLogLevel("error")val df: DataFrame = spark.read.json("./data/rank")// df.show()df.createOrReplaceTempView("score")spark.sql("select " +"name," +"pro," +"score," +"rank() over(partition by pro order by score desc) rp," +"dense_rank() over(partition by pro order by score desc) drp," +"row_number() over(partition by pro order by score desc) rmp " +"from score").show()spark.stop()} }

結(jié)果展示:

和在之前我們Hive中語法沒有區(qū)別。

Hive _偏門常用查詢函數(shù)(三)附帶實例(Rank)

https://georgedage.blog.csdn.net/article/details/102923071

小結(jié):

RANK() 排序相同時會重復(fù),總數(shù)不會變

DENSE_RANK() 排序相同時會重復(fù),總數(shù)會減少

ROW_NUMBER() 會根據(jù)順序計算

超強干貨來襲 云風(fēng)專訪:近40年碼齡,通宵達旦的技術(shù)人生

總結(jié)

以上是生活随笔為你收集整理的Spark _28_窗口函数的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。