當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Spark _25.plus _使用idea读取Hive中的数据加载成DataFrame/DataSet（四）

發布時間：2024/2/28 编程问答 23 豆豆

生活随笔收集整理的這篇文章主要介紹了 Spark _25.plus _使用idea读取Hive中的数据加载成DataFrame/DataSet（四）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

對Spark _25 _讀取Hive中的數據加載成DataFrame/DataSet（四）

https://georgedage.blog.csdn.net/article/details/103091542

的糾正，沒錯，就是這么觸不及防。

忘了接在idea上的local模式：

首先：目錄結構

javaAPI:?

package com.henu;import org.apache.spark.SparkConf; import org.apache.spark.SparkContext; import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; import org.apache.spark.sql.SaveMode; import org.apache.spark.sql.hive.HiveContext;/*** @author George* @description* 讀取Hive中的數據加載成DataFrame**/ public class HiveDemo {public static void main(String[] args) {SparkConf conf = new SparkConf();conf.setMaster("local");conf.setAppName("hive");SparkContext sc = new SparkContext(conf);//HiveContext是SQLContext的子類。/*** 友情提示，在2.3.1中HiveContext被* SparkSession.builder.enableHiveSupport* 所替代*/HiveContext hiveContext = new HiveContext(sc);hiveContext.sql("use spark");hiveContext.sql("drop table if exists student_infos");//在hive中創建student_infos表hiveContext.sql("create table if not exists student_infos(name String,age Int)" +"row format delimited fields terminated by ' '");hiveContext.sql("load data local inpath './student_info' into table student_infos");hiveContext.sql("drop table if exists student_scores");hiveContext.sql("create table if not exists student_scores (name String,score Int)" +"row format delimited fields terminated by ' '");hiveContext.sql("load data local inpath './student_scores' into table student_scores");/*** 查詢表生成Dataset*/Dataset<Row> dataset = hiveContext.sql("SELECT si.name, si.age, ss.score "+ "FROM student_infos si "+ "JOIN student_scores ss "+ "ON si.name=ss.name "+ "WHERE ss.score>=80"); // dataset.show();hiveContext.sql("drop table if exists good_student_infos");dataset.registerTempTable("goodstudent");Dataset<Row> sql = hiveContext.sql("select * from goodstudent");sql.show();/*** 將結果保存到hive表 good_student_infos*/dataset.write().mode(SaveMode.Overwrite).saveAsTable("good_student_infos");sc.stop();} }

沒錯直接運行：

然后前往linux上的hive，

0: jdbc:hive2://henu2:10000> show tables; +---------------------+--+ | tab_name | +---------------------+--+ | good_student_infos | | student_infos | | student_scores | +---------------------+--+ 3 rows selected (0.176 seconds) 0: jdbc:hive2://henu2:10000> select * from good_student_infos; +--------------------------+-------------------------+---------------------------+--+ | good_student_infos.name | good_student_infos.age | good_student_infos.score | +--------------------------+-------------------------+---------------------------+--+ | George | 22 | 100 | | kangkang | 20 | 100 | | GeorgeDage | 28 | 98 | | limu | 1 | 120 | +--------------------------+-------------------------+---------------------------+--+

舒服了！！！

然后scalaAPI:

package com.henuimport org.apache.spark.sql.{SaveMode, SparkSession}object HiveScalaDemo {def main(args: Array[String]): Unit = {val spark = SparkSession.builder().appName("CreateDataFrameFromHive").master("local").enableHiveSupport().getOrCreate()spark.sql("use spark")spark.sql("drop table if exists student_infos")//在hive中創建student_infos表spark.sql("create table if not exists student_infos(name String,age Int)" +"row format delimited fields terminated by ' '")spark.sql("load data local inpath './student_info' into table student_infos")spark.sql("drop table if exists student_scores")spark.sql("create table if not exists student_scores (name String,score Int)" +"row format delimited fields terminated by ' '")spark.sql("load data local inpath './student_scores' into table student_scores")val df = spark.sql("select si.name,si.age,ss.score from student_infos si,student_scores ss where si.name = ss.name")spark.sql("drop table if exists good_student_infos")/*** 將結果寫入到hive表中**/df.write.mode(SaveMode.Overwrite).saveAsTable("good_student_infos")spark.stop()} }

前往linux的hive上查看結果：

0: jdbc:hive2://henu2:10000> show tables; +---------------------+--+ | tab_name | +---------------------+--+ | good_student_infos | | student_infos | | student_scores | +---------------------+--+ 3 rows selected (0.17 seconds) 0: jdbc:hive2://henu2:10000> select * from good_student_infos; +--------------------------+-------------------------+---------------------------+--+ | good_student_infos.name | good_student_infos.age | good_student_infos.score | +--------------------------+-------------------------+---------------------------+--+ | George | 22 | 100 | | kangkang | 20 | 100 | | GeorgeDage | 28 | 98 | | new | 1 | 100000 | +--------------------------+-------------------------+---------------------------+--+ 4 rows selected (0.245 seconds)

【正如你所看到的的，為了產生區別我換了一條信息】

搞成！！！！！！

如果遇到JVM內存不足，，別慌，我給你準備好了！https://georgedage.blog.csdn.net/article/details/103092637

總結

以上是生活随笔為你收集整理的Spark _25.plus _使用idea读取Hive中的数据加载成DataFrame/DataSet（四）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： idea提交spark任务，内存不足，指
下一篇：终止进程的方法总结（使用）