當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

spark数据类型

發布時間：2024/10/8 编程问答 35 豆豆

生活随笔收集整理的這篇文章主要介紹了 spark数据类型小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

RDD

創建RDD

讀取文件 sc.textFile

并行化 sc.parallelize

其他方式

RDD操作

Transfermation

union
intersection
distinct
groupByKey
reduceByKey
sortByKey
join leftOuterJoin rightOuterJoin
aggregate

Action

reduce
count
first
take
takeSample
takeOrdered
saveAsTextFile
countByKey
foreach

DataFrame

DataSet

DataFrame to RDD

DifferenceRDDDataFrameDataSet

區別一	不支持sparksql	支持	支持
區別二		DataSet[Row]

相互轉化

行轉列RDDDataFrameDataSet

RDD	-	val rdd = sc.textFile("") case class Person(name: String, age: String) val a = rdd.map(_.split(",")).map{ line => Person(line(0), line(1))}.toDF	rdd = sc.textFile("") case class Person(name: String, age: String) val a = rdd.map(_.split(",")).map{ line => Person(line(0), line(1))}.toDS
DataFrame	val rdd1 = testDF.rdd	-	val testDS = testDF.as[Coltest]
DataSet	val rdd2 = testDS.rdd	val testDF = testDS.toDF	-

數據類型

local vector (dense, sparse)

labeled point

LabeledPoint to Libsvm

local matrix

distribute matrix

Row matrix

IndexedRowMatrix

CoordinateMatrix

BlockMatrix

讀取文件類型

json parquet jdbc orc libsvm csv text

val peopleDF = spark.read.format("json").load("examples/src/main/resources/people.json")

Reference

官方文檔（推薦）：https://spark.apache.org/docs/2.1.2/programming-guide.html#working-with-key-value-pairs
https://blog.csdn.net/gongpulin/article/details/77622107

https://www.cnblogs.com/maxigang/p/10030834.html

總結

以上是生活随笔為你收集整理的spark数据类型的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。