日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

structField、structType、schame

發(fā)布時(shí)間:2023/12/8 编程问答 30 豆豆
生活随笔 收集整理的這篇文章主要介紹了 structField、structType、schame 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

1、structField

源碼結(jié)構(gòu):

case class StructField(name: String,dataType: DataType,nullable: Boolean = true,metadata: Metadata = Metadata.empty) {}

-----A field inside a StructType
name:The name of this field.
dataType:The data type of this field.
nullable:Indicates if values of this field can be null values.
metadata:The metadata of this field. The metadata should be preserved during transformation if the content of the column is not modified, e.g, in selection.

一個(gè)結(jié)構(gòu)體內(nèi)部的 一個(gè)StructField就像一個(gè)SQL中的一個(gè)字段一樣,它包含了這個(gè)字段的具體信息,可以看如下列子:

def schema_StructField()={/*** StructField 是 一個(gè) case class ,其中是否可以為空,默認(rèn)是 true,初始元信息是為空* 它是作為描述 StructType中的一個(gè)字段*/val sf = new StructField("b",IntegerType)println(sf.name)//bprintln(sf.dataType)//IntegerTypeprintln(sf.nullable)//trueprintln(sf.metadata)//{}}

2、structType

A StructType object can be constructed by

StructType(fields: Seq[StructField])

一個(gè)StructType對(duì)象,可以有多個(gè)StructField,同時(shí)也可以用名字(name)來提取,就想當(dāng)于Map可以用key來提取value,但是他StructType提取的是整條字段的信息

在源碼中structType是一個(gè)case class,如下:

case class StructType(fields: Array[StructField]) extends DataType with Seq[StructField] {}

它是繼承Seq的,也就是說Seq的操作,它都擁有,但是從形式上來說,每個(gè)元素是用 ?StructField包住的。

package Datasetimport org.apache.spark.sql.types._/*** Created by root on 9/21/16.*/object schemaAnalysis {//--------------------------------------------------StructType analysis---------------------------------------val struct = StructType(StructField("a", IntegerType) ::StructField("b", LongType, false) ::StructField("c", BooleanType, false) :: Nil)def schema_StructType()={/*** 一個(gè)scheme是*/import org.apache.spark.sql.types.StructTypeval schemaTyped = new StructType().add("a","int").add("b","string")schemaTyped.foreach(println)/*** StructField(a,IntegerType,true)* StructField(b,StringType,true)*/}def structType_extracted()={// Extract a single StructField.val singleField_a = struct("a")println(singleField_a)//省卻的清空下表示:可以為空的,//StructField(a,IntegerType,true)val singleField_b = struct("b")println(singleField_b)//StructField(b,LongType,false)//val nonExisting = struct("d")//println(nonExisting)//java.lang.IllegalArgumentException: Field "d" does not exist.// Extract multiple StructFields. Field names are provided in a set.// A StructType object will be returned.val twoFields = struct(Set("b", "c"))println(twoFields)//StructType(StructField(b,LongType,false), StructField(c,BooleanType,false))// Any names without matching fields will be ignored.// For the case shown below, "d" will be ignored and// it is treated as struct(Set("b", "c")).val ignoreNonExisting = struct(Set("b", "c", "d"))println(ignoreNonExisting)// ignoreNonExisting: StructType =// StructType(List(StructField(b,LongType,false), StructField(c,BooleanType,false)))//值得注意的是:當(dāng)沒有存在的字段的時(shí)候,官方文檔說:單個(gè)返回的是null,多個(gè)返回的是當(dāng)沒有那個(gè)字段//但是實(shí)驗(yàn)的時(shí)候,報(bào)錯(cuò)---Field d does not exist//源碼調(diào)用的是apply方法,確實(shí)還沒有處理好這部分功能//我是用的是spark2.0初始版本}def structType_opration()={/*** 源碼:case class StructType(fields: Array[StructField]) extends DataType with Seq[StructField] {* 它是繼承與Seq的,也就是說 Seq的操作,StructType都有* 可以查看scala的Seq的操作:http://www.scala-lang.org/api/current/#scala.collection.Seq*/val tmpStruct = StructType(StructField("d", IntegerType)::Nil)//集合與集合的操作println(struct++tmpStruct)// println(struct++:tmpStruct)//List(StructField(a,IntegerType,true), StructField(b,LongType,false), StructField(c,BooleanType,false), StructField(d,IntegerType,true))//集合與元素的操作println(struct :+ StructField("d", IntegerType))//可以用add來進(jìn)行println(struct.add("e",IntegerType))//StructType(StructField(a,IntegerType,true), StructField(b,LongType,false), StructField(c,BooleanType,false), StructField(e,IntegerType,true))//head 部分的元素println(struct.head)//StructField(a,IntegerType,true)//last 部分的元素println(struct.last)//StructField(c,BooleanType,false)println(struct.apply("a"))//StructField(a,IntegerType,true)println(struct.treeString)/*** root|-- a: integer (nullable = true)|-- b: long (nullable = false)|-- c: boolean (nullable = false)*/println(struct.contains(StructField("f", IntegerType)))//falseprintln(struct.mkString)//StructField(a,IntegerType,true)StructField(b,LongType,false)StructField(c,BooleanType,false)println(struct.prettyJson)/*** {"type" : "struct","fields" : [ {"name" : "a","type" : "integer","nullable" : true,"metadata" : { }}, {"name" : "b","type" : "long","nullable" : false,"metadata" : { }}, {"name" : "c","type" : "boolean","nullable" : false,"metadata" : { }} ]}*///更多操作可以查看API:http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.types.StructType}def main(args: Array[String]) {//schema_StructType()//structType_extracted()structType_opration()}}

3、Schema

---------Schema就是我們數(shù)據(jù)的數(shù)據(jù)結(jié)構(gòu)描述。

? ? ? ?一個(gè)Schema是一個(gè)數(shù)據(jù)結(jié)構(gòu)的描述(比如描述一個(gè)Json文件),它可以是在運(yùn)行的時(shí)候隱式導(dǎo)入,或者在編譯的時(shí)候就導(dǎo)入。?它是用一個(gè)StructField集合對(duì)象的StructType描述(用一個(gè)三元tuple,內(nèi)部是:name,type.nullability),本來有四個(gè)信息的為什么會(huì)說是三元數(shù)組??其實(shí)metadata,你是可以調(diào)出來。

def schema_op()={case class Person(name: String, age: Long)val sparkSession = SparkSession.builder().appName("data set example").master("local").getOrCreate()import sparkSession.implicits._val rdd = sparkSession.sparkContext.textFile("hdfs://master:9000/src/main/resources/people.txt")val dataSet = rdd.map(_.split(",")).map(p =>Person(p(0),p(1).trim.toLong)).toDS()println(dataSet.schema)//StructType(StructField(name,StringType,true), StructField(age,LongType,false))/*** def schema: StructType = queryExecution.analyzed.schema** def apply(name: String): StructField = {* nameToField.getOrElse(name,* throw new IllegalArgumentException(s"""Field "$name" does not exist."""))* }*/val tmp: StructField = dataSet.schema("name")println(tmp)//StructField(name,StringType,true)println(tmp.name)//nameprintln(tmp.dataType)//StringTypeprintln(tmp.nullable)//trueprintln(tmp.metadata)//{}

?

總結(jié)

以上是生活随笔為你收集整理的structField、structType、schame的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。