日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

spark-streaming first insight

發(fā)布時(shí)間:2023/12/13 编程问答 26 豆豆
生活随笔 收集整理的這篇文章主要介紹了 spark-streaming first insight 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

一、

Spark Streaming 構(gòu)建在Spark core API之上,具備可伸縮,高吞吐,可容錯(cuò)的流處理模塊。

1)支持多種數(shù)據(jù)源,如Kafka,Flume,Socket,文件等;

  • Basic sources: Sources directly available in the StreamingContext API. Examples: file systems, and socket connections.
  • Advanced sources: Sources like Kafka, Flume, Kinesis, etc. are available through extra utility classes. These require linking against extra dependencies.

2)處理完成數(shù)據(jù)可寫入Kafka,Hdfs,本地文件等多種地方;

?

DStream:

Spark Streaming對(duì)持續(xù)流入的數(shù)據(jù)有個(gè)高層的抽像:

It represents a continuous stream of data

a DStream is represented by a continuous series of RDDs,Each RDD in a DStream contains data from a certain interval

Any operation applied on a DStream translates to operations on the underlying RDDs.

?

什么是RDD?

RDD是Resilient Distributed Dataset的縮寫,中文譯為彈性分布式數(shù)據(jù)集,是Spark中最重要的概念。

RDD是只讀的、分區(qū)的,可容錯(cuò)的數(shù)據(jù)集合。

?

何為彈性?

RDD可在內(nèi)存、磁盤之間任意切換

RDD可以轉(zhuǎn)換成其它RDD,可由其它RDD生成

RDD可存儲(chǔ)任意類型數(shù)據(jù)

?

二、基本概念

1)add dependency

<dependency>

<groupId>org.apache.spark</groupId>

<artifactId>spark-streaming_2.11</artifactId>

<version>2.3.1</version>

</dependency>

其它想關(guān)依賴查詢:

https://search.maven.org/search?q=g:org.apache.spark%20AND%20v:2.2.0

?

2)文件作為DStream源,是如何被監(jiān)控的?

1)文件格式須一致

2)根據(jù)modify time開成流,而非create time

3)處理時(shí),當(dāng)前文件變更不會(huì)在此window處理,即不會(huì)reread

4)可以調(diào)用 FileSystem.setTimes()來修改文件時(shí)間,使其在下個(gè)window被處理,即使文件內(nèi)容未被修改過

?

三、Transform operation

window operation

?

Spark Streaming also provides?windowed computations, which allow you to apply transformations over a sliding window of data.

every time the window?slides?over a source DStream, the source RDDs that fall within the window are combined and operated upon to produce the RDDs of the windowed DStream.?

在一個(gè)時(shí)間窗口內(nèi)的RDD被合并為一個(gè)RDD來處理。

Any window operation needs to specify two parameters:

window length: The duration of the window

sliding interval: The interval at which the window operation if performed

?

四、Output operation

使用foreachRDD

dstream.foreachRDD?is a powerful primitive that allows data to be sent out to external systems. However, it is important to understand how to use this primitive correctly and efficiently.?

?

CheckPoint概念

?

Performance Tuning

?

Fault-tolerance Semantics

?

轉(zhuǎn)載于:https://www.cnblogs.com/gm-201705/p/9533271.html

總結(jié)

以上是生活随笔為你收集整理的spark-streaming first insight的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。