日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Spark Streaming的窗口操作

發(fā)布時間:2024/1/17 编程问答 47 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Spark Streaming的窗口操作 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

2019獨角獸企業(yè)重金招聘Python工程師標準>>>

Spark Streaming的窗口操作 博客分類: spark ?

Spark Streaming的Window Operation可以理解為定時的進行一定時間段內(nèi)的數(shù)據(jù)的處理。

不要怪我語文不太好。。下面上原理圖吧,一圖勝千言:

滑動窗口在監(jiān)控和統(tǒng)計應(yīng)用的場景比較廣泛,比如每隔一段時間(2s)統(tǒng)計最近3s的請求量或者異常次數(shù),根據(jù)請求或者異常次數(shù)采取相應(yīng)措施

如圖:

1. 紅色的矩形就是一個窗口,窗口hold的是一段時間內(nèi)的數(shù)據(jù)流。

2.這里面每一個time都是時間單元,在官方的例子中,每個窗口大小(window size)是3時間單元 (time unit), 而且每隔2個單位時間,窗口會slide(滑動)一次。

所以基于窗口的操作,需要指定2個參數(shù):

?

  • window length?- The duration of the window (3 in the figure)
  • slide interval?- The interval at which the window-based operation is performed (2 in the figure). ?
1.窗口大小,個人感覺是一段時間內(nèi)數(shù)據(jù)的容器。 2.滑動間隔,就是我們可以理解的cron表達式吧。 - -! 舉個例子吧: 還是以最著名的wordcount舉例,每隔10秒,統(tǒng)計一下過去30秒過來的數(shù)據(jù)。 [java]? view plain copy
  • //?Reduce?last?30?seconds?of?data,?every?10?seconds??
  • val?windowedWordCounts?=?pairs.reduceByKeyAndWindow(_?+?_,?Seconds(30),?Seconds(10))??

  • 這里的paris就是一個MapedRDD, 類似(word,1) [java]? view plain copy
  • reduceByKeyAndWindow?//?這個類似RDD里面的reduceByKey,就是對RDD應(yīng)用function??
  • 在這里是根據(jù)key,對至進行聚合,然后累加。 下面粘貼一下它的API,僅供參考:
    window(windowLength,?slideInterval)Return a new DStream which is computed based on windowed batches of the source DStream.
    countByWindow(windowLength,slideInterval)Return a sliding window count of elements in the stream.
    reduceByWindow(func,?windowLength,slideInterval)Return a new single-element stream, created by aggregating elements in the stream over a sliding interval using?func. The function should be associative so that it can be computed correctly in parallel.
    reduceByKeyAndWindow(func,windowLength,?slideInterval, [numTasks])When called on a DStream of (K, V) pairs, returns a new DStream of (K, V) pairs where the values for each key are aggregated using the given reduce function?func?over batches in a sliding window.?Note:?By default, this uses Spark's default number of parallel tasks (2 for local machine, 8 for a cluster) to do the grouping. You can pass an optional?numTasks?argument to set a different number of tasks.
    reduceByKeyAndWindow(func,?invFunc,windowLength,?slideInterval, [numTasks])A more efficient version of the above?reduceByKeyAndWindow()?where the reduce value of each window is calculated incrementally using the reduce values of the previous window. This is done by reducing the new data that enter the sliding window, and "inverse reducing" the old data that leave the window. An example would be that of "adding" and "subtracting" counts of keys as the window slides. However, it is applicable to only "invertible reduce functions", that is, those reduce functions which have a corresponding "inverse reduce" function (taken as parameterinvFunc. Like in?reduceByKeyAndWindow, the number of reduce tasks is configurable through an optional argument.
    countByValueAndWindow(windowLength,slideInterval, [numTasks])When called on a DStream of (K, V) pairs, returns a new DStream of (K, Long) pairs where the value of each key is its frequency within a sliding window. Like in?reduceByKeyAndWindow, the number of reduce tasks is configurable through an optional argument.
    ??

    Output Operations

    When an output operator is called, it triggers the computation of a stream. Currently the following output operators are defined:

    print()Prints first ten elements of every batch of data in a DStream on the driver.
    foreachRDD(func)The fundamental output operator. Applies a function,?func, to each RDD generated from the stream. This function should have side effects, such as printing output, saving the RDD to external files, or writing it over the network to an external system.
    saveAsObjectFiles(prefix, [suffix])Save this DStream's contents as a?SequenceFile?of serialized objects. The file name at each batch interval is generated based on?prefix?and?suffix:?"prefix-TIME_IN_MS[.suffix]".
    saveAsTextFiles(prefix, [suffix])Save this DStream's contents as a text files. The file name at each batch interval is generated based on?prefix?and?suffix:?"prefix-TIME_IN_MS[.suffix]".
    saveAsHadoopFiles(prefix, [suffix])Save this DStream's contents as a Hadoop file. The file name at each batch interval is generated based on?prefix?and?suffix:?"prefix-TIME_IN_MS[.suffix]".
    原創(chuàng),轉(zhuǎn)載請注明出處 http://blog.csdn.net/oopsoom/article/details/23776477

    轉(zhuǎn)載于:https://my.oschina.net/xiaominmin/blog/1599578

    總結(jié)

    以上是生活随笔為你收集整理的Spark Streaming的窗口操作的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。