日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

Streaming Big Data: Storm, Spark and Samza--转载

發(fā)布時(shí)間:2025/4/5 编程问答 37 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Streaming Big Data: Storm, Spark and Samza--转载 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

原文地址:http://www.javacodegeeks.com/2015/02/streaming-big-data-storm-spark-samza.html

?

There are a number of distributed computation systems that can process Big Data in real time or near-real time. This article will start with?a short description of?three?Apache frameworks, and attempt to provide a quick, high-level overview of some of their similarities and differences.

Apache Storm

In?Storm, you?design a graph of real-time computation called a?topology, and feed it to the cluster where the master node will distribute the?code among worker nodes to execute it. In a topology,?data is?passed around between?spouts?that emit data streams as immutable sets of key-value pairs called?tuples,?and?bolts?that transform those streams (count, filter etc.). Bolts themselves can optionally emit data to other bolts down the processing pipeline.

Apache Spark

Spark Streaming?(an extension of the core Spark API) doesn’t process?streams one at a time like Storm. Instead, it slices them in small batches of time intervals?before processing them. The Spark abstraction for a?continuous stream of data is called a?DStream?(for?Discretized Stream). A DStream is a micro-batch?of?RDDs?(Resilient Distributed Datasets). RDDs are distributed?collections that can be operated in parallel by arbitrary functions and by?transformations over a sliding window of data (windowed computations).

Apache Samza

Samza?’s approach to streaming is to process?messages as they are received, one at a time. Samza’s stream primitive is not a?tuple?or a?Dstream, but a?message. Streams are divided into?partitions?and each partition is an ordered sequence of read-only messages with each message having a unique ID (offset). The system?also supports?batching, i.e. consuming several messages from the same stream partition in sequence. Samza`s Execution & Streaming modules are both pluggable, although Samza?typically relies on Hadoop’s?YARN?(Yet Another Resource Negotiator) and?Apache Kafka.

Common Ground

All three real-time?computation systems are open-source,?low-latency,?distributed, scalable?and?fault-tolerant. They?all allow you to run your stream processing code through parallel?tasks?distributed across a cluster of computing machines with fail-over capabilities. They also provide?simple APIs?to abstract the complexity of the underlying implementations.

The three frameworks use different vocabularies for similar concepts:

Comparison Matrix

A few of the differences are summarized in the table below:

There are three general categories of?delivery patterns:

  • At-most-once: messages may be lost. This is usually the least desirable outcome.
  • At-least-once: messages may be redelivered (no loss,?but duplicates). This is good enough for many?use cases.
  • Exactly-once: each message is delivered once and only once (no loss, no duplicates). This is a desirable feature although?difficult to guarantee?in all cases.
  • Another aspect is?state management. There are?different strategies?to store state. Spark Streaming writes data into the distributed file system (e.g. HDFS). Samza uses?an embedded key-value store. With Storm,?you’ll have?to either roll your own state management at your application layer, or use?a higher-level abstraction called?Trident.

    Use Cases

    All three frameworks are particularly?well-suited to efficiently process continuous, massive amounts of real-time data. So which one to use? There are no hard rules, at most a few?general guidelines.

    If you want a high-speed?event processing system that allows for incremental computations,?Storm?would be fine for that. If you further need to?run distributed computations on demand, while the client is waiting synchronously for the results, you’ll have?Distributed RPC?(DRPC) out-of-the-box. Last but not least, because Storm uses?Apache Thrift, you can write topologies in any programming language. If you need state persistence?and/or exactly-once delivery though, you should look at?the higher-level Trident API, which also offers micro-batching.

    A few companies using Storm:?Twitter, Yahoo!, Spotify, The Weather Channel...

    Speaking of micro-batching, if you must have?stateful computations, exactly-once delivery?and don’t mind a higher latency, you could consider?Spark?Streaming…specially if?you also plan for graph operations, machine learning or?SQL access. The Apache Spark stack lets you combine several?libraries with?streaming (Spark SQL,?MLlib,?GraphX) and provides?a convenient?unifying programming model. In particular,?streaming?algorithms?(e.g. streaming?k-means) allow?Spark to facilitate decisions in real-time.

    A few companies using Spark:?Amazon, Yahoo!, NASA JPL, eBay Inc., Baidu…

    If you have a large amount of state to work with (e.g. many gigabytes per partition),?Samza?co-locates storage and processing on the same machines, allowing to work efficiently with state that won’t fit in memory. The framework?also offers flexibility with its?pluggable?API:?its default execution, messaging and storage engines can each be replaced with?your choice of alternatives. Moreover, if you have a number of data processing stages?from different teams with different codebases, Samza ‘s fine-grained jobs would be particularly well-suited, since?they can be added/removed with minimal?ripple effects.

    A few companies using Samza:?LinkedIn, Intuit, Metamarkets, Quantiply, Fortscale…

    Conclusion

    We only?scratched the surface of?The Three Apaches. We didn’t cover?a number of other features and more subtle differences between these frameworks.?Also, it’s important to keep in mind the limits of the above comparisons, as these systems are constantly evolving.

    轉(zhuǎn)載于:https://www.cnblogs.com/davidwang456/p/4892213.html

    總結(jié)

    以上是生活随笔為你收集整理的Streaming Big Data: Storm, Spark and Samza--转载的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。

    主站蜘蛛池模板: 精品欧美一区二区三区免费观看 | 日韩一区二区在线免费观看 | www亚洲色图| 日韩r级电影在线观看 | 国产一国产精品一级毛片 | h网站免费在线观看 | 狠狠入| 欧性猛交ⅹxxx乱大交 | 久久精品99国产国产精 | 免费a大片 | 日本欧美视频 | 又污又黄的网站 | 三上悠亚一区二区 | 性xxxx欧美| 9l蝌蚪porny中文自拍 | a级免费毛片 | 精品久久久久久久久久久国产字幕 | 国产人成在线观看 | 一对一色视频聊天a | 丰满人妻一区二区三区53 | 老外黄色一级片 | 免费色网址| 欧美性视屏 | 欧美精品国产动漫 | 可以免费观看av | 国产伦精品一区二区三区视频孕妇 | 中文字幕丝袜 | 日韩成人黄色 | 1515hh成人免费看 | 日韩免费在线 | 中文字幕一区av | 日韩精品免费一区二区在线观看 | 中文字幕乱码中文乱码777 | 日本肉体xxxⅹ裸体交 | se综合 | 国产xxxx在线| 成人免费网站黄 | 国产成人精品免费网站 | 国产成人片 | 新亚洲天堂 | 国产级毛片 | 中国一区二区三区 | 欧美男人又粗又长又大 | 黄色网入口| 天堂网免费视频 | 91av看片| 一本一道久久a久久综合蜜桃 | 久久99精品国产 | 色牛影院 | 91美女高潮出水 | 不卡一二区 | 日本一区二区免费看 | 亚洲欧洲免费 | 免费日本特黄 | 日韩精品一区二区亚洲av观看 | 手机看片日韩日韩 | 天天舔天天爱 | 亚洲国产精品自拍 | 深夜福利院 | 亚洲青涩 | 亚洲一二三区在线 | 少妇高潮一区二区三区69 | 免费av在线网站 | 好吊视频一二三区 | 在线观看中文字幕第一页 | 亚洲系列 | 欧美亚洲图片小说 | 99午夜| 午夜丰满寂寞少妇精品 | 欧美一级欧美三级在线观看 | 中文字幕一区二区三区人妻不卡 | 欧美巨大乳| 国产精品探花视频 | 日本天堂在线观看 | 午夜不卡在线观看 | 国产美女主播在线 | 潘金莲激情呻吟欲求不满视频 | 亚洲三级在线观看 | 色老头av | 激情在线观看视频 | 欧美日韩字幕 | 香蕉视频网站在线 | 欧美一区二区三区小说 | 国产第一草草影院 | 欧美色就是色 | 免费av一级片 | 美女脱光内衣内裤 | 日韩成人中文字幕 | 国产精品久久婷婷 | www.色婷婷| 精品国产网 | 精品免费一区二区三区 | 亚洲怡春院 | 日本高清www | 红杏出墙记 | 国产黄色一区二区三区 | 天天综合天天综合 | 免费啪啪小视频 | 国产精品福利网站 |