日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

博主新书:《大数据日知录:架构与算法》目录

發布時間:2024/2/28 编程问答 34 豆豆
生活随笔 收集整理的這篇文章主要介紹了 博主新书:《大数据日知录:架构与算法》目录 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.


《大數據日知錄:架構與算法》目錄

4目錄編輯

第0 章 當談論大數據時我們在談什么................ 1 0.1 大數據是什么.......................... 2 0.2 大數據之翼:技術范型轉換......................................... 4 0.3 大數據商業煉金術................................ 6 0.4 “大數據”在路上................................................... 7 第1 章 數據分片與路由.............................................. 9 1.1 抽象模型.......................................................10 1.2 哈希分片(Hash Partition) ..............................11 1.2.1 Round Robin....................................11 1.2.2 虛擬桶(Virtual Buckets) ..........................12 1.2.3 一致性哈希(Consistent Hashing) ...........................13 1.3 范圍分片(Range Partition) ......................................18 參考文獻......................................19 第2 章 數據復制與一致性................................................20 2.1 基本原則與設計理念............................21 2.1.1 原教旨CAP 主義..............................................21 2.1.2 CAP 重裝上陣(CAP Reloaded).............................23 2.1.3 ACID 原則...............................................24 2.1.4 BASE 原則.................................................24 2.1.5 CAP/ACID/BASE 三者的關系...........................25 2.1.6 冪等性(Idempotent)........................................26 2.2 一致性模型分類.................................................26 2.2.1 強一致性............................................27 2.2.2 最終一致性........................................28 2.2.3 因果一致性.............................28 2.2.4 “讀你所寫”一致性....................................29 2.2.5 會話一致性....................................29 2.2.6 單調讀一致性..............................................30 2.2.7 單調寫一致性.....................................................30 2.3 副本更新策略...........................30 2.3.1 同時更新..........................................30 2.3.2 主從式更新.....................................31 2.3.3 任意節點更新......................................32 2.4 一致性協議...........................................................32 2.4.1 兩階段提交協議(Two-Phrase Commit,2PC)..........................33 2.4.2 向量時鐘(Vector Clock) ..............................38 2.4.3 RWN 協議.................................................40 2.4.4 Paxos 協議.............................................42 2.4.5 Raft 協議.............................................45 參考文獻................................................49 第3 章 大數據常用的算法與數據結構....................................51 3.1 布隆過濾器(Bloom Filter) ............................51 3.1.1 基本原理.............................................52 3.1.2 誤判率及相關計算..........................................52 3.1.3 改進:計數Bloom Filter....................................53 3.1.4 應用............................................54 3.2 SkipList............................................55 3.3 LSM 樹........................................58 3.4 Merkle 哈希樹(Merkle Hash Tree) .............................62 3.4.1 Merkle 樹基本原理..................................................62 3.4.2 Dynamo 中的應用.........................................63 3.4.3 比特幣中的應用..................................................63 3.5 Snappy 與LZSS 算法..........................................65 3.5.1 LZSS 算法.............................................65 3.5.2 Snappy..........................................67 3.6 Cuckoo 哈希(Cuckoo Hashing) ..................................67 3.6.1 基本原理...............................................68 3.6.2 應用:SILT 存儲系統.........................................68 參考文獻...................................................70 第4 章 集群資源管理與調度.......................................71 4.1 資源管理抽象模型...................................72 4.1.1 概念模型....................................72 4.1.2 通用架構...............................................73 4.2 調度系統設計的基本問題.....................................74 4.2.1 資源異質性與工作負載異質性............................74 4.2.2 數據局部性(Data Locality) ........................................75 4.2.3 搶占式調度與非搶占式調度...................................75 4.2.4 資源分配粒度(Allocation Granularity) .............76 4.2.5 餓死(Starvation)與死鎖(Dead Lock)問題...........................76 4.2.6 資源隔離方法........................................77 4.3 資源管理與調度系統范型.............................77 4.3.1 集中式調度器(Monolithic Scheduler).......................78 4.3.2 兩級調度器(Two-Level Scheduler) .........................79 4.3.3 狀態共享調度器(Shared-State Scheduler) ....................79 4.4 資源調度策略...............................................81 4.4.1 FIFO 調度策略..........................................81 4.4.2 公平調度器(Fair Scheduler)......................81 4.4.3 能力調度器(Capacity Scheduler) ..........................82 4.4.4 延遲調度策略(Delay Scheduling)............................82 4.4.5 主資源公平調度策略(Dominant Resource Fair Scheduling).............82 4.5 Mesos .................................84 4.6 YARN......................................87 參考文獻..............................................90 第5 章 分布式協調系統...................................91 5.1 Chubby 鎖服務...............................92 5.1.1 系統架構........................................93 5.1.2 數據模型..................................94 5.1.3 會話與KeepAlive 機制...............................95 5.1.4 客戶端緩存.......................................95 5.2 ZooKeeper ................................96 5.2.1 體系結構...........................................96 5.2.2 數據模型(Data Model) .............................97 5.2.3 API ...............................98 5.2.4 ZooKeeper 的典型應用場景..................................98 5.2.5 ZooKeeper 的實際應用.......................................103 參考文獻...................................104 第6 章 分布式通信..............................106 6.1 序列化與遠程過程調用框架..................................107 6.1.1 Protocol Buffer 與Thrift .....................108 6.1.2 Avro...............................109 6.2 消息隊列.....................................110 6.2.1 常見的消息隊列系統......................................110 6.2.2 Kafka .......................111 6.3 應用層多播通信(Application-Level Multi-Broadcast)........114 6.3.1 概述...............................114 6.3.2 Gossip 協議...........................115 參考文獻..........................118 第7 章 數據通道.........................................120 7.1 Log 數據收集.................................120 7.1.1 Chukwa........................121 7.1.2 Scribe......................122 7.2 數據總線......................................123 7.2.1 Databus............................125 7.2.2 Wormhole .......................127 7.3 數據導入/導出...........................................128 參考文獻.............................129 第8 章 分布式文件系統....................................131 8.1 Google 文件系統(GFS) .................................132 8.1.1 GFS 設計原則...........................................132 8.1.2 GFS 整體架構..............................133 8.1.3 GFS 主控服務器..................................134 8.1.4 系統交互行為.................................136 8.1.5 Colossus ........................137 8.2 HDFS ..........................138 8.2.1 HDFS 整體架構.................................139 8.2.2 HA 方案..............................140 8.2.3 NameNode 聯盟........................143 8.3 HayStack 存儲系統....................................145 8.3.1 HayStack 整體架構.................................146 8.3.2 目錄服務..................................147 8.3.3 HayStack 緩存...........................................148 8.3.4 HayStack 存儲系統的實現...............................148 8.4 文件存儲布局.........................................150 8.4.1 行式存儲........................................151 8.4.2 列式存儲...........................................151 8.4.3 混合式存儲........................................156 8.5 糾刪碼(Erasure Code).............................158 8.5.1 Reed-Solomon 編碼...............................159 8.5.2 LRC 編碼.....................................164 8.5.3 HDFS-RAID 架構.........................166 參考文獻.....................................166 第9 章 內存KV 數據庫...................................168 9.1 RAMCloud ..............................169 9.1.1 RAMCloud 整體架構................................169 9.1.2 數據副本管理與數據恢復................................170 9.2 Redis....................................172 9.3 MemBase ...............................173 參考文獻................................................175 第10 章 列式數據庫...........................................176 10.1 BigTable....................................177 10.1.1 BigTable 的數據模型..........................177 10.1.2 BigTable 的整體結構................................178 10.1.3 BigTable 的管理數據.............................179 10.1.4 主控服務器(Master Server)......................181 10.1.5 子表服務器(Tablet Server) ....................182 10.2 PNUTS 存儲系統........................................186 10.2.1 PNUTS 的整體架構..............................186 10.2.2 存儲單元...............................187 10.2.3 子表控制器與數據路由器..................................187 10.2.4 雅虎消息代理.............................188 10.2.5 數據一致性.........................................189 10.3 MegaStore..................................................190 10.3.1 實體群組切分......................191 10.3.2 數據模型........................................192 10.3.3 數據讀/寫與備份.................................193 10.4 Spanner .........................................194 10.4.1 SpanServer 軟件棧.........................................195 10.4.2 數據模型.........................................196 10.4.3 TrueTime ...........................................196 參考文獻..............................................197 第11 章 大規模批處理系統...................................199 11.1 MapReduce 計算模型與架構................................200 11.1.1 計算模型.......................................201 11.1.2 系統架構......................................203 11.1.3 MapReduce 計算的特點及不足......................................206 11.2 MapReduce 計算模式...........................206 11.2.1 求和模式(Summarization Pattern)................207 11.2.2 過濾模式(Filtering Pattern) ................208 11.2.3 組織數據模式(Data Organization Pattern) .....................210 11.2.4 Join 模式(Join Pattern)......................212 11.3 DAG 計算模型..........................................214 11.3.1 DAG 計算系統的三層結構............................214 11.3.2 Dryad .......................................215 11.3.3 FlumeJava 和Tez ........................................217 參考文獻...........................................218 第12 章 流式計算........................................219 12.1 流式計算系統架構....................................222 12.1.1 主從架構............................................222 12.1.2 P2P 架構.....................................................223 12.1.3 Samza 架構..........................................224 12.2 DAG 拓撲結構..........................................224 12.2.1 計算節點.....................................................225 12.2.2 數據流..............................................226 12.2.3 拓撲結構..................................226 12.3 送達保證(Delivery Guarantees)..............................229 12.3.1 Storm 的送達保證機制.................................230 12.3.2 MillWheel 的“恰好送達一次”機制...........................233 12.4 狀態持久化...........................................234 12.4.1 容錯的三種模式....................................234 12.4.2 Storm 的狀態持久化.......................................236 12.4.3 MillWheel 和Samza 的狀態持久化......................237 參考文獻............................................238 第13 章 交互式數據分析...................................240 13.1 Hive 系數據倉庫.................................242 13.1.1 Hive .....................................242 13.1.2 StingerInitiative ................................250 13.2 Shark 系數據倉庫..................................251 13.2.1 Shark 架構.........................................252 13.2.2 部分DAG 執行引擎(PDE) ........................253 13.2.3 數據共同分片.........................................254 13.3 Dremel 系數據倉庫...................................254 13.3.1 Dremel...........................255 13.3.2 PowerDrill ..........................258 13.3.3 Impala.................................261 13.3.4 Presto...............................264 13.4 混合系數據倉庫......................................265 參考文獻.................................269 第14 章 圖數據庫:架構與算法................................271 14.1 在線查詢類圖數據庫...........................272 14.1.1 三層結構.........................272 14.1.2 TAO 圖數據庫.................................273 14.2 常見圖挖掘問題..........................................277 14.2.1 PageRank 計算.......................................278 14.2.2 單源最短路徑(Single Source Shortest Path) ..................278 14.2.3 二部圖最大匹配.............................279 14.3 離線挖掘數據分片..............................................279 14.3.1 切邊法(Edge-Cut)......................................280 14.3.2 切點法(Vertex-Cut)...............................282 14.4 離線挖掘計算模型...................................284 14.4.1 以節點為中心的編程模型..........................284 14.4.2 GAS 編程模型...........................................285 14.4.3 同步執行模型.....................................286 14.4.4 異步執行模型...................................290 14.5 離線挖掘圖數據庫.................................292 14.5.1 Pregel..........................292 14.5.2 Giraph...............................299 14.5.3 GraphChi ............................301 14.5.4 PowerGraph.......................307 參考文獻.......................................311 第15 章 機器學習:范型與架構.........................................313 15.1 分布式機器學習...........................................314 15.1.1 機器學習簡介.............................................314 15.1.2 數據并行VS.模型并行.....................................316 15.2 分布式機器學習范型.....................317 15.2.1 三種范型...................................318 15.2.2 MapReduce 迭代計算模型........................319 15.2.3 BSP 計算模型...................................321 15.2.4 SSP 模型............................323 15.3 分布式機器學習架構...................................324 15.3.1 MapReduce 系列..................................325 15.3.2 Spark 及MLBase ..........................................327 15.3.3 參數服務器(Parameter Server).............332 參考文獻................................................335 第16 章 機器學習:分布式算法...............................337 16.1 計算廣告:邏輯回歸.......................................338 16.1.1 邏輯回歸(Logistic Regression,LR).............................338 16.1.2 并行隨機梯度下降(Parallel Stochastic Gradient Descent)............341 16.1.3 批學習并行邏輯回歸..................................341 16.2 推薦系統:矩陣分解................................................344 16.2.1 矩陣分解方法.......................................344 16.2.2 ALS-WR 算法............................................346 16.2.3 并行版ALS-WR 算法..............................347 16.3 搜索引擎:機器學習排序................................347 16.3.1 機器學習排序簡介.................................348 16.3.2 LambdaMART.................................349 16.3.3 分布式LambdaMART........................................351 16.4 自然語言處理:文檔相似性計算.......................................352 16.5 社交挖掘:譜聚類.................................355 16.5.1 社交挖掘實例...............................355 16.5.2 譜聚類....................................356 16.5.3 并行版譜聚類..........................................358 16.6 深度學習:DistBelief .............................................358 16.6.1 深度學習簡介........................................359 16.6.2 DistBelief.....................360 參考文獻.........................................364 第17 章 增量計算..........................................366 17.1 增量計算模式...........................367 17.1.1 兩種計算模式...............................367 17.1.2 Hadoop 平臺下增量計算的一般模式.............................368 17.2 Percolator................................370 17.2.1 事務支持..........................................371 17.2.2 “觀察/通知”體系結構...........................373 17.3 Kineograph ............................374 17.3.1 整體架構.........................................375 17.3.2 增量計算機制....................................375 17.4 DryadInc ....................................376 參考文獻..............................................................377 附錄A 硬件體系結構及常用性能指標......................................378 附錄B 大數據必讀文獻....................................380

總結

以上是生活随笔為你收集整理的博主新书:《大数据日知录:架构与算法》目录的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。