當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

Storage Systems

發(fā)布時(shí)間：2024/1/18 编程问答 39 豆豆

生活随笔收集整理的這篇文章主要介紹了 Storage Systems 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

參考： $Computer\ Arichitecture\ (6\th\ Edition)$

Bus
Disk Storage
Use Arrays of Small Disks?
RAID
- RAID 0: Striping
- RAID 1: Disk Mirroring/Shadowing
- RAID 2: 位交叉式海明編碼陣列
- RAID 3: Bit-interleaved Parity Disk
- RAID 4: Block-interleaved Parity Disk
- RAID 5: Block-interleaved Distributed Parity
- RAID 6: 雙維奇偶校驗(yàn)獨(dú)立存取盤(pán)陣列
- RAID 的實(shí)現(xiàn)
Storage Environment
- Direct Attached Storage (DAS)
- Network Attached Storage (NAS)
- Storage Area Network (SAN)

Memory (存儲(chǔ)系統(tǒng)): 內(nèi)存
Storage Systems (存貯系統(tǒng)): 外存 (持久性、非易失性)

Bus

I/O buses tap into the processor-memory bus via bus adaptors: 適配器用于速度匹配（做緩存）、做接口

Main components of Intel Chipset: Pentium 4

Northbridge (接高速設(shè)備的適配器): Handles memory, Graphics
Southbridge (接低速設(shè)備的適配器): I/O, PCI bus, Disk controllers, USB controllers, Audio, Serial I/O, Interrupt controller, Timers

IMC（Integrated Memory Controller）

可以看到，CPU 集成度越來(lái)越高: Memory Controller 被集成到了 CPU 內(nèi)部，北橋消失了。同時(shí) L1 和 L2 Cache 被集成到了每個(gè) Core 里，L3 Cache 被四個(gè)核共享，也被集成到了 CPU 里
QPI (Quick Path Interconnect)——“快速通道互聯(lián)”，支持多條系統(tǒng)總線連接，取代前端總線 (FSB)

下一步把 Memory 也集成進(jìn) CPU…

The move from Parallel to Serial I/O

Parallel I/O (ISA bus, PCI, SCSI, IDE)
- Parallel bus clock rate limited by clock skew across long bus (~100MHz)
- High power to drive large number of loaded bus lines
- Central bus arbiter (總線仲裁器) adds latency to each transaction, sharing limits throughput
- Expensive parallel connectors and backplanes/cables (all devices pay costs)
Dedicated Point-to-point Serial Links (Ethernet, Infiniband, PCI Express, SATA, USB, Firewire)
- Point-to-point links run at multi-gigabit speed using advanced clock/signal encoding (requires lots of circuitry at each end)
- Lower power since only one well-behaved load
- Multiple simultaneous transfers
- Cheap cables and connectors (trade greater endpoint transistor cost for lower physical wiring cost), customize bandwidth per device using multiple links in parallel
Examples: 硬盤(pán)接口: IDE (并行) $\rightarrow$ SATA (串行)

Disk Storage

Storage emphasizes reliability and scalability (可擴(kuò)展性) as well as cost-performance (性價(jià)比)
What is “Software king” that determines which HW features actually used?
- Compiler for processor
- Operating System for storage

Flash: The future of disks? (固態(tài)硬盤(pán))

Flash drive advantages: Lower power (no moving parts), Much faster seek time, 100X IOs per second (no moving parts), Greater reliability (no moving parts), Lower noise (no moving parts) (數(shù)據(jù)不移動(dòng)時(shí)表現(xiàn)好)
Flash disadvantages: Cost (20-100x disk cost/GB), Slow writes with current design (competitive with disks), write endurance (耐久度不行，某一個(gè)位置寫(xiě)的次數(shù)多就壞了) - not an issue for most applications since use write-leveling to spread wear around blocks on chip (通過(guò)軟件來(lái)處理該問(wèn)題)

Disk Figure of Metric: Areal Density

Bits recorded along a track; Metric is Bits Per Inch (BPI)
Number of tracks per surface; Metric is Tracks Per Inch (TPI)
bit density per unit area; Metric is Bits Per Square Inch: Areal Density $\textrm{BPI} \times \textrm{TPI}$

Disk Drive Performance

Disk Service Time: Time taken by a disk to complete an I/O request is sum of
- Seek Time (尋道時(shí)間), Rotational Latency, Data Transfer Rate（MB/s）

Utilization vs. Response time

利用率和響應(yīng)時(shí)間

利用率 (I/O 請(qǐng)求頻率) 越高，響應(yīng)時(shí)間越長(zhǎng)

反映存儲(chǔ)外設(shè)可靠性能的參數(shù)

Reliability 系統(tǒng)可靠性: 系統(tǒng)從初始狀態(tài)開(kāi)始一直提供服務(wù)的能力
- 用平均無(wú)故障時(shí)間 MTTF (Mean Time to Failure) 來(lái)衡量
Availability 系統(tǒng)可用性: 系統(tǒng)正常工作時(shí)間在連續(xù)兩次正常服務(wù)間隔時(shí)間中所占的比率
- 用 $\frac{\textrm{MTTF}}{\textrm{MTTF} +\textrm{MTTR}}$ （Mean Time To Repair, 平均修復(fù)時(shí)間）來(lái)衡量 (修復(fù) $\rightarrow$ 數(shù)據(jù)恢復(fù))
- MTTF + MTTR = MTBF（Mean Time Between Failure, 平均故障間隔時(shí)間）
Dependability 系統(tǒng)可信性: 多大程度上可以合理地認(rèn)為服務(wù)是可靠的
- 可信性不可度量

Use Arrays of Small Disks?

Replace Small Number of Large Disks with Large Number of Small Disks!

Disk Arrays have potential for large data and I/O rates, high MB per cu. ft., high MB per KW, but what about reliability?

Array Reliability

Reliability of $N$ disks = Reliability of 1 Disk $\ N$
Arrays (without redundancy) too unreliable to be useful!

RAID

Redundant Arrays of (Inexpensive) Disks; 廉價(jià)磁盤(pán)冗余陣列

Files are “striped” across multiple disks (將數(shù)據(jù)以條帶化的形式存儲(chǔ)在很多磁盤(pán)上)
Redundancy yields high data availability 可用性 (Disks will still fail)
- Availability: service still provided to user, even if some components failed
Contents reconstructed from data redundantly stored in the array
- Capacity penalty to store redundant info
- Bandwidth penalty to update redundant info

RAID 0: Striping

數(shù)據(jù)條帶化

RAID 0: 非冗余磁盤(pán)陣列，無(wú)冗余信息；
將數(shù)據(jù)分成條帶 (stripe)，以條帶為單位交叉地分布存放到各個(gè)磁盤(pán)中，形成一個(gè)容量更大，能并行工作的磁盤(pán) (圖中 Stripe0, Stripe1… 為按順序排列的條帶，其大小稱(chēng)為條帶寬度)

所有磁盤(pán)可以并行讀，因此性能很高；但不提供數(shù)據(jù)冗余，只要其中任一磁盤(pán)故障，整個(gè)系統(tǒng)都無(wú)法正常工作
- 適用于需要高帶寬磁盤(pán)訪問(wèn)的場(chǎng)合

RAID 1: Disk Mirroring/Shadowing

Each disk is fully duplicated onto its “mirror”: Very high availability can be achieved

Bandwidth sacrifice on write: Logical write = two physical writes (并行寫(xiě)入磁盤(pán)及其鏡像盤(pán)，且不需要計(jì)算校驗(yàn)信息，因此寫(xiě)入速度比級(jí)別更高的 RAID 都快)
Reads may be optimized: 從 RAID 1 讀取數(shù)據(jù)時(shí)，磁盤(pán)及其鏡像盤(pán)可獨(dú)立地同時(shí)工作，由最先讀出數(shù)據(jù)的磁盤(pán)提供數(shù)據(jù)
Most expensive solution: 100% capacity overhead

RAID 2: 位交叉式海明編碼陣列

每個(gè)數(shù)據(jù)盤(pán)存放數(shù)據(jù)字的一位，按位交叉存放，即 Disk0 存放所有數(shù)據(jù)字的第 0 位，Disk1 存放第 1 位… 各個(gè)數(shù)據(jù)盤(pán)上的相應(yīng)位計(jì)算海明 Hamming 校驗(yàn)碼，編碼位被存放在多個(gè)校驗(yàn)（Ecc）磁盤(pán)的對(duì)應(yīng)位上
從數(shù)據(jù)盤(pán)讀數(shù)據(jù)時(shí)，也要讀出 Hamming 碼，用于判斷數(shù)據(jù)是否有錯(cuò)并加以糾正 (Hamming 碼可以糾正 1 位錯(cuò)誤、檢測(cè)兩位錯(cuò)誤)

需要多個(gè)磁盤(pán)來(lái)存放海明校驗(yàn)碼信息，冗余磁盤(pán)數(shù)量與數(shù)據(jù)磁盤(pán)數(shù)量的對(duì)數(shù)成正比（ $log_2m$ ， $m$ 為數(shù)據(jù)盤(pán)的個(gè)數(shù)）

RAID 3: Bit-interleaved Parity Disk

位交叉奇偶校驗(yàn)盤(pán)陣列

當(dāng)某個(gè)磁盤(pán)發(fā)生故障時(shí)，磁盤(pán)控制器本身就能發(fā)現(xiàn)哪個(gè)磁盤(pán)出錯(cuò)，因此不需要采用復(fù)雜的 Hamming 碼，使用奇偶校驗(yàn)即可

Logically, a single high capacity, high transfer rate disk: good for large transfers 單盤(pán)容錯(cuò)并行傳輸 (細(xì)粒度磁盤(pán)陣列，即條帶寬度較小 (1 個(gè)字節(jié)或 1 位)。因此對(duì)于絕大多數(shù) I/O 請(qǐng)求都需要磁盤(pán)陣列中所有磁盤(pán)為之服務(wù)，因此能獲得很高的數(shù)據(jù)傳輸率)
$1 / N$ capacity cost for parity if $N$ data disks and $1$ parity disk
- Wider arrays reduce capacity costs, but decreases reliability/availability

RAID3 讀寫(xiě)特點(diǎn)

假定：有 4 個(gè)數(shù)據(jù)盤(pán)和一個(gè)冗余盤(pán)
- 讀出數(shù)據(jù)，一共需要 5 次磁盤(pán)讀操作 (同時(shí)讀 4 個(gè)數(shù)據(jù)盤(pán)和一個(gè)冗余盤(pán))
- 寫(xiě)數(shù)據(jù)需要 3 次磁盤(pán)讀和 2 次磁盤(pán)寫(xiě)操作

RAID 4: Block-interleaved Parity Disk

塊交叉奇偶校驗(yàn)磁盤(pán)陣列

Inspiration for RAID 4

在 RAID 3 中，一次磁盤(pán)訪問(wèn)將對(duì)磁盤(pán)陣列中的所有磁盤(pán)進(jìn)行操作。RAID 4 希望使用較少的磁盤(pán)參與操作，以使磁盤(pán)陣列可以并行進(jìn)行多個(gè)數(shù)據(jù)的磁盤(pán)操作

RAID 4 數(shù)據(jù)以塊交叉的方式存于各盤(pán)，奇偶校驗(yàn)信息存在一臺(tái)專(zhuān)用盤(pán)上 (parity disk)，冗余代價(jià)與 RAID 3 相同 (采用粗粒度的磁盤(pán)陣列，即采用比較大的條帶(塊)為單位進(jìn)行交叉存放和計(jì)算奇偶校驗(yàn))；訪問(wèn)數(shù)據(jù)的方法與 RAID 3 不同
- Small read: every block has an error detection field——每個(gè)磁盤(pán)獨(dú)立的進(jìn)行讀操作；Allows independent reads to different disks simultaneously (只有磁盤(pán)出現(xiàn)故障時(shí)，才會(huì)讀校驗(yàn)盤(pán)，進(jìn)行數(shù)據(jù)重建)
  - To catch errors on read, rely on error detection field vs. the parity disk
- Large write: 寫(xiě)入操作時(shí)，由于要重新計(jì)算校驗(yàn)碼，因此幾乎要訪問(wèn)所有磁盤(pán)

RAID 5: Block-interleaved Distributed Parity

Inspiration for RAID 5

Small writes (write to one disk): since P has old sum, compare old data to new data, add the difference to P

Small Write Algorithm

1 Logical Write = 2 Physical Reads + 2 Physical Writes

Problems of Disk Arrays: Small Writes

Small writes are limited by Parity Disk:
- Write to $D_0$ , $D_5$ both also write to P disk (因此還是不能同時(shí)寫(xiě) $D_0$ 和 $D_5$ )

RAID 5: High I/O Rate Interleaved Parity

塊交叉分布式奇偶校驗(yàn)盤(pán)陣列

為了解決上面的問(wèn)題，把校驗(yàn)信息分布到磁盤(pán)陣列中的各個(gè)磁盤(pán)上，無(wú)專(zhuān)用冗余盤(pán)，每一行數(shù)據(jù)塊的校驗(yàn)塊被依次錯(cuò)開(kāi)、循環(huán)地存放到不同盤(pán)中，使奇偶校驗(yàn)信息均勻分布在所有磁盤(pán)上
- Independent writes possible because of interleaved parity

RAID 6: 雙維奇偶校驗(yàn)獨(dú)立存取盤(pán)陣列

Inspiration:

Recovering from 2 failures

RAID6 特點(diǎn)

雙維奇偶校驗(yàn)獨(dú)立存取盤(pán)陣列: 在 RAID5 的基礎(chǔ)上增加了一個(gè)獨(dú)立的校驗(yàn)信息，放在另一個(gè)校驗(yàn)盤(pán)中，寫(xiě)入數(shù)據(jù)要訪問(wèn) 1 個(gè)數(shù)據(jù)盤(pán)和 2 個(gè)冗余盤(pán)，可容忍雙盤(pán)出錯(cuò)
數(shù)據(jù)以塊交叉方式存于各盤(pán)，檢、糾錯(cuò)信息均勻分布在所有磁盤(pán)上

RAID 的實(shí)現(xiàn)

軟件方式：陣列管理軟件由主機(jī)來(lái)實(shí)現(xiàn)
- 優(yōu)點(diǎn)：成本低；
- 缺點(diǎn)：過(guò)多地占用主機(jī)時(shí)間，帶寬指標(biāo)上不去
陣列卡方式：把 RAID 管理軟件固化在 I/O 控制卡上，從而可不占用主機(jī)時(shí)間，一般用于工作站和 PC 機(jī)
子系統(tǒng)方式：這是一種基于通用接口總線的開(kāi)放式平臺(tái)，可用于各種主機(jī)平臺(tái)和網(wǎng)絡(luò)系統(tǒng)

Storage Environment

Direct Attached Storage (DAS)

直連

Servers connect directly to the disk array typically via a SCSI interface.

Network Attached Storage (NAS)

網(wǎng)絡(luò)附加存儲(chǔ)——網(wǎng)絡(luò)上的文件系統(tǒng)

Server 用來(lái)提供服務(wù)，有另外一套專(zhuān)門(mén)的體系負(fù)責(zé)存儲(chǔ)
NAS Devices access the disks in an array via direct connection or through external connectivity

Storage Area Network (SAN)

存儲(chǔ)區(qū)域網(wǎng)絡(luò)——網(wǎng)絡(luò)上的磁盤(pán)

Servers access the disk array through a dedicated network designated as SAN (consists of Fibre Channel switches) (專(zhuān)門(mén)構(gòu)建一個(gè)網(wǎng)絡(luò)進(jìn)行存儲(chǔ)介質(zhì)和服務(wù)器之間的交互)

總結(jié)

以上是生活随笔為你收集整理的Storage Systems的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

生活随笔