2011年10月19~21日に開催された「INSIGHT OUT 2011」のセッション「笔辞蝉迟驳谤别厂蚕尝アーキテクチャ入门」の講演資料です。
「INSIGHT OUT 2011」の詳細については、以下を参照ください。
http://www.insight-tec.com/insight-out-2011.html
主に論文 "Weak Consistency: A Generalized Theory and Optimistic Implementations for Distributed Transactions" の紹介。
https://pmg.csail.mit.edu/pubs/adya99__weak_consis-abstract.html
2011年10月19~21日に開催された「INSIGHT OUT 2011」のセッション「笔辞蝉迟驳谤别厂蚕尝アーキテクチャ入门」の講演資料です。
「INSIGHT OUT 2011」の詳細については、以下を参照ください。
http://www.insight-tec.com/insight-out-2011.html
主に論文 "Weak Consistency: A Generalized Theory and Optimistic Implementations for Distributed Transactions" の紹介。
https://pmg.csail.mit.edu/pubs/adya99__weak_consis-abstract.html
This document discusses the internals of WalB Driver, which is a data storage driver developed by Cybozu Lab. It records only redo logs, not undo logs, to avoid performance degradation. WalB completes I/O operations by just writing redo logs to log storage, without needing to read current data or generate undo logs. This allows it to overlap and parallelize log flushing and data I/O for efficient write performance.
The document introduces an algorithm called B2ST (Big tree, Big string Suffix Tree construction) for constructing suffix trees of data larger than main memory. B2ST partitions the input string into partitions that fit in memory, sorts suffixes within partition pairs using suffix arrays with LCP information, and merges the results by building a suffix tree from the suffix array streams and order arrays on disk in a single pass without reloading the entire input.
The document introduces two algorithms for constructing a suffix array: SA-IS and SA-DS. SA-IS uses induced sorting of longest common prefix substrings, while SA-DS uses radix sorting of fixed-length substrings. The document provides pseudocode for the algorithms and explains various terms and data structures used, including longest minimal suffixes, L-type and S-type characters, and buckets for sorting.
An Efficient Backup and Replication of StorageTakashi Hoshino
?
This document describes WalB, a Linux kernel device driver that provides efficient backup and replication of storage using block-level write-ahead logging (WAL). It has negligible performance overhead and avoids issues like fragmentation. WalB works by wrapping a block device and writing redo logs to a separate log device. It then extracts diffs for backup/replication. The document discusses WalB's architecture, algorithm, performance evaluation and future work.
WalB is a block device driver that uses write-ahead logging (WAL) to provide efficient incremental backups. It aims to address the lack of a good backup solution that works online, with low overhead, across various applications, and using commodity hardware and free software. WalB acts as a wrapper device that logs writes to a separate log device to enable consistent incremental backups of the data device.
The document summarizes VMware vSphere backup operations at Cybozu Labs, including: (1) the vSphere environment containing 78 VMs across 3 ESXi hosts and 4 iSCSI storages, (2) backup software and policy that backs up all VMs weekly retaining past generations, and (3) backup data size and performance, noting that while total provisioned disks are 4.4TB, archives consume only 1TB due to compression and removing zero blocks.
Vmbkp is an online backup tool for VMware vSphere that performs full, differential, and incremental backups of virtual machines. It uses efficient archive formats and sequential I/O to backup virtual disk (VMDK) files. Key features include multi-generation backup management, command-line interface, and support for backup scheduling via Cron. The tool utilizes the VDDK and VI Java libraries to interface with vSphere and perform tasks like snapshots and VMDK access during the backup process.
Protect Your IoT Data with UbiBot's Private Platform.pptxユビボット 株式会社
?
Our on-premise IoT platform offers a secure and scalable solution for businesses, with features such as real-time monitoring, customizable alerts and open API support, and can be deployed on your own servers to ensure complete data privacy and control.
6. 差分取得手法
? フルスキャン
a b c d e a b’ c d e’ 1 b’ 4 e’
? 差分ビットマップ使用
a b c d e a b’ c d e’ 1 b’ 4 e’
00000 01001
? WAL (Write-Ahead Log) 使用
a b c d e a b’ c d e’ 1 b’ 4 e’
1 b’ 4 e’
7
8. WalB Architecture
WalB Dev Any Application WalB Log
Controller (File System, DBMS, etc) Extractor
Control Read Write Log
Wrapper
Block Device
(WalB Dev)
Any Block Device Any Block Device
for Data (Data device) for Log (Log device)
Not special format An original format
10
10. WalB が満たすべき性質
? Read the latest written data
– 上書きされたはずの古いデータを読んではいけない
? Storage state uniqueness
– 歴史が変わらないこと(Log と Data の IO 順序)
? Durability of flushed data
– 永続化保証 フラグの要求を満たす
? Crash recovery without undo
– Undo ログなし,redo ログのみで一貫性を確保
12
12. IO 処理フロー (Easy)
Write
Submitted Completed
WalB write IO response
(Wait for overlapped IOs done)
Packed
Log IO response Data IO response
Time
Log submitted Log completed Data submitted Data completed
Read
Submitted Completed
Data IO response
Time
Data submitted Data completed
14
13. IO 処理フロー (Fast)
Write
Submitted Completed (Wait for overlapped IOs done)
WalB write IO response
Packed
Log IO response Data IO response
Time
Log submitted Log completed Data submitted Data completed
Pdata inserted Pdata deleted
Read
Submitted Completed
Data IO response
Time
(Data submitted) (Data completed)
Pdata copied 15
14. Pdata for Deferred Write
? Read the latest written data のため
– Fast algorithm のみ必要
? Read IO の挙動
– 重複する Write IO が pdata に存在したら
古い順に重複領域をコピー
– pdata に存在しない領域は data device から読む
16
15. 重複 IO 実行の直列化
Wait for overlapped IOs done
Data IO response
Time
Data submitted Data completed
Oldata inserted Got notice Oldata deleted Sent notice
? Storage State Uniqueness のために必要
? Oldata の insert/delete が FIFO であれば
IO につきカウンタひとつで制御可能
18
18. Pros and Cons
WalB Snapshot
Pdata/oldata
Read Index search
search/copy
Write twice (easy)
Index modification
Write
Pdata/oldata (+ old data copy)
modification (fast)
Get Index search and
diffs
Sequential read
Non-sequential read
Typical
Soft.
--- ZFS, BtrFS, (LVM)
WalB + Index ~= Block device with snapshot access 22