This document proposes ReTSO, a lock-free transaction management system for large-scale storage systems. ReTSO uses a single transaction status oracle to coordinate transactions across storage nodes in a lock-free manner, reducing load and latency. Initial results show ReTSO can process over 70,000 transactions per second with sub-millisecond latency while providing fault tolerance through write-ahead logging. Integration with systems like HBase and query engines is planned to handle real-world workloads.
1 of 23
Downloaded 16 times
More Related Content
Retso hotdep-2011
1. Lock-free transactional
support for large-scale
storage systems
Flavio Junqueira, Benjamin Reed, Maysam Yabandeh
Yahoo! Research
June 2011
2. Big data
Large data sets
Unstructured, semi-structured data
Critical for business logic
Examples of such data
Web logs, server logs, social media, etc
June 2011 2
3. Big data
+160% clicks +43% clicks
vs. one-size 鍖ts all vs. editor selected
Eric Baldeschwieler @IBM Big Data, May 2011
June 2011 3
4. Big data: Hadoop
Eric Baldeschwieler @IBM Big Data, May 2011
June 2011 4
5. Background
Database generations in batches
Input e.g.,
Input Hours of Output Hours of Output Hbase,
Input DB MapReduce DB MapReduce DB HDFS
Online concurrent updates
Input txn
Output Require transactional
Input txn DB support
June 2011 5
6. Examples
Mutable tables
Various indexes: Web, news, shopping, coupons
User and content models
Characteristics
Concurrency
Losing updates is undesirable
There are concurrent reads and they must be consistent
June 2011 6
7. Semantics
Read only previously committed values
Txn
w(x,v)
r(x) = v
w(x,v) Time
June 2011 7
8. Semantics
No concurrent writes to the same row
Txn
w(x,v)
w(x,v) Time
At least one must abort
June 2011 8
9. Snapshot Isolation
Known in the database realm
Con鍖icting transactions
Write to the same element (e.g., row)
Time range between start and commit overlap
Ef鍖cient implementation by versioning
June 2011 9
10. Locks?
Previous approaches: Lock data to modify
Convoy effect [Percolator, OSDI10]
Delays of several seconds
Higher overhead on data servers
Our approach
Lock-free, centralized transaction manager
Single point of failure, potential bottleneck?
June 2011 10
11. Transaction Status Oracle
Single process
Processes client inquiries about transactions
Includes a timestamp oracle
Keeps state
TO TSO about committed
rows
DB1 Client DB2
June 2011 11
12. Transaction Status Oracle
Single process
Processes client inquiries about transactions
Includes a timestamp oracle
Keeps state
TO TSO about committed
rows
Ts
DB1 Client DB2
June 2011 11
13. Transaction Status Oracle
Single process
Processes client inquiries about transactions
Includes a timestamp oracle
Keeps state
TO TSO about committed
rows
Ts
w(r2, v2, Ts(txnr)) r(r1)
DB1 ACK Client v1, Ts(txnw), DB2
Ts(txnw) < Ts(txnr)
June 2011 11
14. Transaction Status Oracle
Single process
Processes client inquiries about transactions
Includes a timestamp oracle
Keeps state
TO TSO about committed
rows
Ts Tc(txnw) < Ts(txnr)?
w(r2, v2, Ts(txnr)) r(r1)
DB1 ACK Client v1, Ts(txnw), DB2
Ts(txnw) < Ts(txnr)
June 2011 11
15. Transaction Status Oracle
Single process
Processes client inquiries about transactions
Includes a timestamp oracle
Keeps state
TO TSO about committed
rows
Ts Commit r2 Tc(txnw) < Ts(txnr)?
w(r2, v2, Ts(txnr)) r(r1)
DB1 ACK Client v1, Ts(txnw), DB2
Ts(txnw) < Ts(txnr)
June 2011 11
16. Transaction Status Oracle
Single process
Processes client inquiries about transactions
Includes a timestamp oracle
Keeps state
TO TSO about committed
rows
Ts Commit r2 Tc(txnw) < Ts(txnr)?
w(r2, v2, Ts(txnr)) r(r1)
DB1 ACK Client v1, Ts(txnw), DB2
Ts(txnw) < Ts(txnr)
Cleanup(r2, txnr)
June 2011 11
17. ReTSO: Design choices
TSO
Keeps state of modi鍖ed rows
In-memory state
Highest commit timestamp of all garbage-collected rows
Auto-GC Hash map
Lazy garbage-collection
Upon a hit
June 2011 12
18. ReTSO: Increasing dependability
Remote write-ahead log
Inquiries Backup
ReTSO ReTSO
Warm or cold
Updates
Writes to WAL
are synchronous but do WAL e.g., NFS, BookKeeper
not block other txns
[http://zookeeper.apache.org/bookkeeper]
June 2011 13
19. Preliminary results
Coded in Java
Except for hash map (C++ with JNI interface)
Uses BookKeeper for WAL
10 identical servers
2.13 Dual Core Intel Xeon
4GB of RAM
1 Gigabit interfaces
June 2011 14
20. Preliminary results
Average throughput observed
3 clients, 1,000 concurrent transactions
81k TPS
Average latency
1 client, 1 txn
0.87 ms (with WAL)
0.17 ms (without WAL)
June 2011 15
21. Preliminary results
Increasing the load of the system
1 to 16 clients 18
ReTSO
16 WAL-disabled
Max is 72k TPS 14
12
Latency in ms
10
8
6
4
2
0
20000 40000 60000 80000 100000 120000
Throughput in TPS
June 2011 16
22. Whats baking?
Integration
HBase
Query engine
Real workloads
June 2011 17
23. Summary
Transaction management for large-scale data
repositories
Lock-based vs. Lock-free
ReTSO is lock-free and dependable
Reduced load on storage nodes
Low latency despite faults
Performance suf鍖cient for realistic applications
June 2011 18