18. Dynamic Migration
One-time Migration
- e.g. Shard Rebalancing
Real-time Migration (as a Service?)
- e.g. MySQL => HBase in Real-time
19. Ex: Add new shard
Shard1 Shard2
Shard1 Shard3Shard2
20. Ex: Change shard rule
ID: 1 - 1000
ID: 1001 - 2000
ID: 2001 - 3000
Range
ID mod 3: 0
ID mod 3: 1
ID mod 3: 2
Modulus
21. Ex: Copy to diff. DBMS
MySQL
HBase
MongoDB
NO side effect
22. Ex: Copy to diff. schema
MySQL
MySQL
MySQL
ID AGE CNT V
1 30 5 msg1
2 29 10 msg2
ID CNT V
1 5 msg1
ID CNT V
2 10 msg2
NO side effect
23. Ex: Write w/ custom processing
MySQL
MySQL
MySQL
Events
ID AGE CNT V
1 30 5 msg1
2 29 10 msg2
1) REQ
INFO2)
3) GET INFO
4) MERGE
ID FRIENDS
1 4,5,6,7,8,9
2 3,4,5,8,10
35. Req 1. Row Format
螳 Binlog Before, After 螳
N/A 1 : a=1, b=2, c=3
Before After
INSERT
1 : a=1, b=99, c=99 1 : a=1, b=2, c=3UPDATE
1 : a=1, b=99, c=99 N/ADELETE
36. Req 2. Primary Key
DELETE
Source Destination
INSERT
UPDATE
Delete Old/Insert New
Delete Old/Insert New
Delete Old
覈 Table Primary Key螳 譟伎伎
37. ADT
Req 3. Custom Data Handler
MySQL
Binlog
Processor
MySQL
Data
Crawler
Custom Data Handler
(: Shard 覿覦 碁る)
New
Masters
Load
Current
Master
Data
覿覿 讌 蟲伎
47. How Data Crawler Works
SELECT
SELECT using PK of ex-selected rows
INSERT
INSERT IGNORE is required
if Binlog Processor runs together
48. How to Handle Binlog? (1/4)
Row Event
Type
Query to Dest. (normally)
WRITE insert( after )
DELETE delete( before )
UPDATE update( before, after )
Normally binlog events are handled like this.
49. How to Handle Binlog? (2/4)
However, we should consider
- Unexpected restart
- Data inserted by Crawler
Overwriting!
50. How to Handle Binlog? (3/4)
Row Event
Type
Query to Dest.
(Overwriting)
WRITE replace( after )
DELETE delete( before )
UPDATE
if( before.pk!=after.pk ){
delete( before )
}
replace( after )
51. How to Handle Binlog? (4/4)
Normal Query
UPDATE SET @1=after.1, @2=after.2,
WHERE pk_col=before.pk
Transformation 1: Unrolling
DELETE FROM WHERE pk_col=before.pk;
INSERT INTO VALUES(after.1, after.2,);
Transformation 2: Overwriting
DELETE FROM WHERE pk_col=before.pk;
REPLACE INTO VALUES(after.1, after.2,);
Transformation 3: Reducing
Delete [before] only if PK is changed
52. Strategy 1
Run sequentially
1. Crawl Data
2. Process Binary Log (after 1 is finished)
Binlog file (created before starting) is required
If crawling takes more than 3 days, then?
53. Strategy 2
Run parallel with master DB
1. Start binlog processor
2. Start Data Crawler (ASAP after 1 is started)
with INSERT IGNORE
Problem: conflicts
55. Strategy 2-1
Data Crawler
- SELECT FOR UPDATE when crawling
- ROLLBACK after INSERT IGNORE
Binlog Processor
- Just act normally
Because theres no logs for locked rows
Problem: Multi row lock is dangerous for master DB
56. Strategy 2-2
Binlog Processor
- Cache deleted history during a few minutes
Data Cralwer
- If delete history exists, no INSERT
Problems
- Complicated: Lock is neccesary for history cache
57. Strategy 2-3
Same as Strategy 2-1, except using slave DB
If sync is finished with slave,
restart with new config
- Receive binlog from master
59. Test Scenario
Split into 2 shards from 1 master DB
Binary log only
Query many updates into master DB
1K active sessions with random DML & data
Make errors
Master DB: ifdown > mysql restart > ifup
Compare data
60. Test DML List
INSERT INTO
INSERT IGNORE INTO
INSERT INTO ON DUPLICATE KEY
REPLACE INTO
UPDATE
DELETE
61. Test Table Schema
Column
Name
Type
Table 1 Table 2
Value
(when exec. DML)Primary Unique Primary Unique
no int 1 2 1 Random(1~n)
seq int 2 Random(1~n)
uk int 1 1 Random(1~n)
update_cnt int Incr. when update
random_val text Random String
(危 )
62. Step1. Split into 2 Shards
MySQL
Source
Query
Tool
ADT
Binlog
Events
Write
MySQL
Dest.
Split
Shards
67. Wish to Apply for
Shard reconstruction (default)
MySQL binary log > NoSQL
Copy data change history into OLAP
MySQL binary log > Push Notification
Re-construct shards by GPS Point (Kakao Taxi?)
69. Next Dev. Plans
Change language: Java > GoLang
Control Tower: Admin & Monitoring
Is ADT alive?
Save checkpoint for ungraceful restart
Support Multiple DB Types
Redis, PgSQL,