端端舝

Indexing Delight
Thinking Cap of Fractal-tree Indexes
BohuTANG@2012/12
overred.shuttler@gmail.com

B-tree
Invented in 1972, 40 years!

B-tree

Block0

Block1 Block2 Block3
.... ....

Block4 Block5
.....................................................................................

File on disk: ... Block0 ... ... Block3 ... Block5 ...

B-tree Insert
Insert x

Block0

seek

.... ....

Block4 Block5
.....................................................................................


B-tree Insert
Insert x

Block0

seek

.... ....

seek

Block4 Block5
.....................................................................................


B-tree Insert
Insert x

Block0

seek

.... ....

seek

Block4 Block5
.....................................................................................


Insert one item causes many random seeks!

B-tree Search
Search x

Block0

seek

.... ....

seek

Block4 Block5
.....................................................................................

Query is fast, I/Os costs O(logBN)

B-tree Conclusions
♂ Search: O(logBN ) block transfers.
♂ Insert: O(logBN ) block transfers(slow).
♂ B-tree range queries are slow.
♂ IMPORTANT:
--Parent and child blocks sparse in disk.

A Simpli?ed Fractal-tree
Cache Oblivious Lookahead Array, invented by MITers

COLA

log2N

...........

Binary Search in one level:O(log2N) 2

COLA (Using Fractional Cascading)

log2N

...........

♂ Search: O(log2N) block transfers.
♂ Insert: O((1/B)log2N) amortized block transfers.
♂ Data is stored in log2N arrays of sizes 2, 4, 8, 16,..
♂ Balanced Binary Search Tree

COLA Conclusions

♂ Search: O(log2N) block transfers(Using Fractional
Cascading).
♂ Insert: O((1/B)log2N) amortized block transfers.
♂ Data is stored in log2N arrays of sizes 2, 4, 8, 16,..
♂ Balanced Binary Search Tree
♂ Lookahead(Prefetch), Data-Intensive!
♂ BUT, the bottom level will be big and bigger,
merging expensive.

COLA vs B-tree
♂ Search:
-- (log2N)/(logBN)
= log2B times slower than B-tree(In theory)
♂ Insert:
--(logBN)/((1/B)log2N)
= B/(log2B) times faster than B-trees(In theory)
if B = 4KB:
COLA search is 12 times slower than B-tree
COLA insert is 341 times faster than B-tree

LSM-tree
In memory
buffer

buffer ... buffer

buffer ... buffer ... buffer ... buffer

♂ Lazy insertion, Sorted before
♂ Leveli is the buffer of Leveli+1
♂ Search: O(logBN) * O(logN)
♂ Insert:O((logBN)/B)

LSM-tree (Using Fractional Cascading)
In memory
buffer

buffer ... buffer


♂ Search: O(logBN) (Using FC)
♂ Insert:O((logBN)/B)
♂ 0.618 Fractal-tree?But NOT Cache Oblivious...

LSM-tree (Merging)
In memory
buffer

buffer ... buffer
merge merge merge


A lot of I/O wasted during merging!
Like a headless fly flying... Zzz...

Fractal-tree Indexes
Just Fractal. Patented by Tokutek...

Fractal-tree Indexes

Search: O(logBN) Insert: O((logBN)/B) (amortized)
Search is same as B-tree, but insert faster than B-tree

Fractal-tree Indexes (Block size)

....

.... .... ....

B is 4MB...


full

....

.... .... ....

B is 4MB...


full ....

.... .... ....

B is 4MB...


..

.. .. ..

full

.. ... ... ... ..

.. .. .. .. .. ..

Fractal! 4MB one seek...

汍
B -tree
Just a constant factor on Block fanout...

汍
B -tree
B-tree
Fast 汍=1/2

Search

Slow
AOF
Slow
Fast
Inserts

Optimal Curve

汍
B -tree

insert search

B-tree O(logBN) O(logBN)
(?=1)

?=1/2 O((logBN)/﹟B) O(logBN)

?=0 O((logN)/B) O(logN)

if we want optimal point queries + very fast inserts, we
should choose ?=1/2

汍
B -tree

So, if block size is B, the fanout should be ﹟B

Cache Oblivious Data
Structure
All the above is JUST Cache Oblivious Data Structures...

Cache Oblivious Data Structure
Question:
Reading a sequence of k consecutive blocks
at once is not much more expensive than
reading a single block. How to take advantage
of this feature?

Cache Oblivious Data Structure
My Questions(In Chinese):
Q1ㄩ
硐衄1MB囀湔ㄛ崋欴參謗跺64MB衄唗恅璃磁
甜傖珨跺衄唗恅璃ˋ

Q2ㄩ
湮嗣杅儂迮棠攫ㄛ蟀哿黍龰嗣跺Block睿黍龰
等跺Block豪煤眈船祥湮ㄛ婓Q1笢⺼睡瞳蚚涴跺
蚥岊ˋ

nessDB
You should agree that VFS do better than yourself cache!
https://github.com/shuttler/nessDB

nessDB

.. ... ... ... ..

.. .. .. .. .. ..

Each Block is Small-Splittable Tree

nessDB, What's going on?

..

.. .. ..

.. ... ... ... ..

.. .. .. .. .. ..

From the line to the plane..

Thanks!
Most of the references are from:
Tokutek & MIT CSAIL & Stony Brook.

Drafted By BohuTANG using Google Drive, @2012/12/12

端端舝

Indexing delight --thinking cap of fractal-tree indexes

Recommended

More Related Content

Recently uploaded (20)

Featured (20)

Indexing delight --thinking cap of fractal-tree indexes