ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Indexing Delight
   Thinking Cap of Fractal-tree Indexes
                    BohuTANG@2012/12
                 overred.shuttler@gmail.com
B-tree
Invented in 1972, 40 years!
B-tree

                                                                 Block0




                           Block1                                 Block2                                           Block3
                                                ....                                         ....




            Block4                                                                                                                   Block5
                                    .....................................................................................




File on disk:        ...       Block0                  ...             ...           Block3                 ...             Block5     ...
B-tree Insert
                                                               Insert x

                                                                 Block0

                                                                                                          seek



                           Block1                                 Block2                                           Block3
                                                ....                                         ....




            Block4                                                                                                                   Block5
                                    .....................................................................................




File on disk:        ...       Block0                  ...             ...           Block3                 ...             Block5     ...
B-tree Insert
                                                               Insert x

                                                                 Block0

                                                                                                          seek



                           Block1                                 Block2                                           Block3
                                                ....                                         ....

                                                                                                                                     seek



            Block4                                                                                                                   Block5
                                    .....................................................................................




File on disk:        ...       Block0                  ...             ...           Block3                 ...             Block5     ...
B-tree Insert
                                                               Insert x

                                                                 Block0

                                                                                                          seek



                           Block1                                 Block2                                           Block3
                                                ....                                         ....

                                                                                                                                     seek



            Block4                                                                                                                   Block5
                                    .....................................................................................




File on disk:        ...       Block0                  ...             ...           Block3                 ...             Block5     ...



                           Insert one item causes many random seeks!
B-tree Search
                                             Search x

                                                Block0

                                                                                         seek



          Block1                                 Block2                                           Block3
                               ....                                         ....

                                                                                                           seek



 Block4                                                                                                    Block5
                   .....................................................................................




                     Query is fast, I/Os costs O(logBN)
B-tree Conclusions
¡ñ   Search: O(logBN ) block transfers.
¡ñ   Insert: O(logBN ) block transfers(slow).
¡ñ   B-tree range queries are slow.
¡ñ   IMPORTANT:
     --Parent and child blocks sparse in disk.
A Simpli?ed Fractal-tree
Cache Oblivious Lookahead Array, invented by MITers
COLA


                                        log2N




           ...........


Binary Search in one level:O(log2N) 2
COLA (Using Fractional Cascading)


                                                      log2N




         ...........


¡ñ   Search: O(log2N) block transfers.
¡ñ   Insert: O((1/B)log2N) amortized block transfers.
¡ñ   Data is stored in log2N arrays of sizes 2, 4, 8, 16,..
¡ñ   Balanced Binary Search Tree
COLA Conclusions

¡ñ Search: O(log2N) block transfers(Using Fractional
  Cascading).
¡ñ Insert: O((1/B)log2N) amortized block transfers.
¡ñ Data is stored in log2N arrays of sizes 2, 4, 8, 16,..
¡ñ Balanced Binary Search Tree
¡ñ Lookahead(Prefetch), Data-Intensive!
¡ñ BUT, the bottom level will be big and bigger,
  merging expensive.
COLA vs B-tree
¡ñ Search:
  -- (log2N)/(logBN)
     = log2B times slower than B-tree(In theory)
¡ñ Insert:
  --(logBN)/((1/B)log2N)
     = B/(log2B) times faster than B-trees(In theory)
if B = 4KB:
      COLA search is 12 times slower than B-tree
      COLA insert is 341 times faster than B-tree
LSM-tree
LSM-tree
                                                       In memory
                                 buffer



               buffer             ...                    buffer



      buffer     ...    buffer          ...   buffer        ...    buffer




¡ñ   Lazy insertion, Sorted before
¡ñ   Leveli is the buffer of Leveli+1
¡ñ   Search: O(logBN) * O(logN)
¡ñ   Insert:O((logBN)/B)
LSM-tree (Using Fractional Cascading)
                                                     In memory
                               buffer



             buffer             ...                    buffer



    buffer     ...    buffer          ...   buffer        ...    buffer




¡ñ Search: O(logBN) (Using FC)
¡ñ Insert:O((logBN)/B)
¡ñ 0.618 Fractal-tree?But NOT Cache Oblivious...
LSM-tree (Merging)
                                                           In memory
                                 buffer



             buffer               ...                        buffer
   merge              merge                   merge

    buffer     ...      buffer          ...       buffer        ...        buffer




A lot of I/O wasted during merging!
Like a headless fly flying...                                          Zzz...
Fractal-tree Indexes
Just Fractal. Patented by Tokutek...
Fractal-tree Indexes




Search: O(logBN) Insert: O((logBN)/B) (amortized)
Search is same as B-tree, but insert faster than B-tree
Fractal-tree Indexes (Block size)



                    ....


            ....     ....    ....




               B is 4MB...
Fractal-tree Indexes (Block size)


                    full


                     ....


            ....      ....   ....




               B is 4MB...
Fractal-tree Indexes (Block size)



            full     ....


            ....      ....   ....




                   B is 4MB...
Fractal-tree Indexes (Block size)

                                    ..

                            ..      ..         ..



                full




           ..                    ... ... ...             ..

      ..   ..          ..                           ..   ..   ..




           Fractal! 4MB one seek...
¦Å
B -tree
Just a constant factor on Block fanout...
¦Å
B -tree
             B-tree
      Fast                                ¦Å=1/2



 Search




      Slow
                                                  AOF
              Slow
                                                   Fast
                      Inserts


                          Optimal Curve
¦Å
B -tree

                          insert            search

        B-tree           O(logBN)          O(logBN)
        (?=1)

        ?=1/2         O((logBN)/¡ÌB)        O(logBN)

         ?=0            O((logN)/B)         O(logN)


 if we want optimal point queries + very fast inserts, we
 should choose ?=1/2
¦Å
B -tree




     So, if block size is B, the fanout should be ¡ÌB
Cache Oblivious Data
Structure
All the above is JUST Cache Oblivious Data Structures...
Cache Oblivious Data Structure
Question:
   Reading a sequence of k consecutive blocks
at once is not much more expensive than
reading a single block. How to take advantage
of this feature?
Cache Oblivious Data Structure
My Questions(In Chinese):
Q1£º
  Ö»ÓÐ1MBÄڴ棬ÔõÑù°ÑÁ½¸ö64MBÓÐÐòÎļþºÏ
²¢³ÉÒ»¸öÓÐÐòÎļþ£¿

Q2£º
  ´ó¶àÊý»úе´ÅÅÌ£¬Á¬Ðø¶ÁÈ¡¶à¸öBlockºÍ¶ÁÈ¡
µ¥¸öBlock»¨·ÑÏà²î²»´ó£¬ÔÚQ1ÖÐÈçºÎÀûÓÃÕâ¸ö
ÓÅÊÆ£¿
nessDB
You should agree that VFS do better than yourself cache!
https://github.com/shuttler/nessDB
nessDB


             ..         ... ... ...        ..

     ..      ..   ..                  ..   ..   ..




          Each Block is Small-Splittable Tree
nessDB, What's going on?

                             ..

                     ..      ..         ..




         ..               ... ... ...             ..

    ..   ..     ..                           ..   ..   ..




              From the line to the plane..
Thanks!
Most of the references are from:
Tokutek & MIT CSAIL & Stony Brook.

Drafted By BohuTANG using Google Drive, @2012/12/12

More Related Content

Recently uploaded (20)

William Maclyn Murphy McRae - A Seasoned Professional Renowned
William Maclyn Murphy McRae - A Seasoned Professional RenownedWilliam Maclyn Murphy McRae - A Seasoned Professional Renowned
William Maclyn Murphy McRae - A Seasoned Professional Renowned
William Maclyn Murphy McRae
?
DevNexus - Building 10x Development Organizations.pdf
DevNexus - Building 10x Development Organizations.pdfDevNexus - Building 10x Development Organizations.pdf
DevNexus - Building 10x Development Organizations.pdf
Justin Reock
?
Combining Lexical and Semantic Search with Milvus 2.5
Combining Lexical and Semantic Search with Milvus 2.5Combining Lexical and Semantic Search with Milvus 2.5
Combining Lexical and Semantic Search with Milvus 2.5
Zilliz
?
Teaching Prompting and Prompt Sharing to End Users.pptx
Teaching Prompting and Prompt Sharing to End Users.pptxTeaching Prompting and Prompt Sharing to End Users.pptx
Teaching Prompting and Prompt Sharing to End Users.pptx
Michael Blumenthal (Microsoft MVP)
?
Leadership u automatizaciji: RPA pri?e iz prakse!
Leadership u automatizaciji: RPA pri?e iz prakse!Leadership u automatizaciji: RPA pri?e iz prakse!
Leadership u automatizaciji: RPA pri?e iz prakse!
UiPathCommunity
?
Webinar: LF Energy GEISA: Addressing edge interoperability at the meter
Webinar: LF Energy GEISA: Addressing edge interoperability at the meterWebinar: LF Energy GEISA: Addressing edge interoperability at the meter
Webinar: LF Energy GEISA: Addressing edge interoperability at the meter
DanBrown980551
?
Caching for Performance Masterclass: Caching at Scale
Caching for Performance Masterclass: Caching at ScaleCaching for Performance Masterclass: Caching at Scale
Caching for Performance Masterclass: Caching at Scale
ScyllaDB
?
AI Trends and Fun Demos ¨C Sotheby¡¯s Rehoboth Presentation
AI Trends and Fun Demos ¨C Sotheby¡¯s Rehoboth PresentationAI Trends and Fun Demos ¨C Sotheby¡¯s Rehoboth Presentation
AI Trends and Fun Demos ¨C Sotheby¡¯s Rehoboth Presentation
Ethan Holland
?
Not a Kubernetes fan? The state of PaaS in 2025
Not a Kubernetes fan? The state of PaaS in 2025Not a Kubernetes fan? The state of PaaS in 2025
Not a Kubernetes fan? The state of PaaS in 2025
Anthony Dahanne
?
Mastering ChatGPT & LLMs for Practical Applications: Tips, Tricks, and Use Cases
Mastering ChatGPT & LLMs for Practical Applications: Tips, Tricks, and Use CasesMastering ChatGPT & LLMs for Practical Applications: Tips, Tricks, and Use Cases
Mastering ChatGPT & LLMs for Practical Applications: Tips, Tricks, and Use Cases
Sanjay Willie
?
EaseUS Partition Master Crack 2025 + Serial Key
EaseUS Partition Master Crack 2025 + Serial KeyEaseUS Partition Master Crack 2025 + Serial Key
EaseUS Partition Master Crack 2025 + Serial Key
kherorpacca127
?
UiPath Automation Developer Associate Training Series 2025 - Session 1
UiPath Automation Developer Associate Training Series 2025 - Session 1UiPath Automation Developer Associate Training Series 2025 - Session 1
UiPath Automation Developer Associate Training Series 2025 - Session 1
DianaGray10
?
AI in Medical Diagnostics ¨C The Future of Healthcare
AI in Medical Diagnostics ¨C The Future of HealthcareAI in Medical Diagnostics ¨C The Future of Healthcare
AI in Medical Diagnostics ¨C The Future of Healthcare
Vadim Nareyko
?
ISOIEC 42001 AI Management System ºÝºÝߣs
ISOIEC 42001 AI Management System ºÝºÝߣsISOIEC 42001 AI Management System ºÝºÝߣs
ISOIEC 42001 AI Management System ºÝºÝߣs
GilangRamadhan884333
?
Build with AI on Google Cloud Session #3
Build with AI on Google Cloud Session #3Build with AI on Google Cloud Session #3
Build with AI on Google Cloud Session #3
Margaret Maynard-Reid
?
Predictive vs. Preventive Maintenance ¡ª Which One is Right for Your Factory
Predictive vs. Preventive Maintenance ¡ª Which One is Right for Your FactoryPredictive vs. Preventive Maintenance ¡ª Which One is Right for Your Factory
Predictive vs. Preventive Maintenance ¡ª Which One is Right for Your Factory
Diagsense ltd
?
UiPath Document Understanding - Generative AI and Active learning capabilities
UiPath Document Understanding - Generative AI and Active learning capabilitiesUiPath Document Understanding - Generative AI and Active learning capabilities
UiPath Document Understanding - Generative AI and Active learning capabilities
DianaGray10
?
MIND Revenue Release Quarter 4 2024 - Finacial Presentation
MIND Revenue Release Quarter 4 2024 - Finacial PresentationMIND Revenue Release Quarter 4 2024 - Finacial Presentation
MIND Revenue Release Quarter 4 2024 - Finacial Presentation
MIND CTI
?
Deno ...................................
Deno ...................................Deno ...................................
Deno ...................................
Robert MacLean
?
Computational Photography: How Technology is Changing Way We Capture the World
Computational Photography: How Technology is Changing Way We Capture the WorldComputational Photography: How Technology is Changing Way We Capture the World
Computational Photography: How Technology is Changing Way We Capture the World
HusseinMalikMammadli
?
William Maclyn Murphy McRae - A Seasoned Professional Renowned
William Maclyn Murphy McRae - A Seasoned Professional RenownedWilliam Maclyn Murphy McRae - A Seasoned Professional Renowned
William Maclyn Murphy McRae - A Seasoned Professional Renowned
William Maclyn Murphy McRae
?
DevNexus - Building 10x Development Organizations.pdf
DevNexus - Building 10x Development Organizations.pdfDevNexus - Building 10x Development Organizations.pdf
DevNexus - Building 10x Development Organizations.pdf
Justin Reock
?
Combining Lexical and Semantic Search with Milvus 2.5
Combining Lexical and Semantic Search with Milvus 2.5Combining Lexical and Semantic Search with Milvus 2.5
Combining Lexical and Semantic Search with Milvus 2.5
Zilliz
?
Leadership u automatizaciji: RPA pri?e iz prakse!
Leadership u automatizaciji: RPA pri?e iz prakse!Leadership u automatizaciji: RPA pri?e iz prakse!
Leadership u automatizaciji: RPA pri?e iz prakse!
UiPathCommunity
?
Webinar: LF Energy GEISA: Addressing edge interoperability at the meter
Webinar: LF Energy GEISA: Addressing edge interoperability at the meterWebinar: LF Energy GEISA: Addressing edge interoperability at the meter
Webinar: LF Energy GEISA: Addressing edge interoperability at the meter
DanBrown980551
?
Caching for Performance Masterclass: Caching at Scale
Caching for Performance Masterclass: Caching at ScaleCaching for Performance Masterclass: Caching at Scale
Caching for Performance Masterclass: Caching at Scale
ScyllaDB
?
AI Trends and Fun Demos ¨C Sotheby¡¯s Rehoboth Presentation
AI Trends and Fun Demos ¨C Sotheby¡¯s Rehoboth PresentationAI Trends and Fun Demos ¨C Sotheby¡¯s Rehoboth Presentation
AI Trends and Fun Demos ¨C Sotheby¡¯s Rehoboth Presentation
Ethan Holland
?
Not a Kubernetes fan? The state of PaaS in 2025
Not a Kubernetes fan? The state of PaaS in 2025Not a Kubernetes fan? The state of PaaS in 2025
Not a Kubernetes fan? The state of PaaS in 2025
Anthony Dahanne
?
Mastering ChatGPT & LLMs for Practical Applications: Tips, Tricks, and Use Cases
Mastering ChatGPT & LLMs for Practical Applications: Tips, Tricks, and Use CasesMastering ChatGPT & LLMs for Practical Applications: Tips, Tricks, and Use Cases
Mastering ChatGPT & LLMs for Practical Applications: Tips, Tricks, and Use Cases
Sanjay Willie
?
EaseUS Partition Master Crack 2025 + Serial Key
EaseUS Partition Master Crack 2025 + Serial KeyEaseUS Partition Master Crack 2025 + Serial Key
EaseUS Partition Master Crack 2025 + Serial Key
kherorpacca127
?
UiPath Automation Developer Associate Training Series 2025 - Session 1
UiPath Automation Developer Associate Training Series 2025 - Session 1UiPath Automation Developer Associate Training Series 2025 - Session 1
UiPath Automation Developer Associate Training Series 2025 - Session 1
DianaGray10
?
AI in Medical Diagnostics ¨C The Future of Healthcare
AI in Medical Diagnostics ¨C The Future of HealthcareAI in Medical Diagnostics ¨C The Future of Healthcare
AI in Medical Diagnostics ¨C The Future of Healthcare
Vadim Nareyko
?
ISOIEC 42001 AI Management System ºÝºÝߣs
ISOIEC 42001 AI Management System ºÝºÝߣsISOIEC 42001 AI Management System ºÝºÝߣs
ISOIEC 42001 AI Management System ºÝºÝߣs
GilangRamadhan884333
?
Build with AI on Google Cloud Session #3
Build with AI on Google Cloud Session #3Build with AI on Google Cloud Session #3
Build with AI on Google Cloud Session #3
Margaret Maynard-Reid
?
Predictive vs. Preventive Maintenance ¡ª Which One is Right for Your Factory
Predictive vs. Preventive Maintenance ¡ª Which One is Right for Your FactoryPredictive vs. Preventive Maintenance ¡ª Which One is Right for Your Factory
Predictive vs. Preventive Maintenance ¡ª Which One is Right for Your Factory
Diagsense ltd
?
UiPath Document Understanding - Generative AI and Active learning capabilities
UiPath Document Understanding - Generative AI and Active learning capabilitiesUiPath Document Understanding - Generative AI and Active learning capabilities
UiPath Document Understanding - Generative AI and Active learning capabilities
DianaGray10
?
MIND Revenue Release Quarter 4 2024 - Finacial Presentation
MIND Revenue Release Quarter 4 2024 - Finacial PresentationMIND Revenue Release Quarter 4 2024 - Finacial Presentation
MIND Revenue Release Quarter 4 2024 - Finacial Presentation
MIND CTI
?
Deno ...................................
Deno ...................................Deno ...................................
Deno ...................................
Robert MacLean
?
Computational Photography: How Technology is Changing Way We Capture the World
Computational Photography: How Technology is Changing Way We Capture the WorldComputational Photography: How Technology is Changing Way We Capture the World
Computational Photography: How Technology is Changing Way We Capture the World
HusseinMalikMammadli
?

Featured (20)

Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
Pixeldarts
?
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
?
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
marketingartwork
?
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
Skeleton Technologies
?
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
?
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
?
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
?
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
?
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
?
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
?
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
?
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
?
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
?
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
?
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Christy Abraham Joy
?
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky
?
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius
?
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson36
?
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Applitools
?
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
GetSmarter
?
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
Pixeldarts
?
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
?
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
marketingartwork
?
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
?
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
?
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
?
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
?
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
?
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
?
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
?
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
?
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky
?
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius
?
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson36
?
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Applitools
?
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
GetSmarter
?

Indexing delight --thinking cap of fractal-tree indexes

  • 1. Indexing Delight Thinking Cap of Fractal-tree Indexes BohuTANG@2012/12 overred.shuttler@gmail.com
  • 3. B-tree Block0 Block1 Block2 Block3 .... .... Block4 Block5 ..................................................................................... File on disk: ... Block0 ... ... Block3 ... Block5 ...
  • 4. B-tree Insert Insert x Block0 seek Block1 Block2 Block3 .... .... Block4 Block5 ..................................................................................... File on disk: ... Block0 ... ... Block3 ... Block5 ...
  • 5. B-tree Insert Insert x Block0 seek Block1 Block2 Block3 .... .... seek Block4 Block5 ..................................................................................... File on disk: ... Block0 ... ... Block3 ... Block5 ...
  • 6. B-tree Insert Insert x Block0 seek Block1 Block2 Block3 .... .... seek Block4 Block5 ..................................................................................... File on disk: ... Block0 ... ... Block3 ... Block5 ... Insert one item causes many random seeks!
  • 7. B-tree Search Search x Block0 seek Block1 Block2 Block3 .... .... seek Block4 Block5 ..................................................................................... Query is fast, I/Os costs O(logBN)
  • 8. B-tree Conclusions ¡ñ Search: O(logBN ) block transfers. ¡ñ Insert: O(logBN ) block transfers(slow). ¡ñ B-tree range queries are slow. ¡ñ IMPORTANT: --Parent and child blocks sparse in disk.
  • 9. A Simpli?ed Fractal-tree Cache Oblivious Lookahead Array, invented by MITers
  • 10. COLA log2N ........... Binary Search in one level:O(log2N) 2
  • 11. COLA (Using Fractional Cascading) log2N ........... ¡ñ Search: O(log2N) block transfers. ¡ñ Insert: O((1/B)log2N) amortized block transfers. ¡ñ Data is stored in log2N arrays of sizes 2, 4, 8, 16,.. ¡ñ Balanced Binary Search Tree
  • 12. COLA Conclusions ¡ñ Search: O(log2N) block transfers(Using Fractional Cascading). ¡ñ Insert: O((1/B)log2N) amortized block transfers. ¡ñ Data is stored in log2N arrays of sizes 2, 4, 8, 16,.. ¡ñ Balanced Binary Search Tree ¡ñ Lookahead(Prefetch), Data-Intensive! ¡ñ BUT, the bottom level will be big and bigger, merging expensive.
  • 13. COLA vs B-tree ¡ñ Search: -- (log2N)/(logBN) = log2B times slower than B-tree(In theory) ¡ñ Insert: --(logBN)/((1/B)log2N) = B/(log2B) times faster than B-trees(In theory) if B = 4KB: COLA search is 12 times slower than B-tree COLA insert is 341 times faster than B-tree
  • 15. LSM-tree In memory buffer buffer ... buffer buffer ... buffer ... buffer ... buffer ¡ñ Lazy insertion, Sorted before ¡ñ Leveli is the buffer of Leveli+1 ¡ñ Search: O(logBN) * O(logN) ¡ñ Insert:O((logBN)/B)
  • 16. LSM-tree (Using Fractional Cascading) In memory buffer buffer ... buffer buffer ... buffer ... buffer ... buffer ¡ñ Search: O(logBN) (Using FC) ¡ñ Insert:O((logBN)/B) ¡ñ 0.618 Fractal-tree?But NOT Cache Oblivious...
  • 17. LSM-tree (Merging) In memory buffer buffer ... buffer merge merge merge buffer ... buffer ... buffer ... buffer A lot of I/O wasted during merging! Like a headless fly flying... Zzz...
  • 18. Fractal-tree Indexes Just Fractal. Patented by Tokutek...
  • 19. Fractal-tree Indexes Search: O(logBN) Insert: O((logBN)/B) (amortized) Search is same as B-tree, but insert faster than B-tree
  • 20. Fractal-tree Indexes (Block size) .... .... .... .... B is 4MB...
  • 21. Fractal-tree Indexes (Block size) full .... .... .... .... B is 4MB...
  • 22. Fractal-tree Indexes (Block size) full .... .... .... .... B is 4MB...
  • 23. Fractal-tree Indexes (Block size) .. .. .. .. full .. ... ... ... .. .. .. .. .. .. .. Fractal! 4MB one seek...
  • 24. ¦Å B -tree Just a constant factor on Block fanout...
  • 25. ¦Å B -tree B-tree Fast ¦Å=1/2 Search Slow AOF Slow Fast Inserts Optimal Curve
  • 26. ¦Å B -tree insert search B-tree O(logBN) O(logBN) (?=1) ?=1/2 O((logBN)/¡ÌB) O(logBN) ?=0 O((logN)/B) O(logN) if we want optimal point queries + very fast inserts, we should choose ?=1/2
  • 27. ¦Å B -tree So, if block size is B, the fanout should be ¡ÌB
  • 28. Cache Oblivious Data Structure All the above is JUST Cache Oblivious Data Structures...
  • 29. Cache Oblivious Data Structure Question: Reading a sequence of k consecutive blocks at once is not much more expensive than reading a single block. How to take advantage of this feature?
  • 30. Cache Oblivious Data Structure My Questions(In Chinese): Q1£º Ö»ÓÐ1MBÄڴ棬ÔõÑù°ÑÁ½¸ö64MBÓÐÐòÎļþºÏ ²¢³ÉÒ»¸öÓÐÐòÎļþ£¿ Q2£º ´ó¶àÊý»úе´ÅÅÌ£¬Á¬Ðø¶ÁÈ¡¶à¸öBlockºÍ¶ÁÈ¡ µ¥¸öBlock»¨·ÑÏà²î²»´ó£¬ÔÚQ1ÖÐÈçºÎÀûÓÃÕâ¸ö ÓÅÊÆ£¿
  • 31. nessDB You should agree that VFS do better than yourself cache! https://github.com/shuttler/nessDB
  • 32. nessDB .. ... ... ... .. .. .. .. .. .. .. Each Block is Small-Splittable Tree
  • 33. nessDB, What's going on? .. .. .. .. .. ... ... ... .. .. .. .. .. .. .. From the line to the plane..
  • 34. Thanks! Most of the references are from: Tokutek & MIT CSAIL & Stony Brook. Drafted By BohuTANG using Google Drive, @2012/12/12