ݺߣ

ݺߣShare a Scribd company logo
Clusterpoint Inside-Out
Jur is ܱģ
Development stages

Planning – Idea

Infant – Minimum Viable Product

Child – Trial and error

Teenager – Pivot & Execute

Grown ups – ... soon (:
Inspiration (2001)

FTS for Sybase & FoxPro

First distributed design & implementation

– trying to bite Google (:
 Folk song search portal www.dainuskapis.lv
Long long time ago
Talk is cheap.
Show me the code.
(c) Linus Torvalds
Inverted Index

Problem – real time updates to index
Pierpaolo Basile, Information Access with Lucene, ݺߣshare.net
Talk is cheap.
Show me the code.
(c) Linus Torvalds
Inverted Index
Pierpaolo Basile, Information Access with Lucene, ݺߣshare.net
Talk is cheap.
Show me the code.
(c) Linus Torvalds
Inverted Index
Pierpaolo Basile, Information Access with Lucene, ݺߣshare.net
Infant (2006)

Clusterpoint (2006) – first startup in LV

Seeded by Imprimatur Capital

Team of 2.5 developers and 0.5 CEO

6 months wicked C/C++ coding

biting Google again – search appliance vertical
- “didn't go well”
Talk is cheap.
Show me the code.
(c) Linus Torvalds
Inverted index

Two type FTS indices:
− Memory (mutable)
− Disk based (immutable)

Dump memory index when full

Merge dumpings

Problem solved – real time updates!
Talk is cheap.
Show me the code.
(c) Linus Torvalds
Query language

Simple query
js developer Dublin

Advanced query
js developer
<sex>=”female”</sex>
<salary>2000 .. 5000</salary>
<place>=”Dublin”</place>

Aggregation (SQL like)
SELECT sex, count(sex) GROUP BY sex
Talk is cheap.
Show me the code.
(c) Linus Torvalds
Lookup tables (column-stores)

Associative array/hash map

Constant access/modify time

Memory mapped

Append only

Perfect when accesing data by column
i.e. aggregation, faceting, filtering
Child (2008)

Trust in enterprise sales model

First commercial customers
(directories, portals, e-shops, public sector)

Positioning as database challenging

NoSQL – heard nothing about it

... mhm maybe we are NoSQL ?!
The San Francisco NOSQL Meetup on June 11, 2009 was important to the trend's development.
(Wikipedia)
“F”
Market Trends
Teenager (these days)

Less trust in enterprise model

Shift to free software & Cloud

Grow customer base

Innovate

Develop for developers
Talk is cheap.
Show me the code.
(c) Linus Torvalds
Transactions – for what?

ATM cash withdrawal

Checkout

Transfer of goods (monies, credits, lifes :)

Booking
Talk is cheap.
Show me the code.
(c) Linus Torvalds
Transactions – example

Begin

Retrieve value for A1

Retrieve value for A2

Check

Update value for A1

Update value for A2

Commit
Talk is cheap.
Show me the code.
(c) Linus Torvalds
Transactions – behind the scenes

Begin – fix the “view of the world”

Retrieve A1 (version v1)

Retrieve A2 (version v2)

Check

Update A1: if v1' != v1 then rollback else
continue

Update A2: if v2' != v2 then rollback else
continue
Talk is cheap.
Show me the code.
(c) Linus Torvalds
Transactions – “view of the world”
D1: TID1, TID6
D2: TID2
D3: TID3, TID8
D4: TID4
D5: TID5
Shard1
D6: TID1, TID6
D7: TID2, TID8
D8: TID3
D9: TID4
D10: TID5
Shard2
TID1: D1,D6
TID2: D2, D7
TID3: D3, D8
TID4: D4, D9
TID5: D5, D10
TID6: D1, D6
TID7: D9, D8
TID8: D3, D7
Transaction Log
1.
2. Retrieve
3.
Talk is cheap.
Show me the code.
(c) Linus Torvalds
Transactions – behind the scenes

Begin – fix the “view of the world”

Retrieve A1 (version v1)

Retrieve A2 (version v2)

Check

Update A1: if v1' != v1 then rollback else
continue

Update A2: if v2' != v2 then rollback else
continue
Talk is cheap.
Show me the code.
(c) Linus Torvalds
Transactions - distributed

Tough because of sharding & replication

Transaction log – no SPOF and it scales via
sharding & replication

Optimistic locking – high concurrency

Isolation – phantom reads
Talk is cheap.
Show me the code.
(c) Linus Torvalds
Transactions API
Benchmarks
(single shard)

Ingestion (structured) – 25'000 ops

Ingestion (text) – 1'800 ops

Query (fts) – 4'700 ops

Transactions (2r + 2w) – 3'500 ops
Cloud

6 months of stacking & racking & wiring

800 CPU Cores/250TB Storage/3TB RAM

Real on-demand resources

Pay per use model
Lots of hardware
How does it work?
Once database is stored in Clusterpoint Cloud it
is broken up in many shards and distributed
among many servers.
Try it

Signup for Cloud
http://cloud.clusterpoint.com

Atendees 3 months free of charge access upt
to 100GB storage

Be part of community

Have a fun!
twitter.com/clusterpoint

More Related Content

"Clusterpoint Inside-Out" by Jurģis Orups at NoSQL focused XXVIII DevClub.lv event

  • 2. Development stages  Planning – Idea  Infant – Minimum Viable Product  Child – Trial and error  Teenager – Pivot & Execute  Grown ups – ... soon (:
  • 3. Inspiration (2001)  FTS for Sybase & FoxPro  First distributed design & implementation  – trying to bite Google (:  Folk song search portal www.dainuskapis.lv
  • 5. Talk is cheap. Show me the code. (c) Linus Torvalds Inverted Index  Problem – real time updates to index Pierpaolo Basile, Information Access with Lucene, ݺߣshare.net
  • 6. Talk is cheap. Show me the code. (c) Linus Torvalds Inverted Index Pierpaolo Basile, Information Access with Lucene, ݺߣshare.net
  • 7. Talk is cheap. Show me the code. (c) Linus Torvalds Inverted Index Pierpaolo Basile, Information Access with Lucene, ݺߣshare.net
  • 8. Infant (2006)  Clusterpoint (2006) – first startup in LV  Seeded by Imprimatur Capital  Team of 2.5 developers and 0.5 CEO  6 months wicked C/C++ coding  biting Google again – search appliance vertical - “didn't go well”
  • 9. Talk is cheap. Show me the code. (c) Linus Torvalds Inverted index  Two type FTS indices: − Memory (mutable) − Disk based (immutable)  Dump memory index when full  Merge dumpings  Problem solved – real time updates!
  • 10. Talk is cheap. Show me the code. (c) Linus Torvalds Query language  Simple query js developer Dublin  Advanced query js developer <sex>=”female”</sex> <salary>2000 .. 5000</salary> <place>=”Dublin”</place>  Aggregation (SQL like) SELECT sex, count(sex) GROUP BY sex
  • 11. Talk is cheap. Show me the code. (c) Linus Torvalds Lookup tables (column-stores)  Associative array/hash map  Constant access/modify time  Memory mapped  Append only  Perfect when accesing data by column i.e. aggregation, faceting, filtering
  • 12. Child (2008)  Trust in enterprise sales model  First commercial customers (directories, portals, e-shops, public sector)  Positioning as database challenging  NoSQL – heard nothing about it  ... mhm maybe we are NoSQL ?! The San Francisco NOSQL Meetup on June 11, 2009 was important to the trend's development. (Wikipedia)
  • 15. Teenager (these days)  Less trust in enterprise model  Shift to free software & Cloud  Grow customer base  Innovate  Develop for developers
  • 16. Talk is cheap. Show me the code. (c) Linus Torvalds Transactions – for what?  ATM cash withdrawal  Checkout  Transfer of goods (monies, credits, lifes :)  Booking
  • 17. Talk is cheap. Show me the code. (c) Linus Torvalds Transactions – example  Begin  Retrieve value for A1  Retrieve value for A2  Check  Update value for A1  Update value for A2  Commit
  • 18. Talk is cheap. Show me the code. (c) Linus Torvalds Transactions – behind the scenes  Begin – fix the “view of the world”  Retrieve A1 (version v1)  Retrieve A2 (version v2)  Check  Update A1: if v1' != v1 then rollback else continue  Update A2: if v2' != v2 then rollback else continue
  • 19. Talk is cheap. Show me the code. (c) Linus Torvalds Transactions – “view of the world” D1: TID1, TID6 D2: TID2 D3: TID3, TID8 D4: TID4 D5: TID5 Shard1 D6: TID1, TID6 D7: TID2, TID8 D8: TID3 D9: TID4 D10: TID5 Shard2 TID1: D1,D6 TID2: D2, D7 TID3: D3, D8 TID4: D4, D9 TID5: D5, D10 TID6: D1, D6 TID7: D9, D8 TID8: D3, D7 Transaction Log 1. 2. Retrieve 3.
  • 20. Talk is cheap. Show me the code. (c) Linus Torvalds Transactions – behind the scenes  Begin – fix the “view of the world”  Retrieve A1 (version v1)  Retrieve A2 (version v2)  Check  Update A1: if v1' != v1 then rollback else continue  Update A2: if v2' != v2 then rollback else continue
  • 21. Talk is cheap. Show me the code. (c) Linus Torvalds Transactions - distributed  Tough because of sharding & replication  Transaction log – no SPOF and it scales via sharding & replication  Optimistic locking – high concurrency  Isolation – phantom reads
  • 22. Talk is cheap. Show me the code. (c) Linus Torvalds Transactions API
  • 23. Benchmarks (single shard)  Ingestion (structured) – 25'000 ops  Ingestion (text) – 1'800 ops  Query (fts) – 4'700 ops  Transactions (2r + 2w) – 3'500 ops
  • 24. Cloud  6 months of stacking & racking & wiring  800 CPU Cores/250TB Storage/3TB RAM  Real on-demand resources  Pay per use model
  • 26. How does it work? Once database is stored in Clusterpoint Cloud it is broken up in many shards and distributed among many servers.
  • 27. Try it  Signup for Cloud http://cloud.clusterpoint.com  Atendees 3 months free of charge access upt to 100GB storage  Be part of community  Have a fun!