際際滷

際際滷Share a Scribd company logo
Recruiting SolutionsRecruiting SolutionsRecruiting Solutions
Espresso - distributed document store
By Shahnawaz Saifi & Kiran Chand  Site Reliability Engineering - DDS
DevOps BLR Meet-up, 03/28/2015
Agenda
則р Motivation
則р Why Espresso?
則р Data Model and API
則р Architecture
2
Motivation
則р Schema evolution
則р Provision shards
則р Data center failover
則р Cost
Lets Brew..
則р Elasticity
則р Consistency
則р Distributed
則р Fault Tolerant
則р Secondary Indexing
則р Schema Evolution
則р Change Capture Stream
則р Bulk Ingests
Data Model - Database
則р database is a container for tables
則р database schema contains important metadata about a database
則р defines database traffic quotas e.g., read/write QPS, volume of data read/written etc
則р A table is a container of homogeneously typed documents
則р Every table schema defines a key-structure which can have multiple parts
則р The key-structure defines how documents are accessed.
則р Every fully specified key is the primary key for a single document
則р The leading key in the table schema is also called the partitioning key.
Data Model - Table
Data Model and API
Partitioning
Document based data model
A fully specified key uniquely identifies a single document. A document
schema is an Avro schema. Internally Espresso stores documents as
Avro serialized binary data blobs. The "indexType" attribute implies that
a secondary index has to be built on that field.
from : {
name : "Chris",
email : "chris@linkedin.com"
}
subject : "Go Giants!"
body : "World Series 2012! w00t!"
unread : true
Messages
mailboxID : String
messageID : long
from : {
name : String
email : String
}
subject : String
body : String
unread : boolean
REST based API
則р Get a single message from Bobs mailbox
 GET /MailboxDB/Messages/bob/1
則р Multi-GET several messages
 GET /MailboxDB/Messages/bob/(1,2,3)
則р Query for a page of unread messages
 GET /MailboxDB/Messages/bob/?query=+isUnread:true
&start=0&count=15
則р Write a new message
PUT/MailboxDB/Messages/bob
Content-Type: application/json
Content-Length: 137
{from : , subject : , body : }
10
Partial Updates and Conditional Operations
11
則р Mark a message as read (partial update)
POST /MailboxDB/Messages/bob/1
Content-Type: application/json
Content-Length: 21
{unread : false}
則р Get a message, only if recently updated
GET /MailboxDB/Messages/bob/1
If-Match: Wed, 31 Oct 2012 02:54:12 GMT
Architecture
Generic Cluster Manager : Apache Helix
則р Automatic assignment of resources and partitions to nodes
則р Node failure detection and recovery
則р Dynamic addition of resources
則р Dynamic addition of nodes to the cluster
則р Pluggable distributed state machine to manage the state of a
resource via state transitions
則р Automatic load balancing and throttling of transitions
則р Optional pluggable rebalancing for user-defined assignment of
resources and partitions
則р More Info:
 http://helix.apache.org
Espresso state model
則р Every partition must have only 1 master. Every partition can have up
to 'n' configurable slaves.
則р Partitions are distributed evenly across all storage nodes.
則р No replicas of the same partition may be present on the same node.
則р Upon master failover, one of the slaves must be promoted to master.
Router
則р REST API
則р Helix client (Spectator)
則р Constructs storage node requests
Storage Node
則р Query Processing
則р Storage Engine
則р Secondary Indexes
則р Handling State Transitions
則р Local Transactional Support
則р Replication Commit Log
則р Utility functions
則р Scheduled Backups
Databus
Databus is used for several purposes by Espresso:
則р Deliver events to downstream consumers i.e., search indexes,
caches etc..
則р Espresso multi datacenter replication - each locally originated write is
forwarded on to remote data centers. This is discussed in more detail
in the data replicator section.
Cross Colo Replication and ETL
則р Data Replicator
 forwards commits between geo-replicated Espresso clusters.
 a Databus consumer that consumes events for each database
partition within a cluster.
 contains a clustered set of stateless instances managed by Helix.
則р Snapshot service
Espresso_DevOps_BLR_Meetup_28_Mar_2015

More Related Content

Espresso_DevOps_BLR_Meetup_28_Mar_2015

  • 1. Recruiting SolutionsRecruiting SolutionsRecruiting Solutions Espresso - distributed document store By Shahnawaz Saifi & Kiran Chand Site Reliability Engineering - DDS DevOps BLR Meet-up, 03/28/2015
  • 2. Agenda 則р Motivation 則р Why Espresso? 則р Data Model and API 則р Architecture 2
  • 3. Motivation 則р Schema evolution 則р Provision shards 則р Data center failover 則р Cost
  • 4. Lets Brew.. 則р Elasticity 則р Consistency 則р Distributed 則р Fault Tolerant 則р Secondary Indexing 則р Schema Evolution 則р Change Capture Stream 則р Bulk Ingests
  • 5. Data Model - Database 則р database is a container for tables 則р database schema contains important metadata about a database 則р defines database traffic quotas e.g., read/write QPS, volume of data read/written etc
  • 6. 則р A table is a container of homogeneously typed documents 則р Every table schema defines a key-structure which can have multiple parts 則р The key-structure defines how documents are accessed. 則р Every fully specified key is the primary key for a single document 則р The leading key in the table schema is also called the partitioning key. Data Model - Table
  • 9. Document based data model A fully specified key uniquely identifies a single document. A document schema is an Avro schema. Internally Espresso stores documents as Avro serialized binary data blobs. The "indexType" attribute implies that a secondary index has to be built on that field. from : { name : "Chris", email : "chris@linkedin.com" } subject : "Go Giants!" body : "World Series 2012! w00t!" unread : true Messages mailboxID : String messageID : long from : { name : String email : String } subject : String body : String unread : boolean
  • 10. REST based API 則р Get a single message from Bobs mailbox GET /MailboxDB/Messages/bob/1 則р Multi-GET several messages GET /MailboxDB/Messages/bob/(1,2,3) 則р Query for a page of unread messages GET /MailboxDB/Messages/bob/?query=+isUnread:true &start=0&count=15 則р Write a new message PUT/MailboxDB/Messages/bob Content-Type: application/json Content-Length: 137 {from : , subject : , body : } 10
  • 11. Partial Updates and Conditional Operations 11 則р Mark a message as read (partial update) POST /MailboxDB/Messages/bob/1 Content-Type: application/json Content-Length: 21 {unread : false} 則р Get a message, only if recently updated GET /MailboxDB/Messages/bob/1 If-Match: Wed, 31 Oct 2012 02:54:12 GMT
  • 13. Generic Cluster Manager : Apache Helix 則р Automatic assignment of resources and partitions to nodes 則р Node failure detection and recovery 則р Dynamic addition of resources 則р Dynamic addition of nodes to the cluster 則р Pluggable distributed state machine to manage the state of a resource via state transitions 則р Automatic load balancing and throttling of transitions 則р Optional pluggable rebalancing for user-defined assignment of resources and partitions 則р More Info: http://helix.apache.org
  • 14. Espresso state model 則р Every partition must have only 1 master. Every partition can have up to 'n' configurable slaves. 則р Partitions are distributed evenly across all storage nodes. 則р No replicas of the same partition may be present on the same node. 則р Upon master failover, one of the slaves must be promoted to master.
  • 15. Router 則р REST API 則р Helix client (Spectator) 則р Constructs storage node requests
  • 16. Storage Node 則р Query Processing 則р Storage Engine 則р Secondary Indexes 則р Handling State Transitions 則р Local Transactional Support 則р Replication Commit Log 則р Utility functions 則р Scheduled Backups
  • 17. Databus Databus is used for several purposes by Espresso: 則р Deliver events to downstream consumers i.e., search indexes, caches etc.. 則р Espresso multi datacenter replication - each locally originated write is forwarded on to remote data centers. This is discussed in more detail in the data replicator section.
  • 18. Cross Colo Replication and ETL 則р Data Replicator forwards commits between geo-replicated Espresso clusters. a Databus consumer that consumes events for each database partition within a cluster. contains a clustered set of stateless instances managed by Helix. 則р Snapshot service