This document discusses Espresso, a distributed document store. It provides elasticity, consistency, fault tolerance and other features. Espresso uses a document-based data model with a REST API. Documents are stored as serialized Avro data with fully qualified keys. The architecture uses Apache Helix for cluster management and partitions data across storage nodes. It also includes a router, storage nodes, databus for events, and cross-datacenter replication.
5. Data Model - Database
則р database is a container for tables
則р database schema contains important metadata about a database
則р defines database traffic quotas e.g., read/write QPS, volume of data read/written etc
6. 則р A table is a container of homogeneously typed documents
則р Every table schema defines a key-structure which can have multiple parts
則р The key-structure defines how documents are accessed.
則р Every fully specified key is the primary key for a single document
則р The leading key in the table schema is also called the partitioning key.
Data Model - Table
9. Document based data model
A fully specified key uniquely identifies a single document. A document
schema is an Avro schema. Internally Espresso stores documents as
Avro serialized binary data blobs. The "indexType" attribute implies that
a secondary index has to be built on that field.
from : {
name : "Chris",
email : "chris@linkedin.com"
}
subject : "Go Giants!"
body : "World Series 2012! w00t!"
unread : true
Messages
mailboxID : String
messageID : long
from : {
name : String
email : String
}
subject : String
body : String
unread : boolean
10. REST based API
則р Get a single message from Bobs mailbox
GET /MailboxDB/Messages/bob/1
則р Multi-GET several messages
GET /MailboxDB/Messages/bob/(1,2,3)
則р Query for a page of unread messages
GET /MailboxDB/Messages/bob/?query=+isUnread:true
&start=0&count=15
則р Write a new message
PUT/MailboxDB/Messages/bob
Content-Type: application/json
Content-Length: 137
{from : , subject : , body : }
10
11. Partial Updates and Conditional Operations
11
則р Mark a message as read (partial update)
POST /MailboxDB/Messages/bob/1
Content-Type: application/json
Content-Length: 21
{unread : false}
則р Get a message, only if recently updated
GET /MailboxDB/Messages/bob/1
If-Match: Wed, 31 Oct 2012 02:54:12 GMT
13. Generic Cluster Manager : Apache Helix
則р Automatic assignment of resources and partitions to nodes
則р Node failure detection and recovery
則р Dynamic addition of resources
則р Dynamic addition of nodes to the cluster
則р Pluggable distributed state machine to manage the state of a
resource via state transitions
則р Automatic load balancing and throttling of transitions
則р Optional pluggable rebalancing for user-defined assignment of
resources and partitions
則р More Info:
http://helix.apache.org
14. Espresso state model
則р Every partition must have only 1 master. Every partition can have up
to 'n' configurable slaves.
則р Partitions are distributed evenly across all storage nodes.
則р No replicas of the same partition may be present on the same node.
則р Upon master failover, one of the slaves must be promoted to master.
16. Storage Node
則р Query Processing
則р Storage Engine
則р Secondary Indexes
則р Handling State Transitions
則р Local Transactional Support
則р Replication Commit Log
則р Utility functions
則р Scheduled Backups
17. Databus
Databus is used for several purposes by Espresso:
則р Deliver events to downstream consumers i.e., search indexes,
caches etc..
則р Espresso multi datacenter replication - each locally originated write is
forwarded on to remote data centers. This is discussed in more detail
in the data replicator section.
18. Cross Colo Replication and ETL
則р Data Replicator
forwards commits between geo-replicated Espresso clusters.
a Databus consumer that consumes events for each database
partition within a cluster.
contains a clustered set of stateless instances managed by Helix.
則р Snapshot service