際際滷

際際滷Share a Scribd company logo
Apache Cassandra
Ko-Chih Wu
Agenda
 Cassandra basics
 Consistency
 Data modeling
 Real-world use cases
 Demo
What is Cassandra
What is Cassandra
 A distributed database for managing large
amounts of structured data across many
commodity servers, while providing highly
available service and no single point of failure
 Used by Apple, Comcast, Instagram, Spotify,
eBay, Rackspace, Netflix
What is Cassandra
Features
 Masterless architecture
 Fault tolerant
 Replication across nodes or data centers
 Linear scalability
 Tunable consistency
Tunable Consistency
 Replication factor
 Total number of replicas across the cluster
 Consistency level
 Number of replicas to respond before returning to the client
 Can be different for read and write
 ONE, QUORUM (RF/2 + 1), ALL
Scenario 1
RF = 3
Read = ONE
Write = ONE
INSERT
Scenario 1
RF = 3
Read = ONE
Write = ONE
Read
Scenario 2
RF = 3
Read = QUORUM
Write = QUORUM
INSERT
Scenario 2
RF = 3
Read = QUORUM
Write = QUORUM
Read
Strong consistency or Eventual consistency?
 If consistency is desired
(nodes_written + nodes_read) > replication_factor
 Examples
 RF=3, Read=QUORUM, Write=QUORUM
 RF=3, Read=1, Write=ALL -> Optimized for read
 RF=3, Read=ALL, Write=1 -> Optimized for write
Scenario 3
RF = 3
Read = QUORUM
Write = QUORUM
INSERT
Scenario 3
RF = 3
Read = QUORUM
Write = QUORUM
Read
Data modeling
Compare to RDBMS
 No JOINs -> Prefer denormalization
 Model your data around the queries
 Limited transaction support
Key concepts
 Keyspace
 Similar to a schema in RDBMS
 Table
 Primary key = Partition key + Clustering column
 Partition key
 Defines the node on which the data is stored
 Clustering column
 Defines the order of data stored in a row
Phonebook
 A person has a name and a phone number
 Look up by name
 Look up by phone number
Create table
CREATE TABLE person (
person_id uuid,
name text,
phone text,
PRIMARY KEY (person_id)
);
Create table
CREATE TABLE person_by_phone (
name text,
phone text,
PRIMARY KEY (phone, name)
);
Create table
CREATE TABLE person_by_name (
name text,
phone text,
PRIMARY KEY (name, phone)
);
Insert data
INSERT INTO person (person_id, name, phone) VALUES (uuid(), 'Alice', '1000');
INSERT INTO person_by_name (name, phone) VALUES ('Alice', '1000');
INSERT INTO person_by_phone (name, phone) VALUES ('Alice', '1000');
INSERT INTO person (person_id, name, phone) VALUES (uuid(), 'Alice', '2000');
INSERT INTO person_by_name (name, phone) VALUES ('Alice', '2000');
INSERT INTO person_by_phone (name, phone) VALUES ('Alice', '2000');
Query by name
select * from person_by_name where name='Alice';
name | phone
-------+-------
Alice | 1000
Alice | 2000
Query by phone
select * from person_by_phone where phone='1000';
phone | name
-------+-------
1000 | Alice
Demo
Real-world use cases
 Netflix recommendations
 Real-time data pipeline with Spark + Cassandra + Kafka
 https://www.youtube.com/watch?v=SxU0CJJ2nVE
 http://www.slideshare.net/DataStax/netflix-recommendations-using-spark-cassandra
 Multiple datacenters deployment at Uber
 With Apache Mesos
 https://www.youtube.com/watch?v=4Ap-1VT2ChU
 http://www.slideshare.net/DataStax/cassandra-on-mesos-across-multiple-datacenters-at-
uber-abhishek-verma-c-summit-2016
Resource
http://docs.datastax.com/en/landing_page/doc/landing_page/current.html
http://www.planetcassandra.org/try-cassandra/
http://www.ecyrd.com/cassandracalculator/
Questions?

More Related Content

Apache Cassandra