�ݺ�ߣ

Agenda
● Cassandra basics
○ Consistency
○ Data modeling
● Real-world use cases
● Demo

● A distributed database for managing large
amounts of structured data across many
commodity servers, while providing highly
available service and no single point of failure
● Used by Apple, Comcast, Instagram, Spotify,
eBay, Rackspace, Netflix
What is Cassandra

Features
● Masterless architecture
● Fault tolerant
○ Replication across nodes or data centers
● Linear scalability
● Tunable consistency

Tunable Consistency
● Replication factor
○ Total number of replicas across the cluster
● Consistency level
○ Number of replicas to respond before returning to the client
○ Can be different for read and write
○ ONE, QUORUM (RF/2 + 1), ALL

Scenario 1
RF = 3
Read = ONE
Write = ONE
INSERT

Scenario 1
RF = 3
Read = ONE
Write = ONE
Read

Scenario 2
RF = 3
Read = QUORUM
Write = QUORUM
INSERT

Scenario 2
RF = 3
Read = QUORUM
Write = QUORUM
Read

Strong consistency or Eventual consistency?
● If consistency is desired
(nodes_written + nodes_read) > replication_factor
● Examples
○ RF=3, Read=QUORUM, Write=QUORUM
○ RF=3, Read=1, Write=ALL -> Optimized for read
○ RF=3, Read=ALL, Write=1 -> Optimized for write

Scenario 3
RF = 3
Read = QUORUM
Write = QUORUM
INSERT

Scenario 3
RF = 3
Read = QUORUM
Write = QUORUM
Read

Compare to RDBMS
● No JOINs -> Prefer denormalization
● Model your data around the queries
● Limited transaction support

Key concepts
● Keyspace
○ Similar to a schema in RDBMS
● Table
● Primary key = Partition key + Clustering column
● Partition key
○ Defines the node on which the data is stored
● Clustering column
○ Defines the order of data stored in a row

Phonebook
● A person has a name and a phone number
● Look up by name
● Look up by phone number

Create table
CREATE TABLE person (
person_id uuid,
name text,
phone text,
PRIMARY KEY (person_id)
);

Create table
CREATE TABLE person_by_phone (
name text,
phone text,
PRIMARY KEY (phone, name)
);

Create table
CREATE TABLE person_by_name (
name text,
phone text,
PRIMARY KEY (name, phone)
);

Insert data
INSERT INTO person (person_id, name, phone) VALUES (uuid(), 'Alice', '1000');
INSERT INTO person_by_name (name, phone) VALUES ('Alice', '1000');
INSERT INTO person_by_phone (name, phone) VALUES ('Alice', '1000');
INSERT INTO person (person_id, name, phone) VALUES (uuid(), 'Alice', '2000');
INSERT INTO person_by_name (name, phone) VALUES ('Alice', '2000');
INSERT INTO person_by_phone (name, phone) VALUES ('Alice', '2000');

Query by name
select * from person_by_name where name='Alice';
name | phone
-------+-------
Alice | 1000
Alice | 2000

Query by phone
select * from person_by_phone where phone='1000';
phone | name
-------+-------
1000 | Alice

Real-world use cases
● Netflix recommendations
○ Real-time data pipeline with Spark + Cassandra + Kafka
○ https://www.youtube.com/watch?v=SxU0CJJ2nVE
○ http://www.slideshare.net/DataStax/netflix-recommendations-using-spark-cassandra
● Multiple datacenters deployment at Uber
○ With Apache Mesos
○ https://www.youtube.com/watch?v=4Ap-1VT2ChU
○ http://www.slideshare.net/DataStax/cassandra-on-mesos-across-multiple-datacenters-at-
uber-abhishek-verma-c-summit-2016

Resource
http://docs.datastax.com/en/landing_page/doc/landing_page/current.html
http://www.planetcassandra.org/try-cassandra/
http://www.ecyrd.com/cassandracalculator/

�ݺ�ߣ

Apache Cassandra

More Related Content

Apache Cassandra