�ݺ�ߣ

@WrathOfChris github.com/WrathOfChris .blog.wrathofchris.com
Time Series Metrics
with Cassandra

@WrathOfChris blog.wrathofchris.com github.com/WrathOfChris
About Me
? Chris Maxwell
? @WrathOfChris
? Sr Systems Engineer @
Ubiquiti Networks
? Cloud Guy
? DevOps

Mission
? Metrics service for internal services
? Deliver 90 60 30 days of system and app metrics
? Gain experience with Cassandra

History
Ancient Designs
Aging Tools
Pitfalls
https://flic.kr/p/6pqVnP

Graphite (v1)
? Single instance
? carbon-relay +
(2-4) carbon-cache
processes (=cpu)

Graphite (v1)
Problems:
? Single point of SUCCESS!
? Can grow to 16-32 cores, but
I/O saturation
? Carbon write-amplifies 10x
(flushes every 10s)

Graphite (v2)
? Frontend: carbon-relay
? Backend: carbon-relay +
4x carbon-cache
? m3.2xlarge ephemeral SSD
? Manual consistent-hash by IP
? Replication 3

Graphite (v2)
Problems:
? Kind of like a Dynamo, but not
? Replacing node requires full
partition key shuffle
? Adding 5 nodes took 6 days on
1Gbps to re-replicate ring
? Less than 50% disk free means
pain during reshuffle

Limitations
? Cloud Native
? Avoid Manual Intervention
? Ephemeral SSD > EBS
https://flic.kr/p/2hZy6P

Design
What we set out to build
https://flic.kr/p/2spiXb

Graphite (v3)
��it got complicated��

Graphite (v3)
Ingest:
? carbon-c-relay
https://github.com/grobian/carbon-c-relay
? cyanite
https://github.com/pyr/cyanite
? cassandra

Graphite (v3)
Retrieval:
? graphite-api
https://github.com/brutasse/graphite-api
? grafana
https://github.com/grafana/grafana
? cyanite
https://github.com/pyr/cyanite
? elasticsearch
(metric path cache)

Journey
Lessons learned along the way
https://flic.kr/p/hjY15L

Size Tiered Compaction
? Sorted String Table (SSTable)
is an immutable data file
? New data written to small
SSTables
? Periodically merged into larger
SSTables

? Merge 4 similarly sized
SSTables into 1 new SSTable
? Data migrates into larger
SSTables that are less-
regularly compacted
? Disk space required:
Sum of 4 largest SSTables

? Updating a partition frequently
may cause it to be spread
between SSTables
? Metrics workload writes to
all partitions,
every period

all partitions,
every period
? Range queries that spanned
50+ SSTables !!!

? Getting to the older data��
? Ingest 25% more data
? Major Compaction:
? Requires 50% free space
? Compacts all SSTables into
1 large SSTable

Aside: DELETE
? DELETE is the INSERT of a
TOMBSTONE to the end of a
partition
? INSERTs with TTL become
tombstones in the future
? Tombstones live for at least
gc_grace_seconds
? Data is only deleted during
compaction
https://flic.kr/p/35RACf

gc_grace_seconds
Grace is getting something you don��t deserve
(time to noetool repair a node that is down)

gc_grace_seconds
deleted data reappears!

Time To Live
? INSERT with TTL becomes
tombstone after expiry
? 10s for 6 hours
? 60s for 3 days
? 300s for 30 days
https://flic.kr/p/6Fxv7M

TTL
? gc_grace_seconds is 10 days
(by default)
? 10s for 6 hours 10.25 days
? 60s for 3 days 13 days
? 300s for 30 days 40 days
https://flic.kr/p/gBLHYf

https://flic.kr/p/4LNiXg
https://flic.kr/p/35RACf
1.4TB
Disks

Levelled Compaction
based on Google��s LevelDB implementation

Levelled Compaction
? Data is ingested at Level 0
? Immediately compacted and
merged with L1
? Partitions are merged up to Ln
? 90% of partition data
guaranteed to be in same level

Levelled Compaction
all partitions,
every period
? Immediately rolled up to L1

Levelled Compaction
all partitions,
every period
? 1 batch of writes ��> 5 writes

Increasing Write rate
Constant Ingest rate

Increasing Write rate
Constant Ingest rate
https://flic.kr/p/4LNiXg

compaction_throughput_mb_per_sec: 128
��then 0 (unlimited)

Speeding Compactions
�� Don��t Do This ��
multithreaded: true
cassandra_in_memory_compaction_limit_in_mb: 256M

Date Tiered Compaction

Date Tiered Compaction
? Written by
Bj?rn Hegerfors at Spotify
? Experimental!
? Released in 2.0.11 / 2.1.1
? Group data by time
? Compact by time
? Drop expired data by time

Compact SSTables by date window

�C but the docs say 8GB maximum heap!
MAX_HEAP_SIZE=16G
HEAP_NEWSIZE=2048M

�C Rick Branson, Instagram
http://www.slideshare.net/planetcassandra/cassandra-summit-2014-cassandra-at-instagram-2014
-XX:+CMSScavengeBeforeRemark
-XX:CMSMaxAbortablePrecleanTime=60000
-XX:CMSWaitDuration=30000

All systems normal
Inadvertently tested 30,000 writes/sec during launch

Cloud Native
http://wattsupwiththat.com/2015/03/17/spaceship-lenticular-cloud-maybe-the-coolest-cloud-picture-evah/

Cloud Native
Ec2MultiRegionSnitch

Cloud Native
Ephemeral RAID0
-Djava.io.tmpdir=/mnt/cassandra/tmp

Disable AutoScaling Terminate Process:
aws autoscaling suspend-processes --scaling-processes Terminate

Cloud Native
This design works to 50 instances per region

Security Groups
IAM instance-profile role
Security Group + (per region) Security Group

Management (OpsCenter)
IAM instance-profile role
Security Group + (per region) Security Group

Internode Encryption
server_encryption_options:
internode_encryption: all
? keytool -genkeypair -alias test-cass -keyalg RSA -validity 3650
-keystore test-cass.keystore
? keytool -export -alias test-cass -keystore test-cass.keystore
-rfc -file test-cass.crt
? keytool -import -alias test-cass -file test-cass.crt -keystore
test-cass.truststore

Seeds
Cheated��.

Seeds
? selects first 3 nodes from each
region using Autoscale Group
order
? ignores (self) as a seed for
bootstrapping first 3 nodes in
each region

General
? >= 4 Cores per node always
? >= 8 Cores as soon as feasible
? EC2 sweet spots:
? m3.2xlarge (8c/160GB) for small workloads
? i2.2xlarge (8c/1.6TB) for production
? Avoid c3.2xlarge - CPU:Mem ratio is too high

Breaking News!
Dense-storage Instances for EC2

Questions?

d2 instances
Joining a node - system/network

d2 instances
Joining a node - disk performance

General
Metrics

General
Cassandra Metrics

Metrics
CPU - DateTiered

Metrics
JVM - DateTiered

Metrics
Compaction/CommitLog - DateTiered

�ݺ�ߣ

Cassandra meetup 20150331

More Related Content

Cassandra meetup 20150331