Apache Cassandra operations have the reputation to be simple on single datacenter deployments and / or low volume clusters but they become way more complex on high latency multi-datacenter clusters with high volume and / or high throughout: basic Apache Cassandra operations such as repairs, compactions or hints delivery can have dramatic consequences even on a healthy high latency multi-datacenter cluster.
In this presentation, Julien will go through Apache Cassandra mutli-datacenter concepts first then show multi-datacenter operations essentials in details: bootstrapping new nodes and / or datacenter, repairs strategy, Java GC tuning, OS tuning, Apache Cassandra configuration and monitoring.
Based on his 3 years experience managing a multi-datacenter cluster against Apache Cassandra 2.0, 2.1, 2.2 and 3.0, Julien will give you tips on how to anticipate and prevent / mitigate issues related to basic Apache Cassandra operations with a multi-datacenter cluster.
About the Speaker
Julien Anguenot VP Software Engineering, iland Internet Solutions, Corp
Julien currently serves as iland's Vice President of Software Engineering. Prior to joining iland, Mr. Anguenot held tech leadership positions at several open source content management vendors and tech startups in Europe and in the U.S. Julien is a long time Open Source software advocate, contributor and speaker: Zope, ZODB, Nuxeo contributor, Zope and OpenStack foundations member, his talks includes Apache Con, Cassandra summit, OpenStack summit, The WWW Conference or still EuroPython.
Apache Cassandra operations have the reputation to be simple on single datacenter deployments and / or low volume clusters but they become way more complex on high latency multi-datacenter clusters with high volume and / or high throughout: basic Apache Cassandra operations such as repairs, compactions or hints delivery can have dramatic consequences even on a healthy high latency multi-datacenter cluster.
In this presentation, Julien will go through Apache Cassandra mutli-datacenter concepts first then show multi-datacenter operations essentials in details: bootstrapping new nodes and / or datacenter, repairs strategy, Java GC tuning, OS tuning, Apache Cassandra configuration and monitoring.
Based on his 3 years experience managing a multi-datacenter cluster against Apache Cassandra 2.0, 2.1, 2.2 and 3.0, Julien will give you tips on how to anticipate and prevent / mitigate issues related to basic Apache Cassandra operations with a multi-datacenter cluster.
About the Speaker
Julien Anguenot VP Software Engineering, iland Internet Solutions, Corp
Julien currently serves as iland's Vice President of Software Engineering. Prior to joining iland, Mr. Anguenot held tech leadership positions at several open source content management vendors and tech startups in Europe and in the U.S. Julien is a long time Open Source software advocate, contributor and speaker: Zope, ZODB, Nuxeo contributor, Zope and OpenStack foundations member, his talks includes Apache Con, Cassandra summit, OpenStack summit, The WWW Conference or still EuroPython.
※Cassandra勉強会 in Tokyo, July 25th」発表資料です。
「Apache Cassandra 4.0におけるRepairアーキテクチャ」
2019年中にリリース予定とされる新バージョン4.0ではrepairアーキテクチャが大幅ブラッシュアップされています。Apache Cassandraの運用において大きな影響を与えるrepairアーキテクチャが4.0ではどのように変更されブラッシュアップされているかを4.0の内部実装より解き明かします。
冨田 和孝
株式会社INTHEFOREST 代表取締役社長、日本Cassandraコミュニティーメンバー、データベースエンジニア?システムアーキテクト
ぐるなび、外国為替、ISPなど、DB中心としたシステム構築?運用?保守を担当。高負荷?大容量?大規模なデータ処理?データ解析基盤作りに強み。
The document discusses changes to Apache Cassandra's networking layer in version 4.0. Specifically, it has moved from blocking I/O to non-blocking I/O using NIO. This reduces context switching overhead and improves performance as the number of nodes increases. It also applies these improvements to communication between nodes, whereas previously it was only used for client-to-node communication. System administrators should review these changes and perform any necessary tuning when deploying Cassandra in public clouds.
Compaction in Apache Cassandra is the process of merging SSTables to reclaim disk space used by deleted or overwritten data. It occurs automatically in the background after memtables are flushed to disk or manually via nodetool. There are minor, major, and single-SSTable compactions. The compaction strategy, such as size-tiered, leveled, or date-tiered, determines how SSTables are merged.
Cassandra provides row-level isolation where a transaction is atomic for a single query executed on one node. A transaction in Cassandra updates either all columns in a row or none of them. Complex multi-row or multi-query transactions are not supported. The implementation uses a copy-on-write approach utilizing a concurrent data structure called SnapTree to clone columns being updated, ensuring atomic and isolated updates at the row level for a single query.
4. IoTデータとは
IoTとM2M
Internet of Things
Machine to Machine
マシンツーマシン(Machine-to-Machine)とは、
コンピュータネットワークに繋がれた機械同士
が人間を介在せずに相互に情報交換し、自動
的に最適な制御が行われるシステムを指す。
一意に識別可能な「もの」がインターネット/ク
ラウドに接続され、情報交換することにより相
互に制御する仕組みである
16. Sparkの力
? Spark Streaming
データの逐次時系列処理
Spark Streaming with Cassandra
ソーシャル
ストリーム
Spark
Streaming Cassandra
store
Hello World
Hello
World
Hello
World
※短時間のShortBatchを逐次実行可能。
17. Sparkの力
? Spark SQL
SQLライクなDSL言語
Spark SQL with Cassandra
Spark SQL Cassandra
var rdd = cc.sql("SELECT * from test2.words a join
test2.phrase b on a.word = b.phrase")