Cassandra is used as the backend database for Scandit's barcode and product scanning platform. It provides high scalability and availability needed to store large volumes of product data and scan data. Cassandra's data model uses a column family structure and allows storing data flexibly in column names. It is optimized for write-heavy workloads and scales easily by adding more nodes.
Enterprise-grade mobile barcode scanning with Scandit and XamarinXamarin
?
Scandit's lightning-fast and accurate Barcode Scanner is a valuable addition to any enterprise application. Watch Zack Gramama, Technical Lead - Xamarin Component Store, and Christian Floerkemeier, CTO and co-founder of Scandit, as they demonstrate how the Scandit component utilizes a unique blurry barcode scan technology that works across platforms to scan any barcode type from any angle.
Roduner democratizing business processes with android-based mobile devicesDroidcon Berlin
?
The document discusses how mobile devices like Android phones can be used to democratize and improve various business processes. It provides examples of how mobile scanning and data collection can transform procurement, self-checkout, ticketing, point-of-sale systems, and parts tracking. The key benefits outlined are making these processes more accessible, decentralized, and efficient while reducing costs compared to dedicated hardware solutions.
Democratizing Business Processes with Android-based Mobile DevicesScandit
?
Presentation was given in April of 2013 at DroidCon in Berlin, Germany. In this presentation, Scandit's COO Christof Roduner goes over Scandit¡¯s products and services, some recent enterprise IT trends, Android technology and challenges, and a variety of usage scenarios where smartphone-based barcode scanning can add value to business processes. Check it out:
This presentation was delivered in the summer of 2013 at mPOS World in Frankfurt, Germany by Scandit CEO Samuel Mueller. The presentation provides an overview of the changing point-of-sale (POS) landscape, and insights into how mobile barcode scanning and data capture technology enables this transition to mobile point of sale (mPOS).
Enterprise-grade mobile barcode scanning with Scandit and XamarinXamarin
?
Scandit's lightning-fast and accurate Barcode Scanner is a valuable addition to any enterprise application. Watch Zack Gramama, Technical Lead - Xamarin Component Store, and Christian Floerkemeier, CTO and co-founder of Scandit, as they demonstrate how the Scandit component utilizes a unique blurry barcode scan technology that works across platforms to scan any barcode type from any angle.
This document provides an introduction and overview of Cassandra and NoSQL databases. It discusses the challenges faced by modern web applications that led to the development of NoSQL databases. It then describes Cassandra's data model, API, consistency model, and architecture including write path, read path, compactions, and more. Key features of Cassandra like tunable consistency levels and high availability are also highlighted.
Apache Cassandra, part 1 ¨C principles, data modelAndrey Lomakin
?
Aim of this presentation to provide enough information for enterprise architect to choose whether Cassandra will be project data store. Presentation describes each nuance of Cassandra architecture and ways to design data and work with them.
Lightning fast genomics with Spark, Adam and ScalaAndy Petrella
?
This document discusses using Apache Spark and ADAM to perform scalable genomic analysis. It provides an overview of genomics and challenges with existing approaches. ADAM uses Apache Spark and Parquet to efficiently store and query large genomic datasets. The document demonstrates clustering genomic data from the 1000 Genomes Project to predict populations, showing ADAM and Spark can handle large genomic workloads. It concludes these tools provide scalable genomic data processing but future work is needed to implement more advanced algorithms.
This document provides an overview of NoSQL databases Cassandra and MongoDB. It begins with an introduction to RDBMS and discusses the need for NoSQL databases in terms of handling big data. Key concepts covered include the CAP theorem, data models of Cassandra and MongoDB, replication, and automatic failover. The document concludes by emphasizing the usefulness of NoSQL for availability and processing unstructured data at scale.
This document discusses Apache Cassandra, a distributed database management system designed to handle large amounts of data across many commodity servers. It summarizes Cassandra's origins from Amazon Dynamo and Google Bigtable, describes its data model and client APIs. The document also provides examples of using Cassandra and discusses considerations around operations and performance.
Apache Avro is a data serialization system that is compact, fast, and provides support for RPC mechanisms and data evolution. It uses JSON schemas and efficient binary encoding. Avro is built for large datasets and supports features like schema validation, dynamic typing, sorting, and protocol definitions for language-independent RPCs.
Apache Cassandra is a scalable distributed hash map that stores data across multiple commodity servers. It provides high availability with no single point of failure and scales horizontally as more servers are added. Cassandra uses an eventually consistent model and tunable consistency levels. Data is organized into keyspaces containing column families with rows and columns.
The document discusses Cassandra's data model and how it replaces HDFS services. It describes:
1) Two column families - "inode" and "sblocks" - that replace the HDFS NameNode and DataNode services respectively, with "inode" storing metadata and "sblocks" storing file blocks.
2) CFS reads involve reading the "inode" info to find the block and subblock, then directly accessing the data from the Cassandra SSTable file on the node where it is stored.
3) Keyspaces are containers for column families in Cassandra, and the NetworkTopologyStrategy places replicas across data centers to enable local reads and survive failures.
Cassandra is a distributed key-value database inspired by Amazon's Dynamo and Google's Bigtable. It uses a gossip-based protocol for node communication and consistent hashing to partition and replicate data across nodes. Cassandra stores data in memory (memtables) and on disk (SSTables), uses commit logs for crash recovery, and is highly available with tunable consistency.
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...buildacloud
?
About Basho: Basho makes and distributes Riak CS. Built on Riak, Basho's opensource, scalable datastore used by thousands in production, CS is made for companies that need large file storage that can't go down.
About the speaker: Andy Gross, Basho's Chief Architect, will take you on a tour of RiakCS, talk about how and why Basho built it, and the architecture that underpins it. He'll also highlight various uses case featuring Fortune500 companies who rely on Riak CS.
- Cassandra is an open source, distributed database management system designed to handle large amounts of data across many commodity servers. It was originally developed at Facebook in 2008 and is now an Apache project.
- Cassandra provides high availability with no single point of failure, linear scalability and performance of tens of thousands of queries per second. It is used by many large companies including Netflix, Twitter and eBay.
- Data is organized into tables within keyspaces. Tables must have a primary key which determines how data is partitioned and indexed. Cassandra uses a decentralized architecture with no single point of failure and automatic data distribution across nodes.
This document provides an introduction to Cassandra, including:
- A brief history of Cassandra and influences from Dynamo and BigTable.
- An overview of Cassandra's key features like clustering, consistent hashing, tunable consistency, and linear scalability.
- Details on Cassandra's data model using column families and handling large datasets across commodity hardware.
- Examples of using the Cassandra Query Language to insert, update, fetch, and delete data.
- A discussion of when Cassandra is well-suited, such as for large datasets, high availability applications, and challenges like limited transactions.
Apache Cassandra is an open-source distributed database designed to handle large amounts of data across commodity servers in a highly available manner without single points of failure. It uses a gossip protocol for cluster membership and a Dynamo-inspired architecture to provide availability and partition tolerance, while supporting eventual consistency.
Design Patterns for Distributed Non-Relational Databasesguestdfd1ec
?
The document discusses design patterns for distributed non-relational databases, including consistent hashing for key placement, eventual consistency models, vector clocks for determining history, log-structured merge trees for storage layout, and gossip protocols for cluster management without a single point of failure. It raises questions to ask presenters about scalability, reliability, performance, consistency models, cluster management, data models, and real-life considerations for using such systems.
Brisk is a distributed data processing platform that uses Cassandra as its storage layer instead of HDFS. It allows for easier management of Hadoop clusters by eliminating single points of failure and enabling cross-datacenter clusters without downtime. Brisk provides drivers to access Cassandra data from Hadoop and Hive, and allows for both fixed column and dynamic column access to schema data.
MongoDB is a document-oriented, non-relational database that provides an alternative to traditional RDBMS systems. It uses a dynamic schema with flexible document structures and embedded documents. MongoDB has built-in replication for high availability and automatic failover. It also has built-in sharding for horizontal scalability across multiple servers. MongoDB uses JSON-like documents with dynamic schemas, indexing, high performance, and scale horizontally and vertically.
Deep dive into Clustered Columnstore structures with information on compression algorithms, compression types, locking and dictionaries, as well as the Batch Processing Mode.
NoSQL - MongoDB. Agility, scalability, performance. I am going to talk about the basis of NoSQL and MongoDB. Why some projects requires RDBMs and another NoSQL databases? What are the pros and cons to use NoSQL vs. SQL? How data are stored and transefed in MongoDB? What query language is used? How MongoDB supports high availability and automatic failover with the help of the replication? What is sharding and how it helps to support scalability?. The newest level of the concurrency - collection-level and document-level.
2014 05-07-fr - add dev series - session 6 - deploying your application-2MongoDB
?
The document discusses MongoDB replication and sharding. Replication uses replica sets for high availability and disaster recovery. Sharding partitions data across multiple servers (shards) to improve scalability. The key points covered include:
- Replication maintains copies of data on multiple servers for redundancy and high availability. It uses replica sets and elections for failover.
- Sharding partitions data by a shard key across multiple mongod instances (shards) to scale reads and writes. It requires config servers to store metadata and mongos instances as query routers.
- Write concerns allow controlling acknowledgments and replication of write operations. Tag-aware sharding allows controlling data distribution across shards.
UiPath Agentic Automation Capabilities and OpportunitiesDianaGray10
?
Learn what UiPath Agentic Automation capabilities are and how you can empower your agents with dynamic decision making. In this session we will cover these topics:
What do we mean by Agents
Components of Agents
Agentic Automation capabilities
What Agentic automation delivers and AI Tools
Identifying Agent opportunities
? If you have any questions or feedback, please refer to the "Women in Automation 2025" dedicated Forum thread. You can find there extra details and updates.
Fl studio crack version 12.9 Free Downloadkherorpacca127
?
https://ncracked.com/7961-2/
Note: >>?? Please copy the link and paste it into Google New Tab now Download link
The ultimate guide to FL Studio 12.9 Crack, the revolutionary digital audio workstation that empowers musicians and producers of all levels. This software has become a cornerstone in the music industry, offering unparalleled creative capabilities, cutting-edge features, and an intuitive workflow.
With FL Studio 12.9 Crack, you gain access to a vast arsenal of instruments, effects, and plugins, seamlessly integrated into a user-friendly interface. Its signature Piano Roll Editor provides an exceptional level of musical expression, while the advanced automation features empower you to create complex and dynamic compositions.
Apache Cassandra, part 1 ¨C principles, data modelAndrey Lomakin
?
Aim of this presentation to provide enough information for enterprise architect to choose whether Cassandra will be project data store. Presentation describes each nuance of Cassandra architecture and ways to design data and work with them.
Lightning fast genomics with Spark, Adam and ScalaAndy Petrella
?
This document discusses using Apache Spark and ADAM to perform scalable genomic analysis. It provides an overview of genomics and challenges with existing approaches. ADAM uses Apache Spark and Parquet to efficiently store and query large genomic datasets. The document demonstrates clustering genomic data from the 1000 Genomes Project to predict populations, showing ADAM and Spark can handle large genomic workloads. It concludes these tools provide scalable genomic data processing but future work is needed to implement more advanced algorithms.
This document provides an overview of NoSQL databases Cassandra and MongoDB. It begins with an introduction to RDBMS and discusses the need for NoSQL databases in terms of handling big data. Key concepts covered include the CAP theorem, data models of Cassandra and MongoDB, replication, and automatic failover. The document concludes by emphasizing the usefulness of NoSQL for availability and processing unstructured data at scale.
This document discusses Apache Cassandra, a distributed database management system designed to handle large amounts of data across many commodity servers. It summarizes Cassandra's origins from Amazon Dynamo and Google Bigtable, describes its data model and client APIs. The document also provides examples of using Cassandra and discusses considerations around operations and performance.
Apache Avro is a data serialization system that is compact, fast, and provides support for RPC mechanisms and data evolution. It uses JSON schemas and efficient binary encoding. Avro is built for large datasets and supports features like schema validation, dynamic typing, sorting, and protocol definitions for language-independent RPCs.
Apache Cassandra is a scalable distributed hash map that stores data across multiple commodity servers. It provides high availability with no single point of failure and scales horizontally as more servers are added. Cassandra uses an eventually consistent model and tunable consistency levels. Data is organized into keyspaces containing column families with rows and columns.
The document discusses Cassandra's data model and how it replaces HDFS services. It describes:
1) Two column families - "inode" and "sblocks" - that replace the HDFS NameNode and DataNode services respectively, with "inode" storing metadata and "sblocks" storing file blocks.
2) CFS reads involve reading the "inode" info to find the block and subblock, then directly accessing the data from the Cassandra SSTable file on the node where it is stored.
3) Keyspaces are containers for column families in Cassandra, and the NetworkTopologyStrategy places replicas across data centers to enable local reads and survive failures.
Cassandra is a distributed key-value database inspired by Amazon's Dynamo and Google's Bigtable. It uses a gossip-based protocol for node communication and consistent hashing to partition and replicate data across nodes. Cassandra stores data in memory (memtables) and on disk (SSTables), uses commit logs for crash recovery, and is highly available with tunable consistency.
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...buildacloud
?
About Basho: Basho makes and distributes Riak CS. Built on Riak, Basho's opensource, scalable datastore used by thousands in production, CS is made for companies that need large file storage that can't go down.
About the speaker: Andy Gross, Basho's Chief Architect, will take you on a tour of RiakCS, talk about how and why Basho built it, and the architecture that underpins it. He'll also highlight various uses case featuring Fortune500 companies who rely on Riak CS.
- Cassandra is an open source, distributed database management system designed to handle large amounts of data across many commodity servers. It was originally developed at Facebook in 2008 and is now an Apache project.
- Cassandra provides high availability with no single point of failure, linear scalability and performance of tens of thousands of queries per second. It is used by many large companies including Netflix, Twitter and eBay.
- Data is organized into tables within keyspaces. Tables must have a primary key which determines how data is partitioned and indexed. Cassandra uses a decentralized architecture with no single point of failure and automatic data distribution across nodes.
This document provides an introduction to Cassandra, including:
- A brief history of Cassandra and influences from Dynamo and BigTable.
- An overview of Cassandra's key features like clustering, consistent hashing, tunable consistency, and linear scalability.
- Details on Cassandra's data model using column families and handling large datasets across commodity hardware.
- Examples of using the Cassandra Query Language to insert, update, fetch, and delete data.
- A discussion of when Cassandra is well-suited, such as for large datasets, high availability applications, and challenges like limited transactions.
Apache Cassandra is an open-source distributed database designed to handle large amounts of data across commodity servers in a highly available manner without single points of failure. It uses a gossip protocol for cluster membership and a Dynamo-inspired architecture to provide availability and partition tolerance, while supporting eventual consistency.
Design Patterns for Distributed Non-Relational Databasesguestdfd1ec
?
The document discusses design patterns for distributed non-relational databases, including consistent hashing for key placement, eventual consistency models, vector clocks for determining history, log-structured merge trees for storage layout, and gossip protocols for cluster management without a single point of failure. It raises questions to ask presenters about scalability, reliability, performance, consistency models, cluster management, data models, and real-life considerations for using such systems.
Brisk is a distributed data processing platform that uses Cassandra as its storage layer instead of HDFS. It allows for easier management of Hadoop clusters by eliminating single points of failure and enabling cross-datacenter clusters without downtime. Brisk provides drivers to access Cassandra data from Hadoop and Hive, and allows for both fixed column and dynamic column access to schema data.
MongoDB is a document-oriented, non-relational database that provides an alternative to traditional RDBMS systems. It uses a dynamic schema with flexible document structures and embedded documents. MongoDB has built-in replication for high availability and automatic failover. It also has built-in sharding for horizontal scalability across multiple servers. MongoDB uses JSON-like documents with dynamic schemas, indexing, high performance, and scale horizontally and vertically.
Deep dive into Clustered Columnstore structures with information on compression algorithms, compression types, locking and dictionaries, as well as the Batch Processing Mode.
NoSQL - MongoDB. Agility, scalability, performance. I am going to talk about the basis of NoSQL and MongoDB. Why some projects requires RDBMs and another NoSQL databases? What are the pros and cons to use NoSQL vs. SQL? How data are stored and transefed in MongoDB? What query language is used? How MongoDB supports high availability and automatic failover with the help of the replication? What is sharding and how it helps to support scalability?. The newest level of the concurrency - collection-level and document-level.
2014 05-07-fr - add dev series - session 6 - deploying your application-2MongoDB
?
The document discusses MongoDB replication and sharding. Replication uses replica sets for high availability and disaster recovery. Sharding partitions data across multiple servers (shards) to improve scalability. The key points covered include:
- Replication maintains copies of data on multiple servers for redundancy and high availability. It uses replica sets and elections for failover.
- Sharding partitions data by a shard key across multiple mongod instances (shards) to scale reads and writes. It requires config servers to store metadata and mongos instances as query routers.
- Write concerns allow controlling acknowledgments and replication of write operations. Tag-aware sharding allows controlling data distribution across shards.
UiPath Agentic Automation Capabilities and OpportunitiesDianaGray10
?
Learn what UiPath Agentic Automation capabilities are and how you can empower your agents with dynamic decision making. In this session we will cover these topics:
What do we mean by Agents
Components of Agents
Agentic Automation capabilities
What Agentic automation delivers and AI Tools
Identifying Agent opportunities
? If you have any questions or feedback, please refer to the "Women in Automation 2025" dedicated Forum thread. You can find there extra details and updates.
Fl studio crack version 12.9 Free Downloadkherorpacca127
?
https://ncracked.com/7961-2/
Note: >>?? Please copy the link and paste it into Google New Tab now Download link
The ultimate guide to FL Studio 12.9 Crack, the revolutionary digital audio workstation that empowers musicians and producers of all levels. This software has become a cornerstone in the music industry, offering unparalleled creative capabilities, cutting-edge features, and an intuitive workflow.
With FL Studio 12.9 Crack, you gain access to a vast arsenal of instruments, effects, and plugins, seamlessly integrated into a user-friendly interface. Its signature Piano Roll Editor provides an exceptional level of musical expression, while the advanced automation features empower you to create complex and dynamic compositions.
Replacing RocksDB with ScyllaDB in Kafka Streams by Almog GavraScyllaDB
?
Learn how Responsive replaced embedded RocksDB with ScyllaDB in Kafka Streams, simplifying the architecture and unlocking massive availability and scale. The talk covers unbundling stream processors, key ScyllaDB features tested, and lessons learned from the transition.
Gojek Clone is a versatile multi-service super app that offers ride-hailing, food delivery, payment services, and more, providing a seamless experience for users and businesses alike on a single platform.
Many MSPs overlook endpoint backup, missing out on additional profit and leaving a gap that puts client data at risk.
Join our webinar as we break down the top challenges of endpoint backup¡ªand how to overcome them.
DevNexus - Building 10x Development Organizations.pdfJustin Reock
?
Developer Experience is Dead! Long Live Developer Experience!
In this keynote-style session, we¡¯ll take a detailed, granular look at the barriers to productivity developers face today and modern approaches for removing them. 10x developers may be a myth, but 10x organizations are very real, as proven by the influential study performed in the 1980s, ¡®The Coding War Games.¡¯
Right now, here in early 2025, we seem to be experiencing YAPP (Yet Another Productivity Philosophy), and that philosophy is converging on developer experience. It seems that with every new method, we invent to deliver products, whether physical or virtual, we reinvent productivity philosophies to go alongside them.
But which of these approaches works? DORA? SPACE? DevEx? What should we invest in and create urgency behind today so we don¡¯t have the same discussion again in a decade?
Formal Methods: Whence and Whither? [Martin Fr?nzle Festkolloquium, 2025]Jonathan Bowen
?
Alan Turing arguably wrote the first paper on formal methods 75 years ago. Since then, there have been claims and counterclaims about formal methods. Tool development has been slow but aided by Moore¡¯s Law with the increasing power of computers. Although formal methods are not widespread in practical usage at a heavyweight level, their influence as crept into software engineering practice to the extent that they are no longer necessarily called formal methods in their use. In addition, in areas where safety and security are important, with the increasing use of computers in such applications, formal methods are a viable way to improve the reliability of such software-based systems. Their use in hardware where a mistake can be very costly is also important. This talk explores the journey of formal methods to the present day and speculates on future directions.
Computational Photography: How Technology is Changing Way We Capture the WorldHusseinMalikMammadli
?
? Computational Photography (Computer Vision/Image): How Technology is Changing the Way We Capture the World
He? d¨¹?¨¹nm¨¹s¨¹n¨¹zm¨¹, m¨¹asir smartfonlar v? kameralar nec? bu q?d?r g?z?l g?r¨¹nt¨¹l?r yarad?r? Bunun sirri Computational Fotoqrafiyas?nda(Computer Vision/Imaging) gizlidir¡ª??kill?ri ??km? v? emal etm? ¨¹sulumuzu t?kmill??dir?n, komp¨¹ter elmi il? fotoqrafiyan?n inqilabi birl??m?si.
UiPath Automation Developer Associate Training Series 2025 - Session 1DianaGray10
?
Welcome to UiPath Automation Developer Associate Training Series 2025 - Session 1.
In this session, we will cover the following topics:
Introduction to RPA & UiPath Studio
Overview of RPA and its applications
Introduction to UiPath Studio
Variables & Data Types
Control Flows
You are requested to finish the following self-paced training for this session:
Variables, Constants and Arguments in Studio 2 modules - 1h 30m - https://academy.uipath.com/courses/variables-constants-and-arguments-in-studio
Control Flow in Studio 2 modules - 2h 15m - https:/academy.uipath.com/courses/control-flow-in-studio
?? For any questions you may have, please use the dedicated Forum thread. You can tag the hosts and mentors directly and they will reply as soon as possible.
Unlock AI Creativity: Image Generation with DALL¡¤EExpeed Software
?
Discover the power of AI image generation with DALL¡¤E, an advanced AI model that transforms text prompts into stunning, high-quality visuals. This presentation explores how artificial intelligence is revolutionizing digital creativity, from graphic design to content creation and marketing. Learn about the technology behind DALL¡¤E, its real-world applications, and how businesses can leverage AI-generated art for innovation. Whether you're a designer, developer, or marketer, this guide will help you unlock new creative possibilities with AI-driven image synthesis.
World Information Architecture Day 2025 - UX at a CrossroadsJoshua Randall
?
User Experience stands at a crossroads: will we live up to our potential to design a better world? or will we be co-opted by ¡°product management¡± or another business buzzword?
Looking backwards, this talk will show how UX has repeatedly failed to create a better world, drawing on industry data from Nielsen Norman Group, Baymard, MeasuringU, WebAIM, and others.
Looking forwards, this talk will argue that UX must resist hype, say no more often and collaborate less often (you read that right), and become a true profession ¡ª in order to be able to design a better world.
The Future of Repair: Transparent and Incremental by Botond De?nesScyllaDB
?
Regularly run repairs are essential to keep clusters healthy, yet having a good repair schedule is more challenging than it should be. Repairs often take a long time, preventing running them often. This has an impact on data consistency and also limits the usefulness of the new repair based tombstone garbage collection. We want to address these challenges by making repairs incremental and allowing for automatic repair scheduling, without relying on external tools.
A Framework for Model-Driven Digital Twin EngineeringDaniel Lehner
?
ºÝºÝߣs from my PhD Defense at Johannes Kepler University, held on Janurary 10, 2025.
The full thesis is available here: https://epub.jku.at/urn/urn:nbn:at:at-ubl:1-83896
A Framework for Model-Driven Digital Twin EngineeringDaniel Lehner
?
Cassandra 2012 scandit
1. Cassandra for Barcodes, Products and Scans:
The Backend Infrastructure at Scandit
Christof Roduner
Co-founder and COO
christof@scandit.com Link: NoSQL concept and data model
@scandit
www.scandit.com February 1, 2012
3. WHAT IS SCANDIT?
3
Scandit provides developers best-in-class tools to
build, analyze and monetize product-centric apps.
IDENTIFY ANALYZE MONETIZE
Products User Interest Apps
4. ANALYZE:
THE SCANALYTICS PLATFORM
4
? Tool for app publishers
? App-specific usage statistics
? Insights into consumer behavior:
? What do users scan?
? Product categories? Groceries, electronics, books, cosmetics, ¡?
? Where do users scan?
? At home? Or while in a retail store?
? Top products and brands
? Identify new opportunities:
? Customer engagement
? Product interest
? Cross-selling and up-selling
5. BACKEND REQUIREMENTS
5
? Product database
? Many millions of products
? Many different data sources
? Curation of product data (filtering, etc.)
? Analysis of scans
? Accept and store high volumes of scans
? Generate statistics over extended time periods
? Correlate with product data
? Provide reports to developers
6. BACKEND DESIGN GOALS
6
? Scalability
? High-volume storage
? High-volume throughput
? Support large number of concurrent client requests (app)
? Availability
? Low maintenance
7. WHICH DATABASE?
7
Apache Cassandra
? Large, distributed key-value store (DHT)
? ?NoSQL? Polyglot Persistence
? Inspired by:
? Amazon¡¯s Dynamo distributed storage system
? Google¡¯s BigTable data model
? Originally developed at Facebook
? Inbox search
8. WHY DID WE CHOOSE IT?
8
? Looked very fast
? Even when data is much larger than RAM
? Performs well in write-heavy environment
? Proven scalability
? Without downtime
? Tunable replication
? Easy to run and maintain
? No sharding
? All nodes are the same - no coordinators, masters, slaves, ¡
? Data model
? YMMV¡
9. WHAT YOU HAVE TO GIVE UP
9
? Joins
? Referential integrity
? Transactions
? Expressive query language
? Consistency (tunable, but¡)
? Limited support for:
? Schema
? Secondary indices
10. CASSANDRA DATA MODEL
10
Disclaimer: I tend to say ?hash?
when I mean ?dictionary, map,
? Column families associative array? (Can you tell
my favorite language?)
? Rows
? Columns
? (Supercolumns)
? We¡¯ll skip them - Cassandra developers don¡¯t like
them
11. COLUMNS AND ROWS
11
? Column:
? Is a name-value pair
? row_key, CF, column, timestamp and value
? Row:
? Has exactly one key
? Contains any number of columns
? Columns are always automatically sorted by their name
? Column family:
? A collection of any number of rows (!)
? Has a name
? ?Like a table?
12. EXAMPLE COLUMN FAMILY
12
"users": {
Row with key ?christof?
"christof": {
"email": "christof@scandit.com",
"phone": "123-456-7890"
}
"moritz": {
"email": "moritz@scandit.com",
Two columns, automatically
"web": "www.example.com" sorted by their names
} (?email?, ?web?)
}
? A column family ?users? containing two rows
? Columns can be different in every row
? First row has a column named ?phone?, second row does not
? Rows can have many columns
? You can add millions of them
13. DATA IN COLUMN NAMES
13
? Column names can be used to store data
? Frequent pattern in Cassandra
? Takes advantage of column sorting
"logins": {
"christof": {
"2012-01-29 16:22:30 +0100": "208.115.113.86",
"2012-01-30 07:48:03 +0100": "66.249.66.183",
"2012-01-30 18:06:55 +0100": "208.115.111.70",
"2012-01-31 12:37:26 +0100": "66.249.66.183"
}
"moritz": {
"2012-01-23 01:12:49 +0100": "205.209.190.116"
}
}
14. SCHEMA AND DATA TYPES
14
? Schema is optional
? Data type can be defined for:
? Keys
? The values of all columns with a given name
? The column names in a CF
? By default, data type BLOB is used
? Data Types
? BLOB (default) ? UUID
? ASCII text ? Integer (arbitrary length)
? UTF8 text ? Float
? Timestamp ? Double
? Boolean ? Decimal
15. CLUSTER ORGANIZATION
15
Range 1-64,
Node 1 stored on node 2
Token 0
Node 4 Node 2
Token 192 Token 64
Node 3 Range 65-128,
Token 128 stored on node 3
16. STORING A ROW
16
Range 1-64,
1. Calculate md5 hash for row key stored on node 2
Example: md5(¡°foobar") = 48
Node 1
Token 0
2. Determine data range for hash
Example: 48 lies within range 1-64
3. Store row on node responsible
Node 4 Node 2
for range
Token 192 Token 64
Example: store on node 2
Node 3
Token 128
Range 65-128,
stored on node 3
17. IMPLICATIONS
17
? Cluster automatically balanced
? Load is shared equally between nodes
? No hotspots
? Scaling out?
? Easy
? Divide data ranges by adding more nodes
? Cluster rebalances itself automatically
? Range queries not possible
? You can¡¯t retrieve ?all rows from A-C?
? Rows are not stored in their ?natural? order
? Rows are stored in order of their md5 hashes
18. IF YOU NEED RANGE QUERIES¡
18
Option 1: ?Order Preserving Partitioner? (OPP)
? OPP determines node based on a row¡¯s key instead of its hash
? Don¡¯t use it¡
? Manually balancing a cluster is hard
? Hotspots
? Balancing cluster for one column family creates hotspot for another
Option 2: Use columns instead of rows
? Columns are always sorted
? Rows can store millions of columns
19. REPLICATION
19
Replica 1
? Tunable replication factor of row
(RF) Node 1 ?foobar?
Token 0
? RF > 1: rows are automatically
replicated to next RF-1 nodes
? Tunable replication strategy Node 2
Node 4
? ?Ensure two replicas in Token 64
different data centers, racks, Token 192
etc.?
Node 3
Token 128 Replica 2
of row
?foobar?
20. CLIENT ACCESS
20
? Clients can send read and write
requests to any node
? This node will act as
coordinator
Node 1 Replica 1
? Coordinator forwards request Token 0 of row
to nodes where data resides ?foobar?
Client Node 4 Node 2
Token 192 Token 64
Request:
Replica 2
insert( Node 3 of row
"foobar": { "email": "fb@example.com" } Token 128 ?foobar?
)
21. CONSISTENCY LEVELS
21
? For all requests, clients can set a consistency level (CL)
? For writes:
? CL defines how many replicas must be written before
?success? is returned to client
? For reads:
? CL defines how many replicas must respond before result is
returned to client
? Consistency levels:
? ONE
? QUORUM
? ALL
? ¡ (data center-aware levels)
22. INCONSISTENT DATA
22
? Example scenario:
? Replication factor 2
? Two existing replica for row ?foobar?
? Client overwrites existing columns in ?foobar?
? Replica 2 is down
? What happens:
? Column is updated in replica 1, but not replica 2 (even with CL=ALL !)
? Timestamps to the rescue
? Every column has a timestamp
? Timestamps are supplied by clients
? Upon read, column with latest timestamp wins
? ¡úUse NTP
24. RETRIEVING DATA (API)
24
? At a row level, you can¡
? Get all rows
? Get a single row by specifying its key
? Get a number of rows by specifying their keys
? Get a range of rows
? Only with OPP, strongly discouraged
? At a column level, you can¡
? Get all columns
? Get a single column by specifying its name
? Get a number of columns by specifying their names
? Get a range of columns by specifying the name of the first and
last column
? Again: no ranges of rows
28. SECONDARY INDICES
28
? Secondary indices can be defined for (single) columns
? Secondary indices only support equality predicate (=)
in queries
? Each node maintains index for data it owns
? When indexed column is queried, request must be forwarded
to all nodes
? Sometimes better to manually maintain your own index
29. PRODUCTION EXPERIENCE
29
? No stability issues
? Very fast
? Language bindings don¡¯t have the same quality
? Out of sync, bugs
? Data model is a mental twist
? Design-time decisions sometimes hard to change
? Rudimentary access control
30. TRYING OUT CASSANDRA
30
? DataStax website
? Company founded by Cassandra developers
? Provides
? Documentation
? Amazon Machine Image
? Apache website
? Mailing lists
31. CLUSTER AT SCANDIT
31
? Several nodes in two data centers
? Linux machines
? Identical setup on every node
? Allows for easy failover
32. NODE ARCHITECTURE
32
from mobile apps and web browsers
Phusion Passenger
mod_passenger
Website & REST API
Ruby on Rails, Rack
to other nodes