際際滷

際際滷Share a Scribd company logo
CHAPTER 06: Cassandra
cybersecurity notes for mca students for learning
History of Cassandra
 Apache Cassandra was born at Facebook for inbox
search. Facebook open sourced the code in 2008.
 Cassandra became an Apache Incubator project
in 2009 and subsequently became a top-level
Apache project in 2010.
 The latest version of Apache Cassandra is 3.1.1.
 It is a column-oriented database designed to support
peer-to-peer symmetric nodes instead of the master
slave architecture.
 It is built on Amazons dynamo and Googles
BigTable.
cassandra ~= bigtable + dynamo
cybersecurity notes for mca students for learning
What is Cassandra?
 Apache Cassandra is a highly scalable, high-performance
distributed database designed to handle large amounts of
structured data across many commodity servers with
replication, providing high availability and no single point
of failure.
 circles are Cassandra nodes and lines between the
circles shows distributed architecture, while the client
is sending data to the node. (Ring Architecture)
Notable points
 It is scalable, fault-tolerant, and consistent.
 It is a column-oriented database.
 Its distribution design is based on Amazons Dynamo and
its data model on Googles Bigtable.
 Cassandra implements a Dynamo-style replication model
with no single point of failure, but adds a more powerful
column family data model.
 Cassandra is being used by some of the biggest
companies such as Facebook, Twitter, Cisco, Rackspace,
ebay, Adobe, Twitter, Netflix, and more.
Features of Cassandra
 Elastic scalability - Cassandra is highly scalable; it allows
to add more hardware to accommodate more customers
and more data as per requirement.
 Massively Scalable Architecture: Cassandra has a
masterless design where all nodes are at the same level
which provides operational simplicity and easy scale out.
 Always on architecture (peer-to-peer
network): Cassandra replicates data on different nodes
that ensures no single point of failure and it is
continuously available for business-critical applications.
 Linear Scale Performance: As more nodes are added,
the performance of Cassandra increases. Therefore it
maintains a quick response time.
Features of Cassandra
 Flexible data storage - Cassandra accommodates all possible
data formats including: structured, semi-structured, and
unstructured. It can dynamically accommodate changes to
data structures according to the need.
 Easy data distribution - Cassandra provides the flexibility to
distribute data where you need by replicating data across
multiple data centers.
 Transaction support - Cassandra supports properties like
Atomicity, Consistency, Isolation, and Durability (ACID).
 Fast writes - Cassandra was designed to run on cheap
commodity hardware. It performs blazingly fast writes and
can store hundreds of terabytes of data, without sacrificing
the read efficiency.
Features of Cassandra
 Fault Detection and Recovery: Failed nodes can easily be
restored and recovered.
 Flexible and Dynamic Data Model: Supports datatypes
with Fast writes and reads.
 Data Protection: Data is protected with commit log
design and build in security like backup and restore
mechanisms.
 Tunable Data Consistency: Support for strong data
consistency across distributed architecture.
 Multi Data Center Replication: Cassandra provides
feature to replicate data across multiple data center.
Features of Cassandra
 Data Compression: Cassandra can compress up to 80%
data without any overhead.
 Cassandra Query language (CQL): Cassandra provides
query language that is similar like SQL language. It makes
very easy for relational database developers moving
from relational database to Cassandra.
Cassandra Use Cases/Application
 Messaging: Cassandra is a great database for the
companies that provides Mobile phones and messaging
services. These companies have a huge amount of data,
so Cassandra is best for them.
 Internet of things Application: Cassandra is a great
database for the applications where data is coming at
very high speed from different devices or sensors.
 Product Catalogs and retail apps: Cassandra is used by
many retailers for durable shopping cart protection and
fast product catalog input and output.
Cassandra Use Cases/Application
 Social Media Analytics and recommendation engine:
Cassandra is a great database for many online companies
and social media providers for analysis and
recommendation to their customers.
Cassandra Architecture
 The design goal of Cassandra is to handle big data
workloads across multiple nodes without any single
point of failure.
 Cassandra has peer-to-peer distributed system across its
nodes, and data is distributed among all the nodes in a
cluster.
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
Components of Cassandra
 Node  It is the basic fundamental unit of
Cassandra. Data stores in these
units(computer/server).
 Data center  It is a collection of related
nodes.
 Cassandra Rack- A rack is a unit that contains
all the multiple servers, all stacked on top of
another. A node is a single server in a rack.
 Cluster  A cluster is a component that
contains one or more data centers.
Components of Cassandra
 Commit log  The commit log is a crash-recovery
mechanism in Cassandra. Every write operation is
written to the commit log.
 Mem-table  A mem-table is a memory-resident
data structure. After commit log, the data will be
written to the mem-table.
 SSTable  It is a disk file to which the data is
flushed from the mem-table when its contents
reach a threshold value.
A rack is a group of
machines housed in the
same physical box. Each
machine in the rack has
its own CPU, memory,
and hard disk. However,
the rack has no CPU,
memory, or hard disk of
its own.
All machines in the rack are
connected to the network switch
of the rack
The racks network switch is
connected to the cluster.
All machines on the rack have a
common power supply. It is
important to notice that a rack
can fail due to two reasons: a
network switch failure or a power
supply failure.
If a rack fails, none of the
machines on the rack can be
accessed. So it would seem as
though all the nodes on the rack
are down.
Cassandra Cluster
Cassandra Architecture- Cassandra Cluster
Cassandra Architecture
 All the nodes in a cluster play the same role. Each node is
independent and at the same time interconnected to other
nodes.
 Each node in a cluster can accept read and write requests,
regardless of where the data is actually located in the cluster.
 When a node goes down, read/write requests can be served
from other nodes in the network.
Data Replication in Cassandra
 In Cassandra, one or more of the nodes in a
cluster act as replicas for a given piece of data.
 If it is detected that some of the nodes
responded with an out-of-date value,
Cassandra will return the most recent value to
the client. After returning the most recent
value, Cassandra performs a read repair in the
background to update the stale (old) values.
 The RF lies between 1 and n (# of nodes)
Gossip protocol
 Cassandra uses the Gossip Protocol in the
background to allow the nodes to communicate with
each other and detect any faulty nodes in the
cluster.
 A gossip protocol is a style of computer-to-
computer communication protocol inspired by the
form of gossip seen in social networks.
 The term epidemic protocol is sometimes used as a
synonym for a gossip protocol, because gossip
spreads information in a manner similar to the
spread of a virus in a biological community.
Partitioner
 Used for distributing data on the various nodes in
a cluster.
 It also determines the node on which to place the
very first copy of the data.
 It is a hash function
Replication Factor
 The total number of replicas across the cluster is
referred to as the replication factor.
 The RF determines the number of copies of data
(replicas) that will be stored across nodes in a
cluster.
 A replication strategy determines the nodes
where replicas are placed.
 Simple Strategy:
 Network Topology Strategy.
Simple Strategy
 Use only for a single datacenter and one rack.
 Simple Strategy places the first replica on a node
determined by the partitioner. Additional replicas
are placed on the next nodes clockwise in the
ring.
 Simple Strategy which is rack unaware and data
center unaware policy i.e. without considering
topology (rack or datacenter location).
cybersecurity notes for mca students for learning
Network Topology Strategy
 Network Topology Strategy is used when you have
more than two data centers.
 As the name indicates, this strategy is aware of the
network topology (location of nodes in racks, data
centers etc.) and is much intelligent than Simple
Strategy.
 This strategy specifies how many replicas you want in
each datacenter.
 Replicas are set for each data center separately. Rack
set of data for each data center place separately in a
clockwise direction on different racks of the same
data center. This process continues until it reaches
the first node.
cybersecurity notes for mca students for learning
Anti-Entropy
 Anti-entropy is a process of comparing the data of
all replicas and updating each replica to the
newest version.
 Frequent data deletions and node failures are
common causes of data inconsistency.
 Anti-entropy node repairs are important for every
Cassandra cluster.
 Anti-entropy repair is used for routine
maintenance and when a cluster needs fixing.
cybersecurity notes for mca students for learning
Writes path in Cassandra
 Cassandra processes data at several stages on the write path,
starting with the immediate logging of a write and ending in
compaction:
 Logging data in the commit log
 Writing data to the memtable
 Flushing data from the memtable
 Storing data on disk in SSTables
 Compaction
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
Hinted Handoffs
Hinted Handoffs
Depiction of hinted handoffs
Hint table
 Location of the node on which the replica is to be
placed.
 Version metadata
 The actual data
 When node C recovers and is back to the functional,
node A reacts to the hint by forwarding the data to node
C.
Tunable Consistency (T C)
 Consistency refers to how up-to-date and synchronized a
row of Cassandra data is on all of its replicas.
 Tunable consistency = Strong C + Eventual C
 Strong Consistency:
 Each update propagates to all locations, and it
ensures all server should have a copy of the data
before it serves to the client.
 It has impact performance.
Eventual Consistency
 It implies that the client is acknowledged with a success
as soon as a part of the cluster acknowledges the write.
 It is used when application performance matter.
Read consistency
 It means how many replicas must respond before
sending out the result to the client applications.
 Consistency levels : next slide
ONE Returns a response from the closest
node (replica)
holding the data.
QUORUM Returns a result from a quorum of
servers with the most recent timestamp
for the data.
LOCAL_QUORU
M
Returns a result from a quorum of
servers with the most recent timestamp
for the data in the same data center as the
coordinator node.
EACH_QUORUM Returns a result from a quorum of
servers with the
most recent timestamp in all data centers.
ALL This provides the highest level of
consistency of all levels. It responds to a
read request from a client after all the
replica nodes have responded.
Write consistency
 It means on how many replicas , write must succeed
before sending out an ACK to the client application.
 Write consistency levels: next slide
cybersecurity notes for mca students for learning
CQL DATA TYPES
cybersecurity notes for mca students for learning
CQLSH
 Cassandra provides Cassandra query language
shell (cqlsh) that allows users to communicate with
Cassandra.
 Using cqlsh, you can
 define a schema,
 insert data, and
 execute a query.
KEYSPACES (Database [Namespace])
 It is a container to hold application data like RDBMS.
 Used to group column families together.
 Each cluster has one keyspace/application or per
node.
 A keyspace (or key space) in a NoSQL data store is an
object that holds together all column families of a
design.
 It is the outermost grouping of the data in the data
store.
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
To create keyspace
CREATE KEYSPACE KeySpace Name
WITH replication = {'class': Strategy name,
'replication_factor' : No.Of replicas};
Details about existing Keyspaces
Describe keyspaces;
Select * from system.schema_keyspaces;
This gives more details
To use existing keyspace
Use keyspace;
Use students;
To create a column family or table by the name
student_info.
CREATE TABLE Student_Info ( RollNo int PRIMARY
KEY, StudName text, DateofJoining timestamp,
LastExamPercent double);
Other commands
Describe tables;
Describe table student_info;
CRUD
SELECT
To view the data from the table student_info.
SELECT * FROM student_info;
Select * from student_info where rollno in (1,2,3);
Index
T
o create an index on the studname column of the
student_info column family use the following
statement
CREATE INDEX ON student_info(studname);
Select * from student_info where StudName='Aviral';
Update
To update the value held in the StudName column of
the student_info column family to David Sheen for the
record where the RollNo column has value = 2.
Note: An update updates one or more column values for a
given row to the Cassandra table. It does not return
anything.
 UPDATE student_info SET StudName = 'Sharad' WHERE
RollNo = 3;
Delete
T
o delete the column LastExamPercent from the
student_info table for
the record where the RollNo = 2.
Note:Delete statement removes one or more columns
from one or more rows of a Cassandra table or
removes entire rows if no columns are specified.
DELETE LastExamPercent FROM student_info WHERE
RollNo=2;
Collections
 Cassandra provides collection types, used to group and
store data together in a column.
 E.g., grouping such a user's multiple email addresses.
 The values of items in a collection are limited to
64K.
 Collections can be used when you need to store the
following: Phone numbers of users and Email ids of
users.
Collections Set
 T
o alter the schema for the table student_info to
add a column hobbies.
ALTER TABLE student_info ADD hobbies set<text>;
UPDATE student_info SET hobbies = hobbies + {'Chess, Table
Tennis'} WHERE RollNo=4;
Collections List
 T
o alter the schema of the table student_info to
add a list column language.
ALTER TABLE student_info ADD language list<text>;
UPDATE student_info SET language = language + ['Hindi,
English'] WHERE RollNo=1;
Collections Map
 A map relates one item to another with a key-value pair.
Using the map type, you can store timestamp-related
information in user profiles.
 T
o alter the Student_info table to add a map
column todo.
 ALTER TABLE Student_info ADD todo map<timestamp,
text>;
Example
UPDATE student_info SET todo = { '2014-9-24':
'Cassandra Session', '2014-10-2 12:00' :
'MongoDB Session' } where rollno = 1;
Time To Live(TTL)
 Data in a column, other than a counter column, can
have an optional expiration period called TTL (time to
live).
 The client request may specify a TTL value for the
data. The TTL is specified in seconds.
Time To Live(TTL)
 CREATE TABLE userlogin(userid int primary key,
password text);
 INSERT INTO userlogin (userid, password) VALUES
(1,'infy') USING TTL 30;
 select * from userlogin;
Export to CSV
copy student_info( RollNo,StudName ,
DateofJoining, LastExamPercent) TO 'd:student.csv';
Import data from a CSV file
CREATE TABLE student_data ( id int PRIMARY KEY, fn text, ln
text,phone text, city text);
COPY student_data (id,fn,ln,phone,city) FROM
'd:cassandraDatastudent.csv';
Introduction to MapReduce Programming
(Revisit for details)
 In MapReduce Programming, Jobs (Applications) are
split into a set of map tasks and reduce tasks. Then these
tasks are executed in a distributed fashion on Hadoop
cluster.
 Each task processes small subset of data that has been
assigned to it. This way, Hadoop distributes the load
across the cluster.
 MapReduce job takes a set of files that is stored in
HDFS (Hadoop Distributed File System) as input.
Mapper
 The Map task takes care of loading, parsing,
transforming, and filtering.
 A mapper maps the input key-value pairs into a set of
intermediate key-value pairs.
 Maps are individual tasks that have the responsibility of
transforming input records into intermediate key-value
pairs. Each map task is broken into the following phases
 RecordReader
 Mapper/Maps
 Combiner
 partitioner
RecordReader
 RecordReader reads the data from inputsplit (record)
and converts them into key-value pair for the input to
the Mapper class.
cybersecurity notes for mca students for learning
Maps
 Map is a user-defined function, which takes a series of
key-value pairs and processes each one of them to
generate zero or more key-value pairs.
 Map takes a set of data and converts it into another set
of data. Input and output are key-value pairs.
Combiner
 A combiner is a type of local Reducer that groups similar
data from the map phase into new set of key-value pair.
 It is not a part of the main MapReduce algorithm;
 it is optional (may be part of mapper/map).
 The main function of a Combiner is to summarize the
map output records with the same key.
Difference between Combiner and Reducer
 Output generated by combiner is intermediate data and
is passed to the reducer.
 Output of the reducer is passed to the output file on the
disk.
cybersecurity notes for mca students for learning
Partitioner
 A partitioner partitions the key-value pairs of
intermediate Map-outputs.
 The Partitioner in MapReduce controls the partitioning
of the key of the intermediate mapper output.
 The partition phase takes place after the Map phase and
before the Reduce phase.
 The number of partitioner is equal to the number of
reducers. That means a partitioner will divide the data
according to the number of reducers. Therefore, the
data passed from a single partitioner is processed by a
single Reducer.
Partitioner
 And partitioner is created only when there are multiple
reducers.
Shuffling and Sorting in Hadoop MapReduce
 The process by which the intermediate output
from mappers is transferred to the reducer is called
Shuffling.
 Intermediated key-value generated by mapper is sorted
automatically by key.
cybersecurity notes for mca students for learning
Reduce
 The primary task of the Reducer is to reduce
a set of intermediate values (the ones that share
a common key) to a smaller set of values.
 The Reducer takes the grouped key-value paired
data as input and runs a Reducer function on each
one of them.
 Here, the data can be aggregated, filtered, and
combined in a number of ways, and it requires a
wide range of processing.
 The output of the reducer is the final output,
which is stored in HDFS
RecordWriter (Output format)
 RecordWriter writes output key-value pairs from the
Reducer phase to output files.
 OutputFormat instances provided by the Hadoop are
used to write files in HDFS. Thus the final output of
reducer is written on HDFS by OutputFormat instances
using RecordWriter.
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning

More Related Content

Similar to cybersecurity notes for mca students for learning (20)

Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
raghdooosh
cassandra.pptx
cassandra.pptxcassandra.pptx
cassandra.pptx
BRINDHA256909
Why Cassandra?
Why Cassandra?Why Cassandra?
Why Cassandra?
Tayfun Sevimli
Cassandra presentation
Cassandra presentationCassandra presentation
Cassandra presentation
Sergey Enin
Cassandra
CassandraCassandra
Cassandra
Upaang Saxena
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Johnny Miller
5266732.ppt
5266732.ppt5266732.ppt
5266732.ppt
hothyfa
Cassandra Learning
Cassandra LearningCassandra Learning
Cassandra Learning
Ehsan Javanmard
Cassandra - A Distributed Database System
Cassandra - A Distributed Database System Cassandra - A Distributed Database System
Cassandra - A Distributed Database System
Md. Shohel Rana
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra
Nikiforos Botis
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
Bigstep
System design fundamentals CAP.pdf
System design fundamentals CAP.pdfSystem design fundamentals CAP.pdf
System design fundamentals CAP.pdf
UsmanAhmed269749
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Johnny Miller
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Mohit Tare
Column db dol
Column db dolColumn db dol
Column db dol
poojabi
CASSANDRA - Next to RDBMS
CASSANDRA - Next to RDBMSCASSANDRA - Next to RDBMS
CASSANDRA - Next to RDBMS
Vipul Thakur
Cassandra
CassandraCassandra
Cassandra
exsuns
DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014
Christian Johannsen
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
Arunit Gupta
Aruman Cassandra database
Aruman Cassandra databaseAruman Cassandra database
Aruman Cassandra database
Umesh Dande
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
raghdooosh
Cassandra presentation
Cassandra presentationCassandra presentation
Cassandra presentation
Sergey Enin
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Johnny Miller
5266732.ppt
5266732.ppt5266732.ppt
5266732.ppt
hothyfa
Cassandra - A Distributed Database System
Cassandra - A Distributed Database System Cassandra - A Distributed Database System
Cassandra - A Distributed Database System
Md. Shohel Rana
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra
Nikiforos Botis
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
Bigstep
System design fundamentals CAP.pdf
System design fundamentals CAP.pdfSystem design fundamentals CAP.pdf
System design fundamentals CAP.pdf
UsmanAhmed269749
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Johnny Miller
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Mohit Tare
Column db dol
Column db dolColumn db dol
Column db dol
poojabi
CASSANDRA - Next to RDBMS
CASSANDRA - Next to RDBMSCASSANDRA - Next to RDBMS
CASSANDRA - Next to RDBMS
Vipul Thakur
Cassandra
CassandraCassandra
Cassandra
exsuns
DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014
Christian Johannsen
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
Arunit Gupta
Aruman Cassandra database
Aruman Cassandra databaseAruman Cassandra database
Aruman Cassandra database
Umesh Dande

Recently uploaded (20)

AI-Powered Chatbots for Employee Support
AI-Powered Chatbots for Employee SupportAI-Powered Chatbots for Employee Support
AI-Powered Chatbots for Employee Support
AutomationEdge Technologies
Enscape Latest 2025 Crack Free Download
Enscape Latest 2025  Crack Free DownloadEnscape Latest 2025  Crack Free Download
Enscape Latest 2025 Crack Free Download
rnzu5cxw0y
Wondershare Filmora Crack Free Download
Wondershare Filmora  Crack Free DownloadWondershare Filmora  Crack Free Download
Wondershare Filmora Crack Free Download
zqeevcqb3t
Computer Architecture Patterson chapter 1 .ppt
Computer Architecture Patterson chapter 1 .pptComputer Architecture Patterson chapter 1 .ppt
Computer Architecture Patterson chapter 1 .ppt
jaysen110
Elastic Search Engineer Certification - Virtual
Elastic Search Engineer Certification - VirtualElastic Search Engineer Certification - Virtual
Elastic Search Engineer Certification - Virtual
Gon巽alo Pereira
Carousel - Five Key FinTech Trends for 2025
Carousel - Five Key FinTech Trends for 2025Carousel - Five Key FinTech Trends for 2025
Carousel - Five Key FinTech Trends for 2025
Anadea
AVG Antivirus Crack With Free version Download 2025 [Latest]
AVG Antivirus Crack With Free version Download 2025 [Latest]AVG Antivirus Crack With Free version Download 2025 [Latest]
AVG Antivirus Crack With Free version Download 2025 [Latest]
haroonsaeed605
SketchUp Pro Crack [2025]-Free Download?
SketchUp Pro Crack [2025]-Free Download?SketchUp Pro Crack [2025]-Free Download?
SketchUp Pro Crack [2025]-Free Download?
kiran10101khan
Instagram Feed Snippet, Instagram posts display in odoo website
Instagram Feed Snippet, Instagram posts display in odoo websiteInstagram Feed Snippet, Instagram posts display in odoo website
Instagram Feed Snippet, Instagram posts display in odoo website
AxisTechnolabs
Minitool Partition Wizard Crack Free Download
Minitool Partition Wizard Crack Free DownloadMinitool Partition Wizard Crack Free Download
Minitool Partition Wizard Crack Free Download
v3r2eptd2q
Tenorshare 4uKey Crack Fre e Download
Tenorshare  4uKey  Crack  Fre e DownloadTenorshare  4uKey  Crack  Fre e Download
Tenorshare 4uKey Crack Fre e Download
oyv9tzurtx
SolidWorks 2025 Crack free Download updated
SolidWorks 2025 Crack  free Download updatedSolidWorks 2025 Crack  free Download updated
SolidWorks 2025 Crack free Download updated
sanasabaa73
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
Alluxio, Inc.
Wondershare Filmora 14.3.2 Crack + License Key Free Download
Wondershare Filmora 14.3.2 Crack + License Key Free DownloadWondershare Filmora 14.3.2 Crack + License Key Free Download
Wondershare Filmora 14.3.2 Crack + License Key Free Download
arshadkhokher01
salesforce development services - Alt digital
salesforce development services - Alt digitalsalesforce development services - Alt digital
salesforce development services - Alt digital
Alt Digital Technologies
Why Hire Python Developers? Key Benefits for Your Business
Why Hire Python Developers? Key Benefits for Your BusinessWhy Hire Python Developers? Key Benefits for Your Business
Why Hire Python Developers? Key Benefits for Your Business
Mypcot Infotech
DevOpsDays LA - Platform Engineers are Product Managers.pdf
DevOpsDays LA - Platform Engineers are Product Managers.pdfDevOpsDays LA - Platform Engineers are Product Managers.pdf
DevOpsDays LA - Platform Engineers are Product Managers.pdf
Justin Reock
Douwan Preactivated Plus Crack 2025-Latest
Douwan Preactivated Plus Crack 2025-LatestDouwan Preactivated Plus Crack 2025-Latest
Douwan Preactivated Plus Crack 2025-Latest
mubeen010khan
Account Cash Flow Statement Report Generate in odoo
Account Cash Flow Statement Report Generate in odooAccount Cash Flow Statement Report Generate in odoo
Account Cash Flow Statement Report Generate in odoo
AxisTechnolabs
SE- Lecture 5 SE for easy understanding.ppt
SE- Lecture 5 SE for easy understanding.pptSE- Lecture 5 SE for easy understanding.ppt
SE- Lecture 5 SE for easy understanding.ppt
theworldimagine985
Enscape Latest 2025 Crack Free Download
Enscape Latest 2025  Crack Free DownloadEnscape Latest 2025  Crack Free Download
Enscape Latest 2025 Crack Free Download
rnzu5cxw0y
Wondershare Filmora Crack Free Download
Wondershare Filmora  Crack Free DownloadWondershare Filmora  Crack Free Download
Wondershare Filmora Crack Free Download
zqeevcqb3t
Computer Architecture Patterson chapter 1 .ppt
Computer Architecture Patterson chapter 1 .pptComputer Architecture Patterson chapter 1 .ppt
Computer Architecture Patterson chapter 1 .ppt
jaysen110
Elastic Search Engineer Certification - Virtual
Elastic Search Engineer Certification - VirtualElastic Search Engineer Certification - Virtual
Elastic Search Engineer Certification - Virtual
Gon巽alo Pereira
Carousel - Five Key FinTech Trends for 2025
Carousel - Five Key FinTech Trends for 2025Carousel - Five Key FinTech Trends for 2025
Carousel - Five Key FinTech Trends for 2025
Anadea
AVG Antivirus Crack With Free version Download 2025 [Latest]
AVG Antivirus Crack With Free version Download 2025 [Latest]AVG Antivirus Crack With Free version Download 2025 [Latest]
AVG Antivirus Crack With Free version Download 2025 [Latest]
haroonsaeed605
SketchUp Pro Crack [2025]-Free Download?
SketchUp Pro Crack [2025]-Free Download?SketchUp Pro Crack [2025]-Free Download?
SketchUp Pro Crack [2025]-Free Download?
kiran10101khan
Instagram Feed Snippet, Instagram posts display in odoo website
Instagram Feed Snippet, Instagram posts display in odoo websiteInstagram Feed Snippet, Instagram posts display in odoo website
Instagram Feed Snippet, Instagram posts display in odoo website
AxisTechnolabs
Minitool Partition Wizard Crack Free Download
Minitool Partition Wizard Crack Free DownloadMinitool Partition Wizard Crack Free Download
Minitool Partition Wizard Crack Free Download
v3r2eptd2q
Tenorshare 4uKey Crack Fre e Download
Tenorshare  4uKey  Crack  Fre e DownloadTenorshare  4uKey  Crack  Fre e Download
Tenorshare 4uKey Crack Fre e Download
oyv9tzurtx
SolidWorks 2025 Crack free Download updated
SolidWorks 2025 Crack  free Download updatedSolidWorks 2025 Crack  free Download updated
SolidWorks 2025 Crack free Download updated
sanasabaa73
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
Alluxio, Inc.
Wondershare Filmora 14.3.2 Crack + License Key Free Download
Wondershare Filmora 14.3.2 Crack + License Key Free DownloadWondershare Filmora 14.3.2 Crack + License Key Free Download
Wondershare Filmora 14.3.2 Crack + License Key Free Download
arshadkhokher01
salesforce development services - Alt digital
salesforce development services - Alt digitalsalesforce development services - Alt digital
salesforce development services - Alt digital
Alt Digital Technologies
Why Hire Python Developers? Key Benefits for Your Business
Why Hire Python Developers? Key Benefits for Your BusinessWhy Hire Python Developers? Key Benefits for Your Business
Why Hire Python Developers? Key Benefits for Your Business
Mypcot Infotech
DevOpsDays LA - Platform Engineers are Product Managers.pdf
DevOpsDays LA - Platform Engineers are Product Managers.pdfDevOpsDays LA - Platform Engineers are Product Managers.pdf
DevOpsDays LA - Platform Engineers are Product Managers.pdf
Justin Reock
Douwan Preactivated Plus Crack 2025-Latest
Douwan Preactivated Plus Crack 2025-LatestDouwan Preactivated Plus Crack 2025-Latest
Douwan Preactivated Plus Crack 2025-Latest
mubeen010khan
Account Cash Flow Statement Report Generate in odoo
Account Cash Flow Statement Report Generate in odooAccount Cash Flow Statement Report Generate in odoo
Account Cash Flow Statement Report Generate in odoo
AxisTechnolabs
SE- Lecture 5 SE for easy understanding.ppt
SE- Lecture 5 SE for easy understanding.pptSE- Lecture 5 SE for easy understanding.ppt
SE- Lecture 5 SE for easy understanding.ppt
theworldimagine985

cybersecurity notes for mca students for learning

  • 3. History of Cassandra Apache Cassandra was born at Facebook for inbox search. Facebook open sourced the code in 2008. Cassandra became an Apache Incubator project in 2009 and subsequently became a top-level Apache project in 2010. The latest version of Apache Cassandra is 3.1.1. It is a column-oriented database designed to support peer-to-peer symmetric nodes instead of the master slave architecture. It is built on Amazons dynamo and Googles BigTable. cassandra ~= bigtable + dynamo
  • 5. What is Cassandra? Apache Cassandra is a highly scalable, high-performance distributed database designed to handle large amounts of structured data across many commodity servers with replication, providing high availability and no single point of failure.
  • 6. circles are Cassandra nodes and lines between the circles shows distributed architecture, while the client is sending data to the node. (Ring Architecture)
  • 7. Notable points It is scalable, fault-tolerant, and consistent. It is a column-oriented database. Its distribution design is based on Amazons Dynamo and its data model on Googles Bigtable. Cassandra implements a Dynamo-style replication model with no single point of failure, but adds a more powerful column family data model. Cassandra is being used by some of the biggest companies such as Facebook, Twitter, Cisco, Rackspace, ebay, Adobe, Twitter, Netflix, and more.
  • 8. Features of Cassandra Elastic scalability - Cassandra is highly scalable; it allows to add more hardware to accommodate more customers and more data as per requirement. Massively Scalable Architecture: Cassandra has a masterless design where all nodes are at the same level which provides operational simplicity and easy scale out. Always on architecture (peer-to-peer network): Cassandra replicates data on different nodes that ensures no single point of failure and it is continuously available for business-critical applications. Linear Scale Performance: As more nodes are added, the performance of Cassandra increases. Therefore it maintains a quick response time.
  • 9. Features of Cassandra Flexible data storage - Cassandra accommodates all possible data formats including: structured, semi-structured, and unstructured. It can dynamically accommodate changes to data structures according to the need. Easy data distribution - Cassandra provides the flexibility to distribute data where you need by replicating data across multiple data centers. Transaction support - Cassandra supports properties like Atomicity, Consistency, Isolation, and Durability (ACID). Fast writes - Cassandra was designed to run on cheap commodity hardware. It performs blazingly fast writes and can store hundreds of terabytes of data, without sacrificing the read efficiency.
  • 10. Features of Cassandra Fault Detection and Recovery: Failed nodes can easily be restored and recovered. Flexible and Dynamic Data Model: Supports datatypes with Fast writes and reads. Data Protection: Data is protected with commit log design and build in security like backup and restore mechanisms. Tunable Data Consistency: Support for strong data consistency across distributed architecture. Multi Data Center Replication: Cassandra provides feature to replicate data across multiple data center.
  • 11. Features of Cassandra Data Compression: Cassandra can compress up to 80% data without any overhead. Cassandra Query language (CQL): Cassandra provides query language that is similar like SQL language. It makes very easy for relational database developers moving from relational database to Cassandra.
  • 12. Cassandra Use Cases/Application Messaging: Cassandra is a great database for the companies that provides Mobile phones and messaging services. These companies have a huge amount of data, so Cassandra is best for them. Internet of things Application: Cassandra is a great database for the applications where data is coming at very high speed from different devices or sensors. Product Catalogs and retail apps: Cassandra is used by many retailers for durable shopping cart protection and fast product catalog input and output.
  • 13. Cassandra Use Cases/Application Social Media Analytics and recommendation engine: Cassandra is a great database for many online companies and social media providers for analysis and recommendation to their customers.
  • 14. Cassandra Architecture The design goal of Cassandra is to handle big data workloads across multiple nodes without any single point of failure. Cassandra has peer-to-peer distributed system across its nodes, and data is distributed among all the nodes in a cluster.
  • 17. Components of Cassandra Node It is the basic fundamental unit of Cassandra. Data stores in these units(computer/server). Data center It is a collection of related nodes. Cassandra Rack- A rack is a unit that contains all the multiple servers, all stacked on top of another. A node is a single server in a rack. Cluster A cluster is a component that contains one or more data centers.
  • 18. Components of Cassandra Commit log The commit log is a crash-recovery mechanism in Cassandra. Every write operation is written to the commit log. Mem-table A mem-table is a memory-resident data structure. After commit log, the data will be written to the mem-table. SSTable It is a disk file to which the data is flushed from the mem-table when its contents reach a threshold value.
  • 19. A rack is a group of machines housed in the same physical box. Each machine in the rack has its own CPU, memory, and hard disk. However, the rack has no CPU, memory, or hard disk of its own. All machines in the rack are connected to the network switch of the rack The racks network switch is connected to the cluster. All machines on the rack have a common power supply. It is important to notice that a rack can fail due to two reasons: a network switch failure or a power supply failure. If a rack fails, none of the machines on the rack can be accessed. So it would seem as though all the nodes on the rack are down.
  • 21. Cassandra Architecture All the nodes in a cluster play the same role. Each node is independent and at the same time interconnected to other nodes. Each node in a cluster can accept read and write requests, regardless of where the data is actually located in the cluster. When a node goes down, read/write requests can be served from other nodes in the network.
  • 22. Data Replication in Cassandra In Cassandra, one or more of the nodes in a cluster act as replicas for a given piece of data. If it is detected that some of the nodes responded with an out-of-date value, Cassandra will return the most recent value to the client. After returning the most recent value, Cassandra performs a read repair in the background to update the stale (old) values. The RF lies between 1 and n (# of nodes)
  • 23. Gossip protocol Cassandra uses the Gossip Protocol in the background to allow the nodes to communicate with each other and detect any faulty nodes in the cluster. A gossip protocol is a style of computer-to- computer communication protocol inspired by the form of gossip seen in social networks. The term epidemic protocol is sometimes used as a synonym for a gossip protocol, because gossip spreads information in a manner similar to the spread of a virus in a biological community.
  • 24. Partitioner Used for distributing data on the various nodes in a cluster. It also determines the node on which to place the very first copy of the data. It is a hash function
  • 25. Replication Factor The total number of replicas across the cluster is referred to as the replication factor. The RF determines the number of copies of data (replicas) that will be stored across nodes in a cluster. A replication strategy determines the nodes where replicas are placed. Simple Strategy: Network Topology Strategy.
  • 26. Simple Strategy Use only for a single datacenter and one rack. Simple Strategy places the first replica on a node determined by the partitioner. Additional replicas are placed on the next nodes clockwise in the ring. Simple Strategy which is rack unaware and data center unaware policy i.e. without considering topology (rack or datacenter location).
  • 28. Network Topology Strategy Network Topology Strategy is used when you have more than two data centers. As the name indicates, this strategy is aware of the network topology (location of nodes in racks, data centers etc.) and is much intelligent than Simple Strategy. This strategy specifies how many replicas you want in each datacenter. Replicas are set for each data center separately. Rack set of data for each data center place separately in a clockwise direction on different racks of the same data center. This process continues until it reaches the first node.
  • 30. Anti-Entropy Anti-entropy is a process of comparing the data of all replicas and updating each replica to the newest version. Frequent data deletions and node failures are common causes of data inconsistency. Anti-entropy node repairs are important for every Cassandra cluster. Anti-entropy repair is used for routine maintenance and when a cluster needs fixing.
  • 32. Writes path in Cassandra Cassandra processes data at several stages on the write path, starting with the immediate logging of a write and ending in compaction: Logging data in the commit log Writing data to the memtable Flushing data from the memtable Storing data on disk in SSTables Compaction
  • 38. Hint table Location of the node on which the replica is to be placed. Version metadata The actual data When node C recovers and is back to the functional, node A reacts to the hint by forwarding the data to node C.
  • 39. Tunable Consistency (T C) Consistency refers to how up-to-date and synchronized a row of Cassandra data is on all of its replicas. Tunable consistency = Strong C + Eventual C Strong Consistency: Each update propagates to all locations, and it ensures all server should have a copy of the data before it serves to the client. It has impact performance.
  • 40. Eventual Consistency It implies that the client is acknowledged with a success as soon as a part of the cluster acknowledges the write. It is used when application performance matter.
  • 41. Read consistency It means how many replicas must respond before sending out the result to the client applications. Consistency levels : next slide
  • 42. ONE Returns a response from the closest node (replica) holding the data. QUORUM Returns a result from a quorum of servers with the most recent timestamp for the data. LOCAL_QUORU M Returns a result from a quorum of servers with the most recent timestamp for the data in the same data center as the coordinator node. EACH_QUORUM Returns a result from a quorum of servers with the most recent timestamp in all data centers. ALL This provides the highest level of consistency of all levels. It responds to a read request from a client after all the replica nodes have responded.
  • 43. Write consistency It means on how many replicas , write must succeed before sending out an ACK to the client application. Write consistency levels: next slide
  • 47. CQLSH Cassandra provides Cassandra query language shell (cqlsh) that allows users to communicate with Cassandra. Using cqlsh, you can define a schema, insert data, and execute a query.
  • 48. KEYSPACES (Database [Namespace]) It is a container to hold application data like RDBMS. Used to group column families together. Each cluster has one keyspace/application or per node. A keyspace (or key space) in a NoSQL data store is an object that holds together all column families of a design. It is the outermost grouping of the data in the data store.
  • 51. To create keyspace CREATE KEYSPACE KeySpace Name WITH replication = {'class': Strategy name, 'replication_factor' : No.Of replicas};
  • 52. Details about existing Keyspaces Describe keyspaces; Select * from system.schema_keyspaces; This gives more details
  • 53. To use existing keyspace Use keyspace; Use students;
  • 54. To create a column family or table by the name student_info. CREATE TABLE Student_Info ( RollNo int PRIMARY KEY, StudName text, DateofJoining timestamp, LastExamPercent double);
  • 56. CRUD
  • 57. SELECT To view the data from the table student_info. SELECT * FROM student_info; Select * from student_info where rollno in (1,2,3);
  • 58. Index T o create an index on the studname column of the student_info column family use the following statement CREATE INDEX ON student_info(studname); Select * from student_info where StudName='Aviral';
  • 59. Update To update the value held in the StudName column of the student_info column family to David Sheen for the record where the RollNo column has value = 2. Note: An update updates one or more column values for a given row to the Cassandra table. It does not return anything. UPDATE student_info SET StudName = 'Sharad' WHERE RollNo = 3;
  • 60. Delete T o delete the column LastExamPercent from the student_info table for the record where the RollNo = 2. Note:Delete statement removes one or more columns from one or more rows of a Cassandra table or removes entire rows if no columns are specified. DELETE LastExamPercent FROM student_info WHERE RollNo=2;
  • 61. Collections Cassandra provides collection types, used to group and store data together in a column. E.g., grouping such a user's multiple email addresses. The values of items in a collection are limited to 64K. Collections can be used when you need to store the following: Phone numbers of users and Email ids of users.
  • 62. Collections Set T o alter the schema for the table student_info to add a column hobbies. ALTER TABLE student_info ADD hobbies set<text>; UPDATE student_info SET hobbies = hobbies + {'Chess, Table Tennis'} WHERE RollNo=4;
  • 63. Collections List T o alter the schema of the table student_info to add a list column language. ALTER TABLE student_info ADD language list<text>; UPDATE student_info SET language = language + ['Hindi, English'] WHERE RollNo=1;
  • 64. Collections Map A map relates one item to another with a key-value pair. Using the map type, you can store timestamp-related information in user profiles. T o alter the Student_info table to add a map column todo. ALTER TABLE Student_info ADD todo map<timestamp, text>;
  • 65. Example UPDATE student_info SET todo = { '2014-9-24': 'Cassandra Session', '2014-10-2 12:00' : 'MongoDB Session' } where rollno = 1;
  • 66. Time To Live(TTL) Data in a column, other than a counter column, can have an optional expiration period called TTL (time to live). The client request may specify a TTL value for the data. The TTL is specified in seconds.
  • 67. Time To Live(TTL) CREATE TABLE userlogin(userid int primary key, password text); INSERT INTO userlogin (userid, password) VALUES (1,'infy') USING TTL 30; select * from userlogin;
  • 68. Export to CSV copy student_info( RollNo,StudName , DateofJoining, LastExamPercent) TO 'd:student.csv';
  • 69. Import data from a CSV file CREATE TABLE student_data ( id int PRIMARY KEY, fn text, ln text,phone text, city text); COPY student_data (id,fn,ln,phone,city) FROM 'd:cassandraDatastudent.csv';
  • 70. Introduction to MapReduce Programming (Revisit for details) In MapReduce Programming, Jobs (Applications) are split into a set of map tasks and reduce tasks. Then these tasks are executed in a distributed fashion on Hadoop cluster. Each task processes small subset of data that has been assigned to it. This way, Hadoop distributes the load across the cluster. MapReduce job takes a set of files that is stored in HDFS (Hadoop Distributed File System) as input.
  • 71. Mapper The Map task takes care of loading, parsing, transforming, and filtering. A mapper maps the input key-value pairs into a set of intermediate key-value pairs. Maps are individual tasks that have the responsibility of transforming input records into intermediate key-value pairs. Each map task is broken into the following phases RecordReader Mapper/Maps Combiner partitioner
  • 72. RecordReader RecordReader reads the data from inputsplit (record) and converts them into key-value pair for the input to the Mapper class.
  • 74. Maps Map is a user-defined function, which takes a series of key-value pairs and processes each one of them to generate zero or more key-value pairs. Map takes a set of data and converts it into another set of data. Input and output are key-value pairs.
  • 75. Combiner A combiner is a type of local Reducer that groups similar data from the map phase into new set of key-value pair. It is not a part of the main MapReduce algorithm; it is optional (may be part of mapper/map). The main function of a Combiner is to summarize the map output records with the same key.
  • 76. Difference between Combiner and Reducer Output generated by combiner is intermediate data and is passed to the reducer. Output of the reducer is passed to the output file on the disk.
  • 78. Partitioner A partitioner partitions the key-value pairs of intermediate Map-outputs. The Partitioner in MapReduce controls the partitioning of the key of the intermediate mapper output. The partition phase takes place after the Map phase and before the Reduce phase. The number of partitioner is equal to the number of reducers. That means a partitioner will divide the data according to the number of reducers. Therefore, the data passed from a single partitioner is processed by a single Reducer.
  • 79. Partitioner And partitioner is created only when there are multiple reducers.
  • 80. Shuffling and Sorting in Hadoop MapReduce The process by which the intermediate output from mappers is transferred to the reducer is called Shuffling. Intermediated key-value generated by mapper is sorted automatically by key.
  • 82. Reduce The primary task of the Reducer is to reduce a set of intermediate values (the ones that share a common key) to a smaller set of values. The Reducer takes the grouped key-value paired data as input and runs a Reducer function on each one of them. Here, the data can be aggregated, filtered, and combined in a number of ways, and it requires a wide range of processing. The output of the reducer is the final output, which is stored in HDFS
  • 83. RecordWriter (Output format) RecordWriter writes output key-value pairs from the Reducer phase to output files. OutputFormat instances provided by the Hadoop are used to write files in HDFS. Thus the final output of reducer is written on HDFS by OutputFormat instances using RecordWriter.