際際滷

際際滷Share a Scribd company logo
Hadoop
Development
Series
By Sandeep Patil
4/11/2017 1Footer Text
Introduction to Big Data
and Hadoop
4/11/2017Footer Text 2
What is Big Data??
 Large amount of Data .
 Its a popular term used to express exponential growth of
data .
 Big data is difficult to store , collect , maintain , Analyze
and Visualize .
4/11/2017Footer Text 3
Big Data characteristics
 Volume :-
Large amount of data .
 Velocity :-
The rate at which data is getting generated
 Variety :-
Different types of Data
- Structured data ,eg MySql
- Semi-Structured data, eg xml , json
- Unstructured data, eg text , audio, video
4/11/2017Footer Text 4
Big Data sources
 Social Media
 Banks
 Instruments
 Websites
 Stock Market
4/11/2017Footer Text 5
Use cases of Big Data
 Recommendation engines
 Analyzing Call Detail Record(CDR)
 Fraud Detection
 Market Basket Analysis
 Sentimental Analysis
4/11/2017Footer Text 6
Hadoop Introduction
 Open source framework that allows distributed
processing of large datasets on the cluster of commodity
hardware
 Hadoop is a data management tool and uses scale out
storage .
4/11/2017Footer Text 7
Defining Hadoop Cluster
 Size of data is most important factor while defining
hadoop cluster
4/11/2017Footer Text 8
5 Servers with 10 TB storage
capacity each
Total Storage Capacity : - 50TB
Defining Hadoop Cluster
4/11/2017Footer Text 9
7 Servers with 10 TB storage
capacity each
Total storage capacity : 70TB
Hadoop Components
 Hadoop 1 Componets
- HDFS (Hadoop distributed file system)
- MapReduce
 Hadoop 2 Component
- HDFS (Hadoop distributed file system)
- YARN/MRv2
4/11/2017Footer Text 10
HDFS
MR/
YARN
Storage/
Reads-Writes
Processing
Hadoop Daemons
 Hadoop 1 Daemos
Namenode
Datanode
Secondary Namenode
job Tracker
Task Tracker
4/11/2017Footer Text 11
HDFS MapReduce
NameNode
DataNode
Job Tracker
Task Tracker
Hadoop Daemons
 Hadoop 2 Daemos
Namenode
Datanode
Secondary Namenode
Resource Manager
Node Manager
4/11/2017Footer Text 12
HDFS MapReduce
NameNode
DataNode
Resource Manager
Node Manager
Hadoop Master Slave
Architecture
4/11/2017Footer Text 13
HDFS MR/YARN
NameNode DataNode ResourceManager NodeManager
Master Slave Master Slave
Hadoop Cluster
 Assume that we have hadoop cluster with 4 nodes
4/11/2017Footer Text 14
Master
NameNode
ResourceManager
Slave
DataNode
NodeManager
Secondary Name Node
 Secondary Namenode is not a hot backup for Namenode
.
 It just takes hourly backup of Namenode metadata
 It is can be used to Restart a crashed Hadoop Cluster
 Secondary Namenode is an important demon for
Hadoop1 , However in hadoop2 It is not that much
Important .
4/11/2017Footer Text 15
Modes of Operation
 Stand Alone
 Pseudo Distributed
 Fully Distributed
4/11/2017Footer Text 16
Next Video
 Comparison between Hadoop1 and Hadoop2
4/11/2017Footer Text 17
Like and Subscribe
4/11/2017Footer Text 18
sdp117@gmail.com

More Related Content

Introduction to Big Data and Hadoop

  • 2. Introduction to Big Data and Hadoop 4/11/2017Footer Text 2
  • 3. What is Big Data?? Large amount of Data . Its a popular term used to express exponential growth of data . Big data is difficult to store , collect , maintain , Analyze and Visualize . 4/11/2017Footer Text 3
  • 4. Big Data characteristics Volume :- Large amount of data . Velocity :- The rate at which data is getting generated Variety :- Different types of Data - Structured data ,eg MySql - Semi-Structured data, eg xml , json - Unstructured data, eg text , audio, video 4/11/2017Footer Text 4
  • 5. Big Data sources Social Media Banks Instruments Websites Stock Market 4/11/2017Footer Text 5
  • 6. Use cases of Big Data Recommendation engines Analyzing Call Detail Record(CDR) Fraud Detection Market Basket Analysis Sentimental Analysis 4/11/2017Footer Text 6
  • 7. Hadoop Introduction Open source framework that allows distributed processing of large datasets on the cluster of commodity hardware Hadoop is a data management tool and uses scale out storage . 4/11/2017Footer Text 7
  • 8. Defining Hadoop Cluster Size of data is most important factor while defining hadoop cluster 4/11/2017Footer Text 8 5 Servers with 10 TB storage capacity each Total Storage Capacity : - 50TB
  • 9. Defining Hadoop Cluster 4/11/2017Footer Text 9 7 Servers with 10 TB storage capacity each Total storage capacity : 70TB
  • 10. Hadoop Components Hadoop 1 Componets - HDFS (Hadoop distributed file system) - MapReduce Hadoop 2 Component - HDFS (Hadoop distributed file system) - YARN/MRv2 4/11/2017Footer Text 10 HDFS MR/ YARN Storage/ Reads-Writes Processing
  • 11. Hadoop Daemons Hadoop 1 Daemos Namenode Datanode Secondary Namenode job Tracker Task Tracker 4/11/2017Footer Text 11 HDFS MapReduce NameNode DataNode Job Tracker Task Tracker
  • 12. Hadoop Daemons Hadoop 2 Daemos Namenode Datanode Secondary Namenode Resource Manager Node Manager 4/11/2017Footer Text 12 HDFS MapReduce NameNode DataNode Resource Manager Node Manager
  • 13. Hadoop Master Slave Architecture 4/11/2017Footer Text 13 HDFS MR/YARN NameNode DataNode ResourceManager NodeManager Master Slave Master Slave
  • 14. Hadoop Cluster Assume that we have hadoop cluster with 4 nodes 4/11/2017Footer Text 14 Master NameNode ResourceManager Slave DataNode NodeManager
  • 15. Secondary Name Node Secondary Namenode is not a hot backup for Namenode . It just takes hourly backup of Namenode metadata It is can be used to Restart a crashed Hadoop Cluster Secondary Namenode is an important demon for Hadoop1 , However in hadoop2 It is not that much Important . 4/11/2017Footer Text 15
  • 16. Modes of Operation Stand Alone Pseudo Distributed Fully Distributed 4/11/2017Footer Text 16
  • 17. Next Video Comparison between Hadoop1 and Hadoop2 4/11/2017Footer Text 17
  • 18. Like and Subscribe 4/11/2017Footer Text 18 sdp117@gmail.com