際際滷

際際滷Share a Scribd company logo
ITS ALL ABOUT DATA & ANALYTICS
Session - 3
Topics covered in this Presentation
 Analytics Landscape
 BigData
 Hadoop
 BigData Landscape
Analytics Landscape
Analytics 3
Big Data
Hadoop Architecture
Hadoop Architecture
Hadoop Architecture
 Hadoop Common- A set of common libraries and utilities used by other
Hadoop modules.
 HDFS-The default storage layer for Hadoop.
 MapReduce- Executes a wide range of analytic functions by analysing
datasets in parallel before reducing the results.The Map job distributes
a query to different nodes, and the Reduce gathers the results and
resolves them into a single value.
 YARN- Present in version 2.0 onwards,YARN is the cluster management
layer of Hadoop. Prior to 2.0, MapReduce was responsible for cluster
management as well as processing.The inclusion ofYARN means you can
run multiple applications in Hadoop (so youre no longer limited to
MapReduce), which all share common cluster management.
Hadoop Architecture
 Spark- Used on top of HDFS, Spark promises speeds up to 100 times faster than the two-
step MapReduce function in certain applications. Allows data to loaded in-memory and
queried repeatedly, making it particularly apt for machine learning algorithms
 Hive- Originally developed by Facebook, Hive is a data warehouse infrastructure built on
top of Hadoop. Hive provides a simple, SQL-like language called HiveQL, whilst
maintaining full support for MapReduce. This means SQL programmers with little former
experience with Hadoop can use the system easier, and provides better integration with
certain analytics packages like Tableau. Hive also provides indexes, making querying
faster.
 HBase- Is a NoSQL columnar database which is designed to run on top of HDFS. It is
modelled after Googles BigTable and written in Java. It was designed to provide BigTable-
like capabilities to Hadoop, such as the columnar data storage model and storage for
sparse data.
 Flume- Flume collects (typically log) data from agents which it then aggregates and
moves into Hadoop. In essence, Flume is what takes the data from the source (say a
server or mobile device) and delivers it to Hadoop.
 Mahout- Mahout is a machine learning library. It collects key algorithms for clustering,
classification and collaborative filtering and implements them on top of distributed data
systems, like MapReduce. Mahout primarily set out to collect algorithms for
implementation on the MapReduce model, but has begun implementing on other
systems which were more efficient for data mining, such as Spark.
 Sqoop- Sqoop is a tool which aids in transitioning data from other database systems (such
as relational databases) into Hadoop.
Map Reduce Example
Analytics 3
Analytics 3
Analytics 3
Analytics 3
Analytics 3
Analytics 3
In God,WeTrust... All Others must bring the "Data"
Srikanth Ayithy
about.me/srikanthayithy

More Related Content

What's hot (19)

PPTX
Big Data and Hadoop - An Introduction
Nagarjuna Kanamarlapudi
PPTX
Hadoop And Their Ecosystem
sunera pathan
PPTX
Introduction to hadoop
Chad Richeson
PPSX
Hadoop Ecosystem
Patrick Nicolas
PPTX
PPT on Hadoop
Shubham Parmar
PPTX
Hadoop introduction
Chirag Ahuja
PPTX
Getting started big data
Kibrom Gebrehiwot
PPTX
Apache hadoop introduction and architecture
Harikrishnan K
PPTX
Big data
Alisha Roy
PPTX
Apache Hadoop Big Data Technology
Jay Nagar
PDF
Hadoop
siva shankari
PPT
Hadoop
chandinisanz
PPTX
Apache hadoop
Sai Koppuravuri
PDF
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
maharajothip1
PPTX
Hadoop vs Apache Spark
ALTEN Calsoft Labs
PDF
WHAT IS HADOOP AND ITS COMPONENTS?
nakshatraL
PPTX
Cloud Services for Big Data Analytics
Geoffrey Fox
PPTX
Hadoop white papers
Muthu Natarajan
PDF
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
MaharajothiP
Big Data and Hadoop - An Introduction
Nagarjuna Kanamarlapudi
Hadoop And Their Ecosystem
sunera pathan
Introduction to hadoop
Chad Richeson
Hadoop Ecosystem
Patrick Nicolas
PPT on Hadoop
Shubham Parmar
Hadoop introduction
Chirag Ahuja
Getting started big data
Kibrom Gebrehiwot
Apache hadoop introduction and architecture
Harikrishnan K
Big data
Alisha Roy
Apache Hadoop Big Data Technology
Jay Nagar
Hadoop
siva shankari
Hadoop
chandinisanz
Apache hadoop
Sai Koppuravuri
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
maharajothip1
Hadoop vs Apache Spark
ALTEN Calsoft Labs
WHAT IS HADOOP AND ITS COMPONENTS?
nakshatraL
Cloud Services for Big Data Analytics
Geoffrey Fox
Hadoop white papers
Muthu Natarajan
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
MaharajothiP

Similar to Analytics 3 (20)

PPTX
Big Data Analytics Module-4 as per vtu .pptx
shilpabl1803
PDF
Hadoop
Veera Sundari
PDF
IJARCCE_49
Mr.Sameer Kumar Das
PDF
Big Data Tools MapReduce,Hive and Pig.pdf
Sharmila Chidaravalli
PDF
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
IJCSIS Research Publications
PDF
BIGDATA ppts
Krisshhna Daasaarii
PDF
Introduction To Hadoop Ecosystem
InSemble
PPTX
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Dataconomy Media
PPTX
Big data ppt
Thirunavukkarasu Ps
PPT
Big Data & Hadoop
Krishna Sujeer
PDF
Hadoop - Architectural road map for Hadoop Ecosystem
nallagangus
PPTX
How Big Data ,Cloud Computing ,Data Science can help business
Ajay Ohri
PPTX
NoSQL for the SQL Server Pro
Lynn Langit
PPTX
Big data a brief overview
Dorai Thodla
PDF
Comparison among rdbms, hadoop and spark
AgnihotriGhosh2
PDF
Hadoop Overview
Gregg Barrett
PPTX
Bw tech hadoop
Mindgrub Technologies
PPTX
BW Tech Meetup: Hadoop and The rise of Big Data
Mindgrub Technologies
PPTX
Microsoft's Big Play for Big Data
Andrew Brust
PPT
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Andrew Brust
Big Data Analytics Module-4 as per vtu .pptx
shilpabl1803
Hadoop
Veera Sundari
IJARCCE_49
Mr.Sameer Kumar Das
Big Data Tools MapReduce,Hive and Pig.pdf
Sharmila Chidaravalli
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
IJCSIS Research Publications
BIGDATA ppts
Krisshhna Daasaarii
Introduction To Hadoop Ecosystem
InSemble
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Dataconomy Media
Big data ppt
Thirunavukkarasu Ps
Big Data & Hadoop
Krishna Sujeer
Hadoop - Architectural road map for Hadoop Ecosystem
nallagangus
How Big Data ,Cloud Computing ,Data Science can help business
Ajay Ohri
NoSQL for the SQL Server Pro
Lynn Langit
Big data a brief overview
Dorai Thodla
Comparison among rdbms, hadoop and spark
AgnihotriGhosh2
Hadoop Overview
Gregg Barrett
Bw tech hadoop
Mindgrub Technologies
BW Tech Meetup: Hadoop and The rise of Big Data
Mindgrub Technologies
Microsoft's Big Play for Big Data
Andrew Brust
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Andrew Brust
Ad

Recently uploaded (20)

PDF
Informatics Market Insights AI Workforce.pdf
karizaroxx
DOCX
Cat_Latin_America_in_World_Politics[1].docx
sales480687
PPTX
Daily, Weekly, Monthly Report MTC March 2025.pptx
PanjiDewaPamungkas1
PPTX
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
PDF
Digital-Transformation-for-Federal-Agencies.pdf.pdf
One Federal Solution
DOCX
Artigo - Playing to Win.planejamento docx
KellyXavier15
PDF
11_L2_Defects_and_Trouble_Shooting_2014[1].pdf
gun3awan88
PPTX
Data Analytics using sparkabcdefghi.pptx
KarkuzhaliS3
PDF
CT-2-Ancient ancient accept-Criticism.pdf
DepartmentofEnglishC1
PPTX
727325165-Unit-1-Data-Analytics-PPT-1.pptx
revathi148366
PDF
Microsoft Power BI - Advanced Certificate for Business Intelligence using Pow...
Prasenjit Debnath
PPTX
Monitoring Improvement ( Pomalaa Branch).pptx
fajarkunee
PDF
Data science AI/Ml basics to learn .pdf
deokhushi04
PPTX
Indigo dyeing Presentation (2).pptx as dye
shreeroop1335
PPTX
Parental Leave Policies & Research Bulgaria
仍亳舒 亳仄亳仂于舒
PPTX
Mynd company all details what they are doing a
AniketKadam40952
PPTX
Artificial intelligence Presentation1.pptx
SaritaMahajan5
PDF
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Mahmoud Shoush
DOCX
The Influence off Flexible Work Policies
sales480687
PPTX
RESEARCH-FINAL-GROUP-3, about the final .pptx
gwapokoha1
Informatics Market Insights AI Workforce.pdf
karizaroxx
Cat_Latin_America_in_World_Politics[1].docx
sales480687
Daily, Weekly, Monthly Report MTC March 2025.pptx
PanjiDewaPamungkas1
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
Digital-Transformation-for-Federal-Agencies.pdf.pdf
One Federal Solution
Artigo - Playing to Win.planejamento docx
KellyXavier15
11_L2_Defects_and_Trouble_Shooting_2014[1].pdf
gun3awan88
Data Analytics using sparkabcdefghi.pptx
KarkuzhaliS3
CT-2-Ancient ancient accept-Criticism.pdf
DepartmentofEnglishC1
727325165-Unit-1-Data-Analytics-PPT-1.pptx
revathi148366
Microsoft Power BI - Advanced Certificate for Business Intelligence using Pow...
Prasenjit Debnath
Monitoring Improvement ( Pomalaa Branch).pptx
fajarkunee
Data science AI/Ml basics to learn .pdf
deokhushi04
Indigo dyeing Presentation (2).pptx as dye
shreeroop1335
Parental Leave Policies & Research Bulgaria
仍亳舒 亳仄亳仂于舒
Mynd company all details what they are doing a
AniketKadam40952
Artificial intelligence Presentation1.pptx
SaritaMahajan5
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Mahmoud Shoush
The Influence off Flexible Work Policies
sales480687
RESEARCH-FINAL-GROUP-3, about the final .pptx
gwapokoha1
Ad

Analytics 3

  • 1. ITS ALL ABOUT DATA & ANALYTICS Session - 3
  • 2. Topics covered in this Presentation Analytics Landscape BigData Hadoop BigData Landscape
  • 8. Hadoop Architecture Hadoop Common- A set of common libraries and utilities used by other Hadoop modules. HDFS-The default storage layer for Hadoop. MapReduce- Executes a wide range of analytic functions by analysing datasets in parallel before reducing the results.The Map job distributes a query to different nodes, and the Reduce gathers the results and resolves them into a single value. YARN- Present in version 2.0 onwards,YARN is the cluster management layer of Hadoop. Prior to 2.0, MapReduce was responsible for cluster management as well as processing.The inclusion ofYARN means you can run multiple applications in Hadoop (so youre no longer limited to MapReduce), which all share common cluster management.
  • 9. Hadoop Architecture Spark- Used on top of HDFS, Spark promises speeds up to 100 times faster than the two- step MapReduce function in certain applications. Allows data to loaded in-memory and queried repeatedly, making it particularly apt for machine learning algorithms Hive- Originally developed by Facebook, Hive is a data warehouse infrastructure built on top of Hadoop. Hive provides a simple, SQL-like language called HiveQL, whilst maintaining full support for MapReduce. This means SQL programmers with little former experience with Hadoop can use the system easier, and provides better integration with certain analytics packages like Tableau. Hive also provides indexes, making querying faster. HBase- Is a NoSQL columnar database which is designed to run on top of HDFS. It is modelled after Googles BigTable and written in Java. It was designed to provide BigTable- like capabilities to Hadoop, such as the columnar data storage model and storage for sparse data. Flume- Flume collects (typically log) data from agents which it then aggregates and moves into Hadoop. In essence, Flume is what takes the data from the source (say a server or mobile device) and delivers it to Hadoop. Mahout- Mahout is a machine learning library. It collects key algorithms for clustering, classification and collaborative filtering and implements them on top of distributed data systems, like MapReduce. Mahout primarily set out to collect algorithms for implementation on the MapReduce model, but has begun implementing on other systems which were more efficient for data mining, such as Spark. Sqoop- Sqoop is a tool which aids in transitioning data from other database systems (such as relational databases) into Hadoop.
  • 17. In God,WeTrust... All Others must bring the "Data" Srikanth Ayithy about.me/srikanthayithy