際際滷

際際滷Share a Scribd company logo
HDInsight	
 油Essentials	
 油ISBN	
 油:	
 油1849695369	
 油	
 油/	
 油ISBN	
 油13	
 油:	
 油9781849695367	
 油
Rajesh	
 油Nadipalli	
 油
05/01/2014	
 油
Goals	
 油of	
 油this	
 油Book	
 油
≒Focus	
 油on	
 油Microso's	
 油new	
 油Hadoop	
 油
distribu=on	
 油
≒Serve	
 油as	
 油Quick	
 油Reference	
 油
≒Provide	
 油an	
 油Overview	
 油of	
 油Hadoop	
 油
≒Address	
 油both	
 油cloud	
 油and	
 油on-足premise	
 油setup	
 油
for	
 油HDInsight	
 油
≒Highlight	
 油HDInsight	
 油di鍖eren:ator	
 油	
 油
≒Provide	
 油Prac=cal	
 油&	
 油Real	
 油world	
 油examples	
 油
Book	
 油Table	
 油of	
 油Contents	
 油
≒ Chapter	
 油1:	
 油	
 油HDInsight	
 油in	
 油a	
 油Heartbeat	
 油
≒ Chapter	
 油2:	
 油	
 油Deployment	
 油HDInsight	
 油on	
 油premise	
 油
≒ Chapter	
 油3:	
 油	
 油HDInsight	
 油Azure	
 油cloud	
 油service	
 油
≒ Chapter	
 油4:	
 油	
 油Administer	
 油your	
 油cluster	
 油
≒ Chapter	
 油5:	
 油	
 油Ingest	
 油data	
 油to	
 油your	
 油cluster	
 油
≒ Chapter	
 油6:	
 油	
 油Transform	
 油data	
 油in	
 油your	
 油cluster	
 油
≒ Chapter	
 油7:	
 油	
 油Analyze	
 油&	
 油Report	
 油data	
 油from	
 油cluster	
 油
≒ Chapter	
 油8:	
 油	
 油Project	
 油Planning	
 油&	
 油	
 油
	
 油	
 油	
 油	
 油	
 油	
 油	
 油	
 油	
 油	
 油	
 油	
 油	
 油	
 油	
 油	
 油	
 油	
 油	
 油	
 油	
 油Architectural	
 油Considera=ons	
 油
CHAPTER	
 油1	
 油HIGHLIGHTS:	
 油	
 油
HDINSIGHT	
 油IN	
 油A	
 油HEARTBEAT	
 油
Big	
 油Data	
 油Problem	
 油Characteristics	
 油	
 油
Hadoop	
 油Overview	
 油
Self Healing
Distributed Storage
Fault Tolerant
Distributed
Computing
+
Abstraction for
Parallel Processing
CORE HADOOP COMPONENTS ≒ HDFS:	
 油Distributed	
 油
Storage	
 油	
 油replicated,	
 油
self-足healing	
 油and	
 油
scalable	
 油
	
 油
≒ MapReduce:	
 油	
 油Parallel	
 油
Processing,	
 油process	
 油
local	
 油data	
 油for	
 油e鍖ciency	
 油	
 油
	
 油
NameNode
JobTracker
TaskTracker	
 油
	
 油
TaskTracker	
 油
	
 油
TaskTracker	
 油
	
 油MapReduce	
 油
Layer	
 油
Distributed	
 油	
 油
File	
 油System	
 油
Layer	
 油 Secondary
NameNode
Master	
 油Node	
 油 Slaves	
 油Nodes	
 油
DataNode	
 油
	
 油
DataNode	
 油
	
 油
DataNode	
 油
	
 油
Hadoop	
 油Nodes	
 油Layout	
 油
Data	
 油Sources	
 油
	
 油
	
 油
	
 油
RDBMS	
 油	
 油
Databases	
 油
Audio,	
 油	
 油
Images	
 油 Log	
 油Files	
 油
Sensors,	
 油	
 油
RFID	
 油
Social	
 油	
 油
Media,	
 油Feeds	
 油
	
 油
Hadoop	
 油Data	
 油Store	
 油
	
 油
	
 油
	
 油
	
 油
HDFS	
 油
Hbase	
 油	
 油(NOSQL	
 油DB)	
 油
	
 油
Data	
 油Processing	
 油
	
 油
	
 油
	
 油
Mapreduce	
 油
	
 油
Data	
 油Access	
 油
	
 油
	
 油
	
 油
Hive	
 油 Pig	
 油
Mahout	
 油	
 油
Machine	
 油Learning	
 油
Flume,	
 油Sqoop	
 油
Excel	
 油
Business	
 油	
 油
Data	
 油Feeds	
 油
Zookeeper	
 油(Distributed	
 油Process	
 油Management)	
 油
Hcatalog	
 油(Metadata	
 油on	
 油Pig,	
 油Hive,	
 油MapReduce	
 油)	
 油
Oozie	
 油	
 油
Work鍖ow,	
 油Scheduler	
 油
Infrastructure	
 油,	
 油Opera:ons	
 油
(Monitoring,	
 油Con鍖gura<on)	
 油
Hadoop	
 油Eco	
 油System	
 油
Collect & Import
to HDFS
Process
(MapReduce)
Analyze
(BI Tools)
Report & Publish
End	
 油to	
 油End	
 油Solution	
 油on	
 油Hadoop	
 油
Popular	
 油Hadoop	
 油Distributions	
 油
≒ Amazon	
 油Elas=c	
 油MapReduce	
 油(cloud,	
 油hbp://aws.amazon.com/
elas=cmapreduce/)	
 油
	
 油
≒ Cloudera	
 油(
hbp://www.cloudera.com/content/cloudera/en/home.html)	
 油
	
 油
≒ EMC	
 油PivitolHD	
 油(hbp://gopivotal.com/)	
 油
	
 油
≒ Hortonworks	
 油HDP	
 油(hbp://hortonworks.com/)	
 油
	
 油
≒ MapR	
 油(hbp://mapr.com/)	
 油
	
 油
≒ Microsod	
 油HDInsight	
 油(cloud,	
 油hbp://www.windowsazure.com/)	
 油
HDInsight	
 油Differenciator	
 油
≒ Enterprise-足ready	
 油Hadoop	
 油backed	
 油by	
 油Microsod	
 油
	
 油
≒ Analy:cs	
 油using	
 油Excel	
 油
≒ Integra=on	
 油with	
 油Ac=ve	
 油Directory.	
 油
	
 油	
 油
≒ Integra=on	
 油with	
 油.NET	
 油and	
 油Javascript	
 油
	
 油
≒ Connectors	
 油to	
 油RDBMS	
 油
	
 油
≒ Scale	
 油using	
 油cloud	
 油o鍖ering:	
 油	
 油Azure	
 油HDInsight	
 油service	
 油enables	
 油customers	
 油
to	
 油scale	
 油quickly	
 油and	
 油has	
 油seamless	
 油interface	
 油between	
 油HDFS	
 油and	
 油Azure	
 油
Storage	
 油Vault	
 油
	
 油
≒ JavaScript	
 油Console	
 油
WordCount	
 油in	
 油HDInsight	
 油
CHAPTER	
 油2	
 油HIGHLIGHTS:	
 油	
 油
HDINSIGHT	
 油INSTALL	
 油ON	
 油PREMISE	
 油
Apache	
 油Hadoop	
 油
	
 油
	
 油
	
 油
≒ Open	
 油Source	
 油Sodware	
 油
≒ Community	
 油Development	
 油
	
 油	
 油
Hortonworks	
 油Data	
 油PlaSorm	
 油
	
 油
	
 油
	
 油
≒ Enterprise	
 油Hadoop	
 油Plagorm	
 油(HDP)	
 油
≒ Leaders	
 油in	
 油Hadoop	
 油
≒ Code	
 油commibers	
 油to	
 油Hadoop	
 油
Microso'	
 油HDInsight	
 油
	
 油
	
 油
	
 油
≒ Built	
 油on	
 油top	
 油of	
 油HDP	
 油
≒ Integra=on	
 油with	
 油ASV,	
 油Excel,	
 油Powerview,	
 油
SQLServer,	
 油Ac=ve	
 油Directory	
 油
	
 油	
 油
HDInsight	
 油Distribution	
 油
Physical	
 油Install	
 油Options	
 油
NN	
 油	
 油	
 油	
 油	
 油SNN	
 油	
 油	
 油	
 油	
 油	
 油JT	
 油
DN	
 油	
 油/	
 油TT	
 油
Single	
 油node	
 油for	
 油development/test	
 油	
 油	
 油
Mul=	
 油node	
 油for	
 油produc=on	
 油	
 油	
 油
Multi	
 油Node	
 油Install	
 油Steps	
 油
≒ Pre-足requisites	
 油
≒ Networking	
 油Setup	
 油
≒ Remote	
 油Scrip=ng	
 油
≒ Firewall	
 油Setup	
 油
≒ Sodware	
 油Install	
 油(each	
 油node)	
 油
≒ Hadoop	
 油Con鍖gura=on	
 油
≒ Veri鍖ca=on	
 油
CHAPTER	
 油3	
 油HIGHLIGHTS:	
 油	
 油
HDINSIGHT	
 油AZURE	
 油SERVICE	
 油
Azure	
 油Cloud	
 油Service	
 油
Create	
 油Storage	
 油
Create	
 油HDInsight	
 油
cluster	
 油
CHAPTER	
 油4	
 油HIGHLIGHTS:	
 油	
 油
ADMINISTER	
 油YOUR	
 油CLUSTER	
 油
HDInsight	
 油Cluster	
 油Management	
 油
HDInsight	
 油Dashboard	
 油
HDInsight	
 油Dashboard	
 油
NameNode	
 油Status	
 油
Jobtracker	
 油Status	
 油
CHAPTER	
 油5	
 油HIGHLIGHTS:	
 油	
 油
INGEST	
 油DATA	
 油INTO	
 油YOUR	
 油CLUSTER	
 油
Loading	
 油Data	
 油into	
 油your	
 油Cluster	
 油
You	
 油have	
 油following	
 油op=ons	
 油
	
 油
≒ Loading	
 油data	
 油using	
 油Hadoop	
 油commands	
 油
≒ Loading	
 油data	
 油using	
 油Azure	
 油Storage	
 油Vault	
 油
≒ Loading	
 油data	
 油using	
 油Interac:ve	
 油JavaScript	
 油	
 油
≒ Shipping	
 油data	
 油to	
 油your	
 油Cluster	
 油
≒ Loading	
 油data	
 油from	
 油RDBMS	
 油via	
 油Sqoop	
 油
Loading	
 油via	
 油Azure	
 油Storage	
 油Explorer	
 油
CHAPTER	
 油6	
 油HIGHLIGHTS:	
 油	
 油
TRANSFORM	
 油YOUR	
 油DATA	
 油
Transforming	
 油Data	
 油
You	
 油have	
 油following	
 油op=ons	
 油
	
 油
≒ MapReduce	
 油
≒ Hive	
 油
≒ Pig	
 油
≒ Others	
 油
Processing	
 油Data	
 油in	
 油Cluster	
 油
Map for
Jan2012
Map for
Feb2012
Map for
Apr2013
	
 油
One Reducer
HDFS	
 油
Hive	
 油
JDBC/OBDC
Metastore
Thrift Server
Command LineWeb GUI
Driver
(Parser, Planner, Executor)
MapReduce	
 油
Hive	
 油
Raw	
 油Data	
 油in	
 油HDFS	
 油
≒ Distributed	
 油
Storage	
 油
≒ Reliable	
 油
Data	
 油Processing	
 油via	
 油Pig	
 油
≒ Pipelines	
 油
≒ Itera=ve	
 油Processing	
 油
≒ Research	
 油
Data	
 油
Warehouse	
 油
HDFS	
 油
Data	
 油Warehouse	
 油via	
 油Hive	
 油
≒ BI	
 油Tools	
 油
≒ Analysis	
 油
Hive	
 油or	
 油Pig?	
 油
CHAPTER	
 油7	
 油HIGHLIGHTS:	
 油	
 油
ANALYZE	
 油&	
 油REPORT	
 油
Analyze	
 油using	
 油Excel	
 油
Analyze	
 油using	
 油Excel	
 油
CHAPTER	
 油8:	
 油	
 油
PROJECT	
 油PLANNING	
 油&	
 油ARCHITECTURAL	
 油
CONSIDERATIONS	
 油
Execu:ve	
 油&	
 油
Stakeholder	
 油	
 油
Buy-足in	
 油
Discovery	
 油&	
 油
Analysis	
 油
Design	
 油
Implementa:on	
 油User	
 油Acceptance	
 油
Produc:on	
 油
Opera:ons	
 油
Feedback,	
 油New	
 油
Requirements	
 油
Ad

Recommended

The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
Gert Drapers
Reimagining Devon Energys Data Estate with a Unified Approach to Integration...
Reimagining Devon Energys Data Estate with a Unified Approach to Integration...
Databricks
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a Glance
Neev Technologies
20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup
Wei Ting Chen
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Rohit Kulkarni
The hadoop 2.0 ecosystem and yarn
The hadoop 2.0 ecosystem and yarn
Michael Joseph
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
Shivaji Dutta
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop
MapR Technologies
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
Wei Ting Chen
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS
Bouquet
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
DataWorks Summit
Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014
spinningmatt
HDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows Azure
Lynn Langit
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
John Sing
Hadoop Ecosystem
Hadoop Ecosystem
Lior Sidi
Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015
Joydeep Sen Sarma
Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015
Tugdual Grall
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
DataWorks Summit/Hadoop Summit
Getting started big data
Getting started big data
Kibrom Gebrehiwot
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
DataWorks Summit
Overview of stinger interactive query for hive
Overview of stinger interactive query for hive
David Kaiser
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Cloudera, Inc.
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for Hadoop
Gwen (Chen) Shapira
Is Cloud a right Companion for Hadoop
Is Cloud a right Companion for Hadoop
DataWorks Summit
Hadoop distributions - ecosystem
Hadoop distributions - ecosystem
Jakub Stransky
Hadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluation
mattlieber
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Caserta
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
liuknag
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
HARMAN Services
St辿phane Fr辿chette - Samedi SQL - Introduction to HDInsight
St辿phane Fr辿chette - Samedi SQL - Introduction to HDInsight
MSDEVMTL

More Related Content

What's hot (20)

20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
Wei Ting Chen
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS
Bouquet
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
DataWorks Summit
Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014
spinningmatt
HDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows Azure
Lynn Langit
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
John Sing
Hadoop Ecosystem
Hadoop Ecosystem
Lior Sidi
Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015
Joydeep Sen Sarma
Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015
Tugdual Grall
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
DataWorks Summit/Hadoop Summit
Getting started big data
Getting started big data
Kibrom Gebrehiwot
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
DataWorks Summit
Overview of stinger interactive query for hive
Overview of stinger interactive query for hive
David Kaiser
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Cloudera, Inc.
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for Hadoop
Gwen (Chen) Shapira
Is Cloud a right Companion for Hadoop
Is Cloud a right Companion for Hadoop
DataWorks Summit
Hadoop distributions - ecosystem
Hadoop distributions - ecosystem
Jakub Stransky
Hadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluation
mattlieber
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Caserta
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
liuknag
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
Wei Ting Chen
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS
Bouquet
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
DataWorks Summit
Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014
spinningmatt
HDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows Azure
Lynn Langit
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
John Sing
Hadoop Ecosystem
Hadoop Ecosystem
Lior Sidi
Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015
Joydeep Sen Sarma
Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015
Tugdual Grall
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
DataWorks Summit/Hadoop Summit
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
DataWorks Summit
Overview of stinger interactive query for hive
Overview of stinger interactive query for hive
David Kaiser
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Cloudera, Inc.
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for Hadoop
Gwen (Chen) Shapira
Is Cloud a right Companion for Hadoop
Is Cloud a right Companion for Hadoop
DataWorks Summit
Hadoop distributions - ecosystem
Hadoop distributions - ecosystem
Jakub Stransky
Hadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluation
mattlieber
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Caserta
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
liuknag

Similar to Hd insight essentials quick view (20)

Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
HARMAN Services
St辿phane Fr辿chette - Samedi SQL - Introduction to HDInsight
St辿phane Fr辿chette - Samedi SQL - Introduction to HDInsight
MSDEVMTL
Introduction to Azure HDInsight
Introduction to Azure HDInsight
St辿phane Fr辿chette
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsight
DataWorks Summit
Big Data on Azure Tutorial
Big Data on Azure Tutorial
rustd
Big Data in the Microsoft Platform
Big Data in the Microsoft Platform
Jesus Rodriguez
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
Andrew Brust
Build Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsight
DataWorks Summit/Hadoop Summit
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
nallagangus
Introduction to Microsofts Hadoop solution (HDInsight)
Introduction to Microsofts Hadoop solution (HDInsight)
James Serra
HDInsight Informative articles
HDInsight Informative articles
Karan Gulati
Hd insight overview
Hd insight overview
vhrocca
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BI
Prasad Prabhu (PP)
HDInsight Interactive Query
HDInsight Interactive Query
Ashish Thapliyal
Using Machine Learning with HDInsight
Using Machine Learning with HDInsight
Eng Teong Cheah
Hortonworks Setup & Configuration on Azure
Hortonworks Setup & Configuration on Azure
Anita Luthra
Big Data: Its all about the Use Cases
Big Data: Its all about the Use Cases
James Serra
Big Data on azure
Big Data on azure
David Giard
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
Mark Kromer
Big Data Solutions in Azure - David Giard
Big Data Solutions in Azure - David Giard
ITCamp
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
HARMAN Services
St辿phane Fr辿chette - Samedi SQL - Introduction to HDInsight
St辿phane Fr辿chette - Samedi SQL - Introduction to HDInsight
MSDEVMTL
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsight
DataWorks Summit
Big Data on Azure Tutorial
Big Data on Azure Tutorial
rustd
Big Data in the Microsoft Platform
Big Data in the Microsoft Platform
Jesus Rodriguez
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
Andrew Brust
Build Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsight
DataWorks Summit/Hadoop Summit
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
nallagangus
Introduction to Microsofts Hadoop solution (HDInsight)
Introduction to Microsofts Hadoop solution (HDInsight)
James Serra
HDInsight Informative articles
HDInsight Informative articles
Karan Gulati
Hd insight overview
Hd insight overview
vhrocca
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BI
Prasad Prabhu (PP)
HDInsight Interactive Query
HDInsight Interactive Query
Ashish Thapliyal
Using Machine Learning with HDInsight
Using Machine Learning with HDInsight
Eng Teong Cheah
Hortonworks Setup & Configuration on Azure
Hortonworks Setup & Configuration on Azure
Anita Luthra
Big Data: Its all about the Use Cases
Big Data: Its all about the Use Cases
James Serra
Big Data on azure
Big Data on azure
David Giard
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
Mark Kromer
Big Data Solutions in Azure - David Giard
Big Data Solutions in Azure - David Giard
ITCamp
Ad

Hd insight essentials quick view

  • 1. HDInsight 油Essentials 油ISBN 油: 油1849695369 油 油/ 油ISBN 油13 油: 油9781849695367 油 Rajesh 油Nadipalli 油 05/01/2014 油
  • 2. Goals 油of 油this 油Book 油 ≒Focus 油on 油Microso's 油new 油Hadoop 油 distribu=on 油 ≒Serve 油as 油Quick 油Reference 油 ≒Provide 油an 油Overview 油of 油Hadoop 油 ≒Address 油both 油cloud 油and 油on-足premise 油setup 油 for 油HDInsight 油 ≒Highlight 油HDInsight 油di鍖eren:ator 油 油 ≒Provide 油Prac=cal 油& 油Real 油world 油examples 油
  • 3. Book 油Table 油of 油Contents 油 ≒ Chapter 油1: 油 油HDInsight 油in 油a 油Heartbeat 油 ≒ Chapter 油2: 油 油Deployment 油HDInsight 油on 油premise 油 ≒ Chapter 油3: 油 油HDInsight 油Azure 油cloud 油service 油 ≒ Chapter 油4: 油 油Administer 油your 油cluster 油 ≒ Chapter 油5: 油 油Ingest 油data 油to 油your 油cluster 油 ≒ Chapter 油6: 油 油Transform 油data 油in 油your 油cluster 油 ≒ Chapter 油7: 油 油Analyze 油& 油Report 油data 油from 油cluster 油 ≒ Chapter 油8: 油 油Project 油Planning 油& 油 油 油 油 油 油 油 油 油 油 油 油 油 油 油 油 油 油 油 油 油 油 油Architectural 油Considera=ons 油
  • 4. CHAPTER 油1 油HIGHLIGHTS: 油 油 HDINSIGHT 油IN 油A 油HEARTBEAT 油
  • 5. Big 油Data 油Problem 油Characteristics 油 油
  • 6. Hadoop 油Overview 油 Self Healing Distributed Storage Fault Tolerant Distributed Computing + Abstraction for Parallel Processing CORE HADOOP COMPONENTS ≒ HDFS: 油Distributed 油 Storage 油 油replicated, 油 self-足healing 油and 油 scalable 油 油 ≒ MapReduce: 油 油Parallel 油 Processing, 油process 油 local 油data 油for 油e鍖ciency 油 油 油
  • 7. NameNode JobTracker TaskTracker 油 油 TaskTracker 油 油 TaskTracker 油 油MapReduce 油 Layer 油 Distributed 油 油 File 油System 油 Layer 油 Secondary NameNode Master 油Node 油 Slaves 油Nodes 油 DataNode 油 油 DataNode 油 油 DataNode 油 油 Hadoop 油Nodes 油Layout 油
  • 8. Data 油Sources 油 油 油 油 RDBMS 油 油 Databases 油 Audio, 油 油 Images 油 Log 油Files 油 Sensors, 油 油 RFID 油 Social 油 油 Media, 油Feeds 油 油 Hadoop 油Data 油Store 油 油 油 油 油 HDFS 油 Hbase 油 油(NOSQL 油DB) 油 油 Data 油Processing 油 油 油 油 Mapreduce 油 油 Data 油Access 油 油 油 油 Hive 油 Pig 油 Mahout 油 油 Machine 油Learning 油 Flume, 油Sqoop 油 Excel 油 Business 油 油 Data 油Feeds 油 Zookeeper 油(Distributed 油Process 油Management) 油 Hcatalog 油(Metadata 油on 油Pig, 油Hive, 油MapReduce 油) 油 Oozie 油 油 Work鍖ow, 油Scheduler 油 Infrastructure 油, 油Opera:ons 油 (Monitoring, 油Con鍖gura<on) 油 Hadoop 油Eco 油System 油
  • 9. Collect & Import to HDFS Process (MapReduce) Analyze (BI Tools) Report & Publish End 油to 油End 油Solution 油on 油Hadoop 油
  • 10. Popular 油Hadoop 油Distributions 油 ≒ Amazon 油Elas=c 油MapReduce 油(cloud, 油hbp://aws.amazon.com/ elas=cmapreduce/) 油 油 ≒ Cloudera 油( hbp://www.cloudera.com/content/cloudera/en/home.html) 油 油 ≒ EMC 油PivitolHD 油(hbp://gopivotal.com/) 油 油 ≒ Hortonworks 油HDP 油(hbp://hortonworks.com/) 油 油 ≒ MapR 油(hbp://mapr.com/) 油 油 ≒ Microsod 油HDInsight 油(cloud, 油hbp://www.windowsazure.com/) 油
  • 11. HDInsight 油Differenciator 油 ≒ Enterprise-足ready 油Hadoop 油backed 油by 油Microsod 油 油 ≒ Analy:cs 油using 油Excel 油 ≒ Integra=on 油with 油Ac=ve 油Directory. 油 油 油 ≒ Integra=on 油with 油.NET 油and 油Javascript 油 油 ≒ Connectors 油to 油RDBMS 油 油 ≒ Scale 油using 油cloud 油o鍖ering: 油 油Azure 油HDInsight 油service 油enables 油customers 油 to 油scale 油quickly 油and 油has 油seamless 油interface 油between 油HDFS 油and 油Azure 油 Storage 油Vault 油 油 ≒ JavaScript 油Console 油
  • 13. CHAPTER 油2 油HIGHLIGHTS: 油 油 HDINSIGHT 油INSTALL 油ON 油PREMISE 油
  • 14. Apache 油Hadoop 油 油 油 油 ≒ Open 油Source 油Sodware 油 ≒ Community 油Development 油 油 油 Hortonworks 油Data 油PlaSorm 油 油 油 油 ≒ Enterprise 油Hadoop 油Plagorm 油(HDP) 油 ≒ Leaders 油in 油Hadoop 油 ≒ Code 油commibers 油to 油Hadoop 油 Microso' 油HDInsight 油 油 油 油 ≒ Built 油on 油top 油of 油HDP 油 ≒ Integra=on 油with 油ASV, 油Excel, 油Powerview, 油 SQLServer, 油Ac=ve 油Directory 油 油 油 HDInsight 油Distribution 油
  • 15. Physical 油Install 油Options 油 NN 油 油 油 油 油SNN 油 油 油 油 油 油JT 油 DN 油 油/ 油TT 油 Single 油node 油for 油development/test 油 油 油 Mul= 油node 油for 油produc=on 油 油 油
  • 16. Multi 油Node 油Install 油Steps 油 ≒ Pre-足requisites 油 ≒ Networking 油Setup 油 ≒ Remote 油Scrip=ng 油 ≒ Firewall 油Setup 油 ≒ Sodware 油Install 油(each 油node) 油 ≒ Hadoop 油Con鍖gura=on 油 ≒ Veri鍖ca=on 油
  • 17. CHAPTER 油3 油HIGHLIGHTS: 油 油 HDINSIGHT 油AZURE 油SERVICE 油
  • 18. Azure 油Cloud 油Service 油 Create 油Storage 油 Create 油HDInsight 油 cluster 油
  • 19. CHAPTER 油4 油HIGHLIGHTS: 油 油 ADMINISTER 油YOUR 油CLUSTER 油
  • 25. CHAPTER 油5 油HIGHLIGHTS: 油 油 INGEST 油DATA 油INTO 油YOUR 油CLUSTER 油
  • 26. Loading 油Data 油into 油your 油Cluster 油 You 油have 油following 油op=ons 油 油 ≒ Loading 油data 油using 油Hadoop 油commands 油 ≒ Loading 油data 油using 油Azure 油Storage 油Vault 油 ≒ Loading 油data 油using 油Interac:ve 油JavaScript 油 油 ≒ Shipping 油data 油to 油your 油Cluster 油 ≒ Loading 油data 油from 油RDBMS 油via 油Sqoop 油
  • 27. Loading 油via 油Azure 油Storage 油Explorer 油
  • 28. CHAPTER 油6 油HIGHLIGHTS: 油 油 TRANSFORM 油YOUR 油DATA 油
  • 29. Transforming 油Data 油 You 油have 油following 油op=ons 油 油 ≒ MapReduce 油 ≒ Hive 油 ≒ Pig 油 ≒ Others 油
  • 30. Processing 油Data 油in 油Cluster 油 Map for Jan2012 Map for Feb2012 Map for Apr2013 油 One Reducer
  • 31. HDFS 油 Hive 油 JDBC/OBDC Metastore Thrift Server Command LineWeb GUI Driver (Parser, Planner, Executor) MapReduce 油 Hive 油
  • 32. Raw 油Data 油in 油HDFS 油 ≒ Distributed 油 Storage 油 ≒ Reliable 油 Data 油Processing 油via 油Pig 油 ≒ Pipelines 油 ≒ Itera=ve 油Processing 油 ≒ Research 油 Data 油 Warehouse 油 HDFS 油 Data 油Warehouse 油via 油Hive 油 ≒ BI 油Tools 油 ≒ Analysis 油 Hive 油or 油Pig? 油
  • 33. CHAPTER 油7 油HIGHLIGHTS: 油 油 ANALYZE 油& 油REPORT 油
  • 36. CHAPTER 油8: 油 油 PROJECT 油PLANNING 油& 油ARCHITECTURAL 油 CONSIDERATIONS 油
  • 37. Execu:ve 油& 油 Stakeholder 油 油 Buy-足in 油 Discovery 油& 油 Analysis 油 Design 油 Implementa:on 油User 油Acceptance 油 Produc:on 油 Opera:ons 油 Feedback, 油New 油 Requirements 油