�ݺ�ߣ

Big Data
HDInsight and Power BI
Prasad Prabhu

Big Data - HDInsight and Power BI

INFO IN TABULAR
FORMAT
ROWS
&
COLUMNS DEFINED
SCHEMA
PRIMARY
KEY
RELATIONSHIPS
FOREIGN
KEY

TRADITIONAL DW/BI ENVIRONMENT
Data Warehouse
ETL
ERP/ CRM,

EVOLUTION OF DATA
Internet of things
Wikis / Blogs
Audio / Video
Log Files
Text/Image
Social Sentiment
Data Market Feeds
eGov Feeds
Weather
Click Stream
Sensors / RFID / Devices
Spatial & GPS Coordinates
WEB 2.0 Mobile
Advertising eCommerce Collaboration
Digital Marketing
Search Marketing
Web Logs
Recommendations
ERP / CRM
Sales Pipeline
Payables
Payroll
Inventory
Contacts
Deal Tracking
Exabytes
(10E18)
Exabytes
(10E18)
Petabytes
(10E15)
Petabytes
(10E15)
Terabytes
(10E12)
Gigabytes
(10E9)
Velocity - Variety
Volume
1980
190,000$
2010
0.07$
1990
9,000$
2000
15$
Storage/GB
ERP / CRM WEB 2.0 Internet of things
Terabytes
(10E12)
Gigabytes
(10E9)
Storage/GB

90%
of the world’s data has been
created in the last 2 years
Source:SINTEF

3 ‘V’S OF BIG DATA
VOLUME
(Size)
VARIETY
(Structure)
VELOCITY
(Speed)

How do we handle this massive amount of data
which comes in different forms and at some speed ?

TOMORROWS DW/BI ENVIRONMENT
Business Critical
Data Warehouse
ETL
New data sources

WHAT IS HADOOP?
Apache Hadoop is an open source system to reliably store and process a LOT
of information across many commodity computers
Began life as an open source implementation of Google’s Map/Reduce and GFS
papers. Now used at many major web companies at massive scale (1000’s of
node, PB’s of storage)
Key attributes:
• Open source
• Highly scalable
• Runs on commodity hardware
• Redundant and reliable (no data loss)
• Batch processing centric – using
“Map-Reduce” processing paradigm

2 CORE COMPONENTS OF HADOOP
Distributed Processing
(MapReduce)
Distributed Storage
(HDFS)

HADOOP IS JUST A FILE SYSTEM
Head Node
Data Node Data Node Data Node Data Node Data Node
File

HADOOP IS JUST A FILE SYSTEM
Head Node
Data Node Data Node Data Node Data Node Data Node
Replicated 3 times
File
Read Optimised & Failure Tolerant

MAP + REDUCE = EXTRACT, LOAD + TRANSFORM
REDUCE MAP
Raw Data Raw Data Raw Data Raw Data
Mapper Mapper Mapper Mapper
Data Data Data Data
Reducer
Output

MAP REDUCE ANALOGY – BLOGGER ANALYSIS
Hi John,
As you know we are building the blogging platform blogger2.com, I need some statistics. I need to find out, Across all blogs
ever written on blogger.com, how many times 1 character words occur(like 'a', 'I'),
How many times two character words occur (like 'be', 'is').. and so on till how many times do ten character words occur.
• Occurrence of one character words – Around 937688399933
• Occurrence of two character words – Around 23388383830753434
• .. hence forth till 10
I know its a really big job. So, I will assign, all 50,000 employees working in our company
to work with you on this for a week. I am going on a vacation for a week, and its really
important that I've this when I return.
Good luck.
Regards,
CEO

THE ECOSYSTEM
Query
(Hive)
Distributed Processing
(MapReduce)
Distributed Storage
(HDFS)
ODBC
Legend
Red = Core
Hadoop
Blue = Data
processing
Purple =
Microsoft
integration
points and value
adds
Orange = Data
Movement

INTRODUCING HDINSIGHT
 HDInsight is Microsoft’s 100% Apache compatible Hadoop distribution
 Available as a Microsoft Azure service
 Develop in .NET and Java
 Built on Hortonworks Data Platform (HDP)
 Can be automated with PowerShell and Command Line
 Empowers organizations with new insights on previously untouched
unstructured data, while connecting to the most widely used BI tools on
the planet

RUN SQL LIKE COMMANDS USING HIVEQL

USING EXCEL TO CONNECT TO HDINISGHT

POWER BI = POWER PIVOT + POWER QUERY + POWER MAP

NATURAL LANGUAGE USING POWER BI

SUMMARY
 Growing data – Not necessarily structured
 Storage is really cheap
 Need systems that do not enforce structure on write but on read.
 Just don’t validate but analyze and find patterns, perform exploratory analysis,
predict outcomes
 Find ways to make big data simpler to business users – empower them so that
business can take more informed decisions.

http://azure.microsoft.com/bigdata
http://www.microsoft.com/powerbi
Sign up for 30 day free trial
REFERENCE LINKS

�ݺ�ߣ

Big Data - HDInsight and Power BI

More Related Content

Big Data - HDInsight and Power BI