�ݺ�ߣ

Hadoop Motivation
? HW improvements through the years ��
�C CPU Speeds: 40 MIPS (1990) -> 50 GIPS (2010) =>
1,250x
�C RAM Memory: 640kB (1990) -> 8GB (2010) => 12,500x
�C Disk Capacity & Cost: 40 MB (1990 for $400) -> 1 TB
(2010 for $100) => 25,000x
? What about disk read speed / disk latency?
�C 4.4 MB/s in 1990
�C 100 MB/s in 2010 => JUST 25x faster
�C => parallel read from multiple-disks
�C it��s not just about reads, but parallel writes as well

Hadoop motivation - issues
? Parallel reads and writes brings challenges ��
�C Hardware failure
? Disks failure => replication? => RAID?
�C Data combination
? Combining data from disks
? Solution �� HADOOP
�C Hadoop Distributed File System (HDFS)
�C MapReduce programming model �C analysis system
? Abstracts from disk R/W to computation over sets of keys
and values

Hadoop HDFS
? Big Virtual File System
? Master �C Slave
architecture

Hadoop X Relational DB
Relational DB MapReduce
Data size GBs TBs / PBs
Access Interactive /
Batch
Batch
Updates Read & Write Write once / multiple reads
Structure Static schema Dynamic schema (analyst chooses it)
Integrity High Low
Scaling Non linear Linear

Hadoop 1 vs Hadoop 2
? Hadoop 1 SPOF
NameNode
? Security
? Hadoop 2 promotes
cluster to ��universal
computational cluster��
? Removes bottlenecks in
Map-Reduce

NameNode
? http://bd-prg-c03-nn01:50070/dfshealth.jsp
? http://bd-prg-c03-nn02:50070/dfshealth.jsp

YARN - Resource Manager
? http://bd-prg-c03-rm01:8088/cluster
? http://aimc2rm1:8088/cluster

History server
? http://bd-prg-c03-rm01:19888/jobhistory

HBase features
? NoSQL database
? Column oriented DB
? Google��s BigTable implementation
? Linear and modular scalability
? Strictly consistent reads and writes.
? Automatic and configurable
sharding of tables
? Automatic failover support between
RegionServers.
? Convenient base classes for backing
Hadoop MapReduce jobs with
Apache HBase tables.
? Easy to use Java API for client
access.
? Block cache and Bloom Filters for
real-time queries.
? Query predicate push down via
server side Filters
? Thrift gateway and a REST-ful Web
service that supports XML, Protobuf,
and binary data encoding options

HBase
? http://bd-prg-c03-nn02:60010/master-status

�ݺ�ߣ

Hadoop hbase introduction

More Related Content

Hadoop hbase introduction