ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Introduction
Hadoop Motivation
? HW improvements through the years ¡­
¨C CPU Speeds: 40 MIPS (1990) -> 50 GIPS (2010) =>
1,250x
¨C RAM Memory: 640kB (1990) -> 8GB (2010) => 12,500x
¨C Disk Capacity & Cost: 40 MB (1990 for $400) -> 1 TB
(2010 for $100) => 25,000x
? What about disk read speed / disk latency?
¨C 4.4 MB/s in 1990
¨C 100 MB/s in 2010 => JUST 25x faster
¨C => parallel read from multiple-disks
¨C it¡¯s not just about reads, but parallel writes as well
Hadoop motivation - issues
? Parallel reads and writes brings challenges ¡­
¨C Hardware failure
? Disks failure => replication? => RAID?
¨C Data combination
? Combining data from disks
? Solution ¡­ HADOOP
¨C Hadoop Distributed File System (HDFS)
¨C MapReduce programming model ¨C analysis system
? Abstracts from disk R/W to computation over sets of keys
and values
Hadoop HDFS
? Big Virtual File System
? Master ¨C Slave
architecture
Map-Reduce
Hadoop X Relational DB
Relational DB MapReduce
Data size GBs TBs / PBs
Access Interactive /
Batch
Batch
Updates Read & Write Write once / multiple reads
Structure Static schema Dynamic schema (analyst chooses it)
Integrity High Low
Scaling Non linear Linear
Hadoop 1 vs Hadoop 2
? Hadoop 1 SPOF
NameNode
? Security
? Hadoop 2 promotes
cluster to ¡°universal
computational cluster¡±
? Removes bottlenecks in
Map-Reduce
NameNode
? http://bd-prg-c03-nn01:50070/dfshealth.jsp
? http://bd-prg-c03-nn02:50070/dfshealth.jsp
YARN - Resource Manager
? http://bd-prg-c03-rm01:8088/cluster
? http://aimc2rm1:8088/cluster
History server
? http://bd-prg-c03-rm01:19888/jobhistory
HBase features
? NoSQL database
? Column oriented DB
? Google¡¯s BigTable implementation
? Linear and modular scalability
? Strictly consistent reads and writes.
? Automatic and configurable
sharding of tables
? Automatic failover support between
RegionServers.
? Convenient base classes for backing
Hadoop MapReduce jobs with
Apache HBase tables.
? Easy to use Java API for client
access.
? Block cache and Bloom Filters for
real-time queries.
? Query predicate push down via
server side Filters
? Thrift gateway and a REST-ful Web
service that supports XML, Protobuf,
and binary data encoding options
HBase Architecture
HBase
? http://bd-prg-c03-nn02:60010/master-status

More Related Content

Hadoop hbase introduction

  • 2. Hadoop Motivation ? HW improvements through the years ¡­ ¨C CPU Speeds: 40 MIPS (1990) -> 50 GIPS (2010) => 1,250x ¨C RAM Memory: 640kB (1990) -> 8GB (2010) => 12,500x ¨C Disk Capacity & Cost: 40 MB (1990 for $400) -> 1 TB (2010 for $100) => 25,000x ? What about disk read speed / disk latency? ¨C 4.4 MB/s in 1990 ¨C 100 MB/s in 2010 => JUST 25x faster ¨C => parallel read from multiple-disks ¨C it¡¯s not just about reads, but parallel writes as well
  • 3. Hadoop motivation - issues ? Parallel reads and writes brings challenges ¡­ ¨C Hardware failure ? Disks failure => replication? => RAID? ¨C Data combination ? Combining data from disks ? Solution ¡­ HADOOP ¨C Hadoop Distributed File System (HDFS) ¨C MapReduce programming model ¨C analysis system ? Abstracts from disk R/W to computation over sets of keys and values
  • 4. Hadoop HDFS ? Big Virtual File System ? Master ¨C Slave architecture
  • 6. Hadoop X Relational DB Relational DB MapReduce Data size GBs TBs / PBs Access Interactive / Batch Batch Updates Read & Write Write once / multiple reads Structure Static schema Dynamic schema (analyst chooses it) Integrity High Low Scaling Non linear Linear
  • 7. Hadoop 1 vs Hadoop 2 ? Hadoop 1 SPOF NameNode ? Security ? Hadoop 2 promotes cluster to ¡°universal computational cluster¡± ? Removes bottlenecks in Map-Reduce
  • 9. YARN - Resource Manager ? http://bd-prg-c03-rm01:8088/cluster ? http://aimc2rm1:8088/cluster
  • 11. HBase features ? NoSQL database ? Column oriented DB ? Google¡¯s BigTable implementation ? Linear and modular scalability ? Strictly consistent reads and writes. ? Automatic and configurable sharding of tables ? Automatic failover support between RegionServers. ? Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables. ? Easy to use Java API for client access. ? Block cache and Bloom Filters for real-time queries. ? Query predicate push down via server side Filters ? Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options