狠狠撸

狠狠撸Share a Scribd company logo
读一块数据的基本流程                   一个解压流的对象关系 (以LZO为例)

       readBlock
   (压缩块在文件中的偏移量,
    硬盘上的压缩块大小,                        BufferedInputStream
                                                               FilterInputStream
解压后数据的大小[一般在块压缩文件中会                    (缓存Buffer是1KB)
      记住这个大小])




                                    包含底层流
compressAlgo.getDecompressor()
   根据用户选择的压缩算法获得一个
                                  BlockDecompressorStream
Decomprssor[可能是从CodecPool中得到或
                                     (解压buffer为64KB,           DecompressorStream        CompressionInputStream
             new出来]
                                    对应一个Decompressor)




                                    包含底层流
   根据前面设置的文件中的块 new
 BoundedRangeFileInputStream     BoundedRangeFileInputStream
 用来读取文件中的其中一块压缩数据                    (对应底层流中的一段数据
                                          start-end)
                                 可在同一个底层流上有多个,close
                                       时不会close底层流
        compressAlgo.
                                    包含底层流



 createDecompressionStream()
          获得解压流


                                        FSDataInputStream      DataInputStream
                                        (对应到HDFS上的文件)          Seekable, PositionedReadable
      在这个流上读数据
     读到的数据就是解压过的




          关闭该解压流
写一块数据的基本流程                   一个压缩流的对象关系 (以LZO为例)



          Start writeBlock
                                            DataOutputStream      FilterInputStream
                                   (最上层,为了能写各种类型的数据)              DataOutput




                                    包含底层流
  compressAlgo.getCompressor()
    根据用户选择的压缩算法获得一个
Comprssor[可能是从CodecPool中得到或new      BufferedOutputStream
              出来]                    (写缓存Buffer 4KB)               FilterOutputStream




                                    包含底层流
         compressAlgo.
   createCompressionStream()
           获得压缩流                 FinishOnFlushCompressionStream
                                   在flush的时候先调用底层压缩流的
                                                                      FilterOutputStream
                                 finish,然后flush,并reset底层流
                                          的resetStarte
                                    包含底层流




       new DataOutputStream
         用于写的直接接口



                                   BlockCompressorStream
                                                                  CompressorStream         CompressionOutputStream
                                      压缩buffer 64KB
     写各种各样的数据到这个流
                                    包含底层流




 在一个块写完的时候flush该流,但不必
close。因为close就会将底层的流都close              FSDataOutputStream
                                                                  DataOutputStream
掉,也就close了底层文件,我们必须在写                       底层的文件流
                                                                  Syncable
完所有的block后再单独close底层文件流

More Related Content

What's hot (19)

颁补蝉蝉补苍诲谤补运维之道(辞蹿蹿颈肠别2003)
颁补蝉蝉补苍诲谤补运维之道(辞蹿蹿颈肠别2003)颁补蝉蝉补苍诲谤补运维之道(辞蹿蹿颈肠别2003)
颁补蝉蝉补苍诲谤补运维之道(辞蹿蹿颈肠别2003)
haiyuan ning
?
网域名称系统
网域名称系统网域名称系统
网域名称系统
祐豪 余
?
翱蝉读书会20170415
翱蝉读书会20170415翱蝉读书会20170415
翱蝉读书会20170415
Jen-Wei Cheng
?
分布式系统中的 RPC 与串行化
分布式系统中的 RPC 与串行化分布式系统中的 RPC 与串行化
分布式系统中的 RPC 与串行化
freeplant
?
尝颈苍耻虫常用命令
尝颈苍耻虫常用命令尝颈苍耻虫常用命令
尝颈苍耻虫常用命令
Tony Deng
?
linux mm
linux mmlinux mm
linux mm
Waylin Ch
?
Google key technologies
Google key technologiesGoogle key technologies
Google key technologies
Stefanie Zhao
?
Tcpcopy benchmark
Tcpcopy benchmarkTcpcopy benchmark
Tcpcopy benchmark
Louis liu
?
顿谤辞辫产辞虫讲义
顿谤辞辫产辞虫讲义顿谤辞辫产辞虫讲义
顿谤辞辫产辞虫讲义
Andy Juang
?
Mac os Terminal 常用指令與小技巧
Mac os Terminal 常用指令與小技巧Mac os Terminal 常用指令與小技巧
Mac os Terminal 常用指令與小技巧
Chen Liwei
?
深入顿辞肠办别谤的资源管理
深入顿辞肠办别谤的资源管理深入顿辞肠办别谤的资源管理
深入顿辞肠办别谤的资源管理
SpeedyCloud
?
程式設計師的自我修養 Chapter 8
程式設計師的自我修養 Chapter 8程式設計師的自我修養 Chapter 8
程式設計師的自我修養 Chapter 8
Shu-Yu Fu
?
贵迟苍存储设计
贵迟苍存储设计贵迟苍存储设计
贵迟苍存储设计
gzterrytan
?
如何解Zip壓縮檔(以Win Rar為例)
如何解Zip壓縮檔(以Win Rar為例)如何解Zip壓縮檔(以Win Rar為例)
如何解Zip壓縮檔(以Win Rar為例)
p_yang
?
探索 Everything 背后的技术
探索 Everything 背后的技术探索 Everything 背后的技术
探索 Everything 背后的技术
yiwenshengmei
?
Make talk-cn
Make talk-cnMake talk-cn
Make talk-cn
CapnKernel
?
尝颈苍耻虫基础
尝颈苍耻虫基础尝颈苍耻虫基础
尝颈苍耻虫基础
Eric Lo
?
常用惭补肠/尝颈苍耻虫命令分享
常用惭补肠/尝颈苍耻虫命令分享常用惭补肠/尝颈苍耻虫命令分享
常用惭补肠/尝颈苍耻虫命令分享
Yihua Huang
?
颁补蝉蝉补苍诲谤补运维之道(辞蹿蹿颈肠别2003)
颁补蝉蝉补苍诲谤补运维之道(辞蹿蹿颈肠别2003)颁补蝉蝉补苍诲谤补运维之道(辞蹿蹿颈肠别2003)
颁补蝉蝉补苍诲谤补运维之道(辞蹿蹿颈肠别2003)
haiyuan ning
?
网域名称系统
网域名称系统网域名称系统
网域名称系统
祐豪 余
?
翱蝉读书会20170415
翱蝉读书会20170415翱蝉读书会20170415
翱蝉读书会20170415
Jen-Wei Cheng
?
分布式系统中的 RPC 与串行化
分布式系统中的 RPC 与串行化分布式系统中的 RPC 与串行化
分布式系统中的 RPC 与串行化
freeplant
?
尝颈苍耻虫常用命令
尝颈苍耻虫常用命令尝颈苍耻虫常用命令
尝颈苍耻虫常用命令
Tony Deng
?
Google key technologies
Google key technologiesGoogle key technologies
Google key technologies
Stefanie Zhao
?
Tcpcopy benchmark
Tcpcopy benchmarkTcpcopy benchmark
Tcpcopy benchmark
Louis liu
?
顿谤辞辫产辞虫讲义
顿谤辞辫产辞虫讲义顿谤辞辫产辞虫讲义
顿谤辞辫产辞虫讲义
Andy Juang
?
Mac os Terminal 常用指令與小技巧
Mac os Terminal 常用指令與小技巧Mac os Terminal 常用指令與小技巧
Mac os Terminal 常用指令與小技巧
Chen Liwei
?
深入顿辞肠办别谤的资源管理
深入顿辞肠办别谤的资源管理深入顿辞肠办别谤的资源管理
深入顿辞肠办别谤的资源管理
SpeedyCloud
?
程式設計師的自我修養 Chapter 8
程式設計師的自我修養 Chapter 8程式設計師的自我修養 Chapter 8
程式設計師的自我修養 Chapter 8
Shu-Yu Fu
?
贵迟苍存储设计
贵迟苍存储设计贵迟苍存储设计
贵迟苍存储设计
gzterrytan
?
如何解Zip壓縮檔(以Win Rar為例)
如何解Zip壓縮檔(以Win Rar為例)如何解Zip壓縮檔(以Win Rar為例)
如何解Zip壓縮檔(以Win Rar為例)
p_yang
?
探索 Everything 背后的技术
探索 Everything 背后的技术探索 Everything 背后的技术
探索 Everything 背后的技术
yiwenshengmei
?
尝颈苍耻虫基础
尝颈苍耻虫基础尝颈苍耻虫基础
尝颈苍耻虫基础
Eric Lo
?
常用惭补肠/尝颈苍耻虫命令分享
常用惭补肠/尝颈苍耻虫命令分享常用惭补肠/尝颈苍耻虫命令分享
常用惭补肠/尝颈苍耻虫命令分享
Yihua Huang
?

Viewers also liked (15)

Fans of running gump
Fans of running gumpFans of running gump
Fans of running gump
Schubert Zhang
?
Wild Thinking of BigdataBase
Wild Thinking of BigdataBaseWild Thinking of BigdataBase
Wild Thinking of BigdataBase
Schubert Zhang
?
Big Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aBig Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223a
Schubert Zhang
?
叠颈驳迟补产濒别数据模型解决颁顿搁清单存储问题的资源估算
叠颈驳迟补产濒别数据模型解决颁顿搁清单存储问题的资源估算叠颈驳迟补产濒别数据模型解决颁顿搁清单存储问题的资源估算
叠颈驳迟补产濒别数据模型解决颁顿搁清单存储问题的资源估算
Schubert Zhang
?
RockStor - A Cloud Object System based on Hadoop
RockStor -  A Cloud Object System based on HadoopRockStor -  A Cloud Object System based on Hadoop
RockStor - A Cloud Object System based on Hadoop
Schubert Zhang
?
Horizon for Big Data
Horizon for Big DataHorizon for Big Data
Horizon for Big Data
Schubert Zhang
?
Scrum Agile Development
Scrum Agile DevelopmentScrum Agile Development
Scrum Agile Development
Schubert Zhang
?
Career Advice
Career AdviceCareer Advice
Career Advice
Schubert Zhang
?
HBase Coprocessor Introduction
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor Introduction
Schubert Zhang
?
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solution
Schubert Zhang
?
贬补诲辞辞辫大数据实践经验
贬补诲辞辞辫大数据实践经验贬补诲辞辞辫大数据实践经验
贬补诲辞辞辫大数据实践经验
Schubert Zhang
?
HiveServer2
HiveServer2HiveServer2
HiveServer2
Schubert Zhang
?
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
Anil Gupta
?
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
Edureka!
?
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processing
Schubert Zhang
?
Wild Thinking of BigdataBase
Wild Thinking of BigdataBaseWild Thinking of BigdataBase
Wild Thinking of BigdataBase
Schubert Zhang
?
Big Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aBig Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223a
Schubert Zhang
?
叠颈驳迟补产濒别数据模型解决颁顿搁清单存储问题的资源估算
叠颈驳迟补产濒别数据模型解决颁顿搁清单存储问题的资源估算叠颈驳迟补产濒别数据模型解决颁顿搁清单存储问题的资源估算
叠颈驳迟补产濒别数据模型解决颁顿搁清单存储问题的资源估算
Schubert Zhang
?
RockStor - A Cloud Object System based on Hadoop
RockStor -  A Cloud Object System based on HadoopRockStor -  A Cloud Object System based on Hadoop
RockStor - A Cloud Object System based on Hadoop
Schubert Zhang
?
HBase Coprocessor Introduction
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor Introduction
Schubert Zhang
?
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solution
Schubert Zhang
?
贬补诲辞辞辫大数据实践经验
贬补诲辞辞辫大数据实践经验贬补诲辞辞辫大数据实践经验
贬补诲辞辞辫大数据实践经验
Schubert Zhang
?
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
Anil Gupta
?
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
Edureka!
?
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processing
Schubert Zhang
?

More from Schubert Zhang (17)

Blockchain in Action
Blockchain in ActionBlockchain in Action
Blockchain in Action
Schubert Zhang
?
科普区块链
科普区块链科普区块链
科普区块链
Schubert Zhang
?
Engineering Culture and Infrastructure
Engineering Culture and InfrastructureEngineering Culture and Infrastructure
Engineering Culture and Infrastructure
Schubert Zhang
?
Simple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluationSimple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluation
Schubert Zhang
?
骋补苍驳濒颈补轻度使用指南
骋补苍驳濒颈补轻度使用指南骋补苍驳濒颈补轻度使用指南
骋补苍驳濒颈补轻度使用指南
Schubert Zhang
?
Big data and cloud
Big data and cloudBig data and cloud
Big data and cloud
Schubert Zhang
?
Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Learning from google megastore (Part-1)
Learning from google megastore (Part-1)
Schubert Zhang
?
Hanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221aHanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221a
Schubert Zhang
?
Cassandra Compression and Performance Evaluation
Cassandra Compression and Performance EvaluationCassandra Compression and Performance Evaluation
Cassandra Compression and Performance Evaluation
Schubert Zhang
?
The World of Structured Storage System
The World of Structured Storage SystemThe World of Structured Storage System
The World of Structured Storage System
Schubert Zhang
?
Distributed Filesystems Review
Distributed Filesystems ReviewDistributed Filesystems Review
Distributed Filesystems Review
Schubert Zhang
?
Red Hat Global File System (GFS)
Red Hat Global File System (GFS)Red Hat Global File System (GFS)
Red Hat Global File System (GFS)
Schubert Zhang
?
pNFS Introduction
pNFS IntroductionpNFS Introduction
pNFS Introduction
Schubert Zhang
?
无线信息传媒的技术分析和商业模式
无线信息传媒的技术分析和商业模式无线信息传媒的技术分析和商业模式
无线信息传媒的技术分析和商业模式
Schubert Zhang
?
Case Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of DataCase Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of Data
Schubert Zhang
?
HFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs
HFile: A Block-Indexed File Format to Store Sorted Key-Value PairsHFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs
HFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs
Schubert Zhang
?
HBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationHBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance Evaluation
Schubert Zhang
?
Engineering Culture and Infrastructure
Engineering Culture and InfrastructureEngineering Culture and Infrastructure
Engineering Culture and Infrastructure
Schubert Zhang
?
Simple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluationSimple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluation
Schubert Zhang
?
骋补苍驳濒颈补轻度使用指南
骋补苍驳濒颈补轻度使用指南骋补苍驳濒颈补轻度使用指南
骋补苍驳濒颈补轻度使用指南
Schubert Zhang
?
Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Learning from google megastore (Part-1)
Learning from google megastore (Part-1)
Schubert Zhang
?
Hanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221aHanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221a
Schubert Zhang
?
Cassandra Compression and Performance Evaluation
Cassandra Compression and Performance EvaluationCassandra Compression and Performance Evaluation
Cassandra Compression and Performance Evaluation
Schubert Zhang
?
The World of Structured Storage System
The World of Structured Storage SystemThe World of Structured Storage System
The World of Structured Storage System
Schubert Zhang
?
Distributed Filesystems Review
Distributed Filesystems ReviewDistributed Filesystems Review
Distributed Filesystems Review
Schubert Zhang
?
Red Hat Global File System (GFS)
Red Hat Global File System (GFS)Red Hat Global File System (GFS)
Red Hat Global File System (GFS)
Schubert Zhang
?
无线信息传媒的技术分析和商业模式
无线信息传媒的技术分析和商业模式无线信息传媒的技术分析和商业模式
无线信息传媒的技术分析和商业模式
Schubert Zhang
?
Case Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of DataCase Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of Data
Schubert Zhang
?
HFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs
HFile: A Block-Indexed File Format to Store Sorted Key-Value PairsHFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs
HFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs
Schubert Zhang
?
HBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationHBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance Evaluation
Schubert Zhang
?

Hadoop compress-stream

  • 1. 读一块数据的基本流程 一个解压流的对象关系 (以LZO为例) readBlock (压缩块在文件中的偏移量, 硬盘上的压缩块大小, BufferedInputStream FilterInputStream 解压后数据的大小[一般在块压缩文件中会 (缓存Buffer是1KB) 记住这个大小]) 包含底层流 compressAlgo.getDecompressor() 根据用户选择的压缩算法获得一个 BlockDecompressorStream Decomprssor[可能是从CodecPool中得到或 (解压buffer为64KB, DecompressorStream CompressionInputStream new出来] 对应一个Decompressor) 包含底层流 根据前面设置的文件中的块 new BoundedRangeFileInputStream BoundedRangeFileInputStream 用来读取文件中的其中一块压缩数据 (对应底层流中的一段数据 start-end) 可在同一个底层流上有多个,close 时不会close底层流 compressAlgo. 包含底层流 createDecompressionStream() 获得解压流 FSDataInputStream DataInputStream (对应到HDFS上的文件) Seekable, PositionedReadable 在这个流上读数据 读到的数据就是解压过的 关闭该解压流
  • 2. 写一块数据的基本流程 一个压缩流的对象关系 (以LZO为例) Start writeBlock DataOutputStream FilterInputStream (最上层,为了能写各种类型的数据) DataOutput 包含底层流 compressAlgo.getCompressor() 根据用户选择的压缩算法获得一个 Comprssor[可能是从CodecPool中得到或new BufferedOutputStream 出来] (写缓存Buffer 4KB) FilterOutputStream 包含底层流 compressAlgo. createCompressionStream() 获得压缩流 FinishOnFlushCompressionStream 在flush的时候先调用底层压缩流的 FilterOutputStream finish,然后flush,并reset底层流 的resetStarte 包含底层流 new DataOutputStream 用于写的直接接口 BlockCompressorStream CompressorStream CompressionOutputStream 压缩buffer 64KB 写各种各样的数据到这个流 包含底层流 在一个块写完的时候flush该流,但不必 close。因为close就会将底层的流都close FSDataOutputStream DataOutputStream 掉,也就close了底层文件,我们必须在写 底层的文件流 Syncable 完所有的block后再单独close底层文件流