際際滷

際際滷Share a Scribd company logo
MapReduce: Simplified Data
Processing on Large Clusters
          Rob Keisler
           CSCI 638
         Summer 2011
Outline

 Background

 Model

 Examples

 Execution

 Conclusions
Background

 Transformation operations are conceptually straightforward
    Until data is large and the computation must be
     distributed over hundred or thousands of machines

 So, Google created MapReduce

 MapReduce is a programming abstraction
    Expresses simple computations
    Hides complexity details
Model

 Utilizes higher-order shaping functions Map and Reduce to
  take a set of input key/value pairs and produce a set of
  output key/value pairs

 Map
    Takes an input key/value pair and produces a set of
     intermediate key/value pairs

 Reduce
    Accepts an intermediate key I and a set of values for
     that key, and merges those values to form possibly
     smaller sets of values
Examples

 Distributed Grep

 Count of URL Access Frequency

 Reverse Web-Link Graph

 Term-Vector per Host

 Inverted Index

 Distributed Sort
Execution Overview
Conclusions

 The MapReduce programming model proved to be a useful
  abstraction for many different purposes
    Easy to use
        even for programmers without experience with
         parallel and distributed systems
    A large variety of problems are easily expressible as
     MapReduce computations
    The implementation scales to large clusters of machines

 Greatly simplifies large-scale computations at Google
Questions?

http://labs.google.com/papers/mapreduce.html

More Related Content

MapReduce

  • 1. MapReduce: Simplified Data Processing on Large Clusters Rob Keisler CSCI 638 Summer 2011
  • 2. Outline Background Model Examples Execution Conclusions
  • 3. Background Transformation operations are conceptually straightforward Until data is large and the computation must be distributed over hundred or thousands of machines So, Google created MapReduce MapReduce is a programming abstraction Expresses simple computations Hides complexity details
  • 4. Model Utilizes higher-order shaping functions Map and Reduce to take a set of input key/value pairs and produce a set of output key/value pairs Map Takes an input key/value pair and produces a set of intermediate key/value pairs Reduce Accepts an intermediate key I and a set of values for that key, and merges those values to form possibly smaller sets of values
  • 5. Examples Distributed Grep Count of URL Access Frequency Reverse Web-Link Graph Term-Vector per Host Inverted Index Distributed Sort
  • 7. Conclusions The MapReduce programming model proved to be a useful abstraction for many different purposes Easy to use even for programmers without experience with parallel and distributed systems A large variety of problems are easily expressible as MapReduce computations The implementation scales to large clusters of machines Greatly simplifies large-scale computations at Google