際際滷

際際滷Share a Scribd company logo
Pictures at an Exhibition
                                    Ruby, Rails, NoSQL and Big Data
                                                        John Repko

John Repko -- Pikasoft LLC
Agenda

   The Goal: Exploring Big Data with NoSQL and Ruby on Rails
   Just Two Solutions  Heres How We Get There
         Key-Value Data Stores
               Redis
               Riak


         Document Data Stores
               MongoDB
               Cassandra


         Graph Data Stores
               Neo4J


         MapReduce
               Through Hadoop
               Through Riak / MongoDB
               Through Elastic Mapreduce




John Repko -- Pikasoft LLC                                     2
So How Did We Get to Big Data Anyway?




  Source: https://thedailyload.files.wordpress.com/2010/12/william_perry.jpg   Source: http://www.startribune.com/sports/164830346.html




                      Big Data Is Not Just About Big Data  Its About FAST Data!
                                       (http://www.pikasoft.com/journal/2011/5/13/not-big-data-fast-data.html)



John Repko -- Pikasoft LLC                                                                                                                3
Why is Everyone Diving into Big Data?


       There Are Big Data Breakthroughs Everywhere




                                                                    Google Wins                Progressives
                                                                     the Search                   Instant
                                                                       Market                 Overnight rate
                                                                                                  quotes
     Source: https://newshour.s3.amazonaws.com/photos/2011/02/16/
     kayjay_1_blog_main_horizontal.jpg
                                                                                               Progressive creates an
                                                                      Massively parallel
                                                                                                 insurance quote for
                                                                      web searches with
     Watson Wins on Jeopardy                                      results back in a tenth
                                                                                               every car and truck in
                                                                                                the US  every night
         Beat the best Jeopardy players of all time                       of a second



John Repko -- Pikasoft LLC                                                                                              4
Exploring Big Data


           Big Data frequently provides solutions to a common set of problems




                     Source: http://www.slideshare.net/cloudera/20100806-cloudera-10-hadoopable-problems-webinar-4931616




                These appear to be 10 Problems but are really only 2 Problems
John Repko -- Pikasoft LLC                                                                                                 5
Exploring Big Data


   The variety of Big Data wins in the press fall into just two solution patterns


     Foresight
             We are presented a pattern  What has the outcome
              been when weve seen similar patterns in the past?


     Hindsight
             We are presented an outcome -- What pattern of events
              anticipated the outcome in the past?




    You Dont Need Dozens Of Solution Approaches For Big Data        Just Two
John Repko -- Pikasoft LLC                                                          6
Exploring Big Data

        In this light, lets take a look at the 10 Hadoop-able Problems of Big Data

                        Summary  10 Common Hadoop-able Problems*

               1. Modeling True Risk
                            What past patterns led to success or default?

               1. Customer Churn Analysis
                            What do customer churn patterns predict about our products and markets?

               1. Recommendation Engine
                            We have search terms  what have the results been from similar searches in the past?


               1. Ad Targeting
                            We have profile information  what offers have led to sales for similar profiles in the past?


               1. PoS Transaction Analysis
                            We have your purchase history  what deals might we offer in the future?


                                          Foresight                                   Hindsight
John Repko -- Pikasoft LLC                                                                                                   7
Exploring Big Data

       These two solution types apply generally to the Hadoop-able problems

                         Summary  10 Common Hadoop-able Problems

                6. Analyzing Data Logs to Forecast Events
                            We have your logs  what pattern of events have anticipated failures before?

                6. Threat Analysis
                            We have a specific event  what results have we seen from similar threats in the past?

                6. Trade Surveillance
                            Does this parcel raise any alarms, based on our history of past parcel-tracking?

                6. Search Quality
                            We have a set of search terms  what have similar searches succeeded in finding in the
                             past?

                6. Data Sandbox
                            We have your data, possibly unstructured data. What patterns in that data might we
                             bring to your attention now?


                                         Foresight                                 Hindsight
John Repko -- Pikasoft LLC                                                                                            8
The Big Data Platform Provides with Rich Analytics Tools

                             Key Big Data Analytics Solution Patterns



     1.    Predictive Modeling                         5.   Outlier Analysis




     2.    Data Visualization                          6.   AB Testing




                                                       7.   Markov Chains
     3.    Cluster Partitioning




                                                       8.   Bloom Filters
     4.    Collaborative Filtering




John Repko -- Pikasoft LLC                                                     9
Exploring Big Data




                    With Just Two Standard Solution Models We Can
                             Solve Most Big Data Problems

                       The Key Is To Shape Big Data Into A Standard
                         Platform Onto Which We Can Apply These
                                     Analytics Tools


                                              It is not the technology that creates a competitive edge, but the
                                              management process that exploits technology."
                                              ~ Shaping the Future- Peter Keen (1991)




John Repko -- Pikasoft LLC                                                                                         10
Agenda

   The Goal: Exploring Big Data
   Just Two Solutions  Heres How We Get There
         Key-Value Data Stores
               Redis
               Riak


         Document Data Stores
               MongoDB
               Cassandra


         Graph Data Stores
               Neo4J


         MapReduce
               Through Hadoop
               Through Riak / MongoDB
               Through Elastic Mapreduce




John Repko -- Pikasoft LLC                        11
The Core Development Platform


           Clean install of 12.04 and all latest
            updates

           sudo apt-get update
           sudo apt-get upgrade                                           Core Platform: Ubuntu 12.04 + AWS
           sudo apt-get dist-upgrade

           sudo apt-get install build-essential openssl
            libreadline6 libreadline6-dev curl git-core
            zlib1g zlib1g-dev libyaml-dev libsqlite3-0
            libsqlite3-dev sqlite3 libxml2-dev libxslt-dev
            autoconf libc6-dev ncurses-dev automake
            libtool bison subversion

           sudo apt-get install libcurl3 libcurl3-gnutls
            libcurl4-openssl-dev

           bash -s stable < <(curl -shttps
            ://raw.github.com/wayneeseguin/rvm/master/binscripts/rvm-installer
            )

           source ~/.bashrc

           gem update --system (Latest version currently
            installed)

           rvm ruby-1.9.2-p290@rails31 --create --default

           sudo apt-get install nodejs

           gem install rake

           gem install rails -v=3.1.3


John Repko -- Pikasoft LLC                                                                                     12
Agenda

   The Goal: Exploring Big Data

   Just Two Solutions  Heres How We Get There

         Key-Value Data Stores
               Redis
               Riak

         Document Data Stores
               MongoDB
               Cassandra

         Graph Data Stores
               Neo4J


         MapReduce
               Through Hadoop
               Through Riak
               Through Elastic Mapreduce




John Repko -- Pikasoft LLC                        13
Redis
                                                                                                             Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

          Example:
                http://www.pikasoft.com/journal/2011/1/2/a-quick-redis-key
                 -value-example-for-the-holidays.html

          Backing Articles:
                http://purevirtual.de/2010/04/url-shortener-with-redis-and-rails3/

          Code:
                http://www.pikasoft.com/journal/2011/1/2/a-quick-redis-key-value-example-for-the-holidays.html


        The good news is, we've already got our base image, and adding a new Redis data store and
        example app to it only took about an hour. As before, you can play with the URL-shortener at Redis
        URL Shortener, and you can download and play with the code for the application at:Redis URL
        Shortener Source Code.


                             Play with this online at:
                       http://jkr-blog.dyndns.org:3001/mini_urls




John Repko -- Pikasoft LLC                                                                                                                                                  14
Riak
                                                                                Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

          Example:
                http://www.pikasoft.com/journal/2012/1/15/you
                 -only-live-twice-basho-and-riak.html

          Backing Articles:
                http://jit.nuance9.com/2010/07/ruby-192-rails-3-riak-and-
                 ripple.html
                http://jbbarth.com/archives/2011/4/23/basic_usage_of_riak_in
                 /

          Code:




John Repko -- Pikasoft LLC                                                                                                                      15
Agenda

   The Goal: Exploring Big Data
   Just Two Solutions  Heres How We Get There
         Key-Value Data Stores
               Redis
               Riak


         Document Data Stores
               MongoDB
               Cassandra


         Graph Data Stores
               Neo4J


         MapReduce
               Through Hadoop
               Through Riak / MongoDB
               Through Elastic Mapreduce




John Repko -- Pikasoft LLC                        16
MongoDB
                                                                                                                 Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-
                                                                                                                 redis
          Example:
                http://www.pikasoft.com/journal/2010/7/31/nosql-on-the-cloud-our-first-
                 application.html

          Backing Articles:
                http://www.mongodb.org/display/DOCS/Building+for+
                 Linux

          Code:
                http://www.pikasoft.com/journal/2010/8/16/why-our-little-
                 nosql-app-matters.html
   So let's sum up -- after a handful of posts and a small but still sorrowful amount of command-line and rails code,
   we've managed to accomplish the following "Hello World" tasks in NoSQL on the cloud:

   Created a cloud account
   Got our first app created, and saw it in a browser on the web
   Loaded up real development environments (Ruby/Rails we added, Java we got for free)
   Added a stronger app server (thin >> webrick) and a stronger web server (nginx >> almost anything)
   Added our first NoSQL data store (MongoDB) and mapping software to simulate ActiveRecord in NoSQL
   Created a little NoSQL app to show all this, and made it visible though a dynamic DNS address:
   Rails Mongo Notes Example

   Just to wrap the little app up: I updated John Nunemaker's Mongomapper demo app to work with Rails3 and the
   cloud, and if you like you can take a look at the code for it here: Rails Mongo Code.



                                     Play with this online at:
                                 http://jkr-code.dyndns.org:3000/notes

John Repko -- Pikasoft LLC                                                                                                                                                       17
Cassandra
                                                                                                         Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
          Example:
                http://www.pikasoft.com/journal/2011/2/14/casi-casi-
                 cassandra.html
          Backing Articles:
                http://www.25hoursaday.com/weblog/2008/05/23/
                 SomeThoughtsOnTwittersAvailabilityProblems.aspx
          Code:

     Here's what the code for that broadcast might look like:

     # Tweeter class Tweeter < ActiveRecord::Base has_many :followers end -
     class Follower < ActiveRecord::Base belongs_to :tweeter end

     All fine so far -- that's the twittery world we all live in. I can send out my breathless message of what
     I had for breakfast, and then Twitter picks it up and broadcasts the message from me (and all the
     messages from the other tweeters):

     @tweeters = Tweeter.find(:all_tweeters) @tweeters.each do |tweeter|
     @followers = tweeter.find(:all) @followers.each do |follower|
     tweeter.broadcast_to :recipient => follower end end end

     So here we're going to do a query for each of the X tweeters, and for them we'll do another query for
     each of their Y followers.

     Code smell! Fail Whale!!!




John Repko -- Pikasoft LLC                                                                                                                                              18
Agenda

   Exploring Big Data

   Just Two Solutions  Heres How We Get There

         Key-Value Data Stores
               Redis
               Riak


         Document Data Stores
               MongoDB
               Cassandra


         Graph Data Stores
               Neo4J


         MapReduce
               Through Hadoop
               Through Riak / MongoDB
               Through Elastic Mapreduce



John Repko -- Pikasoft LLC                        19
Neo4J
                                                                            Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

          Example:
                http://www.pikasoft.com/journal/2011/1/21/graph-databases-and-star-
                 wars.html
          Backing Articles:
                http://purevirtual.de/2010/04/url-shortener-with-redis-and-rails3/
          Code


                             Play with this online at:
        Six Degrees of Kevin Bacon =     http://jkr-blog.dyndns.org:9292/




John Repko -- Pikasoft LLC                                                                                                                 20
Agenda

   Exploring Big Data

   Just Two Solutions  Heres How We Get There

         Key-Value Data Stores
               Redis
               Riak

         Document Data Stores
               MongoDB
               Cassandra

         Graph Data Stores
               Neo4J


         MapReduce
               Through Hadoop
               Through Riak
               Through Elastic Mapreduce




John Repko -- Pikasoft LLC                        21
MapReduce via Hadoop, Thrift and AWS

          Example:                                                                   Reduce
                http://www.pikasoft.com/journal/2011/1/9/nosql-next-up-hadoop-and-
                 cloudera.html

          Backing Articles:
                http://www.joelonsoftware.com/items/2006/08/01.
                 html

          Code:
                                                       Map




John Repko -- Pikasoft LLC                                                                     22
MapReduce via Riak / MongoDB

       Example:
              http://www.control-alt-del.org/2011/09/14/fun-with-bloom-filters-using-riak-mapreduce
               /
              http://verboselogging.com/2010/03/22/super-mongodb-mapreduce-max-out
       Backing Articles:
          MapReduce on Riak
                     http://wiki.basho.com/MapReduce.html
                     http://stackoverflow.com/questions/2123004/mapreduce-with-
                      riak
                     http://www.readwriteweb.com/hack/2011/06/riak-pipe-rethinks-its-
                      mapreduce.php
                     http://www.quora.com/What-are-the-advantages-and-limitations-of-MapReduce-backed-by-distributed-key-value-store
                      Riak
          MapReduce on MongoDB
                     http://dllhell.net/2010/07/17/on-mapreduce-in-mongodb
                      /
                     http://www.mongodb.org/display/DOCS/
                      MapReduce
                     http://jonathanhui.com/mongodb-mapreduce
                     http://blog.boxedice.com/2010/06/21/map-
                      reduce-and-mongodb/
                                                                                Source: http://blog.boxedice.com/2010/06/21/map-reduce-and-mongodb/




John Repko -- Pikasoft LLC                                                                                                                            23
Elastic MapReduce

          Example:
                http://www.commoncrawl.org/mapreduce-for-the-masses/
          Backing Articles:
                http://www.commoncrawl.org/mapreduce-for-the-masses/
          Code:




John Repko -- Pikasoft LLC                                              24
Summary




                    This Is Only The Beginning. With A
                Standard Platform Well See Richer Big Data
                       Discoveries Become Routine

                             The Solution Tools (際際滷 9) Become
                             Straightforward if We Run Them on a
                                    Standard Architecture
                                                        One mans noise is another mans data.
                                                        ~ Bill Stensrud - InstantEncore




John Repko -- Pikasoft LLC                                                                         25
Contacts



           John Repko:               john.repko@pikasoft.com




                    http://pikasoft.s3.amazonaws.com/Pictures_at_an_Exhibition.pptx




John Repko -- Pikasoft LLC                                                            26

More Related Content

Ruby, rails, no sql and big data

  • 1. Pictures at an Exhibition Ruby, Rails, NoSQL and Big Data John Repko John Repko -- Pikasoft LLC
  • 2. Agenda The Goal: Exploring Big Data with NoSQL and Ruby on Rails Just Two Solutions Heres How We Get There Key-Value Data Stores Redis Riak Document Data Stores MongoDB Cassandra Graph Data Stores Neo4J MapReduce Through Hadoop Through Riak / MongoDB Through Elastic Mapreduce John Repko -- Pikasoft LLC 2
  • 3. So How Did We Get to Big Data Anyway? Source: https://thedailyload.files.wordpress.com/2010/12/william_perry.jpg Source: http://www.startribune.com/sports/164830346.html Big Data Is Not Just About Big Data Its About FAST Data! (http://www.pikasoft.com/journal/2011/5/13/not-big-data-fast-data.html) John Repko -- Pikasoft LLC 3
  • 4. Why is Everyone Diving into Big Data? There Are Big Data Breakthroughs Everywhere Google Wins Progressives the Search Instant Market Overnight rate quotes Source: https://newshour.s3.amazonaws.com/photos/2011/02/16/ kayjay_1_blog_main_horizontal.jpg Progressive creates an Massively parallel insurance quote for web searches with Watson Wins on Jeopardy results back in a tenth every car and truck in the US every night Beat the best Jeopardy players of all time of a second John Repko -- Pikasoft LLC 4
  • 5. Exploring Big Data Big Data frequently provides solutions to a common set of problems Source: http://www.slideshare.net/cloudera/20100806-cloudera-10-hadoopable-problems-webinar-4931616 These appear to be 10 Problems but are really only 2 Problems John Repko -- Pikasoft LLC 5
  • 6. Exploring Big Data The variety of Big Data wins in the press fall into just two solution patterns Foresight We are presented a pattern What has the outcome been when weve seen similar patterns in the past? Hindsight We are presented an outcome -- What pattern of events anticipated the outcome in the past? You Dont Need Dozens Of Solution Approaches For Big Data Just Two John Repko -- Pikasoft LLC 6
  • 7. Exploring Big Data In this light, lets take a look at the 10 Hadoop-able Problems of Big Data Summary 10 Common Hadoop-able Problems* 1. Modeling True Risk What past patterns led to success or default? 1. Customer Churn Analysis What do customer churn patterns predict about our products and markets? 1. Recommendation Engine We have search terms what have the results been from similar searches in the past? 1. Ad Targeting We have profile information what offers have led to sales for similar profiles in the past? 1. PoS Transaction Analysis We have your purchase history what deals might we offer in the future? Foresight Hindsight John Repko -- Pikasoft LLC 7
  • 8. Exploring Big Data These two solution types apply generally to the Hadoop-able problems Summary 10 Common Hadoop-able Problems 6. Analyzing Data Logs to Forecast Events We have your logs what pattern of events have anticipated failures before? 6. Threat Analysis We have a specific event what results have we seen from similar threats in the past? 6. Trade Surveillance Does this parcel raise any alarms, based on our history of past parcel-tracking? 6. Search Quality We have a set of search terms what have similar searches succeeded in finding in the past? 6. Data Sandbox We have your data, possibly unstructured data. What patterns in that data might we bring to your attention now? Foresight Hindsight John Repko -- Pikasoft LLC 8
  • 9. The Big Data Platform Provides with Rich Analytics Tools Key Big Data Analytics Solution Patterns 1. Predictive Modeling 5. Outlier Analysis 2. Data Visualization 6. AB Testing 7. Markov Chains 3. Cluster Partitioning 8. Bloom Filters 4. Collaborative Filtering John Repko -- Pikasoft LLC 9
  • 10. Exploring Big Data With Just Two Standard Solution Models We Can Solve Most Big Data Problems The Key Is To Shape Big Data Into A Standard Platform Onto Which We Can Apply These Analytics Tools It is not the technology that creates a competitive edge, but the management process that exploits technology." ~ Shaping the Future- Peter Keen (1991) John Repko -- Pikasoft LLC 10
  • 11. Agenda The Goal: Exploring Big Data Just Two Solutions Heres How We Get There Key-Value Data Stores Redis Riak Document Data Stores MongoDB Cassandra Graph Data Stores Neo4J MapReduce Through Hadoop Through Riak / MongoDB Through Elastic Mapreduce John Repko -- Pikasoft LLC 11
  • 12. The Core Development Platform Clean install of 12.04 and all latest updates sudo apt-get update sudo apt-get upgrade Core Platform: Ubuntu 12.04 + AWS sudo apt-get dist-upgrade sudo apt-get install build-essential openssl libreadline6 libreadline6-dev curl git-core zlib1g zlib1g-dev libyaml-dev libsqlite3-0 libsqlite3-dev sqlite3 libxml2-dev libxslt-dev autoconf libc6-dev ncurses-dev automake libtool bison subversion sudo apt-get install libcurl3 libcurl3-gnutls libcurl4-openssl-dev bash -s stable < <(curl -shttps ://raw.github.com/wayneeseguin/rvm/master/binscripts/rvm-installer ) source ~/.bashrc gem update --system (Latest version currently installed) rvm ruby-1.9.2-p290@rails31 --create --default sudo apt-get install nodejs gem install rake gem install rails -v=3.1.3 John Repko -- Pikasoft LLC 12
  • 13. Agenda The Goal: Exploring Big Data Just Two Solutions Heres How We Get There Key-Value Data Stores Redis Riak Document Data Stores MongoDB Cassandra Graph Data Stores Neo4J MapReduce Through Hadoop Through Riak Through Elastic Mapreduce John Repko -- Pikasoft LLC 13
  • 14. Redis Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis Example: http://www.pikasoft.com/journal/2011/1/2/a-quick-redis-key -value-example-for-the-holidays.html Backing Articles: http://purevirtual.de/2010/04/url-shortener-with-redis-and-rails3/ Code: http://www.pikasoft.com/journal/2011/1/2/a-quick-redis-key-value-example-for-the-holidays.html The good news is, we've already got our base image, and adding a new Redis data store and example app to it only took about an hour. As before, you can play with the URL-shortener at Redis URL Shortener, and you can download and play with the code for the application at:Redis URL Shortener Source Code. Play with this online at: http://jkr-blog.dyndns.org:3001/mini_urls John Repko -- Pikasoft LLC 14
  • 15. Riak Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis Example: http://www.pikasoft.com/journal/2012/1/15/you -only-live-twice-basho-and-riak.html Backing Articles: http://jit.nuance9.com/2010/07/ruby-192-rails-3-riak-and- ripple.html http://jbbarth.com/archives/2011/4/23/basic_usage_of_riak_in / Code: John Repko -- Pikasoft LLC 15
  • 16. Agenda The Goal: Exploring Big Data Just Two Solutions Heres How We Get There Key-Value Data Stores Redis Riak Document Data Stores MongoDB Cassandra Graph Data Stores Neo4J MapReduce Through Hadoop Through Riak / MongoDB Through Elastic Mapreduce John Repko -- Pikasoft LLC 16
  • 17. MongoDB Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs- redis Example: http://www.pikasoft.com/journal/2010/7/31/nosql-on-the-cloud-our-first- application.html Backing Articles: http://www.mongodb.org/display/DOCS/Building+for+ Linux Code: http://www.pikasoft.com/journal/2010/8/16/why-our-little- nosql-app-matters.html So let's sum up -- after a handful of posts and a small but still sorrowful amount of command-line and rails code, we've managed to accomplish the following "Hello World" tasks in NoSQL on the cloud: Created a cloud account Got our first app created, and saw it in a browser on the web Loaded up real development environments (Ruby/Rails we added, Java we got for free) Added a stronger app server (thin >> webrick) and a stronger web server (nginx >> almost anything) Added our first NoSQL data store (MongoDB) and mapping software to simulate ActiveRecord in NoSQL Created a little NoSQL app to show all this, and made it visible though a dynamic DNS address: Rails Mongo Notes Example Just to wrap the little app up: I updated John Nunemaker's Mongomapper demo app to work with Rails3 and the cloud, and if you like you can take a look at the code for it here: Rails Mongo Code. Play with this online at: http://jkr-code.dyndns.org:3000/notes John Repko -- Pikasoft LLC 17
  • 18. Cassandra Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis Example: http://www.pikasoft.com/journal/2011/2/14/casi-casi- cassandra.html Backing Articles: http://www.25hoursaday.com/weblog/2008/05/23/ SomeThoughtsOnTwittersAvailabilityProblems.aspx Code: Here's what the code for that broadcast might look like: # Tweeter class Tweeter < ActiveRecord::Base has_many :followers end - class Follower < ActiveRecord::Base belongs_to :tweeter end All fine so far -- that's the twittery world we all live in. I can send out my breathless message of what I had for breakfast, and then Twitter picks it up and broadcasts the message from me (and all the messages from the other tweeters): @tweeters = Tweeter.find(:all_tweeters) @tweeters.each do |tweeter| @followers = tweeter.find(:all) @followers.each do |follower| tweeter.broadcast_to :recipient => follower end end end So here we're going to do a query for each of the X tweeters, and for them we'll do another query for each of their Y followers. Code smell! Fail Whale!!! John Repko -- Pikasoft LLC 18
  • 19. Agenda Exploring Big Data Just Two Solutions Heres How We Get There Key-Value Data Stores Redis Riak Document Data Stores MongoDB Cassandra Graph Data Stores Neo4J MapReduce Through Hadoop Through Riak / MongoDB Through Elastic Mapreduce John Repko -- Pikasoft LLC 19
  • 20. Neo4J Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis Example: http://www.pikasoft.com/journal/2011/1/21/graph-databases-and-star- wars.html Backing Articles: http://purevirtual.de/2010/04/url-shortener-with-redis-and-rails3/ Code Play with this online at: Six Degrees of Kevin Bacon = http://jkr-blog.dyndns.org:9292/ John Repko -- Pikasoft LLC 20
  • 21. Agenda Exploring Big Data Just Two Solutions Heres How We Get There Key-Value Data Stores Redis Riak Document Data Stores MongoDB Cassandra Graph Data Stores Neo4J MapReduce Through Hadoop Through Riak Through Elastic Mapreduce John Repko -- Pikasoft LLC 21
  • 22. MapReduce via Hadoop, Thrift and AWS Example: Reduce http://www.pikasoft.com/journal/2011/1/9/nosql-next-up-hadoop-and- cloudera.html Backing Articles: http://www.joelonsoftware.com/items/2006/08/01. html Code: Map John Repko -- Pikasoft LLC 22
  • 23. MapReduce via Riak / MongoDB Example: http://www.control-alt-del.org/2011/09/14/fun-with-bloom-filters-using-riak-mapreduce / http://verboselogging.com/2010/03/22/super-mongodb-mapreduce-max-out Backing Articles: MapReduce on Riak http://wiki.basho.com/MapReduce.html http://stackoverflow.com/questions/2123004/mapreduce-with- riak http://www.readwriteweb.com/hack/2011/06/riak-pipe-rethinks-its- mapreduce.php http://www.quora.com/What-are-the-advantages-and-limitations-of-MapReduce-backed-by-distributed-key-value-store Riak MapReduce on MongoDB http://dllhell.net/2010/07/17/on-mapreduce-in-mongodb / http://www.mongodb.org/display/DOCS/ MapReduce http://jonathanhui.com/mongodb-mapreduce http://blog.boxedice.com/2010/06/21/map- reduce-and-mongodb/ Source: http://blog.boxedice.com/2010/06/21/map-reduce-and-mongodb/ John Repko -- Pikasoft LLC 23
  • 24. Elastic MapReduce Example: http://www.commoncrawl.org/mapreduce-for-the-masses/ Backing Articles: http://www.commoncrawl.org/mapreduce-for-the-masses/ Code: John Repko -- Pikasoft LLC 24
  • 25. Summary This Is Only The Beginning. With A Standard Platform Well See Richer Big Data Discoveries Become Routine The Solution Tools (際際滷 9) Become Straightforward if We Run Them on a Standard Architecture One mans noise is another mans data. ~ Bill Stensrud - InstantEncore John Repko -- Pikasoft LLC 25
  • 26. Contacts John Repko: john.repko@pikasoft.com http://pikasoft.s3.amazonaws.com/Pictures_at_an_Exhibition.pptx John Repko -- Pikasoft LLC 26