This document provides an agenda for a presentation titled "Pictures at an Exhibition: Ruby, Rails, NoSQL and Big Data". The presentation explores solving big data problems using NoSQL databases and Ruby on Rails. It discusses key-value, document, and graph databases as well as MapReduce. Examples and code snippets are provided for Redis, Riak, MongoDB, Cassandra, Neo4J, and using MapReduce with Hadoop, Riak/MongoDB, and Elastic MapReduce. The goal is to show how big data problems typically have one of two solution patterns: using past patterns to predict the future (foresight) or using past events to explain current outcomes (hindsight).
1 of 26
Downloaded 23 times
More Related Content
Ruby, rails, no sql and big data
1. Pictures at an Exhibition
Ruby, Rails, NoSQL and Big Data
John Repko
John Repko -- Pikasoft LLC
2. Agenda
The Goal: Exploring Big Data with NoSQL and Ruby on Rails
Just Two Solutions Heres How We Get There
Key-Value Data Stores
Redis
Riak
Document Data Stores
MongoDB
Cassandra
Graph Data Stores
Neo4J
MapReduce
Through Hadoop
Through Riak / MongoDB
Through Elastic Mapreduce
John Repko -- Pikasoft LLC 2
3. So How Did We Get to Big Data Anyway?
Source: https://thedailyload.files.wordpress.com/2010/12/william_perry.jpg Source: http://www.startribune.com/sports/164830346.html
Big Data Is Not Just About Big Data Its About FAST Data!
(http://www.pikasoft.com/journal/2011/5/13/not-big-data-fast-data.html)
John Repko -- Pikasoft LLC 3
4. Why is Everyone Diving into Big Data?
There Are Big Data Breakthroughs Everywhere
Google Wins Progressives
the Search Instant
Market Overnight rate
quotes
Source: https://newshour.s3.amazonaws.com/photos/2011/02/16/
kayjay_1_blog_main_horizontal.jpg
Progressive creates an
Massively parallel
insurance quote for
web searches with
Watson Wins on Jeopardy results back in a tenth
every car and truck in
the US every night
Beat the best Jeopardy players of all time of a second
John Repko -- Pikasoft LLC 4
5. Exploring Big Data
Big Data frequently provides solutions to a common set of problems
Source: http://www.slideshare.net/cloudera/20100806-cloudera-10-hadoopable-problems-webinar-4931616
These appear to be 10 Problems but are really only 2 Problems
John Repko -- Pikasoft LLC 5
6. Exploring Big Data
The variety of Big Data wins in the press fall into just two solution patterns
Foresight
We are presented a pattern What has the outcome
been when weve seen similar patterns in the past?
Hindsight
We are presented an outcome -- What pattern of events
anticipated the outcome in the past?
You Dont Need Dozens Of Solution Approaches For Big Data Just Two
John Repko -- Pikasoft LLC 6
7. Exploring Big Data
In this light, lets take a look at the 10 Hadoop-able Problems of Big Data
Summary 10 Common Hadoop-able Problems*
1. Modeling True Risk
What past patterns led to success or default?
1. Customer Churn Analysis
What do customer churn patterns predict about our products and markets?
1. Recommendation Engine
We have search terms what have the results been from similar searches in the past?
1. Ad Targeting
We have profile information what offers have led to sales for similar profiles in the past?
1. PoS Transaction Analysis
We have your purchase history what deals might we offer in the future?
Foresight Hindsight
John Repko -- Pikasoft LLC 7
8. Exploring Big Data
These two solution types apply generally to the Hadoop-able problems
Summary 10 Common Hadoop-able Problems
6. Analyzing Data Logs to Forecast Events
We have your logs what pattern of events have anticipated failures before?
6. Threat Analysis
We have a specific event what results have we seen from similar threats in the past?
6. Trade Surveillance
Does this parcel raise any alarms, based on our history of past parcel-tracking?
6. Search Quality
We have a set of search terms what have similar searches succeeded in finding in the
past?
6. Data Sandbox
We have your data, possibly unstructured data. What patterns in that data might we
bring to your attention now?
Foresight Hindsight
John Repko -- Pikasoft LLC 8
9. The Big Data Platform Provides with Rich Analytics Tools
Key Big Data Analytics Solution Patterns
1. Predictive Modeling 5. Outlier Analysis
2. Data Visualization 6. AB Testing
7. Markov Chains
3. Cluster Partitioning
8. Bloom Filters
4. Collaborative Filtering
John Repko -- Pikasoft LLC 9
10. Exploring Big Data
With Just Two Standard Solution Models We Can
Solve Most Big Data Problems
The Key Is To Shape Big Data Into A Standard
Platform Onto Which We Can Apply These
Analytics Tools
It is not the technology that creates a competitive edge, but the
management process that exploits technology."
~ Shaping the Future- Peter Keen (1991)
John Repko -- Pikasoft LLC 10
11. Agenda
The Goal: Exploring Big Data
Just Two Solutions Heres How We Get There
Key-Value Data Stores
Redis
Riak
Document Data Stores
MongoDB
Cassandra
Graph Data Stores
Neo4J
MapReduce
Through Hadoop
Through Riak / MongoDB
Through Elastic Mapreduce
John Repko -- Pikasoft LLC 11
13. Agenda
The Goal: Exploring Big Data
Just Two Solutions Heres How We Get There
Key-Value Data Stores
Redis
Riak
Document Data Stores
MongoDB
Cassandra
Graph Data Stores
Neo4J
MapReduce
Through Hadoop
Through Riak
Through Elastic Mapreduce
John Repko -- Pikasoft LLC 13
14. Redis
Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
Example:
http://www.pikasoft.com/journal/2011/1/2/a-quick-redis-key
-value-example-for-the-holidays.html
Backing Articles:
http://purevirtual.de/2010/04/url-shortener-with-redis-and-rails3/
Code:
http://www.pikasoft.com/journal/2011/1/2/a-quick-redis-key-value-example-for-the-holidays.html
The good news is, we've already got our base image, and adding a new Redis data store and
example app to it only took about an hour. As before, you can play with the URL-shortener at Redis
URL Shortener, and you can download and play with the code for the application at:Redis URL
Shortener Source Code.
Play with this online at:
http://jkr-blog.dyndns.org:3001/mini_urls
John Repko -- Pikasoft LLC 14
16. Agenda
The Goal: Exploring Big Data
Just Two Solutions Heres How We Get There
Key-Value Data Stores
Redis
Riak
Document Data Stores
MongoDB
Cassandra
Graph Data Stores
Neo4J
MapReduce
Through Hadoop
Through Riak / MongoDB
Through Elastic Mapreduce
John Repko -- Pikasoft LLC 16
17. MongoDB
Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-
redis
Example:
http://www.pikasoft.com/journal/2010/7/31/nosql-on-the-cloud-our-first-
application.html
Backing Articles:
http://www.mongodb.org/display/DOCS/Building+for+
Linux
Code:
http://www.pikasoft.com/journal/2010/8/16/why-our-little-
nosql-app-matters.html
So let's sum up -- after a handful of posts and a small but still sorrowful amount of command-line and rails code,
we've managed to accomplish the following "Hello World" tasks in NoSQL on the cloud:
Created a cloud account
Got our first app created, and saw it in a browser on the web
Loaded up real development environments (Ruby/Rails we added, Java we got for free)
Added a stronger app server (thin >> webrick) and a stronger web server (nginx >> almost anything)
Added our first NoSQL data store (MongoDB) and mapping software to simulate ActiveRecord in NoSQL
Created a little NoSQL app to show all this, and made it visible though a dynamic DNS address:
Rails Mongo Notes Example
Just to wrap the little app up: I updated John Nunemaker's Mongomapper demo app to work with Rails3 and the
cloud, and if you like you can take a look at the code for it here: Rails Mongo Code.
Play with this online at:
http://jkr-code.dyndns.org:3000/notes
John Repko -- Pikasoft LLC 17
18. Cassandra
Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
Example:
http://www.pikasoft.com/journal/2011/2/14/casi-casi-
cassandra.html
Backing Articles:
http://www.25hoursaday.com/weblog/2008/05/23/
SomeThoughtsOnTwittersAvailabilityProblems.aspx
Code:
Here's what the code for that broadcast might look like:
# Tweeter class Tweeter < ActiveRecord::Base has_many :followers end -
class Follower < ActiveRecord::Base belongs_to :tweeter end
All fine so far -- that's the twittery world we all live in. I can send out my breathless message of what
I had for breakfast, and then Twitter picks it up and broadcasts the message from me (and all the
messages from the other tweeters):
@tweeters = Tweeter.find(:all_tweeters) @tweeters.each do |tweeter|
@followers = tweeter.find(:all) @followers.each do |follower|
tweeter.broadcast_to :recipient => follower end end end
So here we're going to do a query for each of the X tweeters, and for them we'll do another query for
each of their Y followers.
Code smell! Fail Whale!!!
John Repko -- Pikasoft LLC 18
19. Agenda
Exploring Big Data
Just Two Solutions Heres How We Get There
Key-Value Data Stores
Redis
Riak
Document Data Stores
MongoDB
Cassandra
Graph Data Stores
Neo4J
MapReduce
Through Hadoop
Through Riak / MongoDB
Through Elastic Mapreduce
John Repko -- Pikasoft LLC 19
20. Neo4J
Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
Example:
http://www.pikasoft.com/journal/2011/1/21/graph-databases-and-star-
wars.html
Backing Articles:
http://purevirtual.de/2010/04/url-shortener-with-redis-and-rails3/
Code
Play with this online at:
Six Degrees of Kevin Bacon = http://jkr-blog.dyndns.org:9292/
John Repko -- Pikasoft LLC 20
21. Agenda
Exploring Big Data
Just Two Solutions Heres How We Get There
Key-Value Data Stores
Redis
Riak
Document Data Stores
MongoDB
Cassandra
Graph Data Stores
Neo4J
MapReduce
Through Hadoop
Through Riak
Through Elastic Mapreduce
John Repko -- Pikasoft LLC 21
22. MapReduce via Hadoop, Thrift and AWS
Example: Reduce
http://www.pikasoft.com/journal/2011/1/9/nosql-next-up-hadoop-and-
cloudera.html
Backing Articles:
http://www.joelonsoftware.com/items/2006/08/01.
html
Code:
Map
John Repko -- Pikasoft LLC 22
23. MapReduce via Riak / MongoDB
Example:
http://www.control-alt-del.org/2011/09/14/fun-with-bloom-filters-using-riak-mapreduce
/
http://verboselogging.com/2010/03/22/super-mongodb-mapreduce-max-out
Backing Articles:
MapReduce on Riak
http://wiki.basho.com/MapReduce.html
http://stackoverflow.com/questions/2123004/mapreduce-with-
riak
http://www.readwriteweb.com/hack/2011/06/riak-pipe-rethinks-its-
mapreduce.php
http://www.quora.com/What-are-the-advantages-and-limitations-of-MapReduce-backed-by-distributed-key-value-store
Riak
MapReduce on MongoDB
http://dllhell.net/2010/07/17/on-mapreduce-in-mongodb
/
http://www.mongodb.org/display/DOCS/
MapReduce
http://jonathanhui.com/mongodb-mapreduce
http://blog.boxedice.com/2010/06/21/map-
reduce-and-mongodb/
Source: http://blog.boxedice.com/2010/06/21/map-reduce-and-mongodb/
John Repko -- Pikasoft LLC 23
25. Summary
This Is Only The Beginning. With A
Standard Platform Well See Richer Big Data
Discoveries Become Routine
The Solution Tools (際際滷 9) Become
Straightforward if We Run Them on a
Standard Architecture
One mans noise is another mans data.
~ Bill Stensrud - InstantEncore
John Repko -- Pikasoft LLC 25
26. Contacts
John Repko: john.repko@pikasoft.com
http://pikasoft.s3.amazonaws.com/Pictures_at_an_Exhibition.pptx
John Repko -- Pikasoft LLC 26