This document discusses data locality and scaling algorithms to large datasets. It notes that as more processors are added, the problem size should also scale up according to Gustafson's Law. Map-Reduce platforms provide benefits like fault tolerance, orchestration, and redundant storage that help process terabytes and petabytes of data by adding new nodes smoothly. The conclusion recommends adopting an appropriate architecture now to enable processing of big data by adding cores or computers in the future.
1 of 12
Download to read offline
More Related Content
If the Data Cannot Come To The Algorithm...
1. If the Data Cannot Come to
the Algorithm...
many cores with java
session four
data locality
copyright 2013 Robert Burrell Donkin robertburrelldonkin.name
this work is licensed under a Creative Commons Attribution 3.0 Unported License
5. S(p) = p - a (p-1)
S(p) p
a
"in practice, the problem size scales with the number of
11. Commodity hardware
Scales up to Terabyte and Petabyte
smoothly by adding new nodes
Map-Reduce platforms typically provide
fault tolerance eg. retry
orchestration
redundant data storage
Statistical resilience
12. When you want to be able to process big data
tomorrow by adding cores or computers, adopt
an appropriate architecture today.