2. o s炭 ve直k辿 d叩ta
Tomajov叩 de鍖nicia
Tak辿 d叩ta, ktor辿: nevojd炭 na jeden stroj, alebo sa s ned叩 s nimi
pracova泥 v re叩lnom ase na jednom stroji
Monday, July 25, 11
3. Preo s炭 d担le転it辿?
D叩t je st叩le viac a viac
Web 2.0 - soci叩lny aspekt webu zabezpeuje vytv叩ranie obrovsk辿ho
mno転stva pou転ite直n箪ch d叩t
Jednoduch箪 pr鱈klad: Facebook
135 bilionov spr叩v za mesiac
20 bilionov udalost鱈 za de - 200 000 za sekundu
Monday, July 25, 11
4. N叩rast Facebooku
Nov辿 data za en (GB)
4,000
3,000
2,000
1,000
Marec 2008
Apr鱈l 2009 0
Okt坦ber 2009
Monday, July 25, 11
5. Ako uklada泥 ve直k辿 d叩ta
SQL datab叩zy trpia z叩kladnym probl辿mom so 邸k叩lovate頂nos泥ou
NoSQL - 直ahko 邸k叩lovate頂n辿 - vhodn辿 pre ve直k辿 d叩ta
Monday, July 25, 11
7. NoSQL
Viacer辿 typy
document oriented, column oriented, graph oriented, key-value
Vysok箪 v箪kon
Obmedzen辿 mo転nosti - oproti SQL datab叩zam
Neexistuje 邸tandard pre pr叩cu s d叩tami
V praxi sa osvedila kombin叩cia NoSQL s SQL
Monday, July 25, 11
8. Google
MapReduce
2004 Google vydal paper:
MapReduce: Simpli鍖ed Data
Processing on Large Clusters
Monday, July 25, 11
9. Ciele
MapReduce
Rozlo転i泥 v箪poet medzi viacero
strojov - nodov
Jednoduch箪 framework, ktor箪
zabezpe鱈 jednoduch辿 p鱈sanie
tak辿hoto k坦du
Horizont叩lna 邸k叩lovate直nos泥
Monday, July 25, 11
11. Existuje viacero nodov, ktor辿 m担転u robi泥 viacero vec鱈
2 z叩kladne 炭lohy
Map job
vstupn箪 vektor <key1, value1>
v箪stupn箪 zoznam vektorov <key2, value2>
Reduce job
vstupn箪 vektor <key2, <zoznam hodnot z maperov s v箪stupom key2>>
v箪stupn箪 zoznam vektorov <key3, value3>
Monday, July 25, 11
12. Jednoduch箪 pr鱈klad - spo鱈tanie
slov
void map(String name, String document):
// name: document name
// document: document contents
for each word w in document:
EmitIntermediate(w, "1");
void reduce(String word, Iterator partialCounts):
// word: a word
// partialCounts: a list of aggregated partial counts
int sum = 0;
for each pc in partialCounts:
sum += ParseInt(pc);
Emit(word, AsString(sum));
Monday, July 25, 11
16. Diplomovka
Pr叩ca s Twitter Datasetom
takmer 30 GB textov箪 subor
al邸ie p叩r sto megov辿 csvcka
implement叩cia viacer箪ch Mapperov a Reducerov pre v箪poet
ohodnotenia str叩nok pomocou tweetov z mikroblogu
Monday, July 25, 11
18. Open source MapReduce framework
Nap鱈san箪 v Jave
Podporuje aj in辿 jazyky
Vyu転鱈vaj炭 ho dnes okrem Google-u takmer v邸etci
ve直k箪 IT hr叩鱈:
Facebook, Twitter, LinkedIn, Adobe, Amazon, Apple,
eBay, Hulu, IBM, Last.fm, Yahoo a stra邸ne ve直a
al邸鱈ch
Monday, July 25, 11