3. What is ZooKeeper
? a high-performance coordination service for
distributed applications
? common services
– naming
– configuration management
– synchronization
– group services
? used by HBase, Yahoo! Message Broker, Fetch
Service of Yahoo! crawler in Yahoo!( like Google
's Chubby based on Paxos)
18. Motivation
? Map Reduce is very powerful,but:
– It requires a Java programmer.
– re-invent the wheel(join, filter, etc.)
19. Pig Latin
? Pig provides a higher level language, Pig
Latin, that:
– Increases productivity. In one test
? 10 lines of Pig Latin ≈ 200 lines of Java.
? What took 4 hours to write in Java took 15
minutes in Pig Latin.
– Opens the system to non-Java programmers.
– Provides common operations like join,group,
filter, sort.
20. Why a New Language?
? Pig Latin is a data flow language.
? User code and existing binaries can be
included almost anywhere.
? Metadata not required, but used when
available.
? Support for nested types(map,
list,collection...), pig latin support that as
first class type.
? Operates on files in HDFS
21. Background
? Yahoo! was the first big adopter of Hadoop.
? Hadoop gained popularity in the company
quickly.
? Yahoo! Research developed Pig to
address the need for a higher level
language.
? Roughly 30% of Hadoop jobs run at Yahoo!
are Pig jobs.
22. How Pig is Being Used
? Web log processing
? Data processing for web search platforms
? Ad hoc queries across large data sets.
? Rapid prototyping of algorithms for
processing large data sets
23. Accessing Pig
? Submit a script directly.
? Grunt, the pig shell.
? PigServer Java class, a JDBC like
interface.
? PigPen, an eclipse plugin
– Allows textual and graphical scripting.
– Samples data and shows example data
– flow.
25. Data Types
? Scalar types: int, long,double, chararray,
bytearray.
? Complex types:
– map: associative array.
– tuple: ordered list of data, elements may be of
any scalar or complex type.
– bag: unordered collection of tuples.
26. How to use
? No need to install anything extra on your
Hadoop cluster
? Start a terminal and run
$ cd /usr/share/cloudera/pig/
$ bin/pig –x local
Should see a prompt like:
grunt>
27. Load Data
Users = LOAD 'users.txt'
USING PigStorage(',') AS (name, age);
? LOAD … AS …
? PigStorage(‘,’) to specify separator
name age
John,18 John 18
Mary,20
Mary 20
Bob,30
Bob 30
28. Filter
Fltrd = FILTER Users
BY age >= 18 AND age <= 25;
? FILTER … BY …
? constraints can be composite
name age name age
John 18 John 18
Mary 20 Mary 20
Bob 30
29. Generate / Project
Names = FOREACH Fltrd GENERATE name;
? FOREACH … GENERATE
name age name
John 18 John
Mary 20 Mary
30. Store Data
STORE Names INTO 'names.out';
? STORE … INTO …
? PigStorage(‘,’) to specify separator if multiple
fields
31. Command - JOIN
Users = LOAD ‘users’ AS (name, age);
users’
Pages = LOAD ‘pages’ AS (user, url);
pages’
Jnd = JOIN Users BY name, Pages BY user;
name age
John 18 name age user url
Mary 20
John 18 John yaho
Bob 30
Mary 20 Mary goog
user url
John yaho Bob 30 Bob bing
Mary goog
Bob bing
32. Command - GROUP
Grpd = GROUP Jnd by url;
describe Grpd;
name age url yhoo (John, 18, yhoo)
(Dee, 25, yhoo)
John 18 yhoo
Mary 20 goog goog (Mary, 20, goog)
Dee 25 yhoo
bing (Kim, 40, bing)
Kim 40 bing
(Bob, 30, bing)
Bob 30 bing
33. Other Commands
? ORDER – sort by a field
? COUNT – eval: count #elements
? COGROUP – structured JOIN
? More at http://hadoop.apache.org/pig/
34. Reference
? 初识ZooKeeper, http://bbs.hadoopor.com/thread-533-1-1.html
? Zookeeper分布式安装手册, http://bbs.hadoopor.com/thread-1541-1-1.html
? 安装zookeeper, http://bbs.hadoopor.com/thread-836-1-1.html
? Paxos在大型系统中常见的应用场景,http://timyang.net/tag/zookeeper/
? Introduction to Pig programming,Yiwei Chen,Yahoo Search Engineering,
http://www.docstoc.com/docs/27501834/Introduction-to-Pig-programming
? Introduction to Pig,Allen
Gates,Yahoo!,http://www.cloudera.com/videos/introduction_to_pig ,
http://www.cloudera.com/videos/pig_tutorial
? Pig Latin ── Language for Large Data
Processing,http://www.hadoop.tw/2010/04/pig.html
? Pig安装与配置教程,http://www.hadoopor.com/thread-236-1-1.html
? Hadoop学习-9 Pig执
行,http://sunjun041640.blog.163.com/blog/static/2562683220106240117330/
? http://hadoop.apache.org/pig/
? http://hadoop.apache.org/zookeeper/
? http://wiki.apache.org/hadoop/ZooKeeper