ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Fluentd: 
Towards Unified 
Logging Layer 
Kiyoto Tamura 
@kiyototamura 
Treasure Data, Inc.
Pivotal Open Source:  Using Fluentd to gain insights into your logs
? Fluentd maintainer & 
community manager 
? data nerd 
? math nerd 
? nerd 
whoami
Pivotal Open Source:  Using Fluentd to gain insights into your logs
this talk isn¡¯t about 
Big Data
it¡¯s about 
Log Data
a motivating anecdote
The life of 
a data scientist 
(me in 2009)
http://cacm.acm.org/blogs/blog-cacm/169199-data-science-workflow-overview- 
and-challenges/fulltext
Acquire Data (or 
so you think) 
WUT!? Invalid 
UTF8? 
Fix the encoding 
issue¡­ 
Yell at the 
engineers 
Some columns 
are missing!? 
Run the 
script¡­DIVISION 
BY ZERO!!!
±á³¾³¾¡­
Logging.priority 
=> :not_super_high
analytics.priority 
=> :very_high
analytics.needs? :logs 
=> true
outage.priority 
=> :super_high
outage.needs? :logs 
=> [¡°no¡±, ¡°shit¡±]
Metrics and Monitoring 
(hint: you need logs)
Pivotal Open Source:  Using Fluentd to gain insights into your logs
Ops 
VPs 
Engineers 
Managers 
PMs 
More PMs
Pivotal Open Source:  Using Fluentd to gain insights into your logs
How can we do better?
How to Unify Logging (1) 
Common Interface + Decoupling 
Mobile Web IoT 
Message 
Queue 
Search 
Backend 
Analytic DB 
Archival 
Storage 
Unified 
Logging Layer 
Parse into a 
common data format 
Decouple from 
data sources
How to Unify Logging (2) 
Reliability & Scalability 
Mobile Web IoT 
Message 
Queue 
Search 
Backend 
Analytic DB 
Archival 
Storage 
Unified 
Logging Layer 
Need 
persistence/buff 
ering 
Robust retries 
and recovery
How to Unify Logging (3) 
Extensibility 
? Web IoT 
? Search 
Backend 
Analytic DB 
Archival 
Storage 
Unified 
Logging Layer 
Adding a new 
in/output must be 
easy 
Same for filters
Fluentd can help us unify logging
how it works
Pivotal Open Source:  Using Fluentd to gain insights into your logs
127.0.0.1 - - [05/Feb/2012:17:11:55 
+0000] "GET / HTTP/1.1" 200 140 "-" 
"Mozilla/5.0 (Windows NT 6.1; WOW64) 
AppleWebKit/535.19 (KHTML, like Gecko) 
Chrome/18.0.1025.5 Safari/535.19"
{ 
"host": "127.0.0.1", 
"user": "-", 
"method": "GET", 
"path": "/", 
"code": "200", 
"size": "140", 
"referer": "-", 
"agent": ¡°Mozilla/5.0 (Windows¡­" 
}
Pivotal Open Source:  Using Fluentd to gain insights into your logs
Parse as JSON!
?
[¡°05/Feb/2012:17:11:55¡±,¡°web.access¡±,{ 
"host": "127.0.0.1", 
"user": "-", 
"method": "GET", 
"path": "/", 
"code": "200", 
"size": "140", 
"referer": "-", 
"agent": ¡°Mozilla/5.0 (Windows¡­" 
}] 
timestamp tag 
record
?
web.mongodb 
web.file 
web.hdfs 
web.s3 
web.mysql
Demo: Bring Your Own A/B Testing
How A/B Testing Starts 
website 
<script>¡­</script> 
A/B Testing 
SaaS
How A/B Testing Evolves 
Android iOS 
<script>¡­</script> 
A/B Testing 
SaaS 1 
website 
A/B Testing 
SaaS 1 
A/B Testing 
SaaS 1 
<script>¡­</script> 
A/B Testing 
SaaS 1 
event.post()¡­ 
<script>¡­</script> 
event.post()¡­
How A/B Testing Evolves 
Android iOS 
<script>¡­</script> 
A/B Testing 
SaaS 1 
website 
A/B Testing 
SaaS 1 
A/B Testing 
SaaS 1 
<script>¡­</script> 
A/B Testing 
SaaS 1 
event.post()¡­ 
<script>¡­</script> 
event.post()¡­ 
Looks Familiar?
Bring Your Own A/B Testing! 
Android website iOS 
A/B Testing 
SaaS 1 
A/B Testing 
SaaS 2 
Analytic DB 
Archival 
Storage
bit.ly/cf-fluentd
{ 
¡°install¡±: ¡°gem install fluentd¡±, 
¡°website¡±: ¡°www.fluentd.org¡±, 
¡°github¡± : ¡°fluent/fluentd¡±, 
¡°twitter¡±: ¡°@fluentd¡± 
}

More Related Content

Pivotal Open Source: Using Fluentd to gain insights into your logs

Editor's Notes

  1. Thanks the organizers, Pivotal, audience
  2. so, I am a big fan of spoilers when it comes to tech talks. I think spoilers give the audience a much better idea of what to expect. so here it is.
  3. phew. I just said that. no, this talk is definitely not about ¡°big data¡± besides poking fun around the buzzword¡­ people can¡¯t seem to agree on what it is. I want to talk about something far more concrete
  4. I worked as a quantitative analyst for three years
  5. definitely not just data engineers¡¯ problem started to think more deeply about why logging becomes haphazard. talked to hundreds of people at treasure data. eventually, I had a couple of observations.
  6. and here is another observation
  7. the first requirement is common interface between data inputs and outputs. why? common interface -> one data can be stored into multiple places with the same semantics. You don¡¯t know if you stick to the same backend system. You probably will need to piece together information from multiple data sources.
  8. data pipelines fail format changes volume spikes hardware/IaaS hiccups Scalability matters You need to be able to scale out the logging layer.
  9. New data sources/outputs come up Need to be able to extend your system
  10. So, here is a rather self-aggrandizing claim: Fluentd can be that unified logging layer. In the rest of the talk, I will show you how.
  11. yes, it is about log data!