狠狠撸

狠狠撸Share a Scribd company logo
@
chenxiaoming@
LOG


LOG




  LSP


  DISQL
! ? 46.70.93.94 - - [11/Nov/2011:11:11:11 -1100] "GET /book/
    1984.html HTTP/1.1“404 2326 http://www./s?
  wd=1984&rsv_bp=0&rsv_spt=3&inputT=947 "Mozilla/
  5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us)
  AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4
  Mobile/7B314 Safari/531.21.10 “
——



!   ?   46.70.93.94 - -
!   ?   [11/Nov/2011:11:11:11 -1100]
!   ?   "GET /book/1984.html HTTP/1.1"
!   ?   404
!   ?   2326
! ? "http://www./s?
    wd=1984&rsv_bp=0&rsv_spt=3&inputT=9
    47"
! ? "Mozilla/5.0(iPad; U; CPU iPhone OS 3_2
    like Mac OS X; en-us) AppleWebKit/
    531.21.10 (KHTML, like Gecko) Version/
    4.0.4 Mobile/7B314 Safari/531.21.10 “
——



!   ?   46.70.93.94 - -
!   ?   [11/Nov/2011:11:11:11 -1100]
!   ?   "GET /book/1984.html HTTP/1.1"
!   ?   404
!   ?   2326
! ? "http://www./s?
    wd=1984&rsv_bp=0&rsv_spt=3&inputT=9
    47"
! ? "Mozilla/5.0(iPad; U; CPU iPhone OS 3_2
    like Mac OS X; en-us) AppleWebKit/
    531.21.10 (KHTML, like Gecko) Version/
    4.0.4 Mobile/7B314 Safari/531.21.10 “
——



!   ?   46.70.93.94 - -
!   ?   [11/Nov/2011:11:11:11 -1100]
!   ?   "GET /book/1984.html HTTP/1.1"
!   ?   404
!   ?   2326
! ? "http://www./s?
    wd=1984&rsv_bp=0&rsv_spt=3&inputT=9
    47"
! ? " Mozilla/5.0(iPad; U; CPU iPhone OS
    3_2 like Mac OS X; en-us) AppleWebKit/
    531.21.10 (KHTML, like Gecko) Version/
    4.0.4 Mobile/7B314 Safari/531.21.10"
——



!   ?   46.70.93.94 - -
!   ?   [11/Nov/2011:11:11:11 -1100]
!   ?   GET /book/1984.html HTTP/1.1
!   ?   404
!   ?   2326
! ? "http://www./s?
    wd=1984&rsv_bp=0&rsv_spt=3&inputT=94
    7"
! ? "Mozilla/5.0(iPad; U; CPU iPhone OS
    3_2 like Mac OS X; en-us)
    AppleWebKit/531.21.10 (KHTML, like
    Gecko) Version/4.0.4 Mobile/7B314
    Safari/531.21.10 "
——


!   ?   46.70.93.94 - -
!   ?   [11/Nov/2011:11:11:11 -1100]
!   ?   "GET /book/1984.html HTTP/1.1"
!   ?   404
!   ?   2326
!   ?   "http://www./s?
        wd=1984&rsv_bp=0&rsv_spt=3&inputT=947"
! ? " Mozilla/5.0(iPad; U; CPU iPhone OS
    3_2 like Mac OS X; en-us)
    AppleWebKit/531.21.10 (KHTML, like
    Gecko) Version/4.0.4 Mobile/7B314
    Safari/531.21.10 "
——



!   ?   46.70.93.94 - -
!   ?   [11/Nov/2011:11:11:11 -1100]
!   ?   "GET /book/1984.html HTTP/1.1"
!   ?   404
!   ?   2326
! ? " http://www./s?wd=1984&
    rsv_bp=0&rsv_spt=3&inputT=947 "
! ? "Mozilla/5.0(iPad; U; CPU iPhone OS
    3_2 like Mac OS X; en-us)
    AppleWebKit/531.21.10 (KHTML, like
    Gecko) Version/4.0.4 Mobile/7B314
    Safari/531.21.10 "
——
——
LOG




  LSP


  DISQL
??   ??   ??          ??
??   ??               ??
                      ??
          ?? Ad$hoc
                      ?? ……
LOG




  LSP


  DISQL
!?   …




!   ?                  $
!   ?B*S       $
!    ?     $
!    ?             $
!   ?       C++$       $
                              !   ?C++                 $
!   ? SQL                 $
!   ?PHP$+$C$            $    !    ?             $
                              !    ?Schema         $
!    ?               $
!    ?                        !     ?          $
       PHP .so   $
LSP


LOG




  LSP


  DISQL
20th.陈晓鸣 百度海量日志分析架构及处理经验分享
UI
20th.陈晓鸣 百度海量日志分析架构及处理经验分享
20th.陈晓鸣 百度海量日志分析架构及处理经验分享
DQuery
20th.陈晓鸣 百度海量日志分析架构及处理经验分享
DISQL


LOG




  LSP


  DISQL
!?
     !   ?
     !   ?        _Url   _Res(        )
     !   ?                   _Url          _Site
     !   ?
     !   ?
             !?
             !?
     !?                             JSON
DQuery
!   ?
!   ?        _Url   _Res(        )
!   ?                   _Url          _S
!   ?
!   ?
        !?
        !?
!?                             JSON
PHP-Callback
C-callback
!?    PHP           SQL    (         )
     !?          SQL      M/R
          !?
!?                        DAG
     !?                     MapReduce
     !?
!?                 SQL
!?         PHP
!?         C++                  +        C-Runtime NEW!
     !   ?                 RAII +
     !   ?           Copy On Write
     !   ?                                 schema
     !   ? C++      PHP
20th.陈晓鸣 百度海量日志分析架构及处理经验分享
!?

!?
!?                        parser
!?              JSON
[
     {
         "cmd": "load“,
         "path": null
         "using": "SchemaReader"
         "from": 17
         "options": {"max_item_in_mem“: 100000}
         "include": [25]
     }
      , {"cmd":"filter"……}, {"cmd":"join"……},…… ……
]
SQL
[
    {
        "cmd": "load“,
        "path": null
        "using": "SchemaReader"
        "from": 17
        "options":
        {"max_item_in_mem“:
        100000}
        "include": [25]
    }
        , {"cmd":"filter"……},
        {"cmd":"join"……},…… ……
]
!?
     !?
          !?
     !?
          !?
     !?
          !?
     !?           (     )
          !?           MapReduce
     ! ? Schema
          !?                   schema
     !?               C++ PHP DOT
          !?
Group


Unique             Shu?e
                              Map$Phase$
                   Reduce$
                                  $
                      $
                   Limit$1        $
                                Group
                                  $
                                  $
                               Combine$
                                  $$
         Group                  Count

         Shu?e                    $
                                Shu?e
                                  $
         Reduce$                  $
           $                    Reduce$
                                  $
         Count
                                  $
                                  $
                                 Sum
                                  $
                             Reduce$Phase
Schema


?eld     ID    name      age                ?eld       ID     score
type    uint64 string   int32               type     uint64   double
index     2      5        9                 Index         0     1



                                  join



               Field      ID     name     age       Score
               Type     Uint64   string   int32     double
               Index      2        5       9         10
!   ?
!   ?
!   ? Combiner
!   ? Cached Combiner
!   ? key Join
!   ?
!   ?
       !?               I/O
20th.陈晓鸣 百度海量日志分析架构及处理经验分享
!   ? PHP
!   ? C++
!   ? DOT
!   ?     / MapReduce
Processor        ——Pipes & Filter
  class$
Processor
init()
process(
)
?ni()



            class$        class$         class$            class$
           Selector       Filter        Counter        UserProcessor

       init()           init()         init()         init()
       process(         process(       process(
                                                      process()
       )                )              )
       ?ni()            ?ni()          ?ni()          ?ni()
!?
          4 1        10 27
           3540        4761        1221          +34.5%
 DQuery    1153        3359        2206         +191%
           1569        2963        1394          +88.9%
!?                         ! ? LSP

          24%                 PM          1352       47.4%
                  }$$67%
 DQuery   43%                 RD          1174       41.2%
          33%                 OP          190        6.66%
                                          136        4.77%
                                          2852       100%
LOG




  LSP


  DISQL
!?
     !?            ●
!?
     !?             ●             ●            ●                           …
!?
!?        LSP
     !?            ●     UI ●
!?        DISQL
     !?            ●            ●                  ●               ●
!?
!?
     !?              (@               )         (chenxiaoming@)
     !?         Hadoop in China           12   2 2 20          DISQL2.0
……
                  ……
chenxiaoming@
关注我们:迟.产补颈诲耻-迟别肠丑.肠辞尘

          资料下载和详细介绍:infoq.com/cn/zones/baidu-salon
“畅想?交流?争鸣?聚会”是百度技术沙龙的宗旨。 百度技术沙龙是由百度与InfoQ中文站定期组织的线下技术交流活动。目
的是让中高端技术人员有一个相对自由的思想交流和交友沟通的的平台。主要分讲师分享和OpenSpace两个关键环节,每期
只关注一个焦点话题。

讲师分享和现场Q&A让大家了解百度和其他知名网站技术支持的先进实践经验,OpenSpace环节是百度技术沙龙主题的升华
和展开,提供一个自由交流的平台。针对当期主题,参与者人人都可以发起话题,展开讨论。



                  InfoQ 策划·组织·实施
                  关注我们:weibo.com/infoqchina

More Related Content

What's hot (20)

PDF
I, For One, Welcome Our New Perl6 Overlords
heumann
?
PDF
Bash Scripting
Marian Marinov
?
PDF
Go Web Development
Cheng-Yi Yu
?
PPTX
What is systemd? Why use it? how does it work? - devoxx france 2017
Quentin Adam
?
PDF
Javascript - The Good, the Bad and the Ugly
Thorsten Suckow-Homberg
?
PDF
Having Fun Programming!
Aaron Patterson
?
ODP
Nigel hamilton-megameet-2013
trexy
?
PPT
Best training-in-mumbai-shell scripting
vibrantuser
?
PDF
PerlでWeb API入門
Yusuke Wada
?
TXT
Logrotate sh
Ben Pope
?
ODP
Whatsnew in-perl
daoswald
?
PDF
What is systemd? Why use it? how does it work? - breizhcamp
Quentin Adam
?
PPTX
Ch1(introduction to php)
Chhom Karath
?
PDF
Bologna Developer Zone - About Kotlin
Marco Vasapollo
?
ZIP
Ruby on Rails: Tasty Burgers
Aaron Patterson
?
TXT
Wsomdp
riahialae
?
PPT
Functional Pe(a)rls version 2
osfameron
?
DOC
PHP code examples
programmingslides
?
PDF
Needle in an enterprise haystack
Andrew Mleczko
?
KEY
Desarrollando aplicaciones web en minutos
Edgar Suarez
?
I, For One, Welcome Our New Perl6 Overlords
heumann
?
Bash Scripting
Marian Marinov
?
Go Web Development
Cheng-Yi Yu
?
What is systemd? Why use it? how does it work? - devoxx france 2017
Quentin Adam
?
Javascript - The Good, the Bad and the Ugly
Thorsten Suckow-Homberg
?
Having Fun Programming!
Aaron Patterson
?
Nigel hamilton-megameet-2013
trexy
?
Best training-in-mumbai-shell scripting
vibrantuser
?
PerlでWeb API入門
Yusuke Wada
?
Logrotate sh
Ben Pope
?
Whatsnew in-perl
daoswald
?
What is systemd? Why use it? how does it work? - breizhcamp
Quentin Adam
?
Ch1(introduction to php)
Chhom Karath
?
Bologna Developer Zone - About Kotlin
Marco Vasapollo
?
Ruby on Rails: Tasty Burgers
Aaron Patterson
?
Wsomdp
riahialae
?
Functional Pe(a)rls version 2
osfameron
?
PHP code examples
programmingslides
?
Needle in an enterprise haystack
Andrew Mleczko
?
Desarrollando aplicaciones web en minutos
Edgar Suarez
?

Similar to 20th.陈晓鸣 百度海量日志分析架构及处理经验分享 (20)

PDF
rOCCI : An overview of the Ruby OCCI Framework
Florian Feldhaus
?
KEY
Intro to Cascading (SpringOne2GX)
Paco Nathan
?
PDF
Postgres demystified
Craig Kerstiens
?
PDF
Python Brasil 2010 - Potter vs Voldemort - Li??es ofidiglotas da prática Pyth...
Rodrigo Senra
?
KEY
Ruby on Big Data @ Philly Ruby Group
Brian O'Neill
?
PPTX
PhillyDB Talk - Beyond Batch
boorad
?
KEY
NoSQL "Tools in Action" talk at Devoxx
NGDATA
?
PDF
lab4_php
tutorialsruby
?
PDF
lab4_php
tutorialsruby
?
PDF
The NERD project
Giuseppe Rizzo
?
KEY
LibreCat::Catmandu
Patrick Hochstenbach
?
PPT
Catmandu Librecat
Patrick Hochstenbach
?
PDF
Farewell to Disks: Efficient Processing of Obstinate Data
Distinguished Lecturer Series - Leon The Mathematician
?
PDF
Lecture 10: Data-Intensive Computing for Text Analysis (Fall 2011)
Matthew Lease
?
PDF
贰肠辞蝉基础应用介绍
wanglei999
?
PPTX
The Hadoop Ecosystem
J Singh
?
PDF
Introduction to NoSQL and Couchbase
Dipti Borkar
?
PDF
Invisible loading
Daniel Abadi
?
PDF
Going beyond Django ORM limitations with Postgres
Craig Kerstiens
?
PDF
Making Big Data Analytics Interactive and Real-?Time
Seven Nguyen
?
rOCCI : An overview of the Ruby OCCI Framework
Florian Feldhaus
?
Intro to Cascading (SpringOne2GX)
Paco Nathan
?
Postgres demystified
Craig Kerstiens
?
Python Brasil 2010 - Potter vs Voldemort - Li??es ofidiglotas da prática Pyth...
Rodrigo Senra
?
Ruby on Big Data @ Philly Ruby Group
Brian O'Neill
?
PhillyDB Talk - Beyond Batch
boorad
?
NoSQL "Tools in Action" talk at Devoxx
NGDATA
?
lab4_php
tutorialsruby
?
lab4_php
tutorialsruby
?
The NERD project
Giuseppe Rizzo
?
LibreCat::Catmandu
Patrick Hochstenbach
?
Catmandu Librecat
Patrick Hochstenbach
?
Farewell to Disks: Efficient Processing of Obstinate Data
Distinguished Lecturer Series - Leon The Mathematician
?
Lecture 10: Data-Intensive Computing for Text Analysis (Fall 2011)
Matthew Lease
?
贰肠辞蝉基础应用介绍
wanglei999
?
The Hadoop Ecosystem
J Singh
?
Introduction to NoSQL and Couchbase
Dipti Borkar
?
Invisible loading
Daniel Abadi
?
Going beyond Django ORM limitations with Postgres
Craig Kerstiens
?
Making Big Data Analytics Interactive and Real-?Time
Seven Nguyen
?
Ad

Recently uploaded (19)

PDF
tiranga ritik baclink indexing on google
Jalwa Game
?
PDF
Mayur ReseavsdavdvZVdvzvsdvsdrch Poster.pdf
ssuser762ca81
?
PDF
16 Billions Google Leaked Password Alert in 2025
Harshh Goel
?
PPT
PRESENTATION ON DYANAM YOGA BRAIN HEART SOUL
VenkatDeepakSarma
?
PPTX
原版一样(滨奥鲍毕业证书)美国印第安纳卫斯里大学毕业证在线购买
taqyed
?
PPTX
Hasta la vista sota vita la humour.pptx
JohnAsir4
?
PPTX
仿制颁厂鲍厂学费单美国加利福尼亚州立大学萨克拉门托分校毕业证范本,颁厂鲍厂文凭
taqyed
?
PDF
Elio 2025 Review - Best Animation and Heartfelt.pdf
Flixtor Tor
?
PDF
How Los Angeles Shaped the Evolution of Sitcoms by David Shane PR.pdf
David Shane PR
?
PPTX
办理学历认证鲍厂颁学生证西班牙圣地亚哥德孔波斯特拉大学电子毕业证,鲍厂颁成绩单防伪
Taqyea
?
PDF
Arpit Bala – Talented Rapper & YouTuber from Faridabad
cricketmach.com
?
DOCX
ANNOTATION TEMPLATE FOR TEACHER I-III.docx
catherine borja
?
PPTX
Rice Genomics & Whole Genome Sequencing.pptx
LikhithHR
?
PDF
Breaking the Romance Narrative – Why I Wrote “Hello”
itstriggerhere
?
PPTX
silver_linings_playbook the movie the movie
VernonSmap
?
PDF
Dana Guerin - A Film Producer And Philanthropist
Dana Guerin
?
PPTX
Language Arts Subject for High School - 11th Grade_ Dark Romanticism_ America...
youcefdjarir66
?
DOCX
Top 1 app watch girls livestream (1).docx
jonhsey0009
?
PPTX
在线购买西班牙毕业证安东尼奥·德·内夫里哈大学文凭鲍础狈贰学费单
Taqyea
?
tiranga ritik baclink indexing on google
Jalwa Game
?
Mayur ReseavsdavdvZVdvzvsdvsdrch Poster.pdf
ssuser762ca81
?
16 Billions Google Leaked Password Alert in 2025
Harshh Goel
?
PRESENTATION ON DYANAM YOGA BRAIN HEART SOUL
VenkatDeepakSarma
?
原版一样(滨奥鲍毕业证书)美国印第安纳卫斯里大学毕业证在线购买
taqyed
?
Hasta la vista sota vita la humour.pptx
JohnAsir4
?
仿制颁厂鲍厂学费单美国加利福尼亚州立大学萨克拉门托分校毕业证范本,颁厂鲍厂文凭
taqyed
?
Elio 2025 Review - Best Animation and Heartfelt.pdf
Flixtor Tor
?
How Los Angeles Shaped the Evolution of Sitcoms by David Shane PR.pdf
David Shane PR
?
办理学历认证鲍厂颁学生证西班牙圣地亚哥德孔波斯特拉大学电子毕业证,鲍厂颁成绩单防伪
Taqyea
?
Arpit Bala – Talented Rapper & YouTuber from Faridabad
cricketmach.com
?
ANNOTATION TEMPLATE FOR TEACHER I-III.docx
catherine borja
?
Rice Genomics & Whole Genome Sequencing.pptx
LikhithHR
?
Breaking the Romance Narrative – Why I Wrote “Hello”
itstriggerhere
?
silver_linings_playbook the movie the movie
VernonSmap
?
Dana Guerin - A Film Producer And Philanthropist
Dana Guerin
?
Language Arts Subject for High School - 11th Grade_ Dark Romanticism_ America...
youcefdjarir66
?
Top 1 app watch girls livestream (1).docx
jonhsey0009
?
在线购买西班牙毕业证安东尼奥·德·内夫里哈大学文凭鲍础狈贰学费单
Taqyea
?
Ad

20th.陈晓鸣 百度海量日志分析架构及处理经验分享

  • 2. LOG LOG LSP DISQL
  • 3. ! ? 46.70.93.94 - - [11/Nov/2011:11:11:11 -1100] "GET /book/ 1984.html HTTP/1.1“404 2326 http://www./s? wd=1984&rsv_bp=0&rsv_spt=3&inputT=947 "Mozilla/ 5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B314 Safari/531.21.10 “
  • 4. —— ! ? 46.70.93.94 - - ! ? [11/Nov/2011:11:11:11 -1100] ! ? "GET /book/1984.html HTTP/1.1" ! ? 404 ! ? 2326 ! ? "http://www./s? wd=1984&rsv_bp=0&rsv_spt=3&inputT=9 47" ! ? "Mozilla/5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/ 531.21.10 (KHTML, like Gecko) Version/ 4.0.4 Mobile/7B314 Safari/531.21.10 “
  • 5. —— ! ? 46.70.93.94 - - ! ? [11/Nov/2011:11:11:11 -1100] ! ? "GET /book/1984.html HTTP/1.1" ! ? 404 ! ? 2326 ! ? "http://www./s? wd=1984&rsv_bp=0&rsv_spt=3&inputT=9 47" ! ? "Mozilla/5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/ 531.21.10 (KHTML, like Gecko) Version/ 4.0.4 Mobile/7B314 Safari/531.21.10 “
  • 6. —— ! ? 46.70.93.94 - - ! ? [11/Nov/2011:11:11:11 -1100] ! ? "GET /book/1984.html HTTP/1.1" ! ? 404 ! ? 2326 ! ? "http://www./s? wd=1984&rsv_bp=0&rsv_spt=3&inputT=9 47" ! ? " Mozilla/5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/ 531.21.10 (KHTML, like Gecko) Version/ 4.0.4 Mobile/7B314 Safari/531.21.10"
  • 7. —— ! ? 46.70.93.94 - - ! ? [11/Nov/2011:11:11:11 -1100] ! ? GET /book/1984.html HTTP/1.1 ! ? 404 ! ? 2326 ! ? "http://www./s? wd=1984&rsv_bp=0&rsv_spt=3&inputT=94 7" ! ? "Mozilla/5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B314 Safari/531.21.10 "
  • 8. —— ! ? 46.70.93.94 - - ! ? [11/Nov/2011:11:11:11 -1100] ! ? "GET /book/1984.html HTTP/1.1" ! ? 404 ! ? 2326 ! ? "http://www./s? wd=1984&rsv_bp=0&rsv_spt=3&inputT=947" ! ? " Mozilla/5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B314 Safari/531.21.10 "
  • 9. —— ! ? 46.70.93.94 - - ! ? [11/Nov/2011:11:11:11 -1100] ! ? "GET /book/1984.html HTTP/1.1" ! ? 404 ! ? 2326 ! ? " http://www./s?wd=1984& rsv_bp=0&rsv_spt=3&inputT=947 " ! ? "Mozilla/5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B314 Safari/531.21.10 "
  • 12. LOG LSP DISQL
  • 13. ?? ?? ?? ?? ?? ?? ?? ?? ?? Ad$hoc ?? ……
  • 14. LOG LSP DISQL
  • 15. !? … ! ? $ ! ?B*S $ ! ? $ ! ? $
  • 16. ! ? C++$ $ ! ?C++ $ ! ? SQL $ ! ?PHP$+$C$ $ ! ? $ ! ?Schema $ ! ? $ ! ? ! ? $ PHP .so $
  • 17. LSP LOG LSP DISQL
  • 19. UI
  • 24. DISQL LOG LSP DISQL
  • 25. !? ! ? ! ? _Url _Res( ) ! ? _Url _Site ! ? ! ? !? !? !? JSON
  • 26. DQuery ! ? ! ? _Url _Res( ) ! ? _Url _S ! ? ! ? !? !? !? JSON
  • 29. !? PHP SQL ( ) !? SQL M/R !? !? DAG !? MapReduce !? !? SQL !? PHP !? C++ + C-Runtime NEW! ! ? RAII + ! ? Copy On Write ! ? schema ! ? C++ PHP
  • 31. !? !? !? parser !? JSON [ { "cmd": "load“, "path": null "using": "SchemaReader" "from": 17 "options": {"max_item_in_mem“: 100000} "include": [25] } , {"cmd":"filter"……}, {"cmd":"join"……},…… …… ]
  • 32. SQL [ { "cmd": "load“, "path": null "using": "SchemaReader" "from": 17 "options": {"max_item_in_mem“: 100000} "include": [25] } , {"cmd":"filter"……}, {"cmd":"join"……},…… …… ]
  • 33. !? !? !? !? !? !? !? !? ( ) !? MapReduce ! ? Schema !? schema !? C++ PHP DOT !?
  • 34. Group Unique Shu?e Map$Phase$ Reduce$ $ $ Limit$1 $ Group $ $ Combine$ $$ Group Count Shu?e $ Shu?e $ Reduce$ $ $ Reduce$ $ Count $ $ Sum $ Reduce$Phase
  • 35. Schema ?eld ID name age ?eld ID score type uint64 string int32 type uint64 double index 2 5 9 Index 0 1 join Field ID name age Score Type Uint64 string int32 double Index 2 5 9 10
  • 36. ! ? ! ? ! ? Combiner ! ? Cached Combiner ! ? key Join ! ? ! ? !? I/O
  • 38. ! ? PHP ! ? C++ ! ? DOT ! ? / MapReduce
  • 39. Processor ——Pipes & Filter class$ Processor init() process( ) ?ni() class$ class$ class$ class$ Selector Filter Counter UserProcessor init() init() init() init() process( process( process( process() ) ) ) ?ni() ?ni() ?ni() ?ni()
  • 40. !? 4 1 10 27 3540 4761 1221 +34.5% DQuery 1153 3359 2206 +191% 1569 2963 1394 +88.9% !? ! ? LSP 24% PM 1352 47.4% }$$67% DQuery 43% RD 1174 41.2% 33% OP 190 6.66% 136 4.77% 2852 100%
  • 41. LOG LSP DISQL
  • 42. !? !? ● !? !? ● ● ● … !? !? LSP !? ● UI ● !? DISQL !? ● ● ● ● !? !? !? (@ ) (chenxiaoming@) !? Hadoop in China 12 2 2 20 DISQL2.0
  • 43. …… …… chenxiaoming@
  • 44. 关注我们:迟.产补颈诲耻-迟别肠丑.肠辞尘 资料下载和详细介绍:infoq.com/cn/zones/baidu-salon “畅想?交流?争鸣?聚会”是百度技术沙龙的宗旨。 百度技术沙龙是由百度与InfoQ中文站定期组织的线下技术交流活动。目 的是让中高端技术人员有一个相对自由的思想交流和交友沟通的的平台。主要分讲师分享和OpenSpace两个关键环节,每期 只关注一个焦点话题。 讲师分享和现场Q&A让大家了解百度和其他知名网站技术支持的先进实践经验,OpenSpace环节是百度技术沙龙主题的升华 和展开,提供一个自由交流的平台。针对当期主题,参与者人人都可以发起话题,展开讨论。 InfoQ 策划·组织·实施 关注我们:weibo.com/infoqchina