狠狠撸

狠狠撸Share a Scribd company logo
@
chenxiaoming@
LOG


LOG




  LSP


  DISQL
! ? 46.70.93.94 - - [11/Nov/2011:11:11:11 -1100] "GET /book/
    1984.html HTTP/1.1“404 2326 http://www./s?
  wd=1984&rsv_bp=0&rsv_spt=3&inputT=947 "Mozilla/
  5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us)
  AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4
  Mobile/7B314 Safari/531.21.10 “
——



!   ?   46.70.93.94 - -
!   ?   [11/Nov/2011:11:11:11 -1100]
!   ?   "GET /book/1984.html HTTP/1.1"
!   ?   404
!   ?   2326
! ? "http://www./s?
    wd=1984&rsv_bp=0&rsv_spt=3&inputT=9
    47"
! ? "Mozilla/5.0(iPad; U; CPU iPhone OS 3_2
    like Mac OS X; en-us) AppleWebKit/
    531.21.10 (KHTML, like Gecko) Version/
    4.0.4 Mobile/7B314 Safari/531.21.10 “
——



!   ?   46.70.93.94 - -
!   ?   [11/Nov/2011:11:11:11 -1100]
!   ?   "GET /book/1984.html HTTP/1.1"
!   ?   404
!   ?   2326
! ? "http://www./s?
    wd=1984&rsv_bp=0&rsv_spt=3&inputT=9
    47"
! ? "Mozilla/5.0(iPad; U; CPU iPhone OS 3_2
    like Mac OS X; en-us) AppleWebKit/
    531.21.10 (KHTML, like Gecko) Version/
    4.0.4 Mobile/7B314 Safari/531.21.10 “
——



!   ?   46.70.93.94 - -
!   ?   [11/Nov/2011:11:11:11 -1100]
!   ?   "GET /book/1984.html HTTP/1.1"
!   ?   404
!   ?   2326
! ? "http://www./s?
    wd=1984&rsv_bp=0&rsv_spt=3&inputT=9
    47"
! ? " Mozilla/5.0(iPad; U; CPU iPhone OS
    3_2 like Mac OS X; en-us) AppleWebKit/
    531.21.10 (KHTML, like Gecko) Version/
    4.0.4 Mobile/7B314 Safari/531.21.10"
——



!   ?   46.70.93.94 - -
!   ?   [11/Nov/2011:11:11:11 -1100]
!   ?   GET /book/1984.html HTTP/1.1
!   ?   404
!   ?   2326
! ? "http://www./s?
    wd=1984&rsv_bp=0&rsv_spt=3&inputT=94
    7"
! ? "Mozilla/5.0(iPad; U; CPU iPhone OS
    3_2 like Mac OS X; en-us)
    AppleWebKit/531.21.10 (KHTML, like
    Gecko) Version/4.0.4 Mobile/7B314
    Safari/531.21.10 "
——


!   ?   46.70.93.94 - -
!   ?   [11/Nov/2011:11:11:11 -1100]
!   ?   "GET /book/1984.html HTTP/1.1"
!   ?   404
!   ?   2326
!   ?   "http://www./s?
        wd=1984&rsv_bp=0&rsv_spt=3&inputT=947"
! ? " Mozilla/5.0(iPad; U; CPU iPhone OS
    3_2 like Mac OS X; en-us)
    AppleWebKit/531.21.10 (KHTML, like
    Gecko) Version/4.0.4 Mobile/7B314
    Safari/531.21.10 "
——



!   ?   46.70.93.94 - -
!   ?   [11/Nov/2011:11:11:11 -1100]
!   ?   "GET /book/1984.html HTTP/1.1"
!   ?   404
!   ?   2326
! ? " http://www./s?wd=1984&
    rsv_bp=0&rsv_spt=3&inputT=947 "
! ? "Mozilla/5.0(iPad; U; CPU iPhone OS
    3_2 like Mac OS X; en-us)
    AppleWebKit/531.21.10 (KHTML, like
    Gecko) Version/4.0.4 Mobile/7B314
    Safari/531.21.10 "
——
——
LOG




  LSP


  DISQL
??   ??   ??          ??
??   ??               ??
                      ??
          ?? Ad$hoc
                      ?? ……
LOG




  LSP


  DISQL
!?   …




!   ?                  $
!   ?B*S       $
!    ?     $
!    ?             $
!   ?       C++$       $
                              !   ?C++                 $
!   ? SQL                 $
!   ?PHP$+$C$            $    !    ?             $
                              !    ?Schema         $
!    ?               $
!    ?                        !     ?          $
       PHP .so   $
LSP


LOG




  LSP


  DISQL
20th.陈晓鸣 百度海量日志分析架构及处理经验分享
UI
20th.陈晓鸣 百度海量日志分析架构及处理经验分享
20th.陈晓鸣 百度海量日志分析架构及处理经验分享
DQuery
20th.陈晓鸣 百度海量日志分析架构及处理经验分享
DISQL


LOG




  LSP


  DISQL
!?
     !   ?
     !   ?        _Url   _Res(        )
     !   ?                   _Url          _Site
     !   ?
     !   ?
             !?
             !?
     !?                             JSON
DQuery
!   ?
!   ?        _Url   _Res(        )
!   ?                   _Url          _S
!   ?
!   ?
        !?
        !?
!?                             JSON
PHP-Callback
C-callback
!?    PHP           SQL    (         )
     !?          SQL      M/R
          !?
!?                        DAG
     !?                     MapReduce
     !?
!?                 SQL
!?         PHP
!?         C++                  +        C-Runtime NEW!
     !   ?                 RAII +
     !   ?           Copy On Write
     !   ?                                 schema
     !   ? C++      PHP
20th.陈晓鸣 百度海量日志分析架构及处理经验分享
!?

!?
!?                        parser
!?              JSON
[
     {
         "cmd": "load“,
         "path": null
         "using": "SchemaReader"
         "from": 17
         "options": {"max_item_in_mem“: 100000}
         "include": [25]
     }
      , {"cmd":"filter"……}, {"cmd":"join"……},…… ……
]
SQL
[
    {
        "cmd": "load“,
        "path": null
        "using": "SchemaReader"
        "from": 17
        "options":
        {"max_item_in_mem“:
        100000}
        "include": [25]
    }
        , {"cmd":"filter"……},
        {"cmd":"join"……},…… ……
]
!?
     !?
          !?
     !?
          !?
     !?
          !?
     !?           (     )
          !?           MapReduce
     ! ? Schema
          !?                   schema
     !?               C++ PHP DOT
          !?
Group


Unique             Shu?e
                              Map$Phase$
                   Reduce$
                                  $
                      $
                   Limit$1        $
                                Group
                                  $
                                  $
                               Combine$
                                  $$
         Group                  Count

         Shu?e                    $
                                Shu?e
                                  $
         Reduce$                  $
           $                    Reduce$
                                  $
         Count
                                  $
                                  $
                                 Sum
                                  $
                             Reduce$Phase
Schema


?eld     ID    name      age                ?eld       ID     score
type    uint64 string   int32               type     uint64   double
index     2      5        9                 Index         0     1



                                  join



               Field      ID     name     age       Score
               Type     Uint64   string   int32     double
               Index      2        5       9         10
!   ?
!   ?
!   ? Combiner
!   ? Cached Combiner
!   ? key Join
!   ?
!   ?
       !?               I/O
20th.陈晓鸣 百度海量日志分析架构及处理经验分享
!   ? PHP
!   ? C++
!   ? DOT
!   ?     / MapReduce
Processor        ——Pipes & Filter
  class$
Processor
init()
process(
)
?ni()



            class$        class$         class$            class$
           Selector       Filter        Counter        UserProcessor

       init()           init()         init()         init()
       process(         process(       process(
                                                      process()
       )                )              )
       ?ni()            ?ni()          ?ni()          ?ni()
!?
          4 1        10 27
           3540        4761        1221          +34.5%
 DQuery    1153        3359        2206         +191%
           1569        2963        1394          +88.9%
!?                         ! ? LSP

          24%                 PM          1352       47.4%
                  }$$67%
 DQuery   43%                 RD          1174       41.2%
          33%                 OP          190        6.66%
                                          136        4.77%
                                          2852       100%
LOG




  LSP


  DISQL
!?
     !?            ●
!?
     !?             ●             ●            ●                           …
!?
!?        LSP
     !?            ●     UI ●
!?        DISQL
     !?            ●            ●                  ●               ●
!?
!?
     !?              (@               )         (chenxiaoming@)
     !?         Hadoop in China           12   2 2 20          DISQL2.0
……
                  ……
chenxiaoming@
关注我们:迟.产补颈诲耻-迟别肠丑.肠辞尘

          资料下载和详细介绍:infoq.com/cn/zones/baidu-salon
“畅想?交流?争鸣?聚会”是百度技术沙龙的宗旨。 百度技术沙龙是由百度与InfoQ中文站定期组织的线下技术交流活动。目
的是让中高端技术人员有一个相对自由的思想交流和交友沟通的的平台。主要分讲师分享和OpenSpace两个关键环节,每期
只关注一个焦点话题。

讲师分享和现场Q&A让大家了解百度和其他知名网站技术支持的先进实践经验,OpenSpace环节是百度技术沙龙主题的升华
和展开,提供一个自由交流的平台。针对当期主题,参与者人人都可以发起话题,展开讨论。



                  InfoQ 策划·组织·实施
                  关注我们:weibo.com/infoqchina

More Related Content

20th.陈晓鸣 百度海量日志分析架构及处理经验分享

  • 2. LOG LOG LSP DISQL
  • 3. ! ? 46.70.93.94 - - [11/Nov/2011:11:11:11 -1100] "GET /book/ 1984.html HTTP/1.1“404 2326 http://www./s? wd=1984&rsv_bp=0&rsv_spt=3&inputT=947 "Mozilla/ 5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B314 Safari/531.21.10 “
  • 4. —— ! ? 46.70.93.94 - - ! ? [11/Nov/2011:11:11:11 -1100] ! ? "GET /book/1984.html HTTP/1.1" ! ? 404 ! ? 2326 ! ? "http://www./s? wd=1984&rsv_bp=0&rsv_spt=3&inputT=9 47" ! ? "Mozilla/5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/ 531.21.10 (KHTML, like Gecko) Version/ 4.0.4 Mobile/7B314 Safari/531.21.10 “
  • 5. —— ! ? 46.70.93.94 - - ! ? [11/Nov/2011:11:11:11 -1100] ! ? "GET /book/1984.html HTTP/1.1" ! ? 404 ! ? 2326 ! ? "http://www./s? wd=1984&rsv_bp=0&rsv_spt=3&inputT=9 47" ! ? "Mozilla/5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/ 531.21.10 (KHTML, like Gecko) Version/ 4.0.4 Mobile/7B314 Safari/531.21.10 “
  • 6. —— ! ? 46.70.93.94 - - ! ? [11/Nov/2011:11:11:11 -1100] ! ? "GET /book/1984.html HTTP/1.1" ! ? 404 ! ? 2326 ! ? "http://www./s? wd=1984&rsv_bp=0&rsv_spt=3&inputT=9 47" ! ? " Mozilla/5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/ 531.21.10 (KHTML, like Gecko) Version/ 4.0.4 Mobile/7B314 Safari/531.21.10"
  • 7. —— ! ? 46.70.93.94 - - ! ? [11/Nov/2011:11:11:11 -1100] ! ? GET /book/1984.html HTTP/1.1 ! ? 404 ! ? 2326 ! ? "http://www./s? wd=1984&rsv_bp=0&rsv_spt=3&inputT=94 7" ! ? "Mozilla/5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B314 Safari/531.21.10 "
  • 8. —— ! ? 46.70.93.94 - - ! ? [11/Nov/2011:11:11:11 -1100] ! ? "GET /book/1984.html HTTP/1.1" ! ? 404 ! ? 2326 ! ? "http://www./s? wd=1984&rsv_bp=0&rsv_spt=3&inputT=947" ! ? " Mozilla/5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B314 Safari/531.21.10 "
  • 9. —— ! ? 46.70.93.94 - - ! ? [11/Nov/2011:11:11:11 -1100] ! ? "GET /book/1984.html HTTP/1.1" ! ? 404 ! ? 2326 ! ? " http://www./s?wd=1984& rsv_bp=0&rsv_spt=3&inputT=947 " ! ? "Mozilla/5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B314 Safari/531.21.10 "
  • 12. LOG LSP DISQL
  • 13. ?? ?? ?? ?? ?? ?? ?? ?? ?? Ad$hoc ?? ……
  • 14. LOG LSP DISQL
  • 15. !? … ! ? $ ! ?B*S $ ! ? $ ! ? $
  • 16. ! ? C++$ $ ! ?C++ $ ! ? SQL $ ! ?PHP$+$C$ $ ! ? $ ! ?Schema $ ! ? $ ! ? ! ? $ PHP .so $
  • 17. LSP LOG LSP DISQL
  • 19. UI
  • 24. DISQL LOG LSP DISQL
  • 25. !? ! ? ! ? _Url _Res( ) ! ? _Url _Site ! ? ! ? !? !? !? JSON
  • 26. DQuery ! ? ! ? _Url _Res( ) ! ? _Url _S ! ? ! ? !? !? !? JSON
  • 29. !? PHP SQL ( ) !? SQL M/R !? !? DAG !? MapReduce !? !? SQL !? PHP !? C++ + C-Runtime NEW! ! ? RAII + ! ? Copy On Write ! ? schema ! ? C++ PHP
  • 31. !? !? !? parser !? JSON [ { "cmd": "load“, "path": null "using": "SchemaReader" "from": 17 "options": {"max_item_in_mem“: 100000} "include": [25] } , {"cmd":"filter"……}, {"cmd":"join"……},…… …… ]
  • 32. SQL [ { "cmd": "load“, "path": null "using": "SchemaReader" "from": 17 "options": {"max_item_in_mem“: 100000} "include": [25] } , {"cmd":"filter"……}, {"cmd":"join"……},…… …… ]
  • 33. !? !? !? !? !? !? !? !? ( ) !? MapReduce ! ? Schema !? schema !? C++ PHP DOT !?
  • 34. Group Unique Shu?e Map$Phase$ Reduce$ $ $ Limit$1 $ Group $ $ Combine$ $$ Group Count Shu?e $ Shu?e $ Reduce$ $ $ Reduce$ $ Count $ $ Sum $ Reduce$Phase
  • 35. Schema ?eld ID name age ?eld ID score type uint64 string int32 type uint64 double index 2 5 9 Index 0 1 join Field ID name age Score Type Uint64 string int32 double Index 2 5 9 10
  • 36. ! ? ! ? ! ? Combiner ! ? Cached Combiner ! ? key Join ! ? ! ? !? I/O
  • 38. ! ? PHP ! ? C++ ! ? DOT ! ? / MapReduce
  • 39. Processor ——Pipes & Filter class$ Processor init() process( ) ?ni() class$ class$ class$ class$ Selector Filter Counter UserProcessor init() init() init() init() process( process( process( process() ) ) ) ?ni() ?ni() ?ni() ?ni()
  • 40. !? 4 1 10 27 3540 4761 1221 +34.5% DQuery 1153 3359 2206 +191% 1569 2963 1394 +88.9% !? ! ? LSP 24% PM 1352 47.4% }$$67% DQuery 43% RD 1174 41.2% 33% OP 190 6.66% 136 4.77% 2852 100%
  • 41. LOG LSP DISQL
  • 42. !? !? ● !? !? ● ● ● … !? !? LSP !? ● UI ● !? DISQL !? ● ● ● ● !? !? !? (@ ) (chenxiaoming@) !? Hadoop in China 12 2 2 20 DISQL2.0
  • 43. …… …… chenxiaoming@
  • 44. 关注我们:迟.产补颈诲耻-迟别肠丑.肠辞尘 资料下载和详细介绍:infoq.com/cn/zones/baidu-salon “畅想?交流?争鸣?聚会”是百度技术沙龙的宗旨。 百度技术沙龙是由百度与InfoQ中文站定期组织的线下技术交流活动。目 的是让中高端技术人员有一个相对自由的思想交流和交友沟通的的平台。主要分讲师分享和OpenSpace两个关键环节,每期 只关注一个焦点话题。 讲师分享和现场Q&A让大家了解百度和其他知名网站技术支持的先进实践经验,OpenSpace环节是百度技术沙龙主题的升华 和展开,提供一个自由交流的平台。针对当期主题,参与者人人都可以发起话题,展开讨论。 InfoQ 策划·组织·实施 关注我们:weibo.com/infoqchina