際際滷

際際滷Share a Scribd company logo
V
    #bigdataMY
V
     olume

  elocity

ariety




             #bigdataMY
#bigdataMY
#bigdataMY
Feeds and noti鍖cations
Insights
Recommendation & Matching
Security
Monitoring & Reporting
Event logging



                            #bigdataMY
Feeds and noti鍖cations
Insights                    Change detection
Recommendation & Matching    Change reaction
Security                               Audit
Monitoring & Reporting
Event logging



                                   #bigdataMY
Get ahead of the curve

                                                     Noise


                                                       
Normal

                       
                
      
                
     
                
                               
                                           
                                     
                            
                                            
                                         
                                 
                                                                     
                                            
                                                                    
                                                                           
                 Normal                                               
                                                                      
                                                                                 Normal
                                                                            
                                                                          



                                     




                                Noise




                                                 [J Gama, University of Porto]              #bigdataMY
Get ahead of the curve

                                                     Noise
                                                                                   New concept
                                                       
                                                                                              
Normal                                                                                       
                                                                                               
                                                                                        
                
      
                
     
                
                               
                 
                                     
                                             
                                                        Concept
                            
                                
                                         
                                                        drift
                                 
                                                                          
                            
                                            
                                                             
                                                                                Normal
                 Normal                                                



                                     
                                                                           Big Data is much more likely to catch the
                                                                                          black swan as it swoops in
                                Noise                                                        - Norman Nie, Revolution Analytics




                                                 [J Gama, University of Porto]                           #bigdataMY
Acunu Analytics




                  #bigdataMY
#bigdataMY
UserID
EMEA
 UK
  London
    N1
     Female
       16-21 year old
     16-21 year old
       Female
  16-21 year old
    Female
     London

                   #bigdataMY
V   Under the hood


      21:00   all = 1345    :00 = 45      :01 = 62     ...


      22:00   all = 3221    :00 = 22      :01 = 19     ...


       ...                                             ...


      UK      all = 228    user01 = 1    user14 = 12   ...


       US     all = 354    user01 = 15   user14 = 0    ...


      MY       all = 28    user01 = 0    user02 = 0    ...


       ...




                                                #bigdataMY
V                           Under the hood


                              21:00    all = 1345       :00 = 45       :01 = 62     ...


                             22:00    all = 3221 +1     :00 = 22      :01 = 19 +1   ...

{
    cust_id:      user01,      ...                                                  ...
    session_id:   102,
    geography:    UK,
                              UK      all = 228 +1    user01 = 1 +1   user14 = 12   ...
    browser:      IE,
    time:         22:01,
}                              US       all = 354      user01 = 15    user14 = 0    ...


                              MY        all = 28       user01 = 0     user02 = 0    ...


                               ...




                                                                             #bigdataMY
V
where time 21:00 - 22:00
  count(*)
                           Under the hood


                             21:00   all = 1345    :00 = 45      :01 = 62     ...


                             22:00   all = 3221    :00 = 22      :01 = 19     ...


                              ...                                             ...


                             UK       all = 228   user01 = 1    user14 = 12   ...


                              US      all = 354   user01 = 15   user14 = 0    ...


                             MY       all = 28    user01 = 0    user02 = 0    ...


                              ...




                                                                       #bigdataMY
V
where time 21:00 - 23:00
  count(*)
                           Under the hood


                             21:00   all = 1345    :00 = 45      :01 = 62     ...


                             22:00   all = 3221    :00 = 22      :01 = 19     ...


                              ...                                             ...


                             UK       all = 228   user01 = 1    user14 = 12   ...


                              US      all = 354   user01 = 15   user14 = 0    ...


                             MY       all = 28    user01 = 0    user02 = 0    ...


                              ...




                                                                       #bigdataMY
Little Trouble with Big Disks

                                #bigdataMY
COTS Journal, 2008




                     #bigdataMY
V
where time 21:00 - 23:00
  count(*)
                           Under the hood


                             21:00   all = 1345    :00 = 45      :01 = 62     ...


                             22:00   all = 3221    :00 = 22      :01 = 19     ...


                              ...                                             ...


                             UK       all = 228   user01 = 1    user14 = 12   ...


                              US      all = 354   user01 = 15   user14 = 0    ...


                             MY       all = 28    user01 = 0    user02 = 0    ...


                              ...




                                                                       #bigdataMY
Streaming algorithms

        A = [a1, a2, a3, a4, a5]
mean(A) = sum it up / number of things




                                     #bigdataMY
Streaming algorithms

        A = [a1, a2, a3, a4, a5]
mean(A) = sum it up / number of things

    now add another item a6...???




                                     #bigdataMY
Streaming algorithms

        A = [a1, a2, a3, a4, a5]
mean(A) = sum it up / number of things

    now add another item a6...???
          sum = sum + a6
       inc(number of things)




                                     #bigdataMY
Streaming algorithms

        A = [a1, a2, a3, a4, a5]
mean(A) = sum it up / number of things

    now add another item a6...???
          sum = sum + a6
       inc(number of things)


        try this with median?



                                     #bigdataMY
V     Realtime tradeoffs




            ity
        loc


                    Ad
      -ve



                      -ho
       gh




                         c
    Hi



            High-volume



                             #bigdataMY
V                     Conclusion



    Big Data also about the Little Things, done fast.

               The devil is in the details.

                  Make it accessible.




                                                #bigdataMY
V
    Q?
         #bigdataMY

More Related Content

V

  • 1. V #bigdataMY
  • 2. V olume elocity ariety #bigdataMY
  • 5. Feeds and noti鍖cations Insights Recommendation & Matching Security Monitoring & Reporting Event logging #bigdataMY
  • 6. Feeds and noti鍖cations Insights Change detection Recommendation & Matching Change reaction Security Audit Monitoring & Reporting Event logging #bigdataMY
  • 7. Get ahead of the curve Noise Normal Normal Normal Noise [J Gama, University of Porto] #bigdataMY
  • 8. Get ahead of the curve Noise New concept Normal Concept drift Normal Normal Big Data is much more likely to catch the black swan as it swoops in Noise - Norman Nie, Revolution Analytics [J Gama, University of Porto] #bigdataMY
  • 9. Acunu Analytics #bigdataMY
  • 11. UserID EMEA UK London N1 Female 16-21 year old 16-21 year old Female 16-21 year old Female London #bigdataMY
  • 12. V Under the hood 21:00 all = 1345 :00 = 45 :01 = 62 ... 22:00 all = 3221 :00 = 22 :01 = 19 ... ... ... UK all = 228 user01 = 1 user14 = 12 ... US all = 354 user01 = 15 user14 = 0 ... MY all = 28 user01 = 0 user02 = 0 ... ... #bigdataMY
  • 13. V Under the hood 21:00 all = 1345 :00 = 45 :01 = 62 ... 22:00 all = 3221 +1 :00 = 22 :01 = 19 +1 ... { cust_id: user01, ... ... session_id: 102, geography: UK, UK all = 228 +1 user01 = 1 +1 user14 = 12 ... browser: IE, time: 22:01, } US all = 354 user01 = 15 user14 = 0 ... MY all = 28 user01 = 0 user02 = 0 ... ... #bigdataMY
  • 14. V where time 21:00 - 22:00 count(*) Under the hood 21:00 all = 1345 :00 = 45 :01 = 62 ... 22:00 all = 3221 :00 = 22 :01 = 19 ... ... ... UK all = 228 user01 = 1 user14 = 12 ... US all = 354 user01 = 15 user14 = 0 ... MY all = 28 user01 = 0 user02 = 0 ... ... #bigdataMY
  • 15. V where time 21:00 - 23:00 count(*) Under the hood 21:00 all = 1345 :00 = 45 :01 = 62 ... 22:00 all = 3221 :00 = 22 :01 = 19 ... ... ... UK all = 228 user01 = 1 user14 = 12 ... US all = 354 user01 = 15 user14 = 0 ... MY all = 28 user01 = 0 user02 = 0 ... ... #bigdataMY
  • 16. Little Trouble with Big Disks #bigdataMY
  • 17. COTS Journal, 2008 #bigdataMY
  • 18. V where time 21:00 - 23:00 count(*) Under the hood 21:00 all = 1345 :00 = 45 :01 = 62 ... 22:00 all = 3221 :00 = 22 :01 = 19 ... ... ... UK all = 228 user01 = 1 user14 = 12 ... US all = 354 user01 = 15 user14 = 0 ... MY all = 28 user01 = 0 user02 = 0 ... ... #bigdataMY
  • 19. Streaming algorithms A = [a1, a2, a3, a4, a5] mean(A) = sum it up / number of things #bigdataMY
  • 20. Streaming algorithms A = [a1, a2, a3, a4, a5] mean(A) = sum it up / number of things now add another item a6...??? #bigdataMY
  • 21. Streaming algorithms A = [a1, a2, a3, a4, a5] mean(A) = sum it up / number of things now add another item a6...??? sum = sum + a6 inc(number of things) #bigdataMY
  • 22. Streaming algorithms A = [a1, a2, a3, a4, a5] mean(A) = sum it up / number of things now add another item a6...??? sum = sum + a6 inc(number of things) try this with median? #bigdataMY
  • 23. V Realtime tradeoffs ity loc Ad -ve -ho gh c Hi High-volume #bigdataMY
  • 24. V Conclusion Big Data also about the Little Things, done fast. The devil is in the details. Make it accessible. #bigdataMY
  • 25. V Q? #bigdataMY