際際滷

際際滷Share a Scribd company logo
!
!
!
Every Second  in over 50,000 Categories
eBay Analytics

   >50 TB/day             new data                                >100k data elements
                                          >100 Trillion pairs of information
>150 PB/day            Processed
                                                >50k chains of logic
                                                                                         >7500
                                                                              business users & analysts

       Structured/Unstructured

                                       turning over a TB every             second
  24   x7x365
       Always online                                          Millions of queries/day
                                 99.98+% Availability
                                                                       Near-Real-time
                                                                                    3
Big
Detail
Designing for the Unknown
>85% of analytical workload is NEW & Unknown

The metrics you know are cheap

The metrics you dont know are expensive  but high in potential ROI

Exploration & Testing are core pillars of an analytics-driven
  organization
incremental   storage


        Volume

       DATA
incremental   storage


        Volume

       DATA
                 Velocity      processing

                            change
incremental   storage


                            Volume

                            DATA
    structured    Variety            Velocity      processing
semi-structured
                                                change
        un-structured
Value > Cost
                         $s per year in incremental revenue




www.wallpapertimes.com
!   Data Growing Faster
2011 x.commerce Innovate Data Alchemy
≒仰   Impact
2011 x.commerce Innovate Data Alchemy
2011 x.commerce Innovate Data Alchemy
Data


         questions later
         structure later



              ($0.04/GB, $80/2TB)

single HDFS instances >50PB




Value > Cost                        16
2011 x.commerce Innovate Data Alchemy
Synonyms	
 derived	
 from	
 top	
 queries	
 in	
 item	
 query	
 clusters	
 
texas	
 instruments	
 ba	
 ii	
 plus	
 
                                          /	
 ba	
 ii	
 plus	
 
brighton	
 handbag	
                    brighton	
 purse	
 
lenovo	
 x200	
                         thinkpad	
 x200	
 
king	
 bedspread	
                      king	
 coverlet	
 
rockabilly	
 dress	
                    swing	
 dress	
 
1963	
 ford	
 falcon	
                 63	
 falcon	
 
jessica	
 simpson	
 hair	
 extensions	
 
                                          jessica	
 simpson	
 hairdo	
 
                                        	
 
              Abbrevia7ons/acronym	
 derived	
 from	
 query	
 transi7ons	
 
stanford	
 ky	
                         stanford	
 kentucky	
 
dc	
 sub	
                              dc	
 subwoofer	
 
snowboard	
 helmet	
 l	
               snowboard	
 helmet	
 large	
 
motorcycle	
 cam	
                      motorcycle	
 camera	
 
diamond	
 amp	
                         diamond	
 ampli鍖er
Toys and Hobbies
ATC   >   Artist trading card   in ART
ATC   >   Automatic Tool Change in Business and Industrial
2011 x.commerce Innovate Data Alchemy
2011 x.commerce Innovate Data Alchemy
Offline                   Online                            Clients


Editorial                         Service
                                                                   Search

                                   Code
                                                                   Selling

                                    Small
                                    Data                           Others


               Behavioral Logs
                                  Big Data Store
               Document Data      NoSQL



            Human Judgment

                                 <3 milliseconds per query
                                 1.2 billion queries per day
                                 1,000s of queries per second per machine
2011 x.commerce Innovate Data Alchemy
German Compound Words
 ≒仰   German compound words can be arbitrarily created and extremely long
          Adidastrainingsanzug (Adidas track suit)
          Rindfleischetikettierungs端berwachungsaufgaben端bertragungsgesetz
                   (beef labeling regulation & delegation of supervision law)
 ≒仰   Syntactically, words can be combined and split in many ways.
 ≒仰   Some words shouldnt be de-compounded.
          beiden (both)  bei(at) den(the)
 ≒仰   Too many candidates for
          Granitpflastersteine (granite paving stones)
          Granit(granite) pflastersteine(cobblestones)
          Granit(granite) pflaster(paving/band-aid) steine(stones)
 ≒仰   Binding characters
      Hochzeitsschuhe (grammatically correct, 593 hits on ebay.de)
      Hochzeitschuhe (129 hits on ebay.de).
Analyze & Report
                                                                         Discover & Explore


      Structured                               Semi-Structured                                  Unstructured
         SQL                                       SQL++                                      Java/C++/Pig/Hive
Production Data Warehousing                Contextual-Complex Analytics                       Structure the Unstructured
Large Concurrent User-base             Deep, Seasonal, Consumable Data Sets                        Detect Patterns




  Data Warehouse                            Data Warehouse +                                         Hadoop
                                               Behavioral



Enterprise-class System                Low End Enterprise-class System                    Commodity Hardware System



        8+PB                                      60+PB                                              40+PB
2011 x.commerce Innovate Data Alchemy
Brian knows the satisfaction and importance of good search results,
and his team is responsible for ensuring that the millions of queries
entered onto the eBay website provide just that. The words Did you
mean? are incredibly meaningful to Brian as he combs through a
universe of queries altered by synonyms, acronyms, attributes, and
expansions. Hes been doing this sort of work since he joined eBay
nine years ago. Brian has loved technology ever since junior high
school, when he played the game Lunar Lander on a paper
teletype before video games existed, and pulled pranks in the local
Radio Shack. When Brian gets outside, he goes backpacking on
Mount Whitney, enters triathlons, and walks on water (barefoot water
skiing).
2011 x.commerce Innovate Data Alchemy
2011 x.commerce Innovate Data Alchemy

More Related Content

2011 x.commerce Innovate Data Alchemy

  • 2. Every Second in over 50,000 Categories
  • 3. eBay Analytics >50 TB/day new data >100k data elements >100 Trillion pairs of information >150 PB/day Processed >50k chains of logic >7500 business users & analysts Structured/Unstructured turning over a TB every second 24 x7x365 Always online Millions of queries/day 99.98+% Availability Near-Real-time 3
  • 4. Big
  • 6. Designing for the Unknown >85% of analytical workload is NEW & Unknown The metrics you know are cheap The metrics you dont know are expensive but high in potential ROI Exploration & Testing are core pillars of an analytics-driven organization
  • 7. incremental storage Volume DATA
  • 8. incremental storage Volume DATA Velocity processing change
  • 9. incremental storage Volume DATA structured Variety Velocity processing semi-structured change un-structured
  • 10. Value > Cost $s per year in incremental revenue www.wallpapertimes.com
  • 11. ! Data Growing Faster
  • 13. ≒仰 Impact
  • 16. Data questions later structure later ($0.04/GB, $80/2TB) single HDFS instances >50PB Value > Cost 16
  • 18. Synonyms derived from top queries in item query clusters texas instruments ba ii plus / ba ii plus brighton handbag brighton purse lenovo x200 thinkpad x200 king bedspread king coverlet rockabilly dress swing dress 1963 ford falcon 63 falcon jessica simpson hair extensions jessica simpson hairdo Abbrevia7ons/acronym derived from query transi7ons stanford ky stanford kentucky dc sub dc subwoofer snowboard helmet l snowboard helmet large motorcycle cam motorcycle camera diamond amp diamond ampli鍖er
  • 19. Toys and Hobbies ATC > Artist trading card in ART ATC > Automatic Tool Change in Business and Industrial
  • 22. Offline Online Clients Editorial Service Search Code Selling Small Data Others Behavioral Logs Big Data Store Document Data NoSQL Human Judgment <3 milliseconds per query 1.2 billion queries per day 1,000s of queries per second per machine
  • 24. German Compound Words ≒仰 German compound words can be arbitrarily created and extremely long Adidastrainingsanzug (Adidas track suit) Rindfleischetikettierungs端berwachungsaufgaben端bertragungsgesetz (beef labeling regulation & delegation of supervision law) ≒仰 Syntactically, words can be combined and split in many ways. ≒仰 Some words shouldnt be de-compounded. beiden (both) bei(at) den(the) ≒仰 Too many candidates for Granitpflastersteine (granite paving stones) Granit(granite) pflastersteine(cobblestones) Granit(granite) pflaster(paving/band-aid) steine(stones) ≒仰 Binding characters Hochzeitsschuhe (grammatically correct, 593 hits on ebay.de) Hochzeitschuhe (129 hits on ebay.de).
  • 25. Analyze & Report Discover & Explore Structured Semi-Structured Unstructured SQL SQL++ Java/C++/Pig/Hive Production Data Warehousing Contextual-Complex Analytics Structure the Unstructured Large Concurrent User-base Deep, Seasonal, Consumable Data Sets Detect Patterns Data Warehouse Data Warehouse + Hadoop Behavioral Enterprise-class System Low End Enterprise-class System Commodity Hardware System 8+PB 60+PB 40+PB
  • 27. Brian knows the satisfaction and importance of good search results, and his team is responsible for ensuring that the millions of queries entered onto the eBay website provide just that. The words Did you mean? are incredibly meaningful to Brian as he combs through a universe of queries altered by synonyms, acronyms, attributes, and expansions. Hes been doing this sort of work since he joined eBay nine years ago. Brian has loved technology ever since junior high school, when he played the game Lunar Lander on a paper teletype before video games existed, and pulled pranks in the local Radio Shack. When Brian gets outside, he goes backpacking on Mount Whitney, enters triathlons, and walks on water (barefoot water skiing).