際際滷

際際滷Share a Scribd company logo
Big Data is frustrating

Processing
Storing
Indexing
Searching
             Photo by JavierPsilocybin 鍖ickr.com/santoposmoderno/4116782554
Parsing Big XML Docs?
use a stream reader*

  *in PHP we used XMLReader
Think About Storage




           Photo by itonys 鍖ickr.com/adstone/4549679025
Remember:
DB size = Data + Indexes

Indexes slow INSERTs

Optimise your queries!
Use a dedicated search
      application
Thanks
Simon Hamp @simonhamp
Founder, Flipstorm

鍖ipstorm.co.uk
lesslettuce.co.uk

More Related Content

Big Data is frustrating

Editor's Notes

  • #2: Every part of dealing with large quantities of data is annoying\n- Processing must be fast and accurate\n- Storing must be flexible and safe\n- Indexing must be fast and beneficial\n- Searching still needs to be lightning quick\n
  • #3: LessLettuce relies on XML feeds - some hundreds of MBs in size\nCouldn’t use SimpleXML!\nXMLReader - Look out for my article in .net Mag soon!\n
  • #4: We use MySQL - it’s still good for big data!\nChoose the right storage engine\nGet your data structure right first\nTweak your server to optimise operations\nHow will you recover GBs of data in a crash situation?\nTest, test, test!\n
  • #5: LessLettuce live DB currently has ~20million records\nThis takes up ~7GB of space\nOptimising queries is crucial\n
  • #6: We used Sphinx - easy to deploy (set up in an afternoon), talked directly to the database\nTakes about 20 minutes to do a fairly complex full index\nAll searches return in hundredths of a second\nSphinx Rocks!!!\n
  • #7: \n