The slides from my talk at the February 2011 Multipack Presents Show & Tell event. In this presentation I talked about how Big Data is a challenge to deal with, considered some of the problems and discussed how to deal with them
1 of 6
Download to read offline
More Related Content
Big Data is frustrating
1. Big Data is frustrating
Processing
Storing
Indexing
Searching
Photo by JavierPsilocybin 鍖ickr.com/santoposmoderno/4116782554
2. Parsing Big XML Docs?
use a stream reader*
*in PHP we used XMLReader
#2: Every part of dealing with large quantities of data is annoying\n- Processing must be fast and accurate\n- Storing must be flexible and safe\n- Indexing must be fast and beneficial\n- Searching still needs to be lightning quick\n
#3: LessLettuce relies on XML feeds - some hundreds of MBs in size\nCouldn’t use SimpleXML!\nXMLReader - Look out for my article in .net Mag soon!\n
#4: We use MySQL - it’s still good for big data!\nChoose the right storage engine\nGet your data structure right first\nTweak your server to optimise operations\nHow will you recover GBs of data in a crash situation?\nTest, test, test!\n
#5: LessLettuce live DB currently has ~20million records\nThis takes up ~7GB of space\nOptimising queries is crucial\n
#6: We used Sphinx - easy to deploy (set up in an afternoon), talked directly to the database\nTakes about 20 minutes to do a fairly complex full index\nAll searches return in hundredths of a second\nSphinx Rocks!!!\n