ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Modern Screen Scraping with Node.js, jsdom and jQuery
ME
SCREEN SCRAPING
NODE.JS
JSDOM
JQUERY
Modern Screen Scraping with Node.js, jsdom and jQuery
Modern Screen Scraping with Node.js, jsdom and jQuery
Modern Screen Scraping with Node.js, jsdom and jQuery
Modern Screen Scraping with Node.js, jsdom and jQuery
Modern Screen Scraping with Node.js, jsdom and jQuery
Modern Screen Scraping with Node.js, jsdom and jQuery
Modern Screen Scraping with Node.js, jsdom and jQuery
Modern Screen Scraping with Node.js, jsdom and jQuery
QUESTIONS

-   @stockholmux
-   stockholmux.com
-   Work with me! T-Mark is hiring!

More Related Content

Modern Screen Scraping with Node.js, jsdom and jQuery

Editor's Notes

  • #3: Kyle Davis, Senior Javascript Programmer at T-Mark here in Buffalo I code javascript everyday, 90% of the day. I love javascript.
  • #4: Screen Scraping - taking data from what is intended for the screen. Common task many people TRY. Awful. Regular expressions, html, breaking, ick.
  • #5: Node.JS is server-side javascript. Based on V8 javascript engine that is used in chrome. People love node - it is javascript without the parts that annoy most people.
  • #6: DOM - document object model. What web pages ride on. ¡° The Document Object Model (DOM) is quite awful, and JavaScript is unfairly blamed. The DOM would be painful to work with in any language. The DOM is poorly specified and inconsistently implemented.¡± - doug crockford. Add the DOM back into Node.JS!
  • #7: jQuery is a multi-browser JavaScript library designed to simplify the client-side scripting of HTML . Widely used, widely understood. Provides a very expressive way of handling the dom.
  • #8: Chrome is your visual interface. The javascript console can execute code and provides back very useful debugging information.
  • #9: The website we¡¯ll be looking at. Let¡¯s get everyone¡¯s twitter handle and name. Darn. No API. No problem!
  • #10: Web inspector - this website is nicely but together. Each attendee is in it¡¯s own element. Handy!
  • #11: Use jquery to select all the attendee elements, chrome provides a nice little output. Now, lets write a script to get at the contents of those elements.
  • #12: Make an array. Go over each attendee and pull out the element with the twitter handle and name, then push an object with those two values into an array. Copy paste code into the console. See what the value of attendees is. DARN. Extra characters.
  • #13: Small modification, let¡¯s use jQuery¡¯s trim function. You could use regular expressions, but why not use jQuery. Performance isn¡¯t as dire in this situation as it would be in a browser. You¡¯re dealing with a known situation, unlike all the devices + browsers. Cool, it worked.
  • #14: Enter Node.JS. Let¡¯s take this script and move it to the server. JSDom module, then using env - creating a closed environment and only executing the scripts you specify (no script tags, etc.) Anything inside the callback will be run effectively on ¡°document ready¡± - so we can put our script that ran in chrome here. Then we¡¯ll convert the results to json with stringify and output the results to ¡°console¡± which is standard output.
  • #15: That worked, but let¡¯s write it to a file. First, let¡¯s make the json nicer. All we have to do to write to a file is include the fs module and replace the console with fs.writeFile, specifying the filename and well send a message that we¡¯re done with the file. Now you¡¯ve scraped a screen. The great thing about all of this is that it is very flexible - if the page changes, we just need to change the selector.
  • #16: T-mark is hiring - we are looking for a linux systems administrator/ django programmer.