ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
slideshare.net/Tom-Pool
How To Use Chrome Puppeteer To
Fake Googlebot And Monitor Your Site
Tom Pool // BlueArray //
@cptntommy
Who Am I?
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
Look After
Technical
Output Of
The
Agency
@cptntommy #BrightonSEO
Always Trying
To Find Ways
To Make My
Teams Job
Easier
@cptntommy #BrightonSEO
So I Was
Watching
Google I/O 18
(Which Is
Awesome
BTW)
@cptntommy #BrightonSEO
And I Saw A
Really
Really
Really Cool
Talk
@cptntommy #BrightonSEO
Eric Bidelman
@cptntommy #BrightonSEO
This Got Me Thinking
@cptntommy #BrightonSEO
I Can Use This
To Help Me
With My Job!
@cptntommy #BrightonSEO
So I Went Away &
Did A Shit Ton Of
Research
@cptntommy #BrightonSEO
That Included
@cptntommy #BrightonSEO
Headless
Chrome
@cptntommy #BrightonSEO
Chrome
@cptntommy #BrightonSEO
And A Little Bit
Of Coding
@cptntommy #BrightonSEO
(Not Much!)
@cptntommy #BrightonSEO
I Want All Of You
To At Least Take
@cptntommy #BrightonSEO
A Small
Piece Of
Knowledge
From This
@cptntommy #BrightonSEO
I’ll Also Tweet Out
This Deck
@cptntommy #BrightonSEO
So...
@cptntommy #BrightonSEO
What Is
Headless
Chrome?
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
Headless
Chrome
=
None Of That
Shit
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
Google Chrome Is
Running, But With
No User Interface
@cptntommy #BrightonSEO
So It Is ‘Headless’
@cptntommy #BrightonSEO
Why Should You Even Care?
@cptntommy #BrightonSEO
You Can:
@cptntommy #BrightonSEO
Scrape The Shit Out Of (JS)
Websites
@cptntommy #BrightonSEO
Copy The DOM, & Paste To A
Text File
@cptntommy #BrightonSEO
Compare Source Code With
DOM & Export Differences
@cptntommy #BrightonSEO
Generate
Screenshots
of Pages
@cptntommy #BrightonSEO
Crawl Single Page
Applications
@cptntommy #BrightonSEO
I Know, JS Is Evil, But
It Ain’t Going Away!
@cptntommy #BrightonSEO
Screaming Frog Does Have JS
Rendering Features.
Utilises (Something Like)
Headless Chrome
@cptntommy #BrightonSEO
Google Can Render JS, But It Is
In No Way Perfect, Or Even
That Effective
@cptntommy #BrightonSEO
Countless Case Studies
@cptntommy #BrightonSEO
Crawl Single Page
Applications
@cptntommy #BrightonSEO
Automate WebPage Checks
@cptntommy #BrightonSEO
Used For Webpage Testing
(Clicking On Buttons, Filling
In Forms, General Fuckery)
@cptntommy #BrightonSEO
Great For Emulating User
Behaviour!
@cptntommy #BrightonSEO
Great For Seeing How Much
Shit A Website Can Take
Before It Breaks!
@cptntommy #BrightonSEO
The Problem Is...
@cptntommy #BrightonSEO
You Have To Run
Basic Headless
Chrome From
Command Line
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
/Applications/Google
Chrome.app/Contents/MacOS/Google
Chrome
@cptntommy #BrightonSEO
/Applications/Google
Chrome.app/Contents/MacOS/Google
Chrome --headless
@cptntommy #BrightonSEO
/Applications/Google
Chrome.app/Contents/MacOS/Google
Chrome --headless --remote-debugging-
port=9222
@cptntommy #BrightonSEO
/Applications/Google
Chrome.app/Contents/MacOS/Google
Chrome --headless --remote-debugging-
port=9222 --disable-gpu
@cptntommy #BrightonSEO
/Applications/Google
Chrome.app/Contents/MacOS/Google
Chrome --headless --remote-debugging-
port=9222 --disable-gpu
https://www.bluearray.co.uk
@cptntommy #BrightonSEO
Now
@cptntommy #BrightonSEO
I
Really
Really
Love Using
Command
Line
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
But This
Really
Really
Made
Me
Cry
@cptntommy #BrightonSEO
So How Do I Make It
Easy?
@cptntommy #BrightonSEO
Like I Said - I’m
Always Trying To
Make My Job Easier
@cptntommy #BrightonSEO
And This
Was Not
Easy!
@cptntommy #BrightonSEO
So I Went Away &
Did A Bigger Shit
Ton Of
Research
@cptntommy #BrightonSEO
Eric Bidelman
@cptntommy #BrightonSEO
What Is
Chrome
Puppeteer?
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
BlahBlahBlahBlahBlahBlahBlah
BlahBlahBlahBlahBlahBlahBlahBlahBlah
BlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlah
BlahBlahBlahBlahBlahBlahBlahBlahBlahBlah
BlahBlahBlahBlahBlahBlahBlah
@cptntommy #BrightonSEO
OOOOOOOO API
@cptntommy #BrightonSEO
Node Can Be Used
For Making
Applications
@cptntommy #BrightonSEO
And It Can Also
Be Used To help
Control Headless
Chrome
@cptntommy #BrightonSEO
And Trust Me
It’s Easy!
@cptntommy #BrightonSEO
So How Can I
Get Chrome
Puppeteer?
@cptntommy #BrightonSEO
If You Want To
Run Tests On
Your Local
Machine
@cptntommy #BrightonSEO
You Have To
Install NPM &
Node.js
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
Someone’s
Made This
Easy!
@cptntommy #BrightonSEO
So If
You
Are On
PC
@cptntommy #BrightonSEO
It’s Pretty
Straightforward
@cptntommy #BrightonSEO
Just Install From
The Node.js
Websites
@cptntommy #BrightonSEO
bit.ly/pc-pup-brighton19
@cptntommy #BrightonSEO
If You
Are On
Mac
@cptntommy #BrightonSEO
(Like Me)
@cptntommy #BrightonSEO
It’s Not That
Easy
@cptntommy #BrightonSEO
bit.ly/pupbrighton19
@cptntommy #BrightonSEO
You Wanna
Open Up
Terminal
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
ruby -e "$(curl -fsSL
https://raw.githubusercontent.com/Homebrew/
install/master/install)"
@cptntommy #BrightonSEO
This Installs
Homebrew,
That Makes
Everything E-Z
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
When
This
Has
Done Its
Thing
@cptntommy #BrightonSEO
You Have To
Install 2 More
Things, And
We’ll Be Ready
To Rock
@cptntommy #BrightonSEO
brew install node
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
And Then
@cptntommy #BrightonSEO
npm i puppeteer
@cptntommy #BrightonSEO
Now You Are All
Good!
@cptntommy #BrightonSEO
You Can Now Run
Chrome Puppeteer On
Your Machine!
@cptntommy #BrightonSEO
For Example
@cptntommy #BrightonSEO
If I Wanted To Take A
Screenshot Of A
Single Webpage
@cptntommy #BrightonSEO
There Is A Bunch Of
Code Coming Up
@cptntommy #BrightonSEO
That Can All Be Seen
In The Following Link
(I’ll Also Tweet It)
@cptntommy #BrightonSEO
https://bit.ly/Brighton
SEO19
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
let browser = await
puppeteer.launch({headless:
true});
@cptntommy #BrightonSEO
let page = await
browser.newPage();
@cptntommy #BrightonSEO
await
page.goto('https://www.
bluearray.co.uk/');
@cptntommy #BrightonSEO
await
page.screenshot({
@cptntommy #BrightonSEO
await
page.screenshot({ path:
'./testimg.jpg',
@cptntommy #BrightonSEO
await
page.screenshot({ path:
'./testimg.jpg', type:
'jpeg'});
@cptntommy #BrightonSEO
await page.close();
await
browser.close();
@cptntommy #BrightonSEO
File Is Saved As
Screenshot.js
@cptntommy #BrightonSEO
So To Run This Small
Piece Of Code
@cptntommy #BrightonSEO
Go To Terminal (In
Same Folder As Code),
And Type In
@cptntommy #BrightonSEO
Node Screenshot.js
@cptntommy #BrightonSEO
And Then, 5 Seconds
later,
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
If You Wanted To See
The Browser Do These
Steps
@cptntommy #BrightonSEO
let browser = await
puppeteer.launch({headless:
True});
@cptntommy #BrightonSEO
let browser = await
puppeteer.launch({headless:
False});
@cptntommy #BrightonSEO
You Can Also Provide
A List Of URLs
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
And Get A Shit Ton Of
Screenshots!
Now I’m Sure You Can
See Where This Is
Headed
@cptntommy #BrightonSEO
Faking Googlebot!
@cptntommy #BrightonSEO
With A Few Tweaks to
The Code
@cptntommy #BrightonSEO
await
page.setUserAgent
('Googlebot');
@cptntommy #BrightonSEO
Googlebot’s User
Agent Is Not Just
‘Googlebot’
@cptntommy #BrightonSEO
It’s Fuck*** Huge
@cptntommy #BrightonSEO
Mozilla/5.0 (Linux; Android 6.0.1;
Nexus 5X Build/MMB29P)
AppleWebKit/537.36 (KHTML,
like Gecko) Chrome/41.0.2272.96
Mobile Safari/537.36
(compatible; Googlebot/2.1;
+http://www.google.com/bot.ht@cptntommy #BrightonSEO
And Then You Gotta
Set Googlebot’s
Viewport
@cptntommy #BrightonSEO
await
page.setViewport
@cptntommy #BrightonSEO
await
page.setViewport
({width: 1024, height:
1024});
@cptntommy #BrightonSEO
FYI This Is Not Really
Googlebot
@cptntommy #BrightonSEO
As Unfortunately
@cptntommy #BrightonSEO
Can’t Change Chrome
Version That
Puppeteer Uses To 41
:(
@cptntommy #BrightonSEO
As Chrome Puppeteer
Was Released After
Chrome 41
(*Not Backwards Compatible)
@cptntommy #BrightonSEO
However!
@cptntommy #BrightonSEO
Can Be Persuasive In
Getting A Client To
Ensure Their Content
Is SSR’d
(If Needed)
@cptntommy #BrightonSEO
Chrome Puppeteer
Can Be Installed On
The Server
@cptntommy #BrightonSEO
We Can Then Provide
Puppeteer With A List
Of URLs, And It Can
Work Through Them
All
@cptntommy #BrightonSEO
And Show How They
Would Appear To
Google, Instead Of
@cptntommy #BrightonSEO
In The Case Of Some
JS Sites
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
A Blank Page
@cptntommy #BrightonSEO
Which Is Cool & A
Nice Trick
@cptntommy #BrightonSEO
But The Really Cool
Stuff Is Yet To Come
@cptntommy #BrightonSEO
So Who Here
Has Heard
Of (Or Used)
ContentKing?
@cptntommy #BrightonSEO
It’s Fairly Awesome
@cptntommy #BrightonSEO
Allows You To
Monitor A Site In
Real-Time
@cptntommy #BrightonSEO
With It Letting you
Know Of Any Issues
@cptntommy #BrightonSEO
Meta Changes, New
404 Errors, Updated
Links….
@cptntommy #BrightonSEO
BUT
@cptntommy #BrightonSEO
Like Most Good Tools,
It Costs Money
@cptntommy #BrightonSEO
Maybe You
Don’t Wanna
Eat Into Your
Budget
@cptntommy #BrightonSEO
This Next Example
Shows How We Can
Use Puppeteer
@cptntommy #BrightonSEO
Monitor Your Site
When You Want
&
Report Of Any
Changes To Key Areas
@cptntommy #BrightonSEO
Including
@cptntommy #BrightonSEO
Title Changes
@cptntommy #BrightonSEO
Description Changes
@cptntommy #BrightonSEO
Word Count
Increases/Decreases
@cptntommy #BrightonSEO
Robots Directives
@cptntommy #BrightonSEO
Canonicals
@cptntommy #BrightonSEO
So Basically The
REALLY Important
Shit In The HTML
@cptntommy #BrightonSEO
So I Wrote Some Code
@cptntommy #BrightonSEO
As With All Code, Required A
Bit Of Research
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
And With A Bit Of Luck,
@cptntommy #BrightonSEO
We Now Have A Way To
Monitor Basic Areas Of Sites!
@cptntommy #BrightonSEO
So.
@cptntommy #BrightonSEO
There Is About 200 Lines Of
Code
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
And I Don’t Have Time To Go
Through The Full Thing
@cptntommy #BrightonSEO
But
@cptntommy #BrightonSEO
There Are A Few Interesting
Snippets I’d Like To Share
@cptntommy #BrightonSEO
We Launch Headless Chrome
& Puppeteer As Highlighted A
Minute Ago
@cptntommy #BrightonSEO
const browser = await
puppeteer.launch();
const page = await
browser.newPage();
@cptntommy #BrightonSEO
Provide A List Of URLs For
Puppeteer To Go And Play
With
@cptntommy #BrightonSEO
try {data =
fs.readFileSync('/Users/tomp
ool/Desktop/PuppeteerRender
ing/PageMonitor/urls.txt','utf
8');}
@cptntommy #BrightonSEO
And Then Pull Relevant Meta
Data
@cptntommy #BrightonSEO
For Example
@cptntommy #BrightonSEO
Meta Title
@cptntommy #BrightonSEO
try {title = await page.title();}
catch (e1) {title = 'n/a';}
@cptntommy #BrightonSEO
Then Create An Array Of All
The Meta Data
@cptntommy #BrightonSEO
let retArray =
[date,url,title,description
,canonical,robots,wordC
ount];
@cptntommy #BrightonSEO
And Pushed This To A txt File
@cptntommy #BrightonSEO
The Script Then Loops
Through All Provided URLs
@cptntommy #BrightonSEO
And Checks For Differences In
The Returned Data
@cptntommy #BrightonSEO
If There Are Any Differences,
These Get Saved In Another
txt File
@cptntommy #BrightonSEO
That I Can Check Whenever
@cptntommy #BrightonSEO
So I Can See What Has
Changed From
Yesterday/When I Last Ran
The Code.
@cptntommy #BrightonSEO
This Required Me To Run The
Code Each Day
@cptntommy #BrightonSEO
(That I Forgot To Do)
@cptntommy #BrightonSEO
So I Went One Step Further
@cptntommy #BrightonSEO
Chucked It On A Raspberry Pi
@cptntommy #BrightonSEO
And Set Up A CronJob To
Automatically Run The Script
At The Same Time
@cptntommy #BrightonSEO
Every Day
@cptntommy #BrightonSEO
And Then
@cptntommy #BrightonSEO
(This Was The Longest Bit)
@cptntommy #BrightonSEO
Email Me If Anything Changed
@cptntommy #BrightonSEO
This Is By No Means A
Finished Product, And Is Still
An Ongoing Project
@cptntommy #BrightonSEO
These Usages Of Chrome
Puppeteer
@cptntommy #BrightonSEO
Barely Scratch The Surface Of
What Is Possible
@cptntommy #BrightonSEO
So, To Recap
@cptntommy #BrightonSEO
Today We Have Covered
@cptntommy #BrightonSEO
Headless Chrome
@cptntommy #BrightonSEO
Puppeteer
@cptntommy #BrightonSEO
Basic Scripts Using Node.js
@cptntommy #BrightonSEO
And Automation Of All Of
These To Save You Valuable
Time
@cptntommy #BrightonSEO
And Hopefully, Allow You To
@cptntommy #BrightonSEO
And Hopefully, Allow You To
@cptntommy #BrightonSEO
THANKS!
@cptntommy #BrightonSEO

More Related Content

BrightonSEO April 2019 - Tom Pool - Chrome Puppeteer, Fake Googlebot & Monitor Your Site!

Editor's Notes

  • #6: Like many of us, I’m constantly trying to find any new ways to make my (and my teams) jobs easier
  • #9: So this awesome guy - Eric Bidelman - is a software engineer at Google, and works on headless chrome, lighthouse & dev tools.
  • #11: I can use chrome puppeteer to help me with my job
  • #12: So I went away and did a literal shit ton of research, that is worth sharing.
  • #13: So I went away and did a literal shit ton of research, that is worth sharing.
  • #14: So I went away and did a literal shit ton of research, that is worth sharing.
  • #15: So I went away and did a literal shit ton of research, that is worth sharing.
  • #16: So I went away and did a literal shit ton of research, that is worth sharing.
  • #17: So I went away and did a literal shit ton of research, that is worth sharing.
  • #18: So I went away and did a literal shit ton of research, that is worth sharing.
  • #19: So I went away and did a literal shit ton of research, that is worth sharing.
  • #20: So I went away and did a literal shit ton of research, that is worth sharing.
  • #21: So I went away and did a literal shit ton of research, that is worth sharing.
  • #22: So, the first thing i was looking for was a basic definition.
  • #23: Contrary to what i wanted to believe, it did not involve any decapitation
  • #24: So when you open up Google Chrome normally, you get a wonderful User Interface with bookmarks
  • #25: And a search bar, plugins, buttons, tabs
  • #26: And usable functionality.
  • #27: With headless chrome, you get none of that shit.
  • #28: So here I am running headless chrome
  • #29: And we can see that it is in the background, but I have no Chrome windows open.
  • #30: So Google Chrome is Running, but with NO User Interface.
  • #31: SO it is running without the UX/UI head
  • #32: Why should you even care about this sort of stuff though?
  • #33: Through this research journey, I found out that you can do a bunch of stuff with it!
  • #34: Scrape the literal shit out of Javascript websites (as well as basic HTML scraping)
  • #35: You can copy the DOM, and then paste it into a text file, with which you canm
  • #36: Compare the source code of the site with the DOM, and then export differences. This can allow you to identify any potential rendering issues.
  • #37: Can use it to generate screenshots of
  • #38: And effectively crawl single page applications
  • #39: JS Can be a bit of a pain to work with, but unfortunately, it is not going away!
  • #40: So Screaming Frog (and a majority of crawling softwares), utilise something like headless chrome to emulate a browser, and provide JS rendering features.
  • #41: And we all know about issues that Google can have with crawling JS, ranging from having slight issues with rendering, to completely drawing a blank.
  • #42: So there have been a bunch of JS indexing and rendering case studies over the past couple of years.
  • #43: So it can help you crawl these guys.
  • #44: We can also use Headless Chrome to automate web page checks, and I provide an in depth investigation to this later on in this deck.
  • #45: AND it can be used for general webpage testing. Including clicking on stuff, filling in forms, general fuckery with the mouse and keyboard.
  • #46: It is really good for emulating user behaviour. So great for pretending to be a user, and browsing around a site.
  • #47: SO it is basically really great for seeing exactly how much shit a website can take before it breaks!
  • #48: However, the problem with running all of these tasks is
  • #49: You have to run basic headless chrome through the command line interface
  • #50: So first you gotta install some dependencies, and have a shit ton of errors hit you in the face, and you gotta know where chrome is stored on your local machine...
  • #51: Then you gotta run directly from that location
  • #52: Then specify headless chrome to launch
  • #53: Then open a port to use
  • #54: Then you gotta disable GPU
  • #55: Then you can add a single URL, or a URL list into the command line
  • #56: Now then
  • #57: I really really really love using command line
  • #58: In fact so much so that I spoke about it at Brighton last year
  • #59: But doing all of this shit really really really really made me wanna cry
  • #60: So how do I make utilsiing headless chrome, which is freaking awesome - easy?
  • #61: Like I said a few minutes ago, I’m always trying to find ways to make my job easier
  • #62: And doing all of these boring ass steps was really really not easy. At All.
  • #63: So I went away and did a bigger shit ton of research.
  • #64: So, in this talk at Google IO, Eric mentions something called Google Puppeteer ()shoutout eric
  • #65: So what is Chrome Puppeteer?
  • #66: Doing a simple Google Search for Chrome Puppeteer reveals all.
  • #67: But the stuff I’m interested in is this. A Node Library, and
  • #68: Oooooooooo an API
  • #69: So Node - for those that do not have dev experience, can be used for making some pretty kick-ass applications
  • #70: It can also be used to help control headless chrome in an easy to digest and utilise package
  • #71: So Node - for those that do not have dev experience, can be used for making some pretty kick-ass applications
  • #72: So how can you actually get chrome pupppeteer?
  • #73: If you want to run tests on your local machine, you have to install a few things first.
  • #74: Node.js - which is a runtime environment, and NPM which is a package manager for node.
  • #75: Chill out though, it’s fairly straightforward
  • #76: Someone a while ago has made this easy
  • #77: So If you are on PC it’s fairly simple to get and install,
  • #79: You’ve just gotta install these things from the Node JS website
  • #80: I’ve linked to a guide here - that takes you through step by step.
  • #81: If, like me, you are on a Mac
  • #82: If, like me, you are on a Mac
  • #83: Its not that easy.
  • #84: There’s a wicked awesome guide here that takes you through step by step what you need to do.
  • #85: So you wanna start off by opening up terminal
  • #87: And then typing in a few lines of shit
  • #88: This installs homebrew, that makes everything even ez-er
  • #89: This installs homebrew, that makes everything even ez-er
  • #90: This installs homebrew, that makes everything even ez-er
  • #91: So when homebrew is downloaded - it shouldnt take too long - a max of 5 mins
  • #92: So You Have To Install 2 More Things, And We’ll Be Ready To Rock. These are npm and node.
  • #93: So just type in this. It installs node through homebrew, directly onto your machine with no fuckery.
  • #94: So this installs node and npm, you’ll get a nice progress bar tellling you how far along it is
  • #96: Then you wanna use npm to install the latest version of puppeteer.
  • #97: Now that’s it, you are all good and groovy!
  • #98: You can
  • #99: So for example.
  • #100: If I wanted to take a screenshotof a single page
  • #101: So just type in this, and you should be good to go.
  • #102: So just type in this, and you should be good to go.
  • #103: So just type in this, and you should be good to go.
  • #104: You’ll need to code some stuff up - but I’ve put everything together into a single google doc, that makes it simple & easy to understand what each bit does. Exmplain that you are going to go through it.
  • #105: So we are starting up a headless browser, in true headless mode, so you won’t see what goes on (running in the background)
  • #106: And then we are opening up a new tab/page
  • #107: And then we specify exactly what URL we want to go to. So in this instance, we are testing the BlueArray Hoempage
  • #108: Then we are taking a screenshot. We have to specify 2 things to allow the code to work correctly
  • #109: So the path, so where and what we want the file to be saved as
  • #110: And then saving as a specific filetype. Can fuck around with this, and get the ideal filetype that is good for you.
  • #111: And then we close the page, and then close the broswer.
  • #112: And then we close the page, and then close the broswer.
  • #113: Go to terminal, make sure you are in the same folder as your code, and type in
  • #114: Go to terminal, make sure you are in the same folder as your code, and type in
  • #115: Node screenshot.js.
  • #116: And then a couple of seconds later, you’ll see
  • #117: A nice screenshot get added to your folder with your code in
  • #118: If you wanted to see the browser test this exactly for you,
  • #119: Just change the headless mode to false. This is great for seeing exactly what the browser sees, and looks pretty cool, having a chrome window doing all sorts of shit in front of you!
  • #120: Just change the headless mode to false. This is great for seeing exactly what the browser sees, and looks pretty cool, having a chrome window doing all sorts of shit in front of you!
  • #121: You can also modify the script slightly to run through a list of provided URLs
  • #122: And then get a bunch of screenshots!
  • #123: Now I’m sure that you guys can see where this is headed
  • #124: Faking Googlebot and seeing what they would see
  • #125: So with a few little tweaks to the code that we have for the first example
  • #126: Adding in a user agent string, and setting it to what Googlebot use
  • #127: FYI Googlebot user agent string is not ‘Googlebot’ it is fucking massive
  • #128: FYI Googlebot user agent string is not ‘Googlebot’ it is fuckinhg massive
  • #129: And wouldn’t fit on the slide
  • #130: Node screenshot.js. Screenshot.js is the name of the file.
  • #131: Using the await page set viewport option
  • #132: So we have to specify the width and the height of the viewport that we want to use
  • #133: This isn’t reallt Googlebot, just a decent attempt at emulation
  • #134: AS unfortunately
  • #135: As puppeteer was launched way after Chrome 41, we cannot specify it to use this version of Chrome :*(
  • #136: As puppeteer was launched way after Chrome 41, we cannot specify it to use this version of Chrome :*(
  • #137: However
  • #138: This can be persuasive in getting a client to ensure that their content is Rendered Server Side, as opposed to client side, if needed
  • #139: This can be persuasive in getting a client to ensure that their content is Rendered Server Side, as opposed to client side, if needed
  • #140: We can then provide a list of URLs that we want to get screenshotted
  • #141: And show how they would appear to Google through puppeteer rendering, instead of
  • #142: In the case of some rather shit JS sites
  • #143: Absolutely fuck all
  • #144: Nothing - a blank page
  • #145: Which is pretty cool, and allows for bulk page testing
  • #146: But the really cool stuff is yet to come!
  • #147: So who here has heard of, or even used Content King?
  • #148: It’s a fairly awesome piece of software
  • #149: That allows you to monitor a site in -real time ish,
  • #150: With it alerting you of any issues such as
  • #151: Meta data changes, New pages that 404, Updated links, redirects, indexable and non-indexable pages….
  • #152: However!
  • #153: Like most really good tools, it costs money
  • #154: Maybe You Don’t Wanna Eat Into Your Budget For Content King for a personal project site, or you don’t need the level of detail that those guys provide for a smaller, shitter site?
  • #155: This Next Example Shows How We Can Use puppeteer to
  • #156: Monitor a chosen site when you want, and report of any changes to key areas
  • #157: Including some key areas, such as
  • #158: Meta title changes
  • #159: Meta description updates
  • #160: Any increase or decrease in the word count of the page.
  • #161: Pull out any robots directives, and highlights any differences between them
  • #162: Any differences in canonical elements
  • #163: So basically the really important shit from a HTML webpage
  • #164: So I wrote some code So I’ll be tweeting this out after for those who are interested..
  • #165: As with all coding, this required a bit of research
  • #166: Ahem stackoverflow ahem
  • #167: And with a little bit of luck
  • #168: We now have a way to monitor these basic areas for web pages
  • #169: This is how it works
  • #170: There is about 200 lines of code in total
  • #171: Heres a small snapshot
  • #172: An i don’t have time to go through the full thing today,
  • #173: but
  • #174: There are a few really interesting snippets that I’d really like to share, that can come in handy
  • #175: So we launch headless chrome as highlighted a few minutes ago
  • #176: Like so. So we launch the browser, and then create a new page within the browser, awaiting for further instruction...
  • #177: And then we provide a list of URLs for Puppeteer to go and fuck around with
  • #178: So here we are quoting the file that we will use for this program, we parse (or read it) using a couple more lines, that don’t really look that exciting!
  • #179: And then we pull in teh relevant meta data that I mentioned
  • #180: SO, for example
  • #181: Gonna show you guys how we pull in meta titles
  • #182: So we are just pulling the title from the page. If there isn’t one - we get an error, so add in this - n/a
  • #183: And then create an array of all the meta data - so a nice, formatted list of data that we can use later on within the script
  • #184: So this just tells the script to treat all this data as one line, that we can then refer back to later
  • #185: And we then pushed all this data to a text file
  • #186: The Script then loops through every URL that is provided, pullingout all data for each
  • #187: It then checks for differences in the data - so compares this run with the previous one.
  • #188: If there are any differences between the two sets of data, these get saved within a changes.txt file
  • #189: That i can then check whenever
  • #190: So I can see what has changed from yesterday, or whenever I last ran the code
  • #191: This required me to run the code each day manually
  • #192: That I completely forgot to do
  • #193: So, I went one step further, to make my life even easier
  • #194: Chucked the code on a Raspberry Pi
  • #195: And set up a cron job within my local machine to automatically run the script at the same time
  • #196: Every day
  • #197: And then
  • #198: This was the bit that took the most amount of time by faarrr
  • #199: Send an email to me if there were any changes.
  • #200: Send an email to me if there were any changes.
  • #202: Imgh
  • #203: Imgh
  • #204: Imgh
  • #205: Imgh
  • #206: Imgh
  • #207: Imgh
  • #208: Imgh
  • #209: Imgh
  • #210: Imgh
  • #211: Imgh