際際滷

際際滷Share a Scribd company logo
BLENDING SEO, DISCOVER, &
ENTITY EXTRACTION TO
ANLYZE DATA AT SCALE
L I L Y R A Y
S E O K T O B E R F E S T 2 0 2 2
Sr. Director of SEO
& Head of Organic Research,
Amsive Digital
Google talks a lot about signals
Helpful Content Update
How Google Delivers Reliable Information in Search
Ivebeenonamult-
yearquesttoidentify
andanalyzethesignals
thatalignwith
How can a lowly SEO
compete the vast amount of
data Google uses to
understand these signals?
Big data analysis.
(using as many sites as possible.)
Questions we can answer with big data analysis:
 What are the elements found on top performing pages?
 Which topics drive the most traffic?
 Are certain topics off-limits?
 What are the qualities of top-performing headlines?
 Which authors drive the best performance?
 Do social signals correlate with SEO/Discover performance?
My process:
(or any tool that connects to the GSC API)
(any data visualization tool is fine)
(or the crawler of your choice)
for measuring social signals
for analyzing readability
the shining star of todays talk  entity extraction & Knowledge Graph
Identifying potential signals:
on-page factors
What elements could count as signals?
What are all the possible shared aspects of
a page with good E-A-T?
Identifying Signals:
Step 1:
Crawl & custom extract all the things
What could possibly count as an on-page signal that
counts towards E-A-T?
1. Titles
2. Headlines
3. Breadcrumbs
4. Star ratings
5. Author name
6. Expert name
7. Date published/modified
8. Internal / external links
9. Bibliographies
(*not an exhaustive list!
Use your imagination.)
Example Custom Extract
Step 2:
Cross-reference with real performance data
(if available) or 3rd-party traffic/visibility data
2. My preferred technique for collecting search & Discover data
Analytics Edge (inexpensive & easy to use!)
+ can use a macro to collect data for various GSC properties
Read Glenn Gabes article about how to use Analytics
Edge for an easy tutorial
Step 3:
Collect relevant data from other tools (on the
URL level)
URL
Readability
Rating
Flesch Reading
Ease
Flesch Kincaid
GradeLevel Sentiment
Sentiment
Number Tone
Tone
Number
Personalis
m
All Items (weight_loss_plateau_-_1st_SERP_urls.csv) A 60.5 8 Negative 40 Formal 18 Personal
https://www.mayoclinic.org/healthy-lifestyle/weight-loss/in-
depth/weight-loss-plateau/art-20044615 A 64.83 6.28 Negative 17 Formal 29 Personal
https://en.wikipedia.org/wiki/Mayo_Clinic B 43.16 8.91 Positive 83 Formal 11
Impersona
l
https://www.mayoclinic.org/healthy-lifestyle/weight-loss/in-
depth/weight-loss-plateau/art-20044615 A 64.83 6.28 Negative 17 Formal 29 Personal
https://www.healthline.com/nutrition/weight-loss-plateau B 55.96 9.26 Negative 17 Formal 10 Personal
https://en.wikipedia.org/wiki/Healthline B 42.75 8.84 Positive 83 Formal 13
https://www.healthline.com/nutrition/weight-loss-plateau B 55.96 9.26 Negative 17 Formal 10 Personal
https://en.wikipedia.org/wiki/Healthline B 42.75 8.84 Positive 83 Formal 13
https://www.healthline.com/nutrition/weight-loss-plateau B 55.96 9.26 Negative 17 Formal 10 Personal
https://www.secondnature.io/us/guides/mind/motivation/wei
ght-loss-plateaus-explained B 65.31 8.61 Negative 17 Formal 27
Impersona
l
https://www.secondnature.io/us/guides/mind/motivation/wei
ght-loss-plateaus-explained B 65.31 8.61 Negative 17 Formal 27
Impersona
l
https://www.medicalnewstoday.com/articles/326415 B 50.47 9.6 Negative 17 Formal 7
https://en.wikipedia.org/wiki/Medical_News_Today B 44.03 8.11 Positive 60 Formal 16 Neutral
https://www.medicalnewstoday.com/articles/326415 B 50.47 9.6 Negative 17 Formal 7
https://www.healthline.com/nutrition/20-reasons-you-are-
not-losing-weight A 68.07 6.53 Negative 17 Formal 20 Personal
https://www.healthline.com/nutrition/20-reasons-you-are-
not-losing-weight A 68.07 6.53 Negative 17 Formal 20 Personal
https://www.verywellfit.com/understanding-weight-loss-
plateaus-1229951 B 64.72 8.09 Positive 83 Formal 27 Personal
Readability Metrics from
Social & Engagement Metrics from
Backlink Analysis from
And introducing the secret weapon
SEOktoberfest 2022 - Blending SEO, Discover, & Entity Extraction to Analyze Data at Scale, from Lily Ray
Evaluating Entities:
Diffbot Natural Language API
demo.nl.diffbot.com
h/t @carlos_darko
has its own Knowledge Graph
Evaluating Entities (Organizations):
Diffbot Natural Language API
Evaluating Entities (People):
Diffbot Natural Language API
You can bulk export this data
Pick which attributes to export
Client Case Study:
Problem:
client has a directory full of thousands of company pages, but
many were thin, unoriginal and not receiving organic traffic
Client Case Study:
Solution with Diffbot:
Identified underperforming company page URLs. Took
underperforming company names, ran through Diffbot.
Exported in a CSV:
 Employee count
 Company description
 Social media links
 Quora topic ID
 Wikipedia URL
 Crunchbase URL
 Region
 City
 Address
 Phone Number
 Stock symbol
 Investment amounts
 Industries
 Category
 Subsidiary
 Parent Company
 Names of board members
Diffbots Knowledge Graph enabled our SEO
team to provide client with relevant content
at scale for thousands of thin pages
And the best 沿温姻岳
Entity extraction using NLP!
Options for Extracting Entity Data from URLs:
Author Author Link Categories
Breadcrumbs Date Published
Icon Images Location Address Publisher Country Publisher Region
Site Name
Sentiment Tags Timestamp Type
Blending the Data
Matching Data Sources by the URL in PowerBI
What are the results?
Identifying patterns across clients at scale
Highest Traffic Discover Content (Across 50+ Sites, past 12 months)
Highest Traffic Content in Discover (Examples):
CURES TO ILLNESSES
likely cause of Alzheimers
discovered in a new study (2.3 MM
clicks!)
HEALTH HACKS
too much melatonin? Are PB&Js healthy?
Lemon-ginger tea before bed? Best time of
day to take vitamin D?
MONEY TIPS
best cities to retire on $2,000 a
month, things to never buy at a
dollar store, student loan forgiveness
SHOCKING STORIES
Airbnb host fed guests dog food,
harmful parenting techniques,
cybersecurity risks
SPORTS DRAMA
Kobe Bryant, Larry
Birds personal lives
MEDICAL SCARES
Salmonella recalls, signs of
cognitive decline, side effects
of lemon water, sociopath
tests
A High-Level Summary of What Drives
Traffic in Discover
(500K+ clicks in 16 months):
Emotional,
shocking headlines
Highly
relatable topics Drama Enticing questions
Things that might
impact *me*
Nostalgia
and memorabilia
Potential danger
and destruction
Bringing in the power of
Crawling every site with a unique custom
extraction setup is too much work
Diffbot enables you to extract relevant entity
information from URL batches at scale.
What are the results?
Client
names
redacted
Analyze SEO & Discover Performance for Various Clients
Compare SEO Traffic x Discover Traffic by Diffbot Tags Across 50+ Sites
Bubble size = number of articles
Color = sentiment (green is positive)
Drilling Deeper: Analyzing the Data
 Consumer tech drives 15+ million clicks in Discover: iPhone
release dates, delete this app immediately
 Beloved celebrities and athletes from 20+ years ago
(Michael Jordan, Larry Bird, Scottie Pippen) drive
significant Discover traffic
 Specific interests and hobbies perform well in Discover
(Nascar, Netflix shows, crypto, BTS)
Drilling Deeper: Filter by Tag = Netflix
Notice a pattern?
Drilling Deeper:
Compare SEO Traffic x Discover Traffic by Diffbot Tags Across 50+ Sites 
Filter by Entertainment
Bubble size = number of articles
Color = sentiment (green is positive)
Drilling Deeper: Analyzing the Data
 Topics related to old/dead celebrities and nostalgia over
index in Discover (Michael Jordan, Larry Bird, Dolly Parton,
Elvis Presley, George Harrison)
 Current Netflix shows get consistent traffic from both
search & Discover
Drilling Deeper:
Compare SEO Traffic x Discover Traffic by Diffbot Tags Across 50+ Sites 
Filter by Video Games
Bubble size = number of articles
Color = sentiment (green is positive)
Fortnite has much higher
search volume, but Destiny 2
drives more traffic in Discover
Compare SEO Traffic x Discover Traffic by Diffbot Authors & Average Sentiment
Across 50+ Sites
Bubble size = number of articles
Color = sentiment (green is positive)
Kyle & Zack drive significant
Discover traffic with relatively
few articles
Same for Rachael with SEO
traffic
Compare SEO Traffic x Discover Traffic by Diffbot Authors & Average Sentiment
Across 50+ Sites  Filtered by Health
SEOktoberfest 2022 - Blending SEO, Discover, & Entity Extraction to Analyze Data at Scale, from Lily Ray
The Discover/SEO and SEO/Discover Ratio:
Discover / SEO SEO / Discover
Publishing Client: Stay in Your Lane
Potential E-A-T issue
Lets play a game: DISCOVER MAD LIBS
Please provide:
1. adjective, plural noun
2. App name
3. Drug name
4. Number
5. Drug/supplement name
6. Number
7. App name
8. Netflix show, Netflix show
9. Type of alcohol
How to Write Headlines for Discover (Mad Libs style):
1. These are the most [adjective] [plural noun] this season
2. [Number] reasons why you should delete [app name] off your phone
3. [Drug name]  How much is too much?
4. Best cities to retire on a budget of [number] per month
5. Best time of day to take [supplement]
6. Google confirms Google Chrome has these [number] new security
threats
7. Why you should switch [app name] for this alternative
8. [Netflix show] dethroned by [Netflix show]
9. Scientists have finally figured out the risks of consuming [alcohol]
You, too, can get 1+ million clicks from Discover!
SEOktoberfest 2022 - Blending SEO, Discover, & Entity Extraction to Analyze Data at Scale, from Lily Ray

More Related Content

SEOktoberfest 2022 - Blending SEO, Discover, & Entity Extraction to Analyze Data at Scale, from Lily Ray

  • 1. BLENDING SEO, DISCOVER, & ENTITY EXTRACTION TO ANLYZE DATA AT SCALE L I L Y R A Y S E O K T O B E R F E S T 2 0 2 2
  • 2. Sr. Director of SEO & Head of Organic Research, Amsive Digital
  • 3. Google talks a lot about signals
  • 5. How Google Delivers Reliable Information in Search
  • 7. How can a lowly SEO compete the vast amount of data Google uses to understand these signals?
  • 8. Big data analysis. (using as many sites as possible.)
  • 9. Questions we can answer with big data analysis: What are the elements found on top performing pages? Which topics drive the most traffic? Are certain topics off-limits? What are the qualities of top-performing headlines? Which authors drive the best performance? Do social signals correlate with SEO/Discover performance?
  • 10. My process: (or any tool that connects to the GSC API) (any data visualization tool is fine) (or the crawler of your choice) for measuring social signals for analyzing readability the shining star of todays talk entity extraction & Knowledge Graph
  • 12. What elements could count as signals?
  • 13. What are all the possible shared aspects of a page with good E-A-T?
  • 15. Step 1: Crawl & custom extract all the things
  • 16. What could possibly count as an on-page signal that counts towards E-A-T? 1. Titles 2. Headlines 3. Breadcrumbs 4. Star ratings 5. Author name 6. Expert name 7. Date published/modified 8. Internal / external links 9. Bibliographies (*not an exhaustive list! Use your imagination.)
  • 18. Step 2: Cross-reference with real performance data (if available) or 3rd-party traffic/visibility data
  • 19. 2. My preferred technique for collecting search & Discover data Analytics Edge (inexpensive & easy to use!) + can use a macro to collect data for various GSC properties
  • 20. Read Glenn Gabes article about how to use Analytics Edge for an easy tutorial
  • 21. Step 3: Collect relevant data from other tools (on the URL level)
  • 22. URL Readability Rating Flesch Reading Ease Flesch Kincaid GradeLevel Sentiment Sentiment Number Tone Tone Number Personalis m All Items (weight_loss_plateau_-_1st_SERP_urls.csv) A 60.5 8 Negative 40 Formal 18 Personal https://www.mayoclinic.org/healthy-lifestyle/weight-loss/in- depth/weight-loss-plateau/art-20044615 A 64.83 6.28 Negative 17 Formal 29 Personal https://en.wikipedia.org/wiki/Mayo_Clinic B 43.16 8.91 Positive 83 Formal 11 Impersona l https://www.mayoclinic.org/healthy-lifestyle/weight-loss/in- depth/weight-loss-plateau/art-20044615 A 64.83 6.28 Negative 17 Formal 29 Personal https://www.healthline.com/nutrition/weight-loss-plateau B 55.96 9.26 Negative 17 Formal 10 Personal https://en.wikipedia.org/wiki/Healthline B 42.75 8.84 Positive 83 Formal 13 https://www.healthline.com/nutrition/weight-loss-plateau B 55.96 9.26 Negative 17 Formal 10 Personal https://en.wikipedia.org/wiki/Healthline B 42.75 8.84 Positive 83 Formal 13 https://www.healthline.com/nutrition/weight-loss-plateau B 55.96 9.26 Negative 17 Formal 10 Personal https://www.secondnature.io/us/guides/mind/motivation/wei ght-loss-plateaus-explained B 65.31 8.61 Negative 17 Formal 27 Impersona l https://www.secondnature.io/us/guides/mind/motivation/wei ght-loss-plateaus-explained B 65.31 8.61 Negative 17 Formal 27 Impersona l https://www.medicalnewstoday.com/articles/326415 B 50.47 9.6 Negative 17 Formal 7 https://en.wikipedia.org/wiki/Medical_News_Today B 44.03 8.11 Positive 60 Formal 16 Neutral https://www.medicalnewstoday.com/articles/326415 B 50.47 9.6 Negative 17 Formal 7 https://www.healthline.com/nutrition/20-reasons-you-are- not-losing-weight A 68.07 6.53 Negative 17 Formal 20 Personal https://www.healthline.com/nutrition/20-reasons-you-are- not-losing-weight A 68.07 6.53 Negative 17 Formal 20 Personal https://www.verywellfit.com/understanding-weight-loss- plateaus-1229951 B 64.72 8.09 Positive 83 Formal 27 Personal Readability Metrics from
  • 23. Social & Engagement Metrics from
  • 25. And introducing the secret weapon
  • 27. Evaluating Entities: Diffbot Natural Language API demo.nl.diffbot.com h/t @carlos_darko
  • 28. has its own Knowledge Graph
  • 31. You can bulk export this data
  • 33. Client Case Study: Problem: client has a directory full of thousands of company pages, but many were thin, unoriginal and not receiving organic traffic
  • 34. Client Case Study: Solution with Diffbot: Identified underperforming company page URLs. Took underperforming company names, ran through Diffbot. Exported in a CSV: Employee count Company description Social media links Quora topic ID Wikipedia URL Crunchbase URL Region City Address Phone Number Stock symbol Investment amounts Industries Category Subsidiary Parent Company Names of board members
  • 35. Diffbots Knowledge Graph enabled our SEO team to provide client with relevant content at scale for thousands of thin pages
  • 36. And the best 沿温姻岳
  • 38. Options for Extracting Entity Data from URLs: Author Author Link Categories Breadcrumbs Date Published Icon Images Location Address Publisher Country Publisher Region Site Name Sentiment Tags Timestamp Type
  • 40. Matching Data Sources by the URL in PowerBI
  • 41. What are the results?
  • 42. Identifying patterns across clients at scale
  • 43. Highest Traffic Discover Content (Across 50+ Sites, past 12 months)
  • 44. Highest Traffic Content in Discover (Examples): CURES TO ILLNESSES likely cause of Alzheimers discovered in a new study (2.3 MM clicks!) HEALTH HACKS too much melatonin? Are PB&Js healthy? Lemon-ginger tea before bed? Best time of day to take vitamin D? MONEY TIPS best cities to retire on $2,000 a month, things to never buy at a dollar store, student loan forgiveness SHOCKING STORIES Airbnb host fed guests dog food, harmful parenting techniques, cybersecurity risks SPORTS DRAMA Kobe Bryant, Larry Birds personal lives MEDICAL SCARES Salmonella recalls, signs of cognitive decline, side effects of lemon water, sociopath tests
  • 45. A High-Level Summary of What Drives Traffic in Discover (500K+ clicks in 16 months): Emotional, shocking headlines Highly relatable topics Drama Enticing questions Things that might impact *me* Nostalgia and memorabilia Potential danger and destruction
  • 46. Bringing in the power of
  • 47. Crawling every site with a unique custom extraction setup is too much work Diffbot enables you to extract relevant entity information from URL batches at scale.
  • 48. What are the results?
  • 49. Client names redacted Analyze SEO & Discover Performance for Various Clients
  • 50. Compare SEO Traffic x Discover Traffic by Diffbot Tags Across 50+ Sites Bubble size = number of articles Color = sentiment (green is positive)
  • 51. Drilling Deeper: Analyzing the Data Consumer tech drives 15+ million clicks in Discover: iPhone release dates, delete this app immediately Beloved celebrities and athletes from 20+ years ago (Michael Jordan, Larry Bird, Scottie Pippen) drive significant Discover traffic Specific interests and hobbies perform well in Discover (Nascar, Netflix shows, crypto, BTS)
  • 52. Drilling Deeper: Filter by Tag = Netflix Notice a pattern?
  • 53. Drilling Deeper: Compare SEO Traffic x Discover Traffic by Diffbot Tags Across 50+ Sites Filter by Entertainment Bubble size = number of articles Color = sentiment (green is positive)
  • 54. Drilling Deeper: Analyzing the Data Topics related to old/dead celebrities and nostalgia over index in Discover (Michael Jordan, Larry Bird, Dolly Parton, Elvis Presley, George Harrison) Current Netflix shows get consistent traffic from both search & Discover
  • 55. Drilling Deeper: Compare SEO Traffic x Discover Traffic by Diffbot Tags Across 50+ Sites Filter by Video Games Bubble size = number of articles Color = sentiment (green is positive) Fortnite has much higher search volume, but Destiny 2 drives more traffic in Discover
  • 56. Compare SEO Traffic x Discover Traffic by Diffbot Authors & Average Sentiment Across 50+ Sites Bubble size = number of articles Color = sentiment (green is positive) Kyle & Zack drive significant Discover traffic with relatively few articles Same for Rachael with SEO traffic
  • 57. Compare SEO Traffic x Discover Traffic by Diffbot Authors & Average Sentiment Across 50+ Sites Filtered by Health
  • 59. The Discover/SEO and SEO/Discover Ratio: Discover / SEO SEO / Discover
  • 60. Publishing Client: Stay in Your Lane Potential E-A-T issue
  • 61. Lets play a game: DISCOVER MAD LIBS
  • 62. Please provide: 1. adjective, plural noun 2. App name 3. Drug name 4. Number 5. Drug/supplement name 6. Number 7. App name 8. Netflix show, Netflix show 9. Type of alcohol
  • 63. How to Write Headlines for Discover (Mad Libs style): 1. These are the most [adjective] [plural noun] this season 2. [Number] reasons why you should delete [app name] off your phone 3. [Drug name] How much is too much? 4. Best cities to retire on a budget of [number] per month 5. Best time of day to take [supplement] 6. Google confirms Google Chrome has these [number] new security threats 7. Why you should switch [app name] for this alternative 8. [Netflix show] dethroned by [Netflix show] 9. Scientists have finally figured out the risks of consuming [alcohol]
  • 64. You, too, can get 1+ million clicks from Discover!