際際滷

際際滷Share a Scribd company logo
Recruiting SolutionsRecruiting SolutionsRecruiting Solutions
Search Ranking Across Heterogeneous
Information Sources
Viet Ha-Thuc and Dhruv Arya
Search Quality - LinkedIn
1
Heterogeneous Information Access at SIGIR 2016
2
 200+ countries and
territories
 2+ new members per
second
3
 Dual Roles of Search
 Enable talent discover opportunity
 Help companies to search for the right talent
4
FLAGSHIP SEARCH
RECRUITER SEARCH
SALES NAVIGATOR
Unique Nature of LinkedIn Search
5
 Heterogeneous sources
 Different entity types: People, jobs, companies,
slideshares
 Many use-cases: Hiring, sales, connecting, job
seeking, content discovery
 Requires different features, training data and
objectives
Scale
 400+MM members, 6+MM jobs, 18+MM slideshows
Federation across the sources
Overview
6
Query
Federated Search
Spell Correction
Query Tagging
Intent Prediction
People Companies
Federated Search
Page Construction
Name Title Skill
Jobs
Overview
7
Query
Federated Search
Spell Correction
Query Tagging
Intent Prediction
People Companies
Federated Search
Page Construction
Name Title Skill
Jobs
Agenda
 Introduction
 Vertical Ranking
Job Search [KDD16]
People Search by Skills [BigData15, SIGIR16]
 Federation [CIKM15]
 Lessons
8
Challenges of Job Search
 Hidden structures
 Query only represents a small fraction of information need
San Francisco, software engineer, java
Job attractiveness varies on many aspects
Hot titles: data scientist
Top companies: Google, Facebook, etc.
Trending skills: machine learning, big data, etc.,
Location
9
Entity-Aware Matching
10
Expertise Homophily
 Classic homophily in social networks
People tend to interact with similar ones
Expertise homophily in job search
Searcher tends to apply for jobs with similar expertise
Apply rate of job results with overlapping skills is 2x higher
Expertise:
Jobs: extract skills from job description
Searcher: explicit and implicit skills
Jaccard similarity
11
Entity-faceted CTRs
 Job attractiveness
Historical CTRs for individual jobs
Challenge: job lifetime is short -> unreliable estimation
Entity-faceted historical CTRs
CTRs of jobs with standardized tile data scientist
CTRs of jobs from company IBM
CTRs of jobs requiring trending skill: machine learning, big data, etc.
Advantages
Alleviate data sparseness by grouping jobs by facets
Resolve cold start problem
12
Other features
13
Labeling Strategy
 Job Applies, Views and Skips are considered
Uncertain (removed)
Skipped: label = 0
Good: label = 1Click
Applied Highest: label = 4
Learning to Rank
 Listwise
 Consider relevance is relative to every query
 Allow optimizing quality metric directly
 Objective function
 Normalized Discounted Cumulative Gain (NDCG@K)
 Graded relevance labels
15
Experiment Results
16
 Baseline
All of the existing features except entity-aware ones
Machine learned
Optimized for the same objective function
CTR Apply Rate
Improvement +11.3% +5.3%
Overview
17
Query
Federated Search
Spell Correction
Query Tagging
Intent Prediction
People Companies
Federated Search
Page Construction
Name Title Skill
Jobs
Introduction
 Skills
 Represent professional
expertise
 35K+ standardized skills
 Members get endorsed on
skills
Skill queries
 Contains skills and no
personal name
18
Introduction
 Unique challenges to LinkedIn expertise Search
 Scale: 400M members x 35K standardized skills
 Sparsity of skills in profiles
 Personalization
19
Reputation
Information a decision maker uses to make a
judgment on an entity with a record (*)
20
(*) Building web reputation systems, Glass and Farmer, 2010
Skill Reputation Scores [Ha-Thuc et al. BigData15]
21
 Decision Maker: searcher
 Record: Professional
career
 Skill reputation: member
expertise on a skill
 Judgment: Hire?
Estimating Skill Reputation
22
Endorse
profile
browsemap
? .85 .45
? ? .35
? .42 ?
? ? .05
Members
Skills
P(expert| member, skill)
Supervised
Learning
algorithm
Estimating Skill Reputation
23
Endorse
profile
browsemap
? .85 .45
? ? .35
? .42 ?
? ? .05
Members
Skills
0.5 1
0.7 0
0 0.6
0.1 0
0.2 0.3 0.5
0.5 0.7 0.2
Members
Skills
Each row is a representation of a
member in latent space
Each column
represents a skill in
latent space
Matrix Factorization
Estimating Skill Reputation
24
Endorse
profile
browsemap
? .85 .45
? ? .35
? .42 ?
.02 ? ?
Members
Skills
0.5 1
0.7 0
0 0.6
0.1 0
0.2 0.3 0.5
0.5 0.7 0.2
Members
Skills
.6 .85 .45
.14 .21 .35
.3 .42 .12
.02 .03 .05
Members
Skills
Fill in unknown cells in
the original matrix
Features
 Reputation feature
 Social Connection
 Homophily
 Geo
 Industry
 Textual Features
25
Experiments
CTR@10 # Messages
per Search
Flagship +11% +20%
Premium +18% +37%
26
 Query Tagging
 Target Segment: skill and no-name
 Baseline
 No skill reputation feature
 Hand-tuned
Overview
27
Query
Federated Search
Spell Correction
Query Tagging
Intent Prediction
People Companies
Federated Search
Page Construction
Name Title Skill
Jobs
Personalized Federated Search
28
 Why do we need this?
29
Personalized Federated Search - Motivation
Personalized Federated Search - Overall
30
Personalized Federated Model [Arya, Ha-Thuc et al. CIKM15]
 Relevance scores from base rankers
 Query intent: P(vertical| query)
 Searcher intent
 Mine searcher profiles and past behavior to infer intent
 Title recruiter -> recruiting intent
 Search for jobs -> job seeking intent
 Machine-learned models predict member intents:
 Job seeking
 Recruiting
 Content consuming
31
Calibrate Signals across Verticals
 Verticals associate with different intents
32
People Result
Job Result
Group Result
Recruiting
Intent
Job Seeking
Intent
Content
Consuming
Intent
Calibrate Signals across Verticals
 Verticals associate with different intents
33
People Result
Job Result
Group Result
Recruiting
Intent
Job Seeking
Intent
Content
Consuming
Intent
Calibrate Signals across Verticals
 Verticals associate with different intents
34
People Result
Job Result
Group Result
Recruiting
Intent
Job Seeking
Intent
Content
Consuming
Intent
Take-Aways
 Text match is still important but not enough
 Advanced features based on semi-structured
data
 People search: skill reputation scores
 Job Search: expertise homophily
 Personalized Learning-to-Rank is crucial
35
References
Personalized Expertise Search at LinkedIn, Ha-Thuc,
Venkataraman, Rodriguez, Sinha, Sundaram and Guo,
BigData, 2015
Personalized Federated Search at LinkedIn, Arya, Ha-
Thuc and Sinha, CIKM, 2015
Learning to Rank Personalized Search Results in
Professional Networks, Ha-Thuc and Sinha, SIGIR, 2016
How to Get Them a Dream Job?, Li, Arya, Ha-Thuc,
Sinha, KDD, 2016
36
37

More Related Content

Search Ranking Across Heterogeneous Information Sources

  • 1. Recruiting SolutionsRecruiting SolutionsRecruiting Solutions Search Ranking Across Heterogeneous Information Sources Viet Ha-Thuc and Dhruv Arya Search Quality - LinkedIn 1 Heterogeneous Information Access at SIGIR 2016
  • 2. 2 200+ countries and territories 2+ new members per second
  • 3. 3 Dual Roles of Search Enable talent discover opportunity Help companies to search for the right talent
  • 5. Unique Nature of LinkedIn Search 5 Heterogeneous sources Different entity types: People, jobs, companies, slideshares Many use-cases: Hiring, sales, connecting, job seeking, content discovery Requires different features, training data and objectives Scale 400+MM members, 6+MM jobs, 18+MM slideshows Federation across the sources
  • 6. Overview 6 Query Federated Search Spell Correction Query Tagging Intent Prediction People Companies Federated Search Page Construction Name Title Skill Jobs
  • 7. Overview 7 Query Federated Search Spell Correction Query Tagging Intent Prediction People Companies Federated Search Page Construction Name Title Skill Jobs
  • 8. Agenda Introduction Vertical Ranking Job Search [KDD16] People Search by Skills [BigData15, SIGIR16] Federation [CIKM15] Lessons 8
  • 9. Challenges of Job Search Hidden structures Query only represents a small fraction of information need San Francisco, software engineer, java Job attractiveness varies on many aspects Hot titles: data scientist Top companies: Google, Facebook, etc. Trending skills: machine learning, big data, etc., Location 9
  • 11. Expertise Homophily Classic homophily in social networks People tend to interact with similar ones Expertise homophily in job search Searcher tends to apply for jobs with similar expertise Apply rate of job results with overlapping skills is 2x higher Expertise: Jobs: extract skills from job description Searcher: explicit and implicit skills Jaccard similarity 11
  • 12. Entity-faceted CTRs Job attractiveness Historical CTRs for individual jobs Challenge: job lifetime is short -> unreliable estimation Entity-faceted historical CTRs CTRs of jobs with standardized tile data scientist CTRs of jobs from company IBM CTRs of jobs requiring trending skill: machine learning, big data, etc. Advantages Alleviate data sparseness by grouping jobs by facets Resolve cold start problem 12
  • 14. Labeling Strategy Job Applies, Views and Skips are considered Uncertain (removed) Skipped: label = 0 Good: label = 1Click Applied Highest: label = 4
  • 15. Learning to Rank Listwise Consider relevance is relative to every query Allow optimizing quality metric directly Objective function Normalized Discounted Cumulative Gain (NDCG@K) Graded relevance labels 15
  • 16. Experiment Results 16 Baseline All of the existing features except entity-aware ones Machine learned Optimized for the same objective function CTR Apply Rate Improvement +11.3% +5.3%
  • 17. Overview 17 Query Federated Search Spell Correction Query Tagging Intent Prediction People Companies Federated Search Page Construction Name Title Skill Jobs
  • 18. Introduction Skills Represent professional expertise 35K+ standardized skills Members get endorsed on skills Skill queries Contains skills and no personal name 18
  • 19. Introduction Unique challenges to LinkedIn expertise Search Scale: 400M members x 35K standardized skills Sparsity of skills in profiles Personalization 19
  • 20. Reputation Information a decision maker uses to make a judgment on an entity with a record (*) 20 (*) Building web reputation systems, Glass and Farmer, 2010
  • 21. Skill Reputation Scores [Ha-Thuc et al. BigData15] 21 Decision Maker: searcher Record: Professional career Skill reputation: member expertise on a skill Judgment: Hire?
  • 22. Estimating Skill Reputation 22 Endorse profile browsemap ? .85 .45 ? ? .35 ? .42 ? ? ? .05 Members Skills P(expert| member, skill) Supervised Learning algorithm
  • 23. Estimating Skill Reputation 23 Endorse profile browsemap ? .85 .45 ? ? .35 ? .42 ? ? ? .05 Members Skills 0.5 1 0.7 0 0 0.6 0.1 0 0.2 0.3 0.5 0.5 0.7 0.2 Members Skills Each row is a representation of a member in latent space Each column represents a skill in latent space Matrix Factorization
  • 24. Estimating Skill Reputation 24 Endorse profile browsemap ? .85 .45 ? ? .35 ? .42 ? .02 ? ? Members Skills 0.5 1 0.7 0 0 0.6 0.1 0 0.2 0.3 0.5 0.5 0.7 0.2 Members Skills .6 .85 .45 .14 .21 .35 .3 .42 .12 .02 .03 .05 Members Skills Fill in unknown cells in the original matrix
  • 25. Features Reputation feature Social Connection Homophily Geo Industry Textual Features 25
  • 26. Experiments CTR@10 # Messages per Search Flagship +11% +20% Premium +18% +37% 26 Query Tagging Target Segment: skill and no-name Baseline No skill reputation feature Hand-tuned
  • 27. Overview 27 Query Federated Search Spell Correction Query Tagging Intent Prediction People Companies Federated Search Page Construction Name Title Skill Jobs
  • 29. Why do we need this? 29 Personalized Federated Search - Motivation
  • 31. Personalized Federated Model [Arya, Ha-Thuc et al. CIKM15] Relevance scores from base rankers Query intent: P(vertical| query) Searcher intent Mine searcher profiles and past behavior to infer intent Title recruiter -> recruiting intent Search for jobs -> job seeking intent Machine-learned models predict member intents: Job seeking Recruiting Content consuming 31
  • 32. Calibrate Signals across Verticals Verticals associate with different intents 32 People Result Job Result Group Result Recruiting Intent Job Seeking Intent Content Consuming Intent
  • 33. Calibrate Signals across Verticals Verticals associate with different intents 33 People Result Job Result Group Result Recruiting Intent Job Seeking Intent Content Consuming Intent
  • 34. Calibrate Signals across Verticals Verticals associate with different intents 34 People Result Job Result Group Result Recruiting Intent Job Seeking Intent Content Consuming Intent
  • 35. Take-Aways Text match is still important but not enough Advanced features based on semi-structured data People search: skill reputation scores Job Search: expertise homophily Personalized Learning-to-Rank is crucial 35
  • 36. References Personalized Expertise Search at LinkedIn, Ha-Thuc, Venkataraman, Rodriguez, Sinha, Sundaram and Guo, BigData, 2015 Personalized Federated Search at LinkedIn, Arya, Ha- Thuc and Sinha, CIKM, 2015 Learning to Rank Personalized Search Results in Professional Networks, Ha-Thuc and Sinha, SIGIR, 2016 How to Get Them a Dream Job?, Li, Arya, Ha-Thuc, Sinha, KDD, 2016 36
  • 37. 37