This document discusses techniques for ranking search results across heterogeneous information sources on LinkedIn. It describes how LinkedIn search handles different entity types at a large scale and how it predicts user intent to federate search across sources. It also summarizes methods for skill-based people search using skill reputation scores and job search ranking using expertise homophily between job postings and user profiles.
1 of 37
Downloaded 16 times
More Related Content
Search Ranking Across Heterogeneous Information Sources
1. Recruiting SolutionsRecruiting SolutionsRecruiting Solutions
Search Ranking Across Heterogeneous
Information Sources
Viet Ha-Thuc and Dhruv Arya
Search Quality - LinkedIn
1
Heterogeneous Information Access at SIGIR 2016
5. Unique Nature of LinkedIn Search
5
Heterogeneous sources
Different entity types: People, jobs, companies,
slideshares
Many use-cases: Hiring, sales, connecting, job
seeking, content discovery
Requires different features, training data and
objectives
Scale
400+MM members, 6+MM jobs, 18+MM slideshows
Federation across the sources
8. Agenda
Introduction
Vertical Ranking
Job Search [KDD16]
People Search by Skills [BigData15, SIGIR16]
Federation [CIKM15]
Lessons
8
9. Challenges of Job Search
Hidden structures
Query only represents a small fraction of information need
San Francisco, software engineer, java
Job attractiveness varies on many aspects
Hot titles: data scientist
Top companies: Google, Facebook, etc.
Trending skills: machine learning, big data, etc.,
Location
9
11. Expertise Homophily
Classic homophily in social networks
People tend to interact with similar ones
Expertise homophily in job search
Searcher tends to apply for jobs with similar expertise
Apply rate of job results with overlapping skills is 2x higher
Expertise:
Jobs: extract skills from job description
Searcher: explicit and implicit skills
Jaccard similarity
11
12. Entity-faceted CTRs
Job attractiveness
Historical CTRs for individual jobs
Challenge: job lifetime is short -> unreliable estimation
Entity-faceted historical CTRs
CTRs of jobs with standardized tile data scientist
CTRs of jobs from company IBM
CTRs of jobs requiring trending skill: machine learning, big data, etc.
Advantages
Alleviate data sparseness by grouping jobs by facets
Resolve cold start problem
12
14. Labeling Strategy
Job Applies, Views and Skips are considered
Uncertain (removed)
Skipped: label = 0
Good: label = 1Click
Applied Highest: label = 4
15. Learning to Rank
Listwise
Consider relevance is relative to every query
Allow optimizing quality metric directly
Objective function
Normalized Discounted Cumulative Gain (NDCG@K)
Graded relevance labels
15
16. Experiment Results
16
Baseline
All of the existing features except entity-aware ones
Machine learned
Optimized for the same objective function
CTR Apply Rate
Improvement +11.3% +5.3%
18. Introduction
Skills
Represent professional
expertise
35K+ standardized skills
Members get endorsed on
skills
Skill queries
Contains skills and no
personal name
18
19. Introduction
Unique challenges to LinkedIn expertise Search
Scale: 400M members x 35K standardized skills
Sparsity of skills in profiles
Personalization
19
20. Reputation
Information a decision maker uses to make a
judgment on an entity with a record (*)
20
(*) Building web reputation systems, Glass and Farmer, 2010
21. Skill Reputation Scores [Ha-Thuc et al. BigData15]
21
Decision Maker: searcher
Record: Professional
career
Skill reputation: member
expertise on a skill
Judgment: Hire?
23. Estimating Skill Reputation
23
Endorse
profile
browsemap
? .85 .45
? ? .35
? .42 ?
? ? .05
Members
Skills
0.5 1
0.7 0
0 0.6
0.1 0
0.2 0.3 0.5
0.5 0.7 0.2
Members
Skills
Each row is a representation of a
member in latent space
Each column
represents a skill in
latent space
Matrix Factorization
24. Estimating Skill Reputation
24
Endorse
profile
browsemap
? .85 .45
? ? .35
? .42 ?
.02 ? ?
Members
Skills
0.5 1
0.7 0
0 0.6
0.1 0
0.2 0.3 0.5
0.5 0.7 0.2
Members
Skills
.6 .85 .45
.14 .21 .35
.3 .42 .12
.02 .03 .05
Members
Skills
Fill in unknown cells in
the original matrix
31. Personalized Federated Model [Arya, Ha-Thuc et al. CIKM15]
Relevance scores from base rankers
Query intent: P(vertical| query)
Searcher intent
Mine searcher profiles and past behavior to infer intent
Title recruiter -> recruiting intent
Search for jobs -> job seeking intent
Machine-learned models predict member intents:
Job seeking
Recruiting
Content consuming
31
32. Calibrate Signals across Verticals
Verticals associate with different intents
32
People Result
Job Result
Group Result
Recruiting
Intent
Job Seeking
Intent
Content
Consuming
Intent
33. Calibrate Signals across Verticals
Verticals associate with different intents
33
People Result
Job Result
Group Result
Recruiting
Intent
Job Seeking
Intent
Content
Consuming
Intent
34. Calibrate Signals across Verticals
Verticals associate with different intents
34
People Result
Job Result
Group Result
Recruiting
Intent
Job Seeking
Intent
Content
Consuming
Intent
35. Take-Aways
Text match is still important but not enough
Advanced features based on semi-structured
data
People search: skill reputation scores
Job Search: expertise homophily
Personalized Learning-to-Rank is crucial
35
36. References
Personalized Expertise Search at LinkedIn, Ha-Thuc,
Venkataraman, Rodriguez, Sinha, Sundaram and Guo,
BigData, 2015
Personalized Federated Search at LinkedIn, Arya, Ha-
Thuc and Sinha, CIKM, 2015
Learning to Rank Personalized Search Results in
Professional Networks, Ha-Thuc and Sinha, SIGIR, 2016
How to Get Them a Dream Job?, Li, Arya, Ha-Thuc,
Sinha, KDD, 2016
36