�ݺ�ߣ

Recruiting SolutionsRecruiting SolutionsRecruiting Solutions
Search Ranking Across Heterogeneous
Information Sources
Viet Ha-Thuc and Dhruv Arya
Search Quality - LinkedIn
1
Heterogeneous Information Access at SIGIR 2016

2
• 200+ countries and
territories
• 2+ new members per
second

3
● Dual Roles of Search
○ Enable talent discover opportunity
○ Help companies to search for the right talent

4
FLAGSHIP SEARCH
RECRUITER SEARCH
SALES NAVIGATOR

Unique Nature of LinkedIn Search
5
▪ Heterogeneous sources
– Different entity types: People, jobs, companies,
slideshares
– Many use-cases: Hiring, sales, connecting, job
seeking, content discovery
– Requires different features, training data and
objectives
▪Scale
– 400+MM members, 6+MM jobs, 18+MM slideshows
▪Federation across the sources

Overview
6
Query
Federated Search
Spell Correction
Query Tagging
Intent Prediction
People Companies
Federated Search
Page Construction
Name Title Skill
Jobs

Overview
7
Query
Federated Search
Spell Correction
Query Tagging
Intent Prediction
People Companies
Federated Search
Page Construction
Name Title Skill
Jobs

Agenda
▪ Introduction
▪ Vertical Ranking
–Job Search [KDD’16]
–People Search by Skills [BigData’15, SIGIR’16]
▪ Federation [CIKM’15]
▪ Lessons
8

Challenges of Job Search
▪ “Hidden” structures
▪ Query only represents a small fraction of information need
–“San Francisco”, “software engineer”, “java”
▪Job attractiveness varies on many aspects
–“Hot” titles: “data scientist”
–Top companies: Google, Facebook, etc.
–Trending skills: machine learning, big data, etc.,
–Location
9

Expertise Homophily
▪ “Classic” homophily in social networks
–People tend to interact with similar ones
▪Expertise homophily in job search
–Searcher tends to apply for jobs with similar expertise
–Apply rate of job results with overlapping skills is 2x higher
▪Expertise:
–Jobs: extract skills from job description
–Searcher: explicit and implicit skills
–Jaccard similarity
11

Entity-faceted CTRs
▪ Job attractiveness
–Historical CTRs for individual jobs
–Challenge: job lifetime is short -> unreliable estimation
▪Entity-faceted historical CTRs
–CTRs of jobs with standardized tile “data scientist”
–CTRs of jobs from company IBM
–CTRs of jobs requiring trending skill: machine learning, big data, etc.
▪Advantages
–Alleviate data sparseness by grouping jobs by facets
–Resolve cold start problem
12

Labeling Strategy
▪ Job Applies, Views and Skips are considered
Uncertain (removed)
Skipped: label = 0
Good: label = 1Click
Applied Highest: label = 4

Learning to Rank
▪ Listwise
– Consider relevance is relative to every query
– Allow optimizing quality metric directly
▪ Objective function
– Normalized Discounted Cumulative Gain (NDCG@K)
– Graded relevance labels
15

Experiment Results
16
▪ Baseline
–All of the existing features except entity-aware ones
–Machine learned
–Optimized for the same objective function
CTR Apply Rate
Improvement +11.3% +5.3%

Overview
17
Query
Federated Search
Spell Correction
Query Tagging
Intent Prediction
People Companies
Federated Search
Page Construction
Name Title Skill
Jobs

Introduction
▪ Skills
– Represent professional
expertise
– 35K+ standardized skills
– Members get endorsed on
skills
▪Skill queries
– Contains skills and no
personal name
18

Introduction
▪ Unique challenges to LinkedIn expertise Search
– Scale: 400M members x 35K standardized skills
– Sparsity of skills in profiles
– Personalization
19
…

Reputation
Information a decision maker uses to make a
judgment on an entity with a record (*)
20
(*) “Building web reputation systems”, Glass and Farmer, 2010

Skill Reputation Scores [Ha-Thuc et al. BigData’15]
21
▪ Decision Maker: searcher
▪ Record: Professional
career
▪ Skill reputation: member
expertise on a skill
▪ Judgment: Hire?

Estimating Skill Reputation
22
Endorse
profile
browsemap
? .85 .45
? ? .35
? .42 ?
? ? .05
Members
Skills
P(expert| member, skill)
Supervised
Learning
algorithm

23
Endorse
profile
browsemap
? .85 .45
? ? .35
? .42 ?
? ? .05
Members
Skills
0.5 1
0.7 0
0 0.6
0.1 0
0.2 0.3 0.5
0.5 0.7 0.2
Members
Skills
Each row is a representation of a
member in latent space
Each column
represents a skill in
latent space
Matrix Factorization

24
Endorse
profile
browsemap
? .85 .45
? ? .35
? .42 ?
.02 ? ?
Members
Skills
0.5 1
0.7 0
0 0.6
0.1 0
0.2 0.3 0.5
0.5 0.7 0.2
Members
Skills
.6 .85 .45
.14 .21 .35
.3 .42 .12
.02 .03 .05
Members
Skills
Fill in unknown cells in
the original matrix

Features
▪ Reputation feature
▪ Social Connection
▪ Homophily
– Geo
– Industry
▪ Textual Features
25

Experiments
CTR@10 # Messages
per Search
Flagship +11% +20%
Premium +18% +37%
26
▪ Query Tagging
▪ Target Segment: skill and no-name
▪ Baseline
– No skill reputation feature
– Hand-tuned

Overview
27
Query
Federated Search
Spell Correction
Query Tagging
Intent Prediction
People Companies
Federated Search
Page Construction
Name Title Skill
Jobs

Personalized Federated Search
28

▪ Why do we need this?
29
Personalized Federated Search - Motivation

Personalized Federated Search - Overall
30

Personalized Federated Model [Arya, Ha-Thuc et al. CIKM’15]
▪ Relevance scores from base rankers
▪ Query intent: P(vertical| query)
▪ Searcher intent
– Mine searcher profiles and past behavior to infer intent
▪ Title recruiter -> recruiting intent
▪ Search for jobs -> job seeking intent
– Machine-learned models predict member intents:
▪ Job seeking
▪ Recruiting
▪ Content consuming
31

Calibrate Signals across Verticals
▪ Verticals associate with different intents
32
People Result
Job Result
Group Result
Recruiting
Intent
Job Seeking
Intent
Content
Consuming
Intent

33
People Result
Job Result
Group Result
Recruiting
Intent
Job Seeking
Intent
Content
Consuming
Intent

34
People Result
Job Result
Group Result
Recruiting
Intent
Job Seeking
Intent
Content
Consuming
Intent

Take-Aways
▪ Text match is still important but not enough
▪ Advanced features based on semi-structured
data
– People search: skill reputation scores
– Job Search: expertise homophily
▪ Personalized Learning-to-Rank is crucial
35

References
▪“Personalized Expertise Search at LinkedIn”, Ha-Thuc,
Venkataraman, Rodriguez, Sinha, Sundaram and Guo,
BigData, 2015
▪“Personalized Federated Search at LinkedIn”, Arya, Ha-
Thuc and Sinha, CIKM, 2015
▪“Learning to Rank Personalized Search Results in
Professional Networks”, Ha-Thuc and Sinha, SIGIR, 2016
▪“How to Get Them a Dream Job?”, Li, Arya, Ha-Thuc,
Sinha, KDD, 2016
36

�ݺ�ߣ

Search Ranking Across Heterogeneous Information Sources

More Related Content

Search Ranking Across Heterogeneous Information Sources