ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Model Driven Candidate Sorting
Based On Video Interview Cues
Benjamin Taylor
Chief Data Scientist
Outline
• Introduction
• Case study objective
• Big data landscape
• Problem setup
• Results/Conclusion
• Future work
@bentaylordata
Introduction
• Chemical Engineering (BS/MS/PhD Candidate)
• 5 years Intel/Micron
– Photolithography, process control, yield modeling
• AIQ Hedge fund
– 600 GPU chip cluster, algorithmic stock modeling, distributed metaheuristic
algorithms
• HireVue, Chief Data Scientist
– HR analytics, interview modeling
@bentaylordata
Case Study Objective
• Given 400 recorded video interviews for sales positions
and post hire performance data can improved sorting
efficiency be demonstrate out-of-sample?
V=400
Input Data Set Target Data Set, n=400
Personal Email Perf
rich.taylor@gmail.com Exceeds
wasatch@aol.com Meets
tradmonkey@mx.com Below
hsommer@gmail.com Meets
@bentaylordata
big
data
hadoop
Big data landscape
• Big data platforms have motivated innovations around
unstructured data handling. These innovations have
involved new algorithms and better unstructured
wrangling methods.
@bentaylordata
Big data landscape
• Unstructured data
– Data that does not have a predefine data model or schema, i.e.
tool logs, resumes, cover letters, images, audio, video, Twitter,
LinkedIn
• Structured data
– Data that fits within a predefined data model. Most common
structured data formats involve a column/row architecture.
Most familiar examples include spreadsheet software such as
Excel.
@bentaylordata
Problem setup
• Unstructured data challenge
– How do we convert the video into a manageable machine
ready format? AKA unstructured > structured data.
0.23,0.15,0.98,0.63,0.45,0.36…
1D Vector representation
Method?
@bentaylordata
UNSTRUCTURED
STRUCTURED
TOKENIZED
Problem Setup
• What is done for text modeling?
@bentaylordata
Problem Setup
• Piecemeal the structuring: final outputs are scalars
Audio
Video
Text
Signal Processing
Personality
Expression Signal Processing
ts
ts
us
us
us
us = unstructured data
ts = time series data
s = scalar data
s
@bentaylordata
Feature
Gen
Raw Audio Indicators
@bentaylordata
• Engagement
• Motivation
• Distress
• Aggression
Model
Personality Models
@bentaylordata
Feature
Gen
Video Indicators
@bentaylordata
Signal
Processing
F989 F990 F991
scalar
@bentaylordata
Combining All Features
X
56.341 -200.45 0 1
2 4 60.71 12 52.15 -350.12 1 1
Feature Mapping:
As the features are produced they
are stored in a matrix where each
column represents a feature and
each row represents an interview
2 4 60.71 12 52.15 -350.12 1 0
2 3 16.16 21 25.51 -105.21 0 0
NA
NA
NA
NA
NA
How To Build A Model
Model
Best
Fitness?
@bentaylordata
A Lesson On K-folding
@bentaylordata
Folds = 9
Cut your data up
into fixed folds
A Lesson On K-folding
@bentaylordata
Folds = 9 Fold = 1 Fold = 2… Y_pred
Fitness Metric?
Top Performer Accuracy AUC
@bentaylordata
Results:
Conclusion:
Using structured features
from audio and video we
are able to show predictive
sorting value in our out-of-
sample interviews.
Model AUC score
Bernoulli NB 0.75
Other 0.79
67.50% reduction in interview evaluation
>300% increase in concentration
@bentaylordata
Feature
Engineering
Auto Feature
Engineering
Future Work:
Future work involves offloading the feature engineering tasks to a more automated
Process such as deep learning or more advanced ensemble modeling methods.
My Contact Info:
Twitter: @bentaylordata
Email: btaylor@hirevue.com
LinkedIn: bentaylordata
@bentaylordata

More Related Content

#SIOP15 Presentation on

  • 1. Model Driven Candidate Sorting Based On Video Interview Cues Benjamin Taylor Chief Data Scientist
  • 2. Outline • Introduction • Case study objective • Big data landscape • Problem setup • Results/Conclusion • Future work @bentaylordata
  • 3. Introduction • Chemical Engineering (BS/MS/PhD Candidate) • 5 years Intel/Micron – Photolithography, process control, yield modeling • AIQ Hedge fund – 600 GPU chip cluster, algorithmic stock modeling, distributed metaheuristic algorithms • HireVue, Chief Data Scientist – HR analytics, interview modeling @bentaylordata
  • 4. Case Study Objective • Given 400 recorded video interviews for sales positions and post hire performance data can improved sorting efficiency be demonstrate out-of-sample? V=400 Input Data Set Target Data Set, n=400 Personal Email Perf rich.taylor@gmail.com Exceeds wasatch@aol.com Meets tradmonkey@mx.com Below hsommer@gmail.com Meets @bentaylordata
  • 5. big data hadoop Big data landscape • Big data platforms have motivated innovations around unstructured data handling. These innovations have involved new algorithms and better unstructured wrangling methods. @bentaylordata
  • 6. Big data landscape • Unstructured data – Data that does not have a predefine data model or schema, i.e. tool logs, resumes, cover letters, images, audio, video, Twitter, LinkedIn • Structured data – Data that fits within a predefined data model. Most common structured data formats involve a column/row architecture. Most familiar examples include spreadsheet software such as Excel. @bentaylordata
  • 7. Problem setup • Unstructured data challenge – How do we convert the video into a manageable machine ready format? AKA unstructured > structured data. 0.23,0.15,0.98,0.63,0.45,0.36… 1D Vector representation Method? @bentaylordata
  • 8. UNSTRUCTURED STRUCTURED TOKENIZED Problem Setup • What is done for text modeling? @bentaylordata
  • 9. Problem Setup • Piecemeal the structuring: final outputs are scalars Audio Video Text Signal Processing Personality Expression Signal Processing ts ts us us us us = unstructured data ts = time series data s = scalar data s @bentaylordata
  • 11. • Engagement • Motivation • Distress • Aggression Model Personality Models @bentaylordata
  • 13. @bentaylordata Combining All Features X 56.341 -200.45 0 1 2 4 60.71 12 52.15 -350.12 1 1 Feature Mapping: As the features are produced they are stored in a matrix where each column represents a feature and each row represents an interview 2 4 60.71 12 52.15 -350.12 1 0 2 3 16.16 21 25.51 -105.21 0 0 NA NA NA NA NA
  • 14. How To Build A Model Model Best Fitness? @bentaylordata
  • 15. A Lesson On K-folding @bentaylordata Folds = 9 Cut your data up into fixed folds
  • 16. A Lesson On K-folding @bentaylordata Folds = 9 Fold = 1 Fold = 2… Y_pred
  • 17. Fitness Metric? Top Performer Accuracy AUC @bentaylordata
  • 18. Results: Conclusion: Using structured features from audio and video we are able to show predictive sorting value in our out-of- sample interviews. Model AUC score Bernoulli NB 0.75 Other 0.79 67.50% reduction in interview evaluation >300% increase in concentration @bentaylordata
  • 19. Feature Engineering Auto Feature Engineering Future Work: Future work involves offloading the feature engineering tasks to a more automated Process such as deep learning or more advanced ensemble modeling methods. My Contact Info: Twitter: @bentaylordata Email: btaylor@hirevue.com LinkedIn: bentaylordata @bentaylordata

Editor's Notes

  • #6: Hadoop story: Why is it called Hadoop? Google paper?
  • #7: Hadoop story: Why is it called Hadoop? Google paper?
  • #8: Hadoop story: Why is it called Hadoop? Google paper?
  • #9: <expand… categorical > tokenizing [assume dependent or independent] Discuss <> Gender: [Name modification >> ]
  • #10: <expand… categorical > tokenizing [assume dependent or independent] Discuss <> Gender: [Name modification >> ]