This document outlines a case study that used video interviews and performance data from 400 sales candidates to build a model for sorting candidates. It describes converting unstructured video data into structured feature vectors for modeling. Audio and video signals were processed to extract features related to engagement, motivation, distress, and aggression. Models were fit and evaluated using k-fold validation. The best model achieved an AUC of 0.79, representing improvements in interview evaluation and hiring efficiency. Future work involves automating the feature engineering process.
1 of 19
Download to read offline
More Related Content
#SIOP15 Presentation on
1. Model Driven Candidate Sorting
Based On Video Interview Cues
Benjamin Taylor
Chief Data Scientist
2. Outline
• Introduction
• Case study objective
• Big data landscape
• Problem setup
• Results/Conclusion
• Future work
@bentaylordata
3. Introduction
• Chemical Engineering (BS/MS/PhD Candidate)
• 5 years Intel/Micron
– Photolithography, process control, yield modeling
• AIQ Hedge fund
– 600 GPU chip cluster, algorithmic stock modeling, distributed metaheuristic
algorithms
• HireVue, Chief Data Scientist
– HR analytics, interview modeling
@bentaylordata
4. Case Study Objective
• Given 400 recorded video interviews for sales positions
and post hire performance data can improved sorting
efficiency be demonstrate out-of-sample?
V=400
Input Data Set Target Data Set, n=400
Personal Email Perf
rich.taylor@gmail.com Exceeds
wasatch@aol.com Meets
tradmonkey@mx.com Below
hsommer@gmail.com Meets
@bentaylordata
5. big
data
hadoop
Big data landscape
• Big data platforms have motivated innovations around
unstructured data handling. These innovations have
involved new algorithms and better unstructured
wrangling methods.
@bentaylordata
6. Big data landscape
• Unstructured data
– Data that does not have a predefine data model or schema, i.e.
tool logs, resumes, cover letters, images, audio, video, Twitter,
LinkedIn
• Structured data
– Data that fits within a predefined data model. Most common
structured data formats involve a column/row architecture.
Most familiar examples include spreadsheet software such as
Excel.
@bentaylordata
7. Problem setup
• Unstructured data challenge
– How do we convert the video into a manageable machine
ready format? AKA unstructured > structured data.
0.23,0.15,0.98,0.63,0.45,0.36…
1D Vector representation
Method?
@bentaylordata
9. Problem Setup
• Piecemeal the structuring: final outputs are scalars
Audio
Video
Text
Signal Processing
Personality
Expression Signal Processing
ts
ts
us
us
us
us = unstructured data
ts = time series data
s = scalar data
s
@bentaylordata
13. @bentaylordata
Combining All Features
X
56.341 -200.45 0 1
2 4 60.71 12 52.15 -350.12 1 1
Feature Mapping:
As the features are produced they
are stored in a matrix where each
column represents a feature and
each row represents an interview
2 4 60.71 12 52.15 -350.12 1 0
2 3 16.16 21 25.51 -105.21 0 0
NA
NA
NA
NA
NA
14. How To Build A Model
Model
Best
Fitness?
@bentaylordata
15. A Lesson On K-folding
@bentaylordata
Folds = 9
Cut your data up
into fixed folds
16. A Lesson On K-folding
@bentaylordata
Folds = 9 Fold = 1 Fold = 2… Y_pred
18. Results:
Conclusion:
Using structured features
from audio and video we
are able to show predictive
sorting value in our out-of-
sample interviews.
Model AUC score
Bernoulli NB 0.75
Other 0.79
67.50% reduction in interview evaluation
>300% increase in concentration
@bentaylordata
19. Feature
Engineering
Auto Feature
Engineering
Future Work:
Future work involves offloading the feature engineering tasks to a more automated
Process such as deep learning or more advanced ensemble modeling methods.
My Contact Info:
Twitter: @bentaylordata
Email: btaylor@hirevue.com
LinkedIn: bentaylordata
@bentaylordata
Editor's Notes
#6: Hadoop story:
Why is it called Hadoop?
Google paper?
#7: Hadoop story:
Why is it called Hadoop?
Google paper?
#8: Hadoop story:
Why is it called Hadoop?
Google paper?