This is a presentation I gave at SIOP 2015 in Philadelphia. The presentation shows how you can predict performance from a video interview using unstructured feature extraction and supervised learning. It also discusses k-folding cross validation which is less commonly known with in the IO community, but preferred within the data science community.
1 of 19
Downloaded 13 times
More Related Content
#SIOP15 Presentation On Performance Sorting Using Video Interviews
1. Model
Driven
Candidate
Sor0ng
Based
On
Video
Interview
Cues
Benjamin
Taylor
Chief
Data
Scien-st
2. Outline
≒ Introduc)on
≒ Case
study
objec)ve
≒ Big
data
landscape
≒ Problem
setup
≒ Results/Conclusion
≒ Future
work
@bentaylordata
3. Introduc0on
≒ Chemical
Engineering
(BS/MS/PhD
Candidate)
≒ 5
years
Intel/Micron
Photolithography,
process
control,
yield
modeling
≒ AIQ
Hedge
fund
600
GPU
chip
cluster,
algorithmic
stock
modeling,
distributed
metaheuris)c
algorithms
≒ HireVue,
Chief
Data
Scien0st
HR
analy)cs,
interview
modeling
@bentaylordata
4. Case
Study
Objec0ve
≒ Given
400
recorded
video
interviews
for
sales
posi)ons
and
post
hire
performance
data
can
improved
sor)ng
e鍖ciency
be
demonstrate
out-足of-足sample?
V=400
Input
Data
Set
Target
Data
Set,
n=400
Personal
Email
Perf
rich.taylor@gmail.com
Exceeds
wasatch@aol.com
Meets
tradmonkey@mx.com
Below
hsommer@gmail.com
Meets
@bentaylordata
5. bigdata
hadoop
Big
data
landscape
≒ Big
data
plaVorms
have
mo)vated
innova)ons
around
unstructured
data
handling.
These
innova)ons
have
involved
new
algorithms
and
beWer
unstructured
wrangling
methods.
@bentaylordata
6. Big
data
landscape
≒ Unstructured
data
Data
that
does
not
have
a
prede鍖ne
data
model
or
schema,
i.e.
tool
logs,
resumes,
cover
le8ers,
images,
audio,
video,
Twi8er,
LinkedIn
≒ Structured
data
Data
that
鍖ts
within
a
prede鍖ned
data
model.
Most
common
structured
data
formats
involve
a
column/row
architecture.
Most
familiar
examples
include
spreadsheet
soYware
such
as
Excel.
@bentaylordata
7. Problem
setup
≒ Unstructured
data
challenge
How
do
we
convert
the
video
into
a
manageable
machine
ready
format?
AKA
unstructured
>
structured
data.
0.23,0.15,0.98,0.63,0.45,0.36
1D
Vector
representa.on
Method?
@bentaylordata
8. F 3.95 Data Scientist Yale Sky diving
M 2.93 HR Analyst SLCC Poetry
F 3.41 Data Munger Harvard Cycling
1 3.95 5 310 56
0 2.93 7 520 91
1 3.41 6 240 56
Name: Sally Taylor
GPA: 3.95
Previous Job: Data Scientist
School: Yale
Hobbies: Sky diving
UNSTRUCTURED
STRUCTURED
TOKENIZED
Problem
Setup
≒ What
is
done
for
text
modeling?
@bentaylordata
9. Problem
Setup
≒ Piecemeal
the
structuring:
鍖nal
outputs
are
scalars
Audio
Video
Text
Signal
Processing
Personality
Expression
Signal
Processing
ts
ts
us
us
us
us
=
unstructured
data
ts
=
-me
series
data
s
=
scalar
data
s
@bentaylordata
12. Feature
Gen
Video
Indicators
@bentaylordata
Signal
Processing
F989
F990
F991
scalar
13. @bentaylordata
Combining
All
Features
X
56.341
-足200.45
0
1
2
4
60.71
12
52.15
-足350.12
1
1
Feature
Mapping:
As
the
features
are
produced
they
are
stored
in
a
matrix
where
each
column
represents
a
feature
and
each
row
represents
an
interview
2
4
60.71
12
52.15
-足350.12
1
0
2
3
16.16
21
25.51
-足105.21
0
0
NA
NA
NA
NA
NA
14. How
To
Build
A
Model
Model
Best
Fitness?
@bentaylordata
15. A
Lesson
On
K-足folding
@bentaylordata
Folds
=
9
Cut
your
data
up
into
鍖xed
folds
16. A
Lesson
On
K-足folding
@bentaylordata
Folds
=
9
Fold
=
1
Fold
=
2
Y_pred
18. Results:
Conclusion:
Using
structured
features
from
audio
and
video
we
are
able
to
show
predic)ve
sor)ng
value
in
our
out-足of-足
sample
interviews.
Model
AUC
score
Bernoulli
NB
0.75
Other
0.79
67.50%
reduc)on
in
interview
evalua)on
>300%
increase
in
concentra)on
@bentaylordata
19. Feature
Engineering
Auto
Feature
Engineering
Future
Work:
Future
work
involves
o鍖oading
the
feature
engineering
tasks
to
a
more
automated
Process
such
as
deep
learning
or
more
advanced
ensemble
modeling
methods.
My
Contact
Info:
Twi^er:
@bentaylordata
Email:
btaylor@hirevue.com
LinkedIn:
bentaylordata
@bentaylordata