際際滷

際際滷Share a Scribd company logo
Model	
 Driven	
 Candidate	
 Sor0ng	
 	
 
Based	
 On	
 Video	
 Interview	
 Cues	
 
	
 	
 	
 
Benjamin	
 Taylor	
 
Chief	
 Data	
 Scien-st
Outline	
 
≒ Introduc)on	
 
≒ Case	
 study	
 objec)ve	
 
≒ Big	
 data	
 landscape	
 	
 
≒ Problem	
 setup	
 
≒ Results/Conclusion	
 
≒ Future	
 work	
 
@bentaylordata
Introduc0on	
 
≒ Chemical	
 Engineering	
 (BS/MS/PhD	
 Candidate)	
 
≒ 5	
 years	
 Intel/Micron	
 
 Photolithography,	
 process	
 control,	
 yield	
 modeling	
 
≒ AIQ	
 Hedge	
 fund	
 
 600	
 GPU	
 chip	
 cluster,	
 algorithmic	
 stock	
 modeling,	
 	
 
 distributed	
 metaheuris)c	
 algorithms	
 
≒ HireVue,	
 Chief	
 Data	
 Scien0st	
 
 HR	
 analy)cs,	
 interview	
 modeling	
 
	
 
@bentaylordata
Case	
 Study	
 Objec0ve	
 
≒ Given	
 400	
 recorded	
 video	
 interviews	
 for	
 sales	
 posi)ons	
 
and	
 post	
 hire	
 performance	
 data	
 can	
 improved	
 sor)ng	
 
e鍖ciency	
 be	
 demonstrate	
 out-足of-足sample?	
 	
 
V=400	
 
Input	
 Data	
 Set	
  Target	
 Data	
 Set,	
 n=400	
 
Personal	
 Email	
  Perf	
 
rich.taylor@gmail.com	
  Exceeds	
 
wasatch@aol.com	
  Meets	
 
tradmonkey@mx.com	
  Below	
 
hsommer@gmail.com	
  Meets	
 
@bentaylordata
bigdata
hadoop
Big	
 data	
 landscape	
 
≒ Big	
 data	
 plaVorms	
 have	
 mo)vated	
 innova)ons	
 around	
 
unstructured	
 data	
 handling.	
 These	
 innova)ons	
 have	
 
involved	
 new	
 algorithms	
 and	
 beWer	
 unstructured	
 
wrangling	
 methods.	
 	
 
@bentaylordata
Big	
 data	
 landscape	
 
≒ Unstructured	
 data	
 
 Data	
 that	
 does	
 not	
 have	
 a	
 prede鍖ne	
 data	
 model	
 or	
 schema,	
 
i.e.	
 tool	
 logs,	
 resumes,	
 cover	
 le8ers,	
 images,	
 audio,	
 video,	
 
Twi8er,	
 LinkedIn	
 
≒ Structured	
 data	
 
 Data	
 that	
 鍖ts	
 within	
 a	
 prede鍖ned	
 data	
 model.	
 Most	
 common	
 
structured	
 data	
 formats	
 involve	
 a	
 column/row	
 architecture.	
 
Most	
 familiar	
 examples	
 include	
 spreadsheet	
 soYware	
 such	
 as	
 
Excel.	
 
@bentaylordata
Problem	
 setup	
 
≒ Unstructured	
 data	
 challenge	
 
 How	
 do	
 we	
 convert	
 the	
 video	
 into	
 a	
 manageable	
 machine	
 
ready	
 format?	
 AKA	
 unstructured	
 >	
 structured	
 data.	
 	
 
0.23,0.15,0.98,0.63,0.45,0.36	
 
1D	
 Vector	
 representa.on	
 
Method?	
 
@bentaylordata
F 3.95 Data Scientist Yale Sky diving
M 2.93 HR Analyst SLCC Poetry
F 3.41 Data Munger Harvard Cycling
1 3.95 5 310 56
0 2.93 7 520 91
1 3.41 6 240 56
Name: Sally Taylor
GPA: 3.95
Previous Job: Data Scientist
School: Yale
Hobbies: Sky diving
UNSTRUCTURED
STRUCTURED
TOKENIZED
Problem	
 Setup	
 
≒ What	
 is	
 done	
 for	
 text	
 modeling?	
 
@bentaylordata
Problem	
 Setup	
 
≒ Piecemeal	
 the	
 structuring:	
 鍖nal	
 outputs	
 are	
 scalars	
 
Audio	
 
Video	
 
Text	
 
Signal	
 Processing	
 
Personality	
 
Expression	
  Signal	
 Processing	
 
ts	
 
ts	
 
us	
 
us	
 
us	
 
us	
 =	
 unstructured	
 data	
 
ts	
 =	
 -me	
 series	
 data	
 
s	
 =	
 scalar	
 data	
 
s	
 
@bentaylordata
Feature	
 
Gen	
 
Raw	
 Audio	
 Indicators	
 
@bentaylordata
≒ Engagement	
 
≒ Mo)va)on	
 
≒ Distress	
 
≒ Aggression	
 
Model	
 
Personality	
 Models	
 
@bentaylordata
Feature	
 
Gen	
 
Video	
 Indicators	
 
@bentaylordata	
 
Signal	
 
Processing	
 
F989	
  F990	
  F991	
 
scalar
@bentaylordata	
 
Combining	
 All	
 Features	
 
X	
 
56.341	
 	
 -足200.45	
 	
 0	
 	
 1	
 	
 
2	
 4	
 60.71	
 12	
 	
 52.15	
 	
 -足350.12	
 	
 1	
 	
 1	
 	
 
Feature	
 Mapping:	
 
As	
 the	
 features	
 are	
 produced	
 they	
 
are	
 stored	
 in	
 a	
 matrix	
 where	
 each	
 
column	
 represents	
 a	
 feature	
 and	
 
each	
 row	
 represents	
 an	
 interview	
 
2	
 4	
 60.71	
 12	
 	
 52.15	
 	
 -足350.12	
 	
 1	
 	
 0	
 	
 
2	
 3	
 16.16	
 21	
 	
 25.51	
 	
 -足105.21	
 	
 0	
 	
 0	
 	
 
NA	
 
NA	
 
NA	
 
NA	
 
NA
How	
 To	
 Build	
 A	
 Model	
 
Model	
 
Best	
 	
 
Fitness?	
 
	
 
@bentaylordata
A	
 Lesson	
 On	
 K-足folding	
 
@bentaylordata	
 
Folds	
 =	
 9	
 
Cut	
 your	
 data	
 up	
 
into	
 鍖xed	
 folds
A	
 Lesson	
 On	
 K-足folding	
 
@bentaylordata	
 
Folds	
 =	
 9	
  Fold	
 =	
 1	
  Fold	
 =	
 2	
  Y_pred
Fitness	
 Metric?	
 
Top	
 Performer	
 Accuracy	
  AUC	
 
@bentaylordata
Results:	
 
Conclusion:	
 
Using	
 structured	
 features	
 
from	
 audio	
 and	
 video	
 we	
 
are	
 able	
 to	
 show	
 predic)ve	
 
sor)ng	
 value	
 in	
 our	
 out-足of-足
sample	
 interviews.	
 
	
 
	
 	
 
Model	
  AUC	
 score	
 
Bernoulli	
 NB	
  0.75	
 
Other	
  0.79	
 
67.50%	
 reduc)on	
 in	
 interview	
 evalua)on	
 
>300%	
 increase	
 in	
 concentra)on	
 
@bentaylordata
Feature	
 
Engineering	
 
Auto	
 Feature	
 	
 
Engineering	
 
Future	
 Work:	
 
Future	
 work	
 involves	
 o鍖oading	
 the	
 feature	
 engineering	
 tasks	
 to	
 a	
 more	
 automated	
 
Process	
 such	
 as	
 deep	
 learning	
 or	
 more	
 advanced	
 ensemble	
 modeling	
 methods.	
 
My	
 Contact	
 Info:	
 
	
 Twi^er:	
 @bentaylordata	
 
	
 Email:	
 btaylor@hirevue.com	
 
	
 LinkedIn:	
 	
 bentaylordata	
 
	
 
@bentaylordata

More Related Content

#SIOP15 Presentation On Performance Sorting Using Video Interviews

  • 1. Model Driven Candidate Sor0ng Based On Video Interview Cues Benjamin Taylor Chief Data Scien-st
  • 2. Outline ≒ Introduc)on ≒ Case study objec)ve ≒ Big data landscape ≒ Problem setup ≒ Results/Conclusion ≒ Future work @bentaylordata
  • 3. Introduc0on ≒ Chemical Engineering (BS/MS/PhD Candidate) ≒ 5 years Intel/Micron Photolithography, process control, yield modeling ≒ AIQ Hedge fund 600 GPU chip cluster, algorithmic stock modeling, distributed metaheuris)c algorithms ≒ HireVue, Chief Data Scien0st HR analy)cs, interview modeling @bentaylordata
  • 4. Case Study Objec0ve ≒ Given 400 recorded video interviews for sales posi)ons and post hire performance data can improved sor)ng e鍖ciency be demonstrate out-足of-足sample? V=400 Input Data Set Target Data Set, n=400 Personal Email Perf rich.taylor@gmail.com Exceeds wasatch@aol.com Meets tradmonkey@mx.com Below hsommer@gmail.com Meets @bentaylordata
  • 5. bigdata hadoop Big data landscape ≒ Big data plaVorms have mo)vated innova)ons around unstructured data handling. These innova)ons have involved new algorithms and beWer unstructured wrangling methods. @bentaylordata
  • 6. Big data landscape ≒ Unstructured data Data that does not have a prede鍖ne data model or schema, i.e. tool logs, resumes, cover le8ers, images, audio, video, Twi8er, LinkedIn ≒ Structured data Data that 鍖ts within a prede鍖ned data model. Most common structured data formats involve a column/row architecture. Most familiar examples include spreadsheet soYware such as Excel. @bentaylordata
  • 7. Problem setup ≒ Unstructured data challenge How do we convert the video into a manageable machine ready format? AKA unstructured > structured data. 0.23,0.15,0.98,0.63,0.45,0.36 1D Vector representa.on Method? @bentaylordata
  • 8. F 3.95 Data Scientist Yale Sky diving M 2.93 HR Analyst SLCC Poetry F 3.41 Data Munger Harvard Cycling 1 3.95 5 310 56 0 2.93 7 520 91 1 3.41 6 240 56 Name: Sally Taylor GPA: 3.95 Previous Job: Data Scientist School: Yale Hobbies: Sky diving UNSTRUCTURED STRUCTURED TOKENIZED Problem Setup ≒ What is done for text modeling? @bentaylordata
  • 9. Problem Setup ≒ Piecemeal the structuring: 鍖nal outputs are scalars Audio Video Text Signal Processing Personality Expression Signal Processing ts ts us us us us = unstructured data ts = -me series data s = scalar data s @bentaylordata
  • 10. Feature Gen Raw Audio Indicators @bentaylordata
  • 11. ≒ Engagement ≒ Mo)va)on ≒ Distress ≒ Aggression Model Personality Models @bentaylordata
  • 12. Feature Gen Video Indicators @bentaylordata Signal Processing F989 F990 F991 scalar
  • 13. @bentaylordata Combining All Features X 56.341 -足200.45 0 1 2 4 60.71 12 52.15 -足350.12 1 1 Feature Mapping: As the features are produced they are stored in a matrix where each column represents a feature and each row represents an interview 2 4 60.71 12 52.15 -足350.12 1 0 2 3 16.16 21 25.51 -足105.21 0 0 NA NA NA NA NA
  • 14. How To Build A Model Model Best Fitness? @bentaylordata
  • 15. A Lesson On K-足folding @bentaylordata Folds = 9 Cut your data up into 鍖xed folds
  • 16. A Lesson On K-足folding @bentaylordata Folds = 9 Fold = 1 Fold = 2 Y_pred
  • 17. Fitness Metric? Top Performer Accuracy AUC @bentaylordata
  • 18. Results: Conclusion: Using structured features from audio and video we are able to show predic)ve sor)ng value in our out-足of-足 sample interviews. Model AUC score Bernoulli NB 0.75 Other 0.79 67.50% reduc)on in interview evalua)on >300% increase in concentra)on @bentaylordata
  • 19. Feature Engineering Auto Feature Engineering Future Work: Future work involves o鍖oading the feature engineering tasks to a more automated Process such as deep learning or more advanced ensemble modeling methods. My Contact Info: Twi^er: @bentaylordata Email: btaylor@hirevue.com LinkedIn: bentaylordata @bentaylordata