You are on page 1of 18

!

!
!
!
!
!
!
!

HR Analytics: Why are our best and most


experienced employees leaving prematurely?

Erik Bebernes
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
Introduction!
!
This!project!uses!a!dataset!I!found!on!kaggle,!where!a!company!has!been!experiencing!difficulty!
retaining!their!best!and!most!experienced!employees.!The!data!frame!consists!of!15,000!
observations!of!10!variables,!which!are:!
!
names(hr)!
![1]!"satisfaction_level"!!!!"last_evaluation"!!!!!!!"number_project"!!!!!!!!
![4]!"average_montly_hours"!!"time_spend_company"!!!!"Work_accident"!!!!!!!!!
![7]!"left"!!!!!!!!!!!!!!!!!!"promotion_last_5years"!"sales"!!!!!!!!!!!!!!!!!
[10]!"salary"!!
!
Satisfaction!Level!!employees!overall!job!satisfaction!level!based!on!a!survey!
Last!Evaluation!!employees!performance!score!given!by!their!manager!
Number!of!projects!!how!many!projects!an!employee!has!been!involved!in!
Average!monthly!hoursQ!mean!hours!worked!by!employee!per!month!
Time!spend!company!!years!employee!has!worked!for!the!company!
Work!accident!!binary!variable!indicating!if!1,!the!employee!has!had!an!accident!in!the!
workplace!
LeftQ!indicated!if!1,!the!employee!has!left!or!0,!the!employee!is!still!at!the!company!
Promotion!last!5!years!!binary!variable!signaling!if!the!employee!has!been!promoted!
SalesQ!categorical!variable!on!job!type!
SalaryQ!categorical!variable!(low,!medium,!high)!of!how!much!the!employee!is!paid!annually!!
!
My!approach!to!this!project!can!be!summarized!in!the!following!steps:!
1.)! Clean!and!structure!the!data!set,!including!imputing!missing!values!if!necessary!
2.)! Create!subsets!between!the!best!employees!that!left!and!stayed!
3.)! Create!discrete!factor!variables!and!perform!association!rules!analysis!
4.)! Classify!employees!through!decision!tree!analysis!
5.)! Find!any!significant!correlations,!and!differences!in!correlations!between!said!subsets.!
6.)! Exploratory!visualization!analysis!in!an!attempt!to!explain!any!discrepancies!in!
correlations.!
7.)! Run!a!random!forest!algorithm!to!confirm!significant!relationships!between!the!
variables,!as!well!as!a!logistic!regression!
8.)! Provide!conclusions!and!recommendations!for!management!
!
!
HR_comma_sep!<Q!read.csv("~/Downloads/HR_comma_sep.csv",!header=TRUE)!
View(HR_comma_sep)!
hr<QHR_comma_sep!
!
!
!
!
!
Cleaning!and!structuring!the!dataset!
!
At!first!glance!the!dataset!seems!clean,!but!to!make!sure!Im!going!to!use!the!amelia!package!
to!identify!any!missingness.!
!
library(Amelia)!
missmap(hr)!
!
!

!
!
This!shows!that!there!is!no!missing!data.!
>!str(hr)!
'data.frame':! 14999!obs.!of!!10!variables:!
!$!satisfaction_level!!!:!num!!0.38!0.8!0.11!0.72!0.37!0.41!0.1!0.92!0.89!0.42!...!
!$!last_evaluation!!!!!!:!num!!0.53!0.86!0.88!0.87!0.52!0.5!0.77!0.85!1!0.53!...!
!$!number_project!!!!!!!:!int!!2!5!7!5!2!2!6!5!5!2!...!
!$!average_montly_hours!:!int!!157!262!272!223!159!153!247!259!224!142!...!
!$!time_spend_company!!!:!int!!3!6!4!5!3!3!4!5!5!3!...!
!$!Work_accident!!!!!!!!:!int!!0!0!0!0!0!0!0!0!0!0!...!
!$!left!!!!!!!!!!!!!!!!!:!int!!1!1!1!1!1!1!1!1!1!1!...!
!$!promotion_last_5years:!int!!0!0!0!0!0!0!0!0!0!0!...!
!$!sales!!!!!!!!!!!!!!!!:!Factor!w/!10!levels!"accounting","hr",..:!8!8!8!8!8!8!8!8!8!8!...!
!$!salary!!!!!!!!!!!!!!!:!Factor!w/!3!levels!"high","low","medium":!2!3!3!2!2!2!2!2!2!2!...!
!
Subsets!
!
hrbestleft<Qhr[which(hr$Last_eval>.72!&!hr$Left!==!1),]!
#employees!with!high!evaluations!and!who!left!the!company!
!
hrbeststay<Qhr[which(hr$Last_eval>.72!&!hr$Left!==!'0'),]!
#employees!with!high!evaluations!that!left!the!company!
!
Creating!Discrete!Variables!and!Association!Rules!Analysis!
!
quantile(hr$average_montly_hours,!.33)!
quantile(hr$average_montly_hours,!.67)!
hr$Hours_Discrete[hr$average_montly_hours!<=!69]<Q!'low'!
hr$Hours_Discrete[hr$average_montly_hours!>69!!&!hr$average_montly_hours!<!134]<Q!
'average'!
hr$Hours_Discrete[hr$average_montly_hours!>=134]<Q!'high'!
!
quantile(hr$satisfaction_level,!.33)!
quantile(hr$satisfaction_level,!.67)!
quantile(hr$satisfaction_level,!.8)!
!
hr$Sat_Discrete[hr$satisfaction_level!<=!43]<Q!'low'!
hr$Sat_Discrete[hr$satisfaction_level!>43!!&!hr$satisfaction_level!<!68]<Q!'average'!
hr$Sat_Discrete[hr$satisfaction_level!>=68]<Q!'high'!
!
library(arules)!
hr$Work_accident<Qas.factor(hr$Work_accident)!
hr$left<Qas.factor(hr$left)!
hr$promotion_last_5years<Qas.factor(hr$promotion_last_5years)!
hr$Hours_Discrete<Qas.factor(hr$Hours_Discrete)!
hr$Sat_Discrete<Qas.factor(hr$Sat_Discrete)!
names(hr)!
hrassoc<Qhr[,c(6,7,8,9,10,11,12)]!
rules<Qapriori(hrassoc,!parameter!=!list(support!=!.2,!confidence!=!.7))!
!
#since!the!majority!of!employees!haven't!left,!it!will!be!a!good!idea!to!reduce!support!and!
increase!confidence!
!
rules<Qapriori(hrassoc,!parameter!=!list(support!=!.05,!confidence!=!.95))!
!
#still!not!getting!any!interesting!rules,!so!I'll!make!a!new!dataset!with!only!left!=1!
!
hrleft<Qhr[which(hrassoc$left==1),]!
hrleft<Qhrleft[,c(6:12)]!
rules<Qapriori(hrleft,!parameter!=!list(support!=!.3,!confidence!=!1))!
inspect(rules)!
!
!!!lhs!!!!!!!!!!!!!!!!!!!!!!!!!!rhs!!!!!!!!!!!!!!!!!!support!confidence!lift!
[1]!!{}!!!!!!!!!!!!!!!!!!!!!!!!=>!{left=1}!!!!!!!!!!!1.0000000!!!!!!!!!!1!!!!1!
[2]!!{}!!!!!!!!!!!!!!!!!!!!!!!!=>!{Sat_Discrete=low}!1.0000000!!!!!!!!!!1!!!!1!
[3]!!{salary=medium}!!!!!!!!!!!=>!{left=1}!!!!!!!!!!!0.3688043!!!!!!!!!!1!!!!1!
[4]!!{salary=medium}!!!!!!!!!!!=>!{Sat_Discrete=low}!0.3688043!!!!!!!!!!1!!!!1!
[5]!!{salary=low}!!!!!!!!!!!!!!=>!{left=1}!!!!!!!!!!!0.6082330!!!!!!!!!!1!!!!1!
[6]!!{salary=low}!!!!!!!!!!!!!!=>!{Sat_Discrete=low}!0.6082330!!!!!!!!!!1!!!!1!
[7]!!{Hours_Discrete=high}!!!!!=>!{left=1}!!!!!!!!!!!0.9106693!!!!!!!!!!1!!!!1!
[8]!!{Hours_Discrete=high}!!!!!=>!{Sat_Discrete=low}!0.9106693!!!!!!!!!!1!!!!1!
[9]!!{Work_accident=0}!!!!!!!!!=>!{left=1}!!!!!!!!!!!0.9526743!!!!!!!!!!1!!!!1!
[10]!{Work_accident=0}!!!!!!!!!=>!{Sat_Discrete=low}!0.9526743!!!!!!!!!!1!!!!1!
[11]!{promotion_last_5years=0}!=>!{left=1}!!!!!!!!!!!0.9946794!!!!!!!!!!1!!!!1!
[12]!{promotion_last_5years=0}!=>!{Sat_Discrete=low}!0.9946794!!!!!!!!!!1!!!!1!
[13]!{left=1}!!!!!!!!!!!!!!!!!!=>!{Sat_Discrete=low}!1.0000000!!!!!!!!!!1!!!!1!
[14]!{Sat_Discrete=low}!!!!!!!!=>!{left=1}!!!!!!!!!!!1.0000000!!!!!!!!!!1!!!!1!
[15]!{salary=medium,!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!Hours_Discrete=high}!!!!!=>!{left=1}!!!!!!!!!!!0.3385606!!!!!!!!!!1!!!!1!
[16]!{salary=medium,!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!Hours_Discrete=high}!!!!!=>!{Sat_Discrete=low}!0.3385606!!!!!!!!!!1!!!!1!
[17]!{Work_accident=0,!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!salary=medium}!!!!!!!!!!!=>!{left=1}!!!!!!!!!!!0.3480818!!!!!!!!!!1!!!!1!
[18]!{Work_accident=0,!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!salary=medium}!!!!!!!!!!!=>!{Sat_Discrete=low}!0.3480818!!!!!!!!!!1!!!!1!
[19]!{promotion_last_5years=0,!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!salary=medium}!!!!!!!!!!!=>!{left=1}!!!!!!!!!!!0.3674041!!!!!!!!!!1!!!!1!
[20]!{promotion_last_5years=0,!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!salary=medium}!!!!!!!!!!!=>!{Sat_Discrete=low}!0.3674041!!!!!!!!!!1!!!!1!
[21]!{left=1,!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!salary=medium}!!!!!!!!!!!=>!{Sat_Discrete=low}!0.3688043!!!!!!!!!!1!!!!1!
[22]!{salary=medium,!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!Sat_Discrete=low}!!!!!!!!=>!{left=1}!!!!!!!!!!!0.3688043!!!!!!!!!!1!!!!1!
[23]!{salary=low,!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!Hours_Discrete=high}!!!!!=>!{left=1}!!!!!!!!!!!0.5527863!!!!!!!!!!1!!!!1!
[24]!{salary=low,!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!Hours_Discrete=high}!!!!!=>!{Sat_Discrete=low}!0.5527863!!!!!!!!!!1!!!!1!
[25]!{Work_accident=0,!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!salary=low}!!!!!!!!!!!!!!=>!{left=1}!!!!!!!!!!!0.5816298!!!!!!!!!!1!!!!1!
[26]!{Work_accident=0,!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!salary=low}!!!!!!!!!!!!!!=>!{Sat_Discrete=low}!0.5816298!!!!!!!!!!1!!!!1!
[27]!{promotion_last_5years=0,!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!salary=low}!!!!!!!!!!!!!!=>!{left=1}!!!!!!!!!!!0.6043125!!!!!!!!!!1!!!!1!
[28]!{promotion_last_5years=0,!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!salary=low}!!!!!!!!!!!!!!=>!{Sat_Discrete=low}!0.6043125!!!!!!!!!!1!!!!1!
[29]!{left=1,!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!salary=low}!!!!!!!!!!!!!!=>!{Sat_Discrete=low}!0.6082330!!!!!!!!!!1!!!!1!
[30]!{salary=low,!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!Sat_Discrete=low}!!!!!!!!=>!{left=1}!!!!!!!!!!!0.6082330!!!!!!!!!!1!!!!1!
!
Most!Interesting!rules:!
1.)!of!the!people!who!left,!99%!never!received!a!promotion!
2.)!95%!never!had!an!accident!
3.)!60%!were!low!salary!
4.)!100%!had!low!job!satisfaction!
!
These!rules!signify!a!few!important!relationships!between!the!variables!that!may!explain!why!
some!employees!are!leaving.!Of!the!employees!who!left,!99%!never!had!an!accident,!60%!were!
low!salary!and!an!astonishing!100%!had!low!job!satisfaction.!This!must!mean!satisfaction!is!
significant!in!determining!leaving!vs.!staying.!Next!Im!going!to!look!at!correlations!between!
satisfaction!and!the!numeric!variables.!
!
Correlation!Analysis!
!
Using!all!employees!in!the!dataset:!
!
cor(hr[,1:5])!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!satisfaction_level!last_evaluation!number_project!average_montly_hours!
satisfaction_level!!!!!!!!!!!!!!!!!1.00000000!!!!!!!!!!!!0.1050212!!!!!!!!Q0.1429696!!!!!!!!!!!!!!Q0.02004811!
last_evaluation!!!!!!!!!!!!!!!!!!!!!0.10502121!!!!!!!1.0000000!!!!!!!!!!!!!!0.3493326!!!!!!!!!!!!!!!0.33974180!
number_project!!!!!!!!!!!!!!!!!!!Q0.14296959!!!!!!!0.3493326!!!!!!!!!!!!!1.0000000!!!!!!!!!!!!!0.41721063!
average_montly_hours!!!!!!!Q0.02004811!!!!!!!0.3397418!!!!!!!!!!!!!0.4172106!!!!!!!!!!!1.00000000!
time_spend_company!!!!!!!!!Q0.10086607!!!!!!!0.1315907!!!!!!!!!!!!0.1967859!!!!!!!!!!!0.12775491!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!time_spend_company!
satisfaction_level!!!!!!!!!!!!!!!!!!!!!!Q0.1008661!
last_evaluation!!!!!!!!!!!!!!!!!!!!!!!!!!0.1315907!
number_project!!!!!!!!!!!!!!!!!!!!!!!!!0.1967859!
average_montly_hours!!!!!!!!!!!!!0.1277549!
time_spend_company!!!!!!!!!!!!!!1.0000000!
!

!
!
The!above!plot!and!output!shows!correlations!between!numeric!variables!of!all!employees.!
Managers!seem!to!give!higher!evaluation!scores!to!employees!who!work!more!hours!and!who!
have!more!projects,!however!there!is!a!negative!correlation!between!employee!satisfaction!and!
number!of!projects.!It!should!be!interesting!to!see!how!this!compares!to!correlations!using!just!
the!best!employees.!
!
Correlations!using!just!the!best!employees!and!most!experienced!employees!that!left:!
!
>!hrbestleft<Qhr[which(hr$last_evaluation!>=!.72!&!hr$left!==!1),]!
>!cor(hrbestleft[,1:5])!
!!!!!!!!!!!!!!!!!!!!!!
!
!
!
!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!satisfaction_level!last_evaluation!number_project!
satisfaction_level!!!!!!!!!!!!1.0000000!!!!!!!0.3611564!!!!!Q0.7370609!
last_evaluation!!!!!!!!!!!!!!!0.3611564!!!!!!!1.0000000!!!!!Q0.2150533!
number_project!!!!!!!!!!!!!!!Q0.7370609!!!!!!Q0.2150533!!!!!!1.0000000!
average_montly_hours!!!!!!!!!Q0.4771749!!!!!!Q0.1261519!!!!!!0.5217016!
time_spend_company!!!!!!!!!!!!0.6582700!!!!!!!0.3147566!!!!!Q0.3644283!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!average_montly_hours!time_spend_company!
satisfaction_level!!!!!!!!!!!!!Q0.4771749!!!!!!!!!!0.6582700!
last_evaluation!!!!!!!!!!!!!!!!Q0.1261519!!!!!!!!!!0.3147566!
number_project!!!!!!!!!!!!!!!!!!0.5217016!!!!!!!!!Q0.3644283!
average_montly_hours!!!!!!!!!!!!1.0000000!!!!!!!!!Q0.1572702!
time_spend_company!!!!!!!!!!!!!Q0.1572702!!!!!!!!!!1.0000000!
!

!
!
There!are!some!very!notable!differences!here,!including!the!massive!negative!correlations!
between!number!of!projects!and!satisfaction!level!and!the!large!negative!correlation!between!
average!monthly!hours!and!satisfaction!level.!This!probably!means!that!managers!are!
overworking!their!best!employees,!which!leads!to!lower!satisfaction!levels.!Its!worth!looking!at!
the!data!visually!to!see!if!this!is!in!fact!the!case.!Ill!also!run!a!decision!tree!analysis!which!may!
serve!as!a!confirmation.!
!
Interpreting!Correlation!Differences!Visually!
!
Do!the!best!employees!work!more!hours?!
!
!

!
!
Comparing!these!histograms,!its!clear!that!employees!that!score!higher!on!manager!
evaluations!are!working!considerably!more!hours!than!the!workforce!as!a!whole.!
Do!the!best!employees!work!on!more!projects?!

!
Yes,!the!best!employees!usually!have!more!projects.!There!is!a!downward!trend!as!the!number!
of!projects!increase!when!you!look!at!the!workforce!as!a!whole,!and!the!opposite!can!almost!be!
said!for!the!best!employees!(until!you!get!to!6!projects).!
!
Have!the!best!employees!been!working!at!the!company!for!a!longer!period!of!time?!

!
Almost!all!of!the!best!employees!have!been!at!the!company!for!at!least!four!years,!perhaps!this!
can!be!related!to!learning!by!doing.!Its!also!a!sufficient!amount!of!time!to!prove!to!managers!
that!they!are!high!performing.!The!dataset!as!a!whole!shows!that!there!are!an!abundance!of!
employees!who!have!been!there!for!2!and!3!years.!Lets!see!if!anyone!is!being!promoted.!

!
As!you!can!see!above,!of!the!best!performing!employeeshardly!any!of!them!have!been!
promoted!in!the!last!five!years.!In!fact,!its!only!.2%.!It!must!be!discouraging!to!these!employees!
to!be!highly!evaluated!and!not!be!rewarded!for!it.!
!
Next!Im!going!to!look!at!the!relationship!between!job!type!and!salary.!Are!there!noticeable!
differences!in!pay!between!different!departments!of!the!company?!And!how!many!employees!
are!in!each!department?!

!
A!couple!of!things!I!noticed!while!looking!at!this!graph!are!that!a!majority!of!the!good!
employees!are!on!the!low!end!of!the!salary!spectrum!and!most!of!them!are!working!in!sales,!
support!in!technical!roles.!However!I!made!the!same!graph!using!the!dataset!as!a!whole!and!
didnt!see!much!of!a!difference,!so!Ill!put!these!observations!aside!for!now.!
!
As!I!mentioned!earlier!during!my!association!rules!analysis,!satisfaction!is!most!likely!significant!
in!determining!why!the!best!employees!are!leaving.!The!plot!below!is!an!attempt!to!see!that!
relationship!visually,!where!the!green!density!is!the!subset!of!the!best!employees!that!left,!the!
red!density!are!the!best!employees!that!have!stayed,!and!the!blue!density!is!the!entire!dataset.!
!
!
!
!
p1<Qggplot()+geom_density(data!=!hrbestleft,!aes(satisfaction_level),!fill!=!'green',!alpha!=!.3)+!
!!geom_density(data!=!hrbeststay,!aes(satisfaction_level),!fill!=!'red',!alpha!=!.3)+!
!!geom_density(data!=!hr,!aes(satisfaction_level),!fill!=!'blue',!alpha!=!.3)+theme_light(base_size!
=!16)+xlab("Satisfaction!Level")+ylab("")+!
!!ggtitle("Satisfaction!Levels!of!Subsets")!

!
!
!
The!best!employees!that!left!(green)!is!what!really!stands!out!here.!Many!of!them!have!very!
low!satisfaction!levels!(<.25),!then!there!is!a!lull,!and!then!another!group!with!satisfaction!levels!
greater!than!.6.!Its!difficult!to!say!why!this!might!be.!Perhaps!there!is!a!difference!in!how!the!
employees!interpret!satisfaction.!Its!possible!that!they!still!enjoyed!their!job!despite!being!over!
worked!and!not!being!promoted.!I!think!the!best!way!to!figure!this!out!is!through!a!decision!
tree!analysis,!where!those!who!left!will!be!classified!more!accurately.!But!first,!I!want!to!
combine!average!monthly!hours!and!satisfaction!into!a!plot.!Since!I!noticed!earlier!that!the!
good!employees!that!left!were!working!a!lot!more!hours,!there!should!be!a!strong!relationship!
between!the!two.!
!
plot6<Qggplot(hr,!aes(satisfaction_level,!average_montly_hours,!color!=!left,!alpha!=!
.3))+geom_point()+ggtitle("Hours!and!Satisfaction")!
!
!
!
!
These!distributions!are!very!tight,!which!tells!me!that!the!decision!tree!will!be!a!great!addition!
to!my!analysis.!The!blue!box!must!be!underperforming!employees,!those!that!have!not!been!
working!many!hours!and!arent!that!satisfied.!Where!the!other!two!blue!distributions,!judging!
by!the!density!plots!on!the!previous!page,!are!high!performing!employees.!My!next!plot!is!
another!confirmation!of!that!hypothesis,!but!this!time!Im!adding!years!spent!at!the!company.!

!
The!cluster!on!the!right!has!a!lot!of!employees!that!have!been!at!the!company!for!a!long!time,!I!
think!the!lack!of!promotions!may!have!something!to!do!with!them!leaving.!
!
Decision!Tree!Analysis!
!
Decision!trees!are!best!used!on!small!datasets,!so!in!order!to!get!a!few!simple!rules!(and!to!
avoid!overQfitting!the!model)!I!made!a!small!sample!of!the!data!(2%).!!
!
install.packages("party")!
library(party)!
set.seed(421)!
ind<Qsample(2,!nrow(hr),!replace!=!TRUE,!prob!=!c(0.02,0.3))!
traindata<Qhr[ind==1,]!
testdata<Qhr[ind==2,]!
form<Qleft~satisfaction_level+average_montly_hours+time_spend_company+last_evaluation!
hrtree<Qctree(form,!data!=!traindata,!controls!=!ctree_control(maxsurrogate!=!3))!
table(predict(hrtree),!traindata$left)!
plot(hrtree,!type!=!"simple")!
?ctree!
!!print(hrtree)!
!
1
satisfaction_level
p < 0.001

0.46 > 0.46


2 7
time_spend_company time_spend_company
p < 0.001 p < 0.001

4 >4 4 >4
3 6 8 9
time_spend_company n = 46 n = 562 last_evaluation
p = 0.001 y = (0.891, 0.109) y = (0.984, 0.016) p < 0.001

2 >2 0.8 > 0.8


4 5 10 11
n = 21 n = 217 n = 61 average_montly_hours
y = (0.952, 0.048) y = (0.258, 0.742) y = (0.951, 0.049) p < 0.001

216 > 216


12 13
n = 18 time_spend_company
y = (1, 0) p = 0.001

5 >5
14 15
n = 37 n = 22
y = (0.081, 0.919) y = (0.273, 0.727)

!
!
Using!the!variables!time!spent!at!company,!satisfaction,!average!monthly!hours!and!last!
evaluation!(what!I!think!are!the!most!important!variables!based!on!the!visualizations!I!made)!I!
was!able!to!come!up!with!a!few!rules!that!help!classify!employees!into!the!leaving!and!staying!
categories.!Here!are!my!key!takeaways:!
1.)! Employees!with!low!satisfaction!levels,!but!havent!been!at!the!company!long!will!generally!
stay.!
2.)! Employees!with!low!satisfaction!levels!and!who!have!been!at!the!company!between!2!and!
5!years!leave.!
3.)! Employees!with!high!satisfaction!levels!who!have!been!working!for!less!than!or!equal!to!4!
years!stay.!
4.)! High!performing!employees!with!high!satisfaction!and!who!have!been!at!the!company!>4!
years!leave!when!they!are!working!too!many!hours.!
!
This!analysis!is!91.5%!accurate,!which!is!pretty!good!considering!how!simple!the!tree!is.!If!I!
were!to!show!management!one!graph!it!would!be!this,!it!identifies!clear!cut!patterns!and!
confirms!much!of!what!I!had!been!hypothesizing!with!my!previous!analyses.!
!
Random!Forest!and!Logistic!Regression!
!
Before!offering!my!final!advice!to!management,!I!want!to!see!how!accurately!I!can!predict!who!
is!going!to!leave.!An!accurate!machine!learning!algorithm!will!allow!the!company!to!focus!on!
specific!employeesperhaps!offering!them!a!raise!or!reducing!their!hours!before!they!decide!to!
leave.!First!Im!going!to!try!a!logistic!regression,!which!determines!probabilities!of!a!binary!
dependent!variable!for!each!observation.!Any!probability!greater!than!.5!will!mean!the!
employee!will!leave.!Lets!see!how!it!goes:!
!
Logistic!Regression:!
!
#creating!a!test!and!training!set!using!dplyr!
set.seed(142)!
train<Qsample_frac(hr,!.7)!
sid<Qas.numeric(rownames(train))!
test<Qhr[Qsid,]!
!
fitted.results<Qpredict(glmmodel,!newdata!=!test,!type!=!"response")!
#type!=!response!converts!logits!to!predicted!probabilities!
new<Qmutate(test,!fitted.results)!
predicted.to.leave<Qfilter(new,!fitted.results!>!.5)!
predicted.to.stay<Qfilter(new,!fitted.results!<!.5)!
View(predicted.to.stay)!
summary(predicted.to.stay$left)!
summary(predicted.to.leave$left)!
!
The!model!ended!up!being!only!79.4%!accurate.!Which!is!okay,!but!considering!the!decision!
tree!was!91%,!I!think!I!can!come!up!with!a!better!model.!Random!forest!works!by!averaging!the!
results!of!many!decision!trees!and!can!work!very!well.!Lets!try!that:!
!
randindex<Q!sample(1:dim(hr)[1])!
cutpoint2_3<Qfloor(2*dim(hr)[1]/3)!
traindata<Qhr[randindex[1:cutpoint2_3],]!
testdata<Q!hr[randindex[(cutpoint2_3+1):dim(hr)[1]],]!
library(randomForest)!
rfmodel!<Q!randomForest(factor(left)!~!satisfaction_level!+!number_project!+!
average_montly_hours!+!!
!!!!!!!!!!!!!!!!!!!!!!!!!!time_spend_company!+!promotion_last_5years!+!last_evaluation,!
!!!!!!!!!!!!!!!!!!!!!!!!data!=!traindata)!
!
plot9<Qplot(rfmodel,!ylim=c(0,0.36))!
!

!
!
The!false!positive!and!false!negative!errors!are!very!low,!which!is!a!good!sign.!Lets!see!how!
accurate!the!model!is!when!I!try!it!on!a!test!set.!
!
prediction<Qpredict(rfmodel,!testdata)!
!
!
confusionMatrix(prediction,!testdata$left)!
Confusion!Matrix!and!Statistics!
!
!!!!!!!!!!Reference!
Prediction!!!!0!!!!1!
!!!!!!!!!0!3786!!!48!
!!!!!!!!!1!!!10!1156!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!Accuracy!:!0.9884!!!!!!!!!!
!!!!!!!!!!!!!!!!!95%!CI!:!(0.985,!0.9912)!
!!!!No!Information!Rate!:!0.7592!!!!!!!!!!
!!!!PQValue![Acc!>!NIR]!:!<!2.2eQ16!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!Kappa!:!0.9679!!!!!!!!!!
!Mcnemar's!Test!PQValue!:!1.184eQ06!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!Sensitivity!:!0.9974!!!!!!!!!!
!!!!!!!!!!!!Specificity!:!0.9601!!!!!!!!!!
!!!!!!!!!Pos!Pred!Value!:!0.9875!!!!!!!!!!
!!!!!!!!!Neg!Pred!Value!:!0.9914!!!!!!!!!!
!!!!!!!!!!!!!Prevalence!:!0.7592!!!!!!!!!!
!!!!!!!!!Detection!Rate!:!0.7572!!!!!!!!!!
!!!Detection!Prevalence!:!0.7668!!!!!!!!!!
!!!!!!Balanced!Accuracy!:!0.9787!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!'Positive'!Class!:!0!!!!!!!!!!!!!!!
!
The!model!is!98.84%!accurate,!this!will!prove!to!be!very!beneficial!in!identifying!employees!that!
are!likely!to!be!leaving!in!the!future.!What!variables!are!most!important!in!leaving!vs.!staying?!
!
importance(rfmodel)!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!MeanDecreaseGini!
satisfaction_level!!!!!!!!!!!!!!!!!!!!!!1226.048093!
number_project!!!!!!!!!!!!!!!!!!!!!!!!665.390311!
average_montly_hours!!!!!!!!!!!!536.922188!
time_spend_company!!!!!!!!!!!!!664.193153!
promotion_last_5years!!!!!!!!!!!4.487941!
last_evaluation!!!!!!!!!!!!!!!!!!!!!!!!!430.694068!
!
According!to!the!random!forest!model,!satisfaction,!number!of!projects!and!time!spent!at!the!
company!are!the!three!most!significant!variables.!
!
!
!
!
Conclusion!and!Recommendations!
!
I!very!much!enjoyed!learning!more!about!this!dataset.!I!performed!so!many!types!of!analyses!
because!retaining!a!companys!best!employees!is!extremely!important.!High!turnover!is!costly,!
and!if!a!company!wants!to!grow!you!need!the!right!people!leading!the!way.!Ive!worked!for!
organizations!in!the!past!that!have!had!high!turnover!rates,!and!while!you!want!
underperforming!employees!to!leave,!you!want!your!best!workers!to!grow!with!you.!
!
What!I!found!most!useful!in!this!project!were!visualizations,!the!decision!tree!and!the!random!
forest!algorithm.!They!all!can!be!used!in!different!ways.!If!management!wants!a!basic!
understanding!of!whats!going!on,!I!would!show!them!the!visuals,!if!they!want!to!know!what!
patterns!are!harming!them,!I!would!go!over!the!decision!tree,!and!if!they!want!to!know!what!
employees!will!leave!in!the!future,!the!random!forest!model!would!be!helpful.!Based!on!all!of!
those,!here!are!the!two!key!points!management!should!know!concerning!why!their!best!and!
most!experienced!employees!are!leaving!prematurely:!
!
1.)! They!are!being!overworked!!its!common!for!managers!to!take!advantage!of!employees!
who!do!a!good!job!by!giving!them!a!heavier!workload.!This!is!costing!the!company,!
because!they!are!deciding!to!leave.!
2.)! They!arent!being!promotedQ!good!employees!expect!to!be!rewarded.!There!is!a!large!
group!of!employees!with!high!satisfaction!levels!who!have!been!at!the!company!for!
more!than!four!years,!but!they!decided!to!leave!because!there!isnt!any!career!growth.!!
!
There!are!a!couple!of!simple,!obvious!actions!management!can!take.!They!shouldnt!work!their!
best!employees!more!than!anyone!else,!and!they!should!be!promoted!after!3!or!4!years.!In!
time,!I!think!they!will!find!that!although!the!company!will!be!less!productive!in!the!short!run,!
reducing!their!turnover!rate!of!their!best!employees!will!lead!to!incremental!growth.!
!
!!

You might also like