�ݺ�ߣ

Background
��
Pas
��de
��Poisson
��is
��a
��fishing
��conglomerate
��headquartered
��in
��Montreal,
��CN.
��
��The
��
fleet
��is
��located
��remotely
��in
��two
��loca?ons,
��Halifax,
��NS
��and
��St.
��John’s
��
Newfoundland.
��
��The
��St.
��John’s
��fleets
��primarily
��work
��the
��near
��shore
��fishing
��
grounds
��of
��Nova
��Sco?a
��and
��Newfoundland
��within
��12
��nau?cal
��miles
��from
��
shore.
��
��The
��Halifax
��loca?ons,
��however,
��have
��fishing
��deployments
��that
��are
��
located
��much
��further
��offshore,
��and
��in
��most
��cases,
��using
��U.S.
��territorial
��waters
��
in
��the
��North
��Atlan?c
��under
��the
��CANAM
��bilateral
��agreements.
��
��
��
The
��en?re
��crew
��of
��the
��St.
��Johns
��fleet
��are
��Canadian
��residents.
��
��Hiring
��managers
��
ensure
��that
��90%
��of
��the
��deck
��hands
��working
��on
��the
��Halifax
��fleet
��are
��foreign
��
workers
��as
��the
��labor
��rate
��is
��significantly
��lower
��and
��the
��turnover
��rate
��is
��6
��?mes
��
the
��rate
��of
��St.
��Johns
��because
��the
��weather
��is
��constantly
��rough
��in
��the
��North
��
Atlan?c
��crea?ng
��excep?onally
��poor
��working
��condi?ons,
��but
��paying
��well.
��

Execu?ve
��Summary
��
The
��hiring
��managers
��of
��Pas
��de
��Poissen
��sought
��the
��guidance
��of
��a
��consul?ng
��
firm
��to
��determine
��which
��of
��the
��na?onality
��of
��the
��foreign
��work
��force,
��entering
��
Canada,
��would
��have
��the
��highest
��probability
��that
��a
��judge
��would
��approve
��their
��
appeal
��to
��remain,
��and
��subsequently
��be
��employable
��in
��the
��country.
��
��
��
Establishing
��a
��model
��to
��best
��determine
��which
��candidates
��to
��hire
��provided
��
excep?onal
��cost
��saving
��opportuni?es.
��
��In
��the
��past,
��if
��the
��company
��was
��
informed
��that
��one
��of
��their
��new
��foreign
��na?onal
��workers
��was
��not
��granted
��an
��
appeal,
��and
��was
��ac?vely
��on
��a
��fishing
��deployment,
��at
��?mes
��las?ng
��for
��over
��45
��
days,
��the
��trawler
��was
��forced
��to
��return
��to
��port.
��
��A
��vessel
��having
��to
��return
��
equated
��to
��missed
��opportunis?c
��revenue,
��as
��it
��could
��no
��longer
��fish,
��and
��
unexpected
��fuel
��expenses
��for
��return
��transit.
��
��Furthermore,
��the
��penalty
��for
��
knowing
��employing
��an
��illegal
��foreign
��worker
��was
��harsh
��from
��both
��the
��
Canadian
��and
��U.S
��fisheries
��enforcement
��agencies.
��

Data
��Integrity
��
•  Source:
��Ra[le
��Library
��
•  Name:
��“Green:
��Refugee
��Appeal”
��
•  “Cleaning”
��steps
��
•  Used
��transform
��tag
��to
��remove
��missing
��and
��ignored
��data
��aèr
��
comparing
��the
��original
��and
��“cleaned”
��OOB
��error
��rates.
��
��
Addi?onally,
��the
��categorical
��data
��“judges”
��was
��deemed
��to
��be
��
sta?s?cally
��insignificant
��for
��our
��purposes,
��hence
��it
��was
��omi[ed
��
thus
��increasing
��the
��integrity
��of
��the
��the
��dataset.
��
•  Steps:
��
��In
��order
��to
��fulfill
��the
��hiring
��strategy
��we
��targeted
��informa?on,
��from
��
the
��data
��(using
��ra[le,
��R
��and
��excel),
��that
��would
��serve
��to
��determine
��the
��
informa?on
��necessary
��to
��depict
��future
��hires
��based
��on
��the
��probability
��to
��
determine
��an
��approved
��appeal.
��
��

Forest
��Model
��
!  Imported
��the
��data
��and
��Rescaled
��
��
!  Created
��a
��Forest
��model
��with
��default
��op?ons
��
!  OOB
��error
��=30.62%
��,
��Type
��1=
��16.12
��%
��and
��Type
��2
��=65.5
��%error,
��AUC
��=
��0.644
��
!  Our
��business
��requires
��more
��focus
��on
��Type
��1
��error
��rather
��than
��Type
��2
��error
��
!  Checked
��the
��trend
��of
��errors
��and
��importance
��
!  Created
��a
��sample
��of
��35,35
��
!  OOB
��es?mate
��of
��
��error
��rate:
��35.83%,
��Type
��1
��error
��rate
��=
��35.02%,
��Type
��2
��error
��rate
��=
��
��
37.77%,
��AUC
��=
��0.653
��
!  Error
��rate
��increased,
��Type
��1
��increased-‐
��not
��good
��
!  No
��major
��change,
��although
��type
��2
��decreased
��
!  Look
��for
��a
��be[er
��one.
��Prune
��the
��trees
��at
��minimum
��complexity
��
!  Here
��tree
��=
��421
��and
��complexity
��=
��0.2913
��
!  Now,
��OOB
��es?mate
��of
��
��error
��rate:
��29.32%
��,
��AUC
��=
��0.646,
��Type
��1
��error
��=
��14.28571%,
��
Type
��2
��error=
��65.55%
��
��
!  Type
��2
��is
��s?ll
��large
��but
��we
��are
��not
��much
��concerned
��about
��that.
��
!  Best
��model
��so
��far
��

Forest
��Model
��
!  Create
��Importance
��level
��of
��Type
��1,
��Type
��2
��error
��rate
��by
��sampling
��data
��(35,35)
��
!  randomForest(formula
��=
��IMO_decision
��~
��.,
��data
��=
��crs$dataset[crs$sample,
��
c(crs$input,
��crs$target)],ntree
��=
��421,
��mtry
��=
��5,
��sampsize
��=
��c(35,
��35),
��
importance
��=
��TRUE,
��replace
��=
��FALSE,
��na.ac?on
��=
��na.roughﬁx)
��
! 
��OOB
��es?mate
��of
��
��error
��rate:
��36.48%,
��Type
��1
��error
��rate
��=
��36.4
��%,
��Type
��2
��error
��
rate
��=
��36.6%
��
!  OOB
��increased
��.
��Type
��1
��increased
��as
��expected
��.
��Not
��a
��good
��solu?on
��
��
��
!  Our
��Best
��Solu?on
��so
��far
��is
��
��
��
!  95%
��CI:
��0.5462-‐0.6554
��(DeLong)
��
��
!  OOB
��es?mate
��of
��
��error
��rate:
��29.32%,
��Type
��1
��error
��rate
��=
��14.28%,
��Type
��2
��error
��
rate
��=
��65.6
��%.
��
!  Run
��the
��evalua?on
��on
��the
��test
��data
��set
��to
��get
��the
��ﬁnal
��result.
��

��
��
��
��
��

Final
��Confusion
��Matrix-‐
��Forest
��Model
��

Boos?ng
��Model
��
•  Run
��the
��Boos?ng
��model
��with
��default
��op?ons
��
•  OOB
��es?mate
��of
��
��error
��rate:
��21.8%
��
•  Type
��1
��error
��rate
��is
��6.9%,
��Type
��2
��error
��rate
��is
��61.1
��%.
��Look
��for
��error
��trends
��and
��importance
��of
��variables.
��
Analysis-‐
��Success
��and
��language
��are
��major
��predictors
��
•  Training
��Error
��is
��high
��ini?ally,
��down
��warding
��as
��number
��of
��itera?ons
��increase.
��
•  Try
��to
��look
��at
��the
��point
��where
��error
��graph
��becomes
��constant.
��
•  1’s
��as
��shown
��in
��the
��graph
��depict
��the
��trend,
��but
��the
��trend
��again
��is
��changing
��beyond
��itera?on
��50.
��
•  Build
��more
��itera?ons
��to
��figure
��out
��the
��trend
��and
��the
��point
��aèr
��which
��error
��rate
��is
��constant.
��
•  Analysis-‐
��Success
��and
��language
��are
��major
��predictors
��
•  Build
��the
��model
��with
��itera?on
��=
��200
��
•  Analysis-‐:
��The
��trend
��seems
��clear.
��Aèr
��140
��itera?ons,
��the
��error
��rate
��graph
��becomes
��constant.
��
•  Set
��the
��itera?ons
��to
��140
��and
��con?nue
��the
��boos?ng
��model.
��
��OOB
��error
��is
��21.2
��%
��but
��Type
��
��2
��errors
��are
��very
��large.
��
��
•  AUC
��=68%.
��S?ll
��room
��for
��improvement.
��Set
��the
��importance
��matrix.
��We
��need
��less
��Type
��2
��error.
��
•  Call:
��
ada(IMO_decision
�� ~
�� .,
�� data
�� =
�� crs$dataset[crs$train,
�� c(crs$input,
��
�� crs$target)],
�� control
�� =
��
rpart.control(maxdepth
��=
��30,
��cp
��=
��0.01,
��
��minsplit
��=
��20,
��xval
��=
��10),
��parms
��=
��list(split
��=
��"informa?on",
��
��loss
��=
��
matrix(c(0,
��1,
��1.5,
��0),
��byrow
��=
��TRUE,
��nrow
��=
��2)),
��iter
��=
��140)
��
��

Final
��Confusion
��Matrix-‐
��Boos?ng
��
Model
��
• 
��
��

��
��
��
��
��
��Best
��so
��far,
��although
��type
��2
��error
��is
��
s?ll
��big
��
��
��
•  Giving
��more
��importance
��doesn’t
��help
��
��
��
•  No
��major
��change
��in
��ROC.
��

Comparison
��of
��Models
��
Forest
��Model
�� Boos,ng
��Model
��

Conclusion
��

��
��
��
��With
��the
��best
��dataset,
��it
��shows
��that
��there
��is
��a
��strong
��sta?s?cal
��signiﬁcance
��that
��
Czechoslovakia,
��exhibit
��1,
��is
��the
��na?on
��with
��the
��highest
��probability
��of
��winning
��
appeal
��based
��on
��data
��analyzed
��in
��MS
��Excel.
��
��Furthermore,
��exhibit
��2
��shows
��29%
��of
��
all
��applicants
��are
��denied
��their
��appeal.
��
��Of
��those
��the
��Rater,
��person
��who
��determines
��
the
��merit
��of
��their
��case
��going
��forward
��predicts
��with,
��an
��81%
��conﬁdence
��rate
��that,
��
when
��he
��or
��she
��predicts
��a
��appeal
��denial,
��it
��is
��the
��correct
��predic?on,
��conversely
��
they
��are
��only
��correct
��48%
��of
��the
��?me
��when
��they
��predict
��an
��awarded
��appeal
��by
��the
��
judge.
��
��Finally,
��the
��data
��shows
��that
��most
��applicants
��the
��seek
��an
��appeal
��have
��a
��
higher
��approval
��probability
��with
��the
��courts
��in
��Montreal
��and
��not
��Toronto.
��

��
��
��
��
��As
��with
��the
��Appeal
��data
��(above)
��the
��same
��inferences
��can
��be
��established
��with
��
individual
��Judge
��data.
��For
��the
��judges
��tree,
��exhibit
��3,
��if
��we
��assume
��that
��the
��rater
��
predicts
��success
��for
��33-‐34%
��of
��claimants,
��72%
��of
��those
��posi?ve
��predic?ons
��are
��
cases
��that
��are
��to
��be
��heard
��by
��judges
��that
��ARE
��NOT
��Heald,
��Hugessen,
��Iacobucci,
��
MacGuigan,
��Pra[e,
��and
��Stone.
��We
��can
��infer
��that
��Desjardins,
��Mahoney,
��Marceau,
��
and
��Urie
��ARE
��judges
��that
��will
��have
��the
��highest
��probability
��of
��ruling
��posi?ve
��on
��an
��
appeal.
��
��Therefore,
��as
��Desjardins
��is
��from
��Montreal
��and
��rules
��favorably
��on
��
Czechoslovakian
��na?onals,
��it
��would
��behoove
��the
��company
��to
��create
��a
��goal
��
congruent
��strategy
��that
��favors
��those
��results.
��

Exhibit
��1
��
Appeal
��Rate
��by
��Na?on
��
NATION
�� APPROVED
��APPEAL
��RATE
��
CZECHOSLOVAKIA
�� 73%
��
SRI
��LANKA
�� 36%
��
EL
��SALVADOR
�� 36%
��
ARGENTINA
�� 25%
��
IRAN
�� 25%
��
CHINA
�� 22%
��
BULGARIA
�� 7%
��

�ݺ�ߣ

Predictive Modeling using R

Recommended

More Related Content

Similar to Predictive Modeling using R (20)

Recently uploaded (20)

Predictive Modeling using R