ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
EY ALEPH
deep learning applied to
jurimetrics
papis | june 2018
EY Aleph: Deep Learning Applied to Jurimetrics Practice
relevance
283,4
76,9
39,1
Total lawsuits in the balance sheets
(R$ Billions)1
Tax Civil Labor
32%
tax litigation in
relation to the
market value1
ICMS 90,4
Social Contribution
on Net Income
65,2
Income Tax 22,8
PIS/Cofins 15,57
CIDE 11,72
Top Litigation Taxes
(R$ Billions)1
1Source: ¡°O contencioso tribut¨¢rio sob a perspectiva corporativa¡±, Ana Teresa L R Lopes
All related data is about 2014 regarding 30 top companies in Brazil
real jurimetrics for labor law
legal provisions
deal & defense models
deal pricing
operation optimization
law firms performance
regional labor tribunal
204 courts in region 2
(S?o Paulo)
universal data
physical lawsuits
over than 7 MM of
documents (pdf) available
electronic lawsuits
all lawsuits from 2015
more structured (html)
our hacking skills
automatic scraping: not human mimic
captcha solving
sometimes is not easy, even
for pros but we can do
little tricks¡­
80% data prep cv 20% + cnns
putting all together
public data
collections
normalization
attribute
extractions
audit visualization
scraping
rpa
deep
learning
computer
vision
document
conversions
unrtf
poppler
regex
machine
learning
human
check
support web
app
web app
viztools
challenge
large-scale processing of files
constant evolution
accessibility
speed
rate OUR setup
stack defined by ia programing language
& azure public cloud
low cost
job management
back-end app
rest api
front-end app
storage queue
no-sql
sql
gitci
table
EY Aleph: Deep Learning Applied to Jurimetrics Practice
cosmosdb
no friction with
data structure alterations,
but not good with dataviz
tools for BI teams nor
application stack
mysql
perfect to dataviz tools and
support the app
summary from no-sql
database, read-only scenario
job management
ops
redis as queue management
load
blob queue table
stores objects with
possibility of local
redundancy (3 copies)
or global (6 copies)
has local redundancy (3
copies of the message)
messages expire in 7
days
storage of key-value
type
"No-sql like"
does not allow map-
reduce operations
filters only by key
(recommended)
raw documents
cleaned
documents
ml models
job management
orchestration configurations
continuous deployment
slack everywhere
ai solution in a box
cosmos
no-sql
app insights
sql
aleph admin
ruby
functions
queue blob
tables jenkinstfs
celery
users
api mgnt
redis
cache
cloud for
b2b
customers ey
aleph
ruby + ember
mechanical
turk staff
check ai black boxes but carefully
our open boxes
In 20/02/2017
was declared ¡­
20/02/2017
In 20 of
December of
2017 ¡­
20/12/2017 In the second day
of January of two
thousand and
seventeen ¡­
02/01/2017
In eleventh day of
March of 2016 ¡­
01/03/2016
our challenge: real unstructured data¡­
¡­.
Foundation
Extra Hours
Worker claims that the hours after
work were not ¡­.
D E C I S I O N
Of the additional of unhealthiness.
The author worked for the claimed
ones ¡­
II ¨C FOUNDATION
- Rescission sums
The author postulates the payment
of the amounts resulting from the
unmotivated waiver ¡­
J u d g e m e n t
¡­
moral damages.
The requester claimed that during
his work at ¡­
¡­and it gets worse
REQUESTS
CONCLUSION
Human
Lawyers
Annotated
Database
Classifier
Model
Predictions
ey mechanical turk
TYPE OF PHRASE
? Requests
? Sentences
? Other
DECISION
? Granted
? Overruled
1 2
classifiers
Text Representation
Tokenizers
Tested Algorithms
WORD COUNT
TF-IDF
N-GRAMS + STOPWORDS
UNIGRAM + STEMMING + STOPWORDS
UNIGRAM + STOPWORDS
REGRESS?O LOG?STICA
RANDOM FOREST
GRADIENT BOOSTING
CLASSIFIER
MULTINOMIAL NA?VE
BAYES
STOCHASTIC
GRADIENT DESCENT
SUPPORT VECTOR
CLASSIFIER
traditional nlp approaches
traditional nlp approaches
text representation tokenizer classifiers algorithms
WORD COUNT
TF-IDF
N-GRAMS + STOPWORDS
UNIGRAM + STEMMING +
STOPWORDS
UNIGRAM + STOPWORDS
LOGISTIC REGRESSION
RANDOM FOREST
GRADIENT BOOSTING
CLASSIFIER
MULTINOMIAL NA?VE BAYES
STOCHASTIC GRADIENT
DESCENT
SUPPORT VECTOR
CLASSIFIER
Treinamento Teste
first results
logistic regression + tf-idf + no stopwords + stemming
training testing
f1-score: 0,90 f1-score: 0,81
EY Aleph: Deep Learning Applied to Jurimetrics Practice
TYPE OF PHRASE
? Demands
? Sentences
? Nothing
DECISION
? Granted
? Overruled
1 2
DEEP LEARNING
TRADITIONAL NLP
classifiers
1DConvNet
LSTM
Bidirectional-LSTM
Task Specific
Pre-trained (Word2Vec)
EMBEDDINGS
(Words)
LAYERS
(Phrases)
LOSS FUNCTION SGD
RMSProp
Nadam
embeddings
pre trained
f-score = 0,897
embeddings
task specific
f-score = 0,896
¡°I do not find the defendant¨C in light of all available evidence
and according to the law and the decision of the jury, and so
it goes and yadda yadda ¨C to be guilty.¡±
recurrent neural networks
RNN
Recurrent Neural Network
LSTM
Long-Short Term Memory
Bidirectional-LSTM
Pre-trained (Word2Vec)EMBEDDINGS
LAYERS
LOSS FUNCTION Nadam
conv1D/dense
alternative approaches
lstm
word2vec
documents /
phrases /
words
stop
guessing
Michel Fernandes
michel.fernandes@br.ey.com
Rafael Kenski
rafael.kenski@br.ey.com
THANK YOU

More Related Content

EY Aleph: Deep Learning Applied to Jurimetrics Practice