�ݺ�ߣ

EY ALEPH
deep learning applied to
jurimetrics
papis | june 2018

EY Aleph: Deep Learning Applied to Jurimetrics Practice

relevance
283,4
76,9
39,1
Total lawsuits in the balance sheets
(R$ Billions)1
Tax Civil Labor
32%
tax litigation in
relation to the
market value1
ICMS 90,4
Social Contribution
on Net Income
65,2
Income Tax 22,8
PIS/Cofins 15,57
CIDE 11,72
Top Litigation Taxes
(R$ Billions)1
1Source: ��O contencioso tribut��rio sob a perspectiva corporativa��, Ana Teresa L R Lopes
All related data is about 2014 regarding 30 top companies in Brazil

real jurimetrics for labor law
legal provisions
deal & defense models
deal pricing
operation optimization
law firms performance

regional labor tribunal
204 courts in region 2
(S?o Paulo)
universal data
physical lawsuits
over than 7 MM of
documents (pdf) available
electronic lawsuits
all lawsuits from 2015
more structured (html)

our hacking skills
automatic scraping: not human mimic
captcha solving

sometimes is not easy, even
for pros but we can do
little tricks��

putting all together
public data
collections
normalization
attribute
extractions
audit visualization
scraping
rpa
deep
learning
computer
vision
document
conversions
unrtf
poppler
regex
machine
learning
human
check
support web
app
web app
viztools

challenge
large-scale processing of files
constant evolution
accessibility
speed

rate OUR setup
stack defined by ia programing language
& azure public cloud
low cost

job management
back-end app
rest api
front-end app
storage queue
no-sql
sql
gitci
table

cosmosdb
no friction with
data structure alterations,
but not good with dataviz
tools for BI teams nor
application stack

mysql
perfect to dataviz tools and
support the app
summary from no-sql
database, read-only scenario

blob queue table
stores objects with
possibility of local
redundancy (3 copies)
or global (6 copies)
has local redundancy (3
copies of the message)
messages expire in 7
days
storage of key-value
type
"No-sql like"
does not allow map-
reduce operations
filters only by key
(recommended)
raw documents
cleaned
documents
ml models
job management
orchestration configurations

ai solution in a box
cosmos
no-sql
app insights
sql
aleph admin
ruby
functions
queue blob
tables jenkinstfs
celery
users
api mgnt
redis
cache
cloud for
b2b
customers ey
aleph
ruby + ember
mechanical
turk staff

check ai black boxes but carefully

In 20/02/2017
was declared ��
20/02/2017
In 20 of
December of
2017 ��
20/12/2017 In the second day
of January of two
thousand and
seventeen ��
02/01/2017
In eleventh day of
March of 2016 ��
01/03/2016
our challenge: real unstructured data��

��.
Foundation
Extra Hours
Worker claims that the hours after
work were not ��.
D E C I S I O N
Of the additional of unhealthiness.
The author worked for the claimed
ones ��
II �C FOUNDATION
- Rescission sums
The author postulates the payment
of the amounts resulting from the
unmotivated waiver ��
J u d g e m e n t
��
moral damages.
The requester claimed that during
his work at ��
��and it gets worse
REQUESTS
CONCLUSION

Human
Lawyers
Annotated
Database
Classifier
Model
Predictions

TYPE OF PHRASE
? Requests
? Sentences
? Other
DECISION
? Granted
? Overruled
1 2
classifiers

Text Representation
Tokenizers
Tested Algorithms
WORD COUNT
TF-IDF
N-GRAMS + STOPWORDS
UNIGRAM + STEMMING + STOPWORDS
UNIGRAM + STOPWORDS
REGRESS?O LOG?STICA
RANDOM FOREST
GRADIENT BOOSTING
CLASSIFIER
MULTINOMIAL NA?VE
BAYES
STOCHASTIC
GRADIENT DESCENT
SUPPORT VECTOR
CLASSIFIER
traditional nlp approaches

traditional nlp approaches
text representation tokenizer classifiers algorithms
WORD COUNT
TF-IDF
N-GRAMS + STOPWORDS
UNIGRAM + STEMMING +
STOPWORDS
UNIGRAM + STOPWORDS
LOGISTIC REGRESSION
RANDOM FOREST
GRADIENT BOOSTING
CLASSIFIER
MULTINOMIAL NA?VE BAYES
STOCHASTIC GRADIENT
DESCENT
SUPPORT VECTOR
CLASSIFIER

Treinamento Teste
first results
logistic regression + tf-idf + no stopwords + stemming
training testing
f1-score: 0,90 f1-score: 0,81

TYPE OF PHRASE
? Demands
? Sentences
? Nothing
DECISION
? Granted
? Overruled
1 2
DEEP LEARNING
TRADITIONAL NLP
classifiers

1DConvNet
LSTM
Bidirectional-LSTM
Task Specific
Pre-trained (Word2Vec)
EMBEDDINGS
(Words)
LAYERS
(Phrases)
LOSS FUNCTION SGD
RMSProp
Nadam

embeddings
pre trained
f-score = 0,897

embeddings
task specific
f-score = 0,896

��I do not find the defendant�C in light of all available evidence
and according to the law and the decision of the jury, and so
it goes and yadda yadda �C to be guilty.��
recurrent neural networks

RNN
Recurrent Neural Network
LSTM
Long-Short Term Memory

Bidirectional-LSTM
Pre-trained (Word2Vec)EMBEDDINGS
LAYERS
LOSS FUNCTION Nadam

conv1D/dense
alternative approaches
lstm
word2vec
documents /
phrases /
words

Michel Fernandes
michel.fernandes@br.ey.com
Rafael Kenski
rafael.kenski@br.ey.com
THANK YOU

�ݺ�ߣ

EY Aleph: Deep Learning Applied to Jurimetrics Practice

More Related Content

EY Aleph: Deep Learning Applied to Jurimetrics Practice