狠狠撸

狠狠撸Share a Scribd company logo
Data Science: Predict Success of
Legislation with Topics Only
Natural Language Processing with Sunlight Foundation Open States API

!
Pauline Chow 

Fall 2016
What policies and laws relate to
your well being?
When I ?rst asked this
question I was working
and interested in
transportation policy,
especially walking and
bicycling
? Increase transparency of the various levels of decision
making - federal, state, and local
? Effectively understand trends in public policy, in order to
educate and in?uence
? Distill legislative process into logical system of features
? Extract and identify relationships of decision makers with
communities, topics, and laws
Why Does Analyzing Elements of Successful
Legislation Important?
Data Science Steps for Predicting Legislative Results
1. Collect Data from Sunlight Foundation API and other open data
sources
2. Clean text from legislative bills via web scraping, including
removing html, stop words, target variable (i.e. bill passage)
3. Extract features from text in python
4. Build topics from text using Latent Dirchlet Allocation (LDA),
probabilistic approach
5. Implement supervised learning models
6. Analyze results
1. Collect: As the initial step
to building predictive
models, insights re?ect
features from California bills
text between 2009-2014
2. Clean:
Good Ole
Scraping
3. How to
Extract
Features from
Text?
Sick of Having to go 2 di?erent hut buy pizza sunglass
1 1 1 2 1 1 1 1 1 1 1
4. Build: What is the Latent Dirchlet Allocation
(LDA) Topic Model?
? Finds hidden semantic structure, aka
context, where topics are cluster of
similar words: P(word | context)
? Each document is a mixture of
topics, words and phrases, which
are split into probabilities
? Tune parameters: # of words in each
topic, mixture within each topic,
threshold for frequency and
probability
? For example: Topic A (1,2,5):
breakfast 30%, pizza 10%, smoothie
5%
1
Sick of having to go to two
huts for pizza and sunglasses
2
I ate a cold pizza and spinach
smoothie for breakfast.
3
I wear my sunglasses at night
so I can see
4
Sometimes I get really sick
when I go on roller coasters
5
Co?ee only for breakfast
because co?ee is for closers
4. Build Word2Vec: Find Similarities
? Extract relationships in
unstructured text
? Leverage context of
documents and LDA’s
probabilistic models
? Hierarchical structure
of probabilities
? Derive meaning from
cleaned vector of
words and phrases
5. Implement Logistic Regression: CA Bills
Over Time
1. Model predicts failure
better than successful
legislation
2. Model with 50 versus
100 topics predictive
results did not differ
signi?cantly
3. Precision (TP / TP + FP)
4. Recall (TP / TP + FN)
Model Classi?cation Report: 

All Topics Over Time
target precision recall
f1-
score
support
failed 0.70 0.98 0.82 3118
passed 0.47 0.04 0.07 1360
avg /
total
0.63 0.69 0.59 4478
6. Analyze California Bills (100 Topic Models)
? Bills have an average of 6.57 number of topics, ranging
from 2 - 16.
? Passage rate by topic ranged from 18% to 36%,
averaging 28% for all bills in the database
? Most frequent topics of legislation relate to local
government funding/taxes/leadership initiatives, health
care, education, budget and taxes, and court system
? Highest and lowest passage rate topics are reviewed
in the next few slides
6. Analyze: Distribution of Topics in California Bills
Top 10 Topics by
Frequency
Topic #s Frequency
topic 48 13569
topic 11 11838
topic 51 8024
topic 73 4913
topic 6 3675
topic 63 2782
topic 1 2615
topic 22 1879
topic 64 1726
topic 45 1663
6. What Topics Support Bill Passage?
Rank Topic # Odds Ratio LDA Topics
1 70 453.981 0.023*tank + 0.019*underground + 0.015*transferor + 0.011*lie + 0.010*decennial
+ 0.008*storage + 0.008*cotenant + 0.008*stanford + 0.006*petroleum +
0.006*orphan
2 74 32.797 0.020*contribution + 0.014*calendar + 0.010*canyon + 0.009*lincoln +
0.009*shoulder + 0.007*stenographer + 0.006*in?ation + 0.005*dispatcher +
0.005*vine + 0.005*boyer
3 47 32.695 0.024*cemetery + 0.011*mexican + 0.010*interment + 0.007*salton +
0.006*elsinore + 0.006*tuberculosis + 0.005*burial + 0.004*bacteria + 0.004*creek
+ 0.004*coliform
4 42 28.312 0.008*hoover + 0.003*tricare + 0.002*shower + 0.002*crutch +
0.002*contractholders + 0.002*bath + 0.001*dme + 0.001*durable + 0.000*hcpcs +
0.000*wheelchair
5 21 25.041 0.021*wyland + 0.015*reorganization + 0.014*brown + 0.012*gordon +
0.010*presidential + 0.008*ford + 0.008*gerald + 0.008*battalion +
0.007*mitochondrial + 0.007*remembrance
6 71 9.608 0.064*andwhereas + 0.024*awareness + 0.020*week + 0.014*whereas +
0.013*violence + 0.012*woman + 0.010*disease + 0.010*resolution + 0.010*month
+ 0.009*furtherresolved
7 65 8.798 0.029*pipeline + 0.027*ronald + 0.022*sea + 0.013*coastal + 0.012*marine +
0.009*rise + 0.008*reagan + 0.008*thomas + 0.007*climate + 0.007*arctic
8 34 8.183 0.030*candidate + 0.011*teen + 0.011*precinct + 0.011*nomination + 0.011*poll +
0.009*freeway + 0.009*say + 0.009*dating + 0.008*teenager + 0.007*sca
9 58 5.086 0.022*autism + 0.014*nursing + 0.014*therapist + 0.013*mr + 0.013*calderon +
0.011*backpack + 0.009*credentialing + 0.008*therapy + 0.008*acupuncture +
0.008*marriage
10 44 4.899 0.021*scientist + 0.017*negrete + 0.016*fact?nding + 0.015*hepatitis +
0.015*mcleod + 0.011*maternity + 0.009*interdistrict + 0.009*knuckle + 0.008*liver
+ 0.005*infected
15 94 2.103 0.023*bicycle + 0.017*bus + 0.013*midwife + 0.011*deployed + 0.010*roadway +
0.009*smog + 0.008*schoolbus + 0.007*safer + 0.005*polluter + 0.005*overtaking
Sample CA Bills Containing “Strong” Topics
Bill Status Bill Session, ID (Link) Topic # All Topics
Passed
2011-2012-0 AB291 

Underground storage tanks:
petroleum: charges.
70 11, 45, 48, 70
Passed
2013-2014-0 AB1286 

Personal income tax: voluntary
contributions: California Breast
Cancer Research Fund
74 11, 45, 74
Passed
2009-2010-0 AB1969 

Elsinore Valley Cemetery
47 1, 11, 47, 48
Passed
2011-2012-0 AB2488 

Vehicles: buses: length limitations
94 1, 11, 48, 49, 73, 94
6. What Topics Have Weak Bill Passage?
Rank Topic #
Odds
Ratio
Topics
-10 79 0.000498
0.057*emission + 0.050*greenhouse + 0.041*gas + 0.019*warming + 0.017*global +
0.016*climate + 0.014*air + 0.013*carbon + 0.013*reduction + 0.013*solution
-9 92 0.000156 0.038*tra?cking + 0.012*duress + 0.010*menace + 0.010*?duciary + 0.010*ammunition +
0.009*chvez + 0.009*human + 0.008*achadjian + 0.008*wilk + 0.006*bigelow
-8 76 0.000150 0.071*inmate + 0.065*parole + 0.026*parolee + 0.023*prison + 0.021*correction +
0.019*rehabilitation + 0.010*released + 0.009*recidivism + 0.008*reentry + 0.007*journalist
-7 28 0.000113 0.036*bag + 0.024*plastic + 0.015*carryout + 0.011*positioning + 0.007*tends +
0.007*electorate + 0.006*store + 0.006*deliberately + 0.006*undetermined + 0.005*el
-6 50 0.000063
0.010*romero + 0.009*antipsychotic + 0.008*medication + 0.006*dementia +
0.006*detachable + 0.005*salvage + 0.004*dietary + 0.004*psychotropic + 0.004*repurchase
+ 0.003*diminishes
-5 24 0.000053 0.023*baby + 0.006*depression + 0.005*paratransit + 0.005*stewardship + 0.004*producer
+ 0.004*perinatal + 0.003*unwanted + 0.003*obstetrics + 0.003*sleep + 0.002*calhome
-4 8 0.000035 0.054*gang + 0.013*immunity + 0.010*tort + 0.008*ri?e + 0.007*european + 0.007*magazine
+ 0.007*pervasive + 0.005*deadly + 0.005*mentally + 0.005*disordered
-3 81 0.000013
0.014*interpreter + 0.007*excellence + 0.006*digitized + 0.005*reelected + 0.005*easy +
0.004*?uency + 0.004*biodegradable + 0.002*willfulness + 0.002*annoyance +
0.002*disincentive
-2 10 0.000004 0.022*budget + 0.013*muratsuchi + 0.013*sawyer + 0.013*mullin + 0.012*bloom +
0.012*nazarian + 0.012*daly + 0.012*campos + 0.011*rodriguez + 0.010*dababneh
-1 62 0.000001
0.005*consummated + 0.004*nonconsenting + 0.003*nonsupervisory + 0.001*peculiar +
0.001*culminating + 0.000*overdue + 0.000*reputation + 0.000*unimpeded + 0.000*foster +
0.000*licentious
Sample CA Bills Containing “Weak” Topics
Bill Status Bill Session, ID (Link) Topic # All Topics
Passed
2011-2012-0 SB1219 

Recycling Plastic Bags
28 28, 45, 48, 57
Passed
2013-2014-0 AB1405

Subversive Organization
Registration Law: repeal
92
6, 11, 20, 37, 48, 51,
66
Passed
2011-2012-0 AB220

Interstate Compact for Juveniles.
76 48, 51, 63, 76
Passed
2009-2010-0 AB863

Public utilities: municipal districts:
civil service exemptions.
62 11, 30, 48, 51, 62
Next Steps for Legislative Predictions
? Add time context for bills in terms of legislative session,
chamber, and major political events
? Adding features about the bill, sponsors, districts,
political context, duration, committees, public comments
? Include exploratory data analysis from bill and legislator
data
? Tune model to apply predictions to current bills
Relevant Citations
? Gensim: Topic Modeling for Humans by Radium Hurek, open
source python package
? Wallach, H. M. (n.d.). Topic Modeling: Beyond Bag-of-Words.
Retrieved from poster link
? Gerrish, S. M., & Blei, D. M. (2011). Predicting legislative roll
calls from text. In Proc. of ICML. Retrieved from article link
? Rong, X. (2016, June 5). Word2vec Parameter Learning
Explained. doi:arXiv:1411.2738 [cs.CL]: article link
? Unsplash for stock photo
Thank you !
!
!
@DataThinker
WhenThereIsData.com

More Related Content

Recently uploaded (20)

Class 3-Workforce profile updated P.pptx
Class 3-Workforce profile updated P.pptxClass 3-Workforce profile updated P.pptx
Class 3-Workforce profile updated P.pptx
angelananalucky
?
april 2024 paper 2 ms. english non fiction
april 2024 paper 2 ms. english non fictionapril 2024 paper 2 ms. english non fiction
april 2024 paper 2 ms. english non fiction
omokoredeolasunbomi
?
Introduction to Java Programming for High School by 狠狠撸sgo.pptx
Introduction to Java Programming for High School by 狠狠撸sgo.pptxIntroduction to Java Programming for High School by 狠狠撸sgo.pptx
Introduction to Java Programming for High School by 狠狠撸sgo.pptx
mirhuzaifahali
?
CH. 4.pptxt and I will be there in about
CH. 4.pptxt and I will be there in aboutCH. 4.pptxt and I will be there in about
CH. 4.pptxt and I will be there in about
miesoabdela57
?
Introduction Lecture 01 Data Science.pdf
Introduction Lecture 01 Data Science.pdfIntroduction Lecture 01 Data Science.pdf
Introduction Lecture 01 Data Science.pdf
messagetome133
?
CloudMonitor - Architecture Audit Review February 2025.pdf
CloudMonitor - Architecture Audit Review February 2025.pdfCloudMonitor - Architecture Audit Review February 2025.pdf
CloudMonitor - Architecture Audit Review February 2025.pdf
Rodney Joyce
?
MTC Supply Chain Management Strategy.pptx
MTC Supply Chain Management Strategy.pptxMTC Supply Chain Management Strategy.pptx
MTC Supply Chain Management Strategy.pptx
Rakshit Porwal
?
data mining tools.pptxvdvjdggmgmgelmgleg
data mining tools.pptxvdvjdggmgmgelmglegdata mining tools.pptxvdvjdggmgmgelmgleg
data mining tools.pptxvdvjdggmgmgelmgleg
1052LaxmanrajS
?
643663189-Q4W3-Synthesize-Information-1-pptx.pptx
643663189-Q4W3-Synthesize-Information-1-pptx.pptx643663189-Q4W3-Synthesize-Information-1-pptx.pptx
643663189-Q4W3-Synthesize-Information-1-pptx.pptx
rossanthonytan130
?
stages-of-moral-development-lawrence-kohlberg-pdf-free.pdf
stages-of-moral-development-lawrence-kohlberg-pdf-free.pdfstages-of-moral-development-lawrence-kohlberg-pdf-free.pdf
stages-of-moral-development-lawrence-kohlberg-pdf-free.pdf
esguerramark1991
?
Stasiun kernel pabrik kelapa sawit indonesia
Stasiun kernel pabrik kelapa sawit indonesiaStasiun kernel pabrik kelapa sawit indonesia
Stasiun kernel pabrik kelapa sawit indonesia
fikrimanurung1
?
Monitoring Imam Ririn di Pilkada Kota Depok 2024
Monitoring Imam Ririn di Pilkada Kota Depok 2024Monitoring Imam Ririn di Pilkada Kota Depok 2024
Monitoring Imam Ririn di Pilkada Kota Depok 2024
Deddy Rahman
?
原版复刻加拿大多伦多大学成绩单(UTSG毕业证书) 文凭
原版复刻加拿大多伦多大学成绩单(UTSG毕业证书) 文凭原版复刻加拿大多伦多大学成绩单(UTSG毕业证书) 文凭
原版复刻加拿大多伦多大学成绩单(UTSG毕业证书) 文凭
taqyed
?
exampleexampleexampleexampleexampleexampleexampleexample
exampleexampleexampleexampleexampleexampleexampleexampleexampleexampleexampleexampleexampleexampleexampleexample
exampleexampleexampleexampleexampleexampleexampleexample
lembiczkat
?
Design Data Model Objects for Analytics, Activation, and AI
Design Data Model Objects for Analytics, Activation, and AIDesign Data Model Objects for Analytics, Activation, and AI
Design Data Model Objects for Analytics, Activation, and AI
aaronmwinters
?
Data-Models-in-DBMS-An-Overview.pptx.pptx
Data-Models-in-DBMS-An-Overview.pptx.pptxData-Models-in-DBMS-An-Overview.pptx.pptx
Data-Models-in-DBMS-An-Overview.pptx.pptx
hfebxtveyjxavhx
?
The Role of Christopher Campos Orlando in Sustainability Analytics
The Role of Christopher Campos Orlando in Sustainability AnalyticsThe Role of Christopher Campos Orlando in Sustainability Analytics
The Role of Christopher Campos Orlando in Sustainability Analytics
christophercamposus1
?
IFRS Finance Powerpoint ppt Finance D.pptx
IFRS Finance Powerpoint  ppt Finance D.pptxIFRS Finance Powerpoint  ppt Finance D.pptx
IFRS Finance Powerpoint ppt Finance D.pptx
amantiwari2091
?
加拿大成绩单购买原版(鲍颁毕业证书)卡尔加里大学毕业证文凭
加拿大成绩单购买原版(鲍颁毕业证书)卡尔加里大学毕业证文凭加拿大成绩单购买原版(鲍颁毕业证书)卡尔加里大学毕业证文凭
加拿大成绩单购买原版(鲍颁毕业证书)卡尔加里大学毕业证文凭
taqyed
?
"MIAO Ecosystem Financial Management PPT
"MIAO Ecosystem Financial Management PPT"MIAO Ecosystem Financial Management PPT
"MIAO Ecosystem Financial Management PPT
miao22
?
Class 3-Workforce profile updated P.pptx
Class 3-Workforce profile updated P.pptxClass 3-Workforce profile updated P.pptx
Class 3-Workforce profile updated P.pptx
angelananalucky
?
april 2024 paper 2 ms. english non fiction
april 2024 paper 2 ms. english non fictionapril 2024 paper 2 ms. english non fiction
april 2024 paper 2 ms. english non fiction
omokoredeolasunbomi
?
Introduction to Java Programming for High School by 狠狠撸sgo.pptx
Introduction to Java Programming for High School by 狠狠撸sgo.pptxIntroduction to Java Programming for High School by 狠狠撸sgo.pptx
Introduction to Java Programming for High School by 狠狠撸sgo.pptx
mirhuzaifahali
?
CH. 4.pptxt and I will be there in about
CH. 4.pptxt and I will be there in aboutCH. 4.pptxt and I will be there in about
CH. 4.pptxt and I will be there in about
miesoabdela57
?
Introduction Lecture 01 Data Science.pdf
Introduction Lecture 01 Data Science.pdfIntroduction Lecture 01 Data Science.pdf
Introduction Lecture 01 Data Science.pdf
messagetome133
?
CloudMonitor - Architecture Audit Review February 2025.pdf
CloudMonitor - Architecture Audit Review February 2025.pdfCloudMonitor - Architecture Audit Review February 2025.pdf
CloudMonitor - Architecture Audit Review February 2025.pdf
Rodney Joyce
?
MTC Supply Chain Management Strategy.pptx
MTC Supply Chain Management Strategy.pptxMTC Supply Chain Management Strategy.pptx
MTC Supply Chain Management Strategy.pptx
Rakshit Porwal
?
data mining tools.pptxvdvjdggmgmgelmgleg
data mining tools.pptxvdvjdggmgmgelmglegdata mining tools.pptxvdvjdggmgmgelmgleg
data mining tools.pptxvdvjdggmgmgelmgleg
1052LaxmanrajS
?
643663189-Q4W3-Synthesize-Information-1-pptx.pptx
643663189-Q4W3-Synthesize-Information-1-pptx.pptx643663189-Q4W3-Synthesize-Information-1-pptx.pptx
643663189-Q4W3-Synthesize-Information-1-pptx.pptx
rossanthonytan130
?
stages-of-moral-development-lawrence-kohlberg-pdf-free.pdf
stages-of-moral-development-lawrence-kohlberg-pdf-free.pdfstages-of-moral-development-lawrence-kohlberg-pdf-free.pdf
stages-of-moral-development-lawrence-kohlberg-pdf-free.pdf
esguerramark1991
?
Stasiun kernel pabrik kelapa sawit indonesia
Stasiun kernel pabrik kelapa sawit indonesiaStasiun kernel pabrik kelapa sawit indonesia
Stasiun kernel pabrik kelapa sawit indonesia
fikrimanurung1
?
Monitoring Imam Ririn di Pilkada Kota Depok 2024
Monitoring Imam Ririn di Pilkada Kota Depok 2024Monitoring Imam Ririn di Pilkada Kota Depok 2024
Monitoring Imam Ririn di Pilkada Kota Depok 2024
Deddy Rahman
?
原版复刻加拿大多伦多大学成绩单(UTSG毕业证书) 文凭
原版复刻加拿大多伦多大学成绩单(UTSG毕业证书) 文凭原版复刻加拿大多伦多大学成绩单(UTSG毕业证书) 文凭
原版复刻加拿大多伦多大学成绩单(UTSG毕业证书) 文凭
taqyed
?
exampleexampleexampleexampleexampleexampleexampleexample
exampleexampleexampleexampleexampleexampleexampleexampleexampleexampleexampleexampleexampleexampleexampleexample
exampleexampleexampleexampleexampleexampleexampleexample
lembiczkat
?
Design Data Model Objects for Analytics, Activation, and AI
Design Data Model Objects for Analytics, Activation, and AIDesign Data Model Objects for Analytics, Activation, and AI
Design Data Model Objects for Analytics, Activation, and AI
aaronmwinters
?
Data-Models-in-DBMS-An-Overview.pptx.pptx
Data-Models-in-DBMS-An-Overview.pptx.pptxData-Models-in-DBMS-An-Overview.pptx.pptx
Data-Models-in-DBMS-An-Overview.pptx.pptx
hfebxtveyjxavhx
?
The Role of Christopher Campos Orlando in Sustainability Analytics
The Role of Christopher Campos Orlando in Sustainability AnalyticsThe Role of Christopher Campos Orlando in Sustainability Analytics
The Role of Christopher Campos Orlando in Sustainability Analytics
christophercamposus1
?
IFRS Finance Powerpoint ppt Finance D.pptx
IFRS Finance Powerpoint  ppt Finance D.pptxIFRS Finance Powerpoint  ppt Finance D.pptx
IFRS Finance Powerpoint ppt Finance D.pptx
amantiwari2091
?
加拿大成绩单购买原版(鲍颁毕业证书)卡尔加里大学毕业证文凭
加拿大成绩单购买原版(鲍颁毕业证书)卡尔加里大学毕业证文凭加拿大成绩单购买原版(鲍颁毕业证书)卡尔加里大学毕业证文凭
加拿大成绩单购买原版(鲍颁毕业证书)卡尔加里大学毕业证文凭
taqyed
?
"MIAO Ecosystem Financial Management PPT
"MIAO Ecosystem Financial Management PPT"MIAO Ecosystem Financial Management PPT
"MIAO Ecosystem Financial Management PPT
miao22
?

Featured (20)

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
Expeed Software
?
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
Pixeldarts
?
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
?
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
marketingartwork
?
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
Skeleton Technologies
?
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
?
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
?
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
?
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
?
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
?
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
?
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
?
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
?
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
?
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
?
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Christy Abraham Joy
?
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky
?
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius
?
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson36
?
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Applitools
?
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
Expeed Software
?
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
Pixeldarts
?
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
?
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
marketingartwork
?
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
?
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
?
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
?
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
?
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
?
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
?
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
?
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
?
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky
?
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius
?
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson36
?
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Applitools
?

Data Science: Predict Bill Passage with Topics Only

  • 1. Data Science: Predict Success of Legislation with Topics Only Natural Language Processing with Sunlight Foundation Open States API ! Pauline Chow Fall 2016
  • 2. What policies and laws relate to your well being? When I ?rst asked this question I was working and interested in transportation policy, especially walking and bicycling
  • 3. ? Increase transparency of the various levels of decision making - federal, state, and local ? Effectively understand trends in public policy, in order to educate and in?uence ? Distill legislative process into logical system of features ? Extract and identify relationships of decision makers with communities, topics, and laws Why Does Analyzing Elements of Successful Legislation Important?
  • 4. Data Science Steps for Predicting Legislative Results 1. Collect Data from Sunlight Foundation API and other open data sources 2. Clean text from legislative bills via web scraping, including removing html, stop words, target variable (i.e. bill passage) 3. Extract features from text in python 4. Build topics from text using Latent Dirchlet Allocation (LDA), probabilistic approach 5. Implement supervised learning models 6. Analyze results
  • 5. 1. Collect: As the initial step to building predictive models, insights re?ect features from California bills text between 2009-2014
  • 7. 3. How to Extract Features from Text? Sick of Having to go 2 di?erent hut buy pizza sunglass 1 1 1 2 1 1 1 1 1 1 1
  • 8. 4. Build: What is the Latent Dirchlet Allocation (LDA) Topic Model? ? Finds hidden semantic structure, aka context, where topics are cluster of similar words: P(word | context) ? Each document is a mixture of topics, words and phrases, which are split into probabilities ? Tune parameters: # of words in each topic, mixture within each topic, threshold for frequency and probability ? For example: Topic A (1,2,5): breakfast 30%, pizza 10%, smoothie 5% 1 Sick of having to go to two huts for pizza and sunglasses 2 I ate a cold pizza and spinach smoothie for breakfast. 3 I wear my sunglasses at night so I can see 4 Sometimes I get really sick when I go on roller coasters 5 Co?ee only for breakfast because co?ee is for closers
  • 9. 4. Build Word2Vec: Find Similarities ? Extract relationships in unstructured text ? Leverage context of documents and LDA’s probabilistic models ? Hierarchical structure of probabilities ? Derive meaning from cleaned vector of words and phrases
  • 10. 5. Implement Logistic Regression: CA Bills Over Time 1. Model predicts failure better than successful legislation 2. Model with 50 versus 100 topics predictive results did not differ signi?cantly 3. Precision (TP / TP + FP) 4. Recall (TP / TP + FN) Model Classi?cation Report: All Topics Over Time target precision recall f1- score support failed 0.70 0.98 0.82 3118 passed 0.47 0.04 0.07 1360 avg / total 0.63 0.69 0.59 4478
  • 11. 6. Analyze California Bills (100 Topic Models) ? Bills have an average of 6.57 number of topics, ranging from 2 - 16. ? Passage rate by topic ranged from 18% to 36%, averaging 28% for all bills in the database ? Most frequent topics of legislation relate to local government funding/taxes/leadership initiatives, health care, education, budget and taxes, and court system ? Highest and lowest passage rate topics are reviewed in the next few slides
  • 12. 6. Analyze: Distribution of Topics in California Bills Top 10 Topics by Frequency Topic #s Frequency topic 48 13569 topic 11 11838 topic 51 8024 topic 73 4913 topic 6 3675 topic 63 2782 topic 1 2615 topic 22 1879 topic 64 1726 topic 45 1663
  • 13. 6. What Topics Support Bill Passage? Rank Topic # Odds Ratio LDA Topics 1 70 453.981 0.023*tank + 0.019*underground + 0.015*transferor + 0.011*lie + 0.010*decennial + 0.008*storage + 0.008*cotenant + 0.008*stanford + 0.006*petroleum + 0.006*orphan 2 74 32.797 0.020*contribution + 0.014*calendar + 0.010*canyon + 0.009*lincoln + 0.009*shoulder + 0.007*stenographer + 0.006*in?ation + 0.005*dispatcher + 0.005*vine + 0.005*boyer 3 47 32.695 0.024*cemetery + 0.011*mexican + 0.010*interment + 0.007*salton + 0.006*elsinore + 0.006*tuberculosis + 0.005*burial + 0.004*bacteria + 0.004*creek + 0.004*coliform 4 42 28.312 0.008*hoover + 0.003*tricare + 0.002*shower + 0.002*crutch + 0.002*contractholders + 0.002*bath + 0.001*dme + 0.001*durable + 0.000*hcpcs + 0.000*wheelchair 5 21 25.041 0.021*wyland + 0.015*reorganization + 0.014*brown + 0.012*gordon + 0.010*presidential + 0.008*ford + 0.008*gerald + 0.008*battalion + 0.007*mitochondrial + 0.007*remembrance 6 71 9.608 0.064*andwhereas + 0.024*awareness + 0.020*week + 0.014*whereas + 0.013*violence + 0.012*woman + 0.010*disease + 0.010*resolution + 0.010*month + 0.009*furtherresolved 7 65 8.798 0.029*pipeline + 0.027*ronald + 0.022*sea + 0.013*coastal + 0.012*marine + 0.009*rise + 0.008*reagan + 0.008*thomas + 0.007*climate + 0.007*arctic 8 34 8.183 0.030*candidate + 0.011*teen + 0.011*precinct + 0.011*nomination + 0.011*poll + 0.009*freeway + 0.009*say + 0.009*dating + 0.008*teenager + 0.007*sca 9 58 5.086 0.022*autism + 0.014*nursing + 0.014*therapist + 0.013*mr + 0.013*calderon + 0.011*backpack + 0.009*credentialing + 0.008*therapy + 0.008*acupuncture + 0.008*marriage 10 44 4.899 0.021*scientist + 0.017*negrete + 0.016*fact?nding + 0.015*hepatitis + 0.015*mcleod + 0.011*maternity + 0.009*interdistrict + 0.009*knuckle + 0.008*liver + 0.005*infected 15 94 2.103 0.023*bicycle + 0.017*bus + 0.013*midwife + 0.011*deployed + 0.010*roadway + 0.009*smog + 0.008*schoolbus + 0.007*safer + 0.005*polluter + 0.005*overtaking
  • 14. Sample CA Bills Containing “Strong” Topics Bill Status Bill Session, ID (Link) Topic # All Topics Passed 2011-2012-0 AB291 Underground storage tanks: petroleum: charges. 70 11, 45, 48, 70 Passed 2013-2014-0 AB1286 Personal income tax: voluntary contributions: California Breast Cancer Research Fund 74 11, 45, 74 Passed 2009-2010-0 AB1969 Elsinore Valley Cemetery 47 1, 11, 47, 48 Passed 2011-2012-0 AB2488 Vehicles: buses: length limitations 94 1, 11, 48, 49, 73, 94
  • 15. 6. What Topics Have Weak Bill Passage? Rank Topic # Odds Ratio Topics -10 79 0.000498 0.057*emission + 0.050*greenhouse + 0.041*gas + 0.019*warming + 0.017*global + 0.016*climate + 0.014*air + 0.013*carbon + 0.013*reduction + 0.013*solution -9 92 0.000156 0.038*tra?cking + 0.012*duress + 0.010*menace + 0.010*?duciary + 0.010*ammunition + 0.009*chvez + 0.009*human + 0.008*achadjian + 0.008*wilk + 0.006*bigelow -8 76 0.000150 0.071*inmate + 0.065*parole + 0.026*parolee + 0.023*prison + 0.021*correction + 0.019*rehabilitation + 0.010*released + 0.009*recidivism + 0.008*reentry + 0.007*journalist -7 28 0.000113 0.036*bag + 0.024*plastic + 0.015*carryout + 0.011*positioning + 0.007*tends + 0.007*electorate + 0.006*store + 0.006*deliberately + 0.006*undetermined + 0.005*el -6 50 0.000063 0.010*romero + 0.009*antipsychotic + 0.008*medication + 0.006*dementia + 0.006*detachable + 0.005*salvage + 0.004*dietary + 0.004*psychotropic + 0.004*repurchase + 0.003*diminishes -5 24 0.000053 0.023*baby + 0.006*depression + 0.005*paratransit + 0.005*stewardship + 0.004*producer + 0.004*perinatal + 0.003*unwanted + 0.003*obstetrics + 0.003*sleep + 0.002*calhome -4 8 0.000035 0.054*gang + 0.013*immunity + 0.010*tort + 0.008*ri?e + 0.007*european + 0.007*magazine + 0.007*pervasive + 0.005*deadly + 0.005*mentally + 0.005*disordered -3 81 0.000013 0.014*interpreter + 0.007*excellence + 0.006*digitized + 0.005*reelected + 0.005*easy + 0.004*?uency + 0.004*biodegradable + 0.002*willfulness + 0.002*annoyance + 0.002*disincentive -2 10 0.000004 0.022*budget + 0.013*muratsuchi + 0.013*sawyer + 0.013*mullin + 0.012*bloom + 0.012*nazarian + 0.012*daly + 0.012*campos + 0.011*rodriguez + 0.010*dababneh -1 62 0.000001 0.005*consummated + 0.004*nonconsenting + 0.003*nonsupervisory + 0.001*peculiar + 0.001*culminating + 0.000*overdue + 0.000*reputation + 0.000*unimpeded + 0.000*foster + 0.000*licentious
  • 16. Sample CA Bills Containing “Weak” Topics Bill Status Bill Session, ID (Link) Topic # All Topics Passed 2011-2012-0 SB1219 Recycling Plastic Bags 28 28, 45, 48, 57 Passed 2013-2014-0 AB1405 Subversive Organization Registration Law: repeal 92 6, 11, 20, 37, 48, 51, 66 Passed 2011-2012-0 AB220 Interstate Compact for Juveniles. 76 48, 51, 63, 76 Passed 2009-2010-0 AB863 Public utilities: municipal districts: civil service exemptions. 62 11, 30, 48, 51, 62
  • 17. Next Steps for Legislative Predictions ? Add time context for bills in terms of legislative session, chamber, and major political events ? Adding features about the bill, sponsors, districts, political context, duration, committees, public comments ? Include exploratory data analysis from bill and legislator data ? Tune model to apply predictions to current bills
  • 18. Relevant Citations ? Gensim: Topic Modeling for Humans by Radium Hurek, open source python package ? Wallach, H. M. (n.d.). Topic Modeling: Beyond Bag-of-Words. Retrieved from poster link ? Gerrish, S. M., & Blei, D. M. (2011). Predicting legislative roll calls from text. In Proc. of ICML. Retrieved from article link ? Rong, X. (2016, June 5). Word2vec Parameter Learning Explained. doi:arXiv:1411.2738 [cs.CL]: article link ? Unsplash for stock photo