狠狠撸

狠狠撸Share a Scribd company logo
HOW	FINRA	KEEPS	THE	MARKETS	FAIR	
WITH	SPARK	ML
ELENA	BOIARSKAIA
LEAD	DATA	SCIENTIST	AT	FINRA
JANUARY	30,	2018
OVERVIEW
? FINRA	and	its	mission
? Types	of	problems
? FINRA’s	data
? Dynamic	Surveillance	Platform
? Spark	and	Spark	ML
? Problem	set	up
? Market	Manipulation	examples
? ML	solutions
? Lessons	learned
? Resources
FINRA	Technology
FINRA	– WHO	WE	ARE	
? Financial	Industry	Regulatory		Authority
? Independent,	non-governmental	regulator
? Oversees	broker-dealers	doing	business	with	the	public	in	the	United	States
Mission
? Promote	investor	protection	and	market	integrity
? Deter	misconduct	by	enforcing	rules
? Detect	and	prevent	wrongdoing	in	the	U.S.	markets
? Discipline	those	who	break	rules
FINRA	Technology
LIFECYCLE	OF	A	REGISTERED	REPRESENTATIVE	
? Get	hired	by	a	member	firm
? Pass	a	FINRA	exam	to	obtain	an	active	registration
? Maintain	the	registration	with	continuing	education
? Adhere	to	SEC	and	FINRA	rules
? Participate	in	market	activity	based	on	registration
MARKET	REGULATION
MEMBER	
REGULATION
REGISTRATION	
AND	
DISCLOSURE
FINRA	Technology
DATA	SCIENCE	AT	FINRA
Some	examples:
? Identify	potential	cheating	behavior	on	FINRA	exams
? Predict	and	assess	broker	risk
? Recognize	possible	market	manipulation
? Discover	anomalous	behavior	in	the	market
FINRA	Technology
FINRA’S	MARKET	DATA
? Monitor	up	to	75	billion	market	events	per	day
? Rebuild	picture	of	US	markets
? NASDQ,	NYSE,	BATS,	IEX…
? Cloud	migration	allows	to	focus	on	analysis
? Make	sense	of	the	data	with	machine	learning
FINRA	Technology
DYNAMIC	SURVEILLANCE	PLATFORM
? Keep	up	with
? Growth	and	volatility	of	market	volume
? Dynamic	evolution	of	exchanges
? Market	manipulator innovation
? Be	able	to
? Reconstruct	the	market	from	trillions	of	events
? Analyze	the	data	for	manipulative	patterns
? Find	the	needle	in	a	haystack
FINRA	Technology
MODEL	DEVELOPMENT	THE	TRADITIONAL	WAY
? Data	stored	in	Relational	Data	Base	or	Hive
? Get	sample	of	data	via	SQL	query
? Create	model	prototype	in	R	or	Python
? Data	engineers	translate	model	to	SQL	pattern
Engineer Data Design Prototype
Data	Sources Algorithms
Business
Review
Prototype
Whitepaper
Example CodeSQL Python/R
Research/prototype
Iterative
Pattern
Development
Business
Review
SQL
Implementation
Iterative
FINRA	Technology
DYNAMIC	SURVEILLANCE	PLATFORM
? Execution	Engine
? Apache	Spark	with	Databricks
? Languages
? Spark	supports	Scala,	Python,	R,	Java	and	SQL
? Feature	Framework
? All	data	manipulated	as	a	Spark	DataFrame
? Constructs	DataFrames of	desired	features
? Machine	Learning	Framework
? Machine	Learning	libraries	from	H2O	and	Spark	ML	
Spark
Scala,	Python,	R,	Java,	SQL
Feature	Framework
ML	Framework
Model
H2O
Data	Source DM
ML
Surveillance	
Platform
Model
Model
Model
Model
Model
Model
Model
FINRA	Technology
SPARK
? Open	source
? Distributed	computing	engine
? Infinite	parallelism
? Developed	in	AMPLab
? Response	to	MapReduce
FINRA	Technology
SPARK	ARCHITECTURE
? RDD
? Partition
? Transformation
? Lazy	evaluation
? Action
FINRA	Technology
SPARK	ML	PIPELINE
Load	
DataFrame
Transformer
OneHotEncoder
Scaler
Bucketizer
…
Evaluator
Binary	
Multiclass
CrossValidator
Estimator
Decision	Tree
Logistic	
Regression
Support	Vector	
Machine
Results	
DataFrame
FINRA	Technology
MODEL	DEVELOPMENT	AT	SCALE
Data	Engineering
? Combine	data	sources	to	form	features	from	raw	data
? Create		reusable	feature	functions
? 80%	of	effort	to	create	a	trained	model
Model	Selection
? Try	multiple	ML	algorithms
? Select	best	model	based	on	performance
? 20%	of	effort	to	create	a	trained	model
Scoring
? Score	new	data	with	trained	model
? Evaluate	model	results
? Provide	human	feedback
Data
Engineering
Model
Selection
New Data
Scoring Prediction
Trained
Model
Data	Sources Algorithms
Iterative
Featurized
Data
FINRA	Technology
MARKET	MANIPULATION
Historic	examples:
? Information	based
? Pump	and	Dump	
? South	Sea	Bubble	in	Amsterdam	1720
? Insider	trading	
? Order	based
? Marking
? Wash	trade
? Front	Running
? Layering
? Spoofing
FINRA	Technology
SUPERVISED	ML	SET	UP	QUESTIONS
? What	are	the	outcome	labels?
? What	does	each	row	represent?	
? What	features	help	describe	behavior	of	interest?
? Are	sampling	techniques	needed	for	unbalanced	labels?
? What	data	to	train	and	test	on?
? Which	algorithms	to	try?
? What	performance	metrics	to	use?
FINRA	Technology
MODEL	TRAINING	AND	SCORING
Raw Data
Featurized
Data (xi)
with
Labels (y)
ML
Algorithm
f(xi) = y
New
Featurized
Data (xi)
Trained
ML
Algorithm
Model
Prediction
f(xi)
Data	
preparation	
and	model	
training
Scoring	new	
data	with	
trained	model
FINRA	Technology
FINDING	MARKET	MANIPULATION
? Machine	learning	model	is	trained	to	identify	manipulative	behavior	
? Based	on	historic	cases	and	business	knowledge
? Data	needs	to	be	engineered	for	training	a	machine	learning	model
? Create	a	Training	Dataset	for	model	selection
? Add	features	that	help	describe	manipulative	behavior	in	the	market
? Select	best	model	for	production
? Review	model	output	with	business
? New	features	and	labels?
? Incorporate	business	feedback	into	next	models
FINRA	Technology
SELECTING	AN	APPROPRIATE	MODEL
? Identify	performance	metrics
? Reduce	false	negatives,	more	interested	in	false	positives
? Try	multiple	candidate	algorithms
? Spark	ML	and	H20
? Train	candidate	models	with	differing	parameters
? Examples:	features	used,	tree	depth,	size	of	forest,	amount	of	history	to	train	with
? Compare	Performance
? Select	Best	Model
FINRA	Technology
MODEL	FEEDBACK
? Feedback	loop	will	provide	“real	truth”
? Keep	a	library	of	these	labeled	cases
? Use	these	cases	for	testing	new	models
? Benchmark	model	performance	on	these	cases
? Monitor	changes	across	models
? Select	feedback	cases	appropriately
? Low	model	certainty
? On	the	cusp
? Representative	sample	for	metric	estimation
FINRA	Technology
LESSONS	LEARNED
? Relationship	with	business	is	key
? The	simplest	model	may	work	best	with	good	features
? Feedback	loop	specifics	are	not	straightforward
? Gauging	model	performance	for	big	data	is	not	easy
FINRA	Technology
RESOURCES
? http://www.finra.org
? https://databricks.com/try-databricks
? https://spark.apache.org/docs/2.2.0/ml-guide.html
? http://docs.h2o.ai/h2o/latest-stable/index.html
FINRA	Technology
THANK	YOU!
QUESTIONS?
Ad

Recommended

PPTX
Data Management, Analytics, and AI at Scale to Protect Securities Markets wit...
Databricks
?
PPTX
An Executive Guide on How to Use Machine Learning and AI for AML Compliance
Alessa
?
PDF
FINRA at Work
mariposaazur
?
PDF
Market Abuse Detection
Raja Das
?
PDF
Splunk for AWS (Bagels and Bytes)
Dominique Dessy
?
PDF
AI and Data Science in Trading Program v76
Justin Tadman
?
PDF
Fintech summit 2016 thomson reuters tim baker_presentation final
Glen Frost
?
PPTX
FINRA's Record-Breaking Sanctions of 2015
Smarsh
?
PDF
FINRA EXAMINATIONS
Daniel Connor
?
PDF
Market Surveillance
Raja Das
?
PDF
Machine learning for factor investing
QuantUniversity
?
PPTX
Seminar.pptx
Ankush84837
?
PDF
Presto Summit 2018 - 08 - FINRA
kbajda
?
PDF
5 AI Solutions Every Chief Risk Officer Needs
Alisa Karybina
?
PDF
The 10 most trusted fraud detection solution providers 2019
Insights success media and technology pvt ltd
?
PDF
Quant trading with artificial intelligence
Roger Lee, CFA
?
PDF
AXA x DSSG Meetup Sharing (Feb 2016)
Eugene Yan Ziyou
?
PPTX
Transforming Insurance Analytics with Big Data and Automated Machine Learning?
Cloudera, Inc.
?
PDF
Re-orienting your business around data
Dani Solà Lagares
?
PPTX
Crafting-Research-Recommendations Grade 12.pptx
DaryllWhere
?
PPTX
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
?
DOCX
Artigo - Playing to Win.planejamento docx
KellyXavier15
?
PPTX
最新版意大利米兰大学毕业证(鲍狈滨惭滨毕业证书)原版定制
taqyea
?
PPTX
英国毕业证范本利物浦约翰摩尔斯大学成绩单底纹防伪尝闯惭鲍学生证办理学历认证
taqyed
?
PPTX
Attendance Presentation Project Excel.pptx
s2025266191
?
PDF
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
Tamanna36
?
PDF
11_L2_Defects_and_Trouble_Shooting_2014[1].pdf
gun3awan88
?
PPTX
NASA ESE Study Results v4 05.29.2020.pptx
CiroAlejandroCamacho
?
DOCX
The Influence off Flexible Work Policies
sales480687
?
PPTX
25 items quiz for practical research 1 in grade 11
leamaydayaganon81
?

More Related Content

Similar to How FINRA Keeps the Markets Fair with Spark ML (11)

PDF
FINRA EXAMINATIONS
Daniel Connor
?
PDF
Market Surveillance
Raja Das
?
PDF
Machine learning for factor investing
QuantUniversity
?
PPTX
Seminar.pptx
Ankush84837
?
PDF
Presto Summit 2018 - 08 - FINRA
kbajda
?
PDF
5 AI Solutions Every Chief Risk Officer Needs
Alisa Karybina
?
PDF
The 10 most trusted fraud detection solution providers 2019
Insights success media and technology pvt ltd
?
PDF
Quant trading with artificial intelligence
Roger Lee, CFA
?
PDF
AXA x DSSG Meetup Sharing (Feb 2016)
Eugene Yan Ziyou
?
PPTX
Transforming Insurance Analytics with Big Data and Automated Machine Learning?
Cloudera, Inc.
?
PDF
Re-orienting your business around data
Dani Solà Lagares
?
FINRA EXAMINATIONS
Daniel Connor
?
Market Surveillance
Raja Das
?
Machine learning for factor investing
QuantUniversity
?
Seminar.pptx
Ankush84837
?
Presto Summit 2018 - 08 - FINRA
kbajda
?
5 AI Solutions Every Chief Risk Officer Needs
Alisa Karybina
?
The 10 most trusted fraud detection solution providers 2019
Insights success media and technology pvt ltd
?
Quant trading with artificial intelligence
Roger Lee, CFA
?
AXA x DSSG Meetup Sharing (Feb 2016)
Eugene Yan Ziyou
?
Transforming Insurance Analytics with Big Data and Automated Machine Learning?
Cloudera, Inc.
?
Re-orienting your business around data
Dani Solà Lagares
?

Recently uploaded (20)

PPTX
Crafting-Research-Recommendations Grade 12.pptx
DaryllWhere
?
PPTX
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
?
DOCX
Artigo - Playing to Win.planejamento docx
KellyXavier15
?
PPTX
最新版意大利米兰大学毕业证(鲍狈滨惭滨毕业证书)原版定制
taqyea
?
PPTX
英国毕业证范本利物浦约翰摩尔斯大学成绩单底纹防伪尝闯惭鲍学生证办理学历认证
taqyed
?
PPTX
Attendance Presentation Project Excel.pptx
s2025266191
?
PDF
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
Tamanna36
?
PDF
11_L2_Defects_and_Trouble_Shooting_2014[1].pdf
gun3awan88
?
PPTX
NASA ESE Study Results v4 05.29.2020.pptx
CiroAlejandroCamacho
?
DOCX
The Influence off Flexible Work Policies
sales480687
?
PPTX
25 items quiz for practical research 1 in grade 11
leamaydayaganon81
?
PDF
Predicting Titanic Survival Presentation
praxyfarhana
?
PPT
Camuflaje Tipos Características Militar 2025.ppt
e58650738
?
PPTX
一比一原版(罢鲍颁毕业证书)开姆尼茨工业大学毕业证如何办理
taqyed
?
PDF
Residential Zone 4 for industrial village
MdYasinArafat13
?
PPTX
UPS and Big Data intro to Business Analytics.pptx
sanjum5582
?
PDF
presentation4.pdf Intro to mcmc methodss
SergeyTsygankov6
?
DOCX
Starbucks in the Indian market through its joint venture.
sales480687
?
PPT
Reliability Monitoring of Aircrfat commerce
Rizk2
?
PPTX
最新版美国佐治亚大学毕业证(鲍骋础毕业证书)原版定制
Taqyea
?
Crafting-Research-Recommendations Grade 12.pptx
DaryllWhere
?
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
?
Artigo - Playing to Win.planejamento docx
KellyXavier15
?
最新版意大利米兰大学毕业证(鲍狈滨惭滨毕业证书)原版定制
taqyea
?
英国毕业证范本利物浦约翰摩尔斯大学成绩单底纹防伪尝闯惭鲍学生证办理学历认证
taqyed
?
Attendance Presentation Project Excel.pptx
s2025266191
?
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
Tamanna36
?
11_L2_Defects_and_Trouble_Shooting_2014[1].pdf
gun3awan88
?
NASA ESE Study Results v4 05.29.2020.pptx
CiroAlejandroCamacho
?
The Influence off Flexible Work Policies
sales480687
?
25 items quiz for practical research 1 in grade 11
leamaydayaganon81
?
Predicting Titanic Survival Presentation
praxyfarhana
?
Camuflaje Tipos Características Militar 2025.ppt
e58650738
?
一比一原版(罢鲍颁毕业证书)开姆尼茨工业大学毕业证如何办理
taqyed
?
Residential Zone 4 for industrial village
MdYasinArafat13
?
UPS and Big Data intro to Business Analytics.pptx
sanjum5582
?
presentation4.pdf Intro to mcmc methodss
SergeyTsygankov6
?
Starbucks in the Indian market through its joint venture.
sales480687
?
Reliability Monitoring of Aircrfat commerce
Rizk2
?
最新版美国佐治亚大学毕业证(鲍骋础毕业证书)原版定制
Taqyea
?
Ad

How FINRA Keeps the Markets Fair with Spark ML