際際滷

際際滷Share a Scribd company logo
Accumulo @ Bloomberg
Accumulo Summit 2015
Skand Gupta
Bloomberg LP
Bloomberg
 Bloomberg technology helps drive the worlds financial markets
 We build our own software, digital platforms, mobile applications and state of the
art hardware
 We run one of the worlds largest private networks with over 20,000 routers across
our network
 We have the largest server side JavaScript deployment in the world  22 million
lines of JavaScript code
 We developed cloud computing and deployed software as a service well ahead
of the general marketplace
 Our technology, has brought transparency to the global financial markets
 Bloomberg technologists
 More than 3,000 software developers and designers located around the world
(London, NYC, SF tech hubs)
 BloombergLabs.com (@BloombergLabs) is our platform for dialogue between our
experts and the broader tech community
 Our clients
 Over 320,000 subscribers
 Primarily financial professionals including investment bankers, CFOs, investor
relations, hedge funds managers, foreign exchange, etc.
Source:	
 Wall	
 Street	
 Journal,	
 CFTC	
 ,	
 New	
 York	
 Times,	
 Marketplace.org
Source:	
 Wall	
 Street	
 Journal,	
 CFTC	
 ,	
 New	
 York	
 Times
Importance	
 of	
 Compliance
Source:	
 Commodity	
 Futures	
 Trading	
 Commission
Hiding	
 in	
 Plain	
 Sight
Compliance	
 Platform	
 and	
 Processing	
 Pipeline	
 
Chat
Reference
Data
Trade
Data
Customer
Data
Product
Data
Market
Data
Counterparty
Email
Social Media Voice
Human-足	
 and	
 Machine-足
generated	
 Data
Surveillance	
 
Pipeline
Communication	
 
Data
Transactional	
 
Data
User	
 
Data
Case	
 
Management
Compliance	
 Platform
Compliance	
 Storage
Compliance	
 
Officers
Search,	
 
Review,	
 
Analyze
HDFS
Spark
Kafka
Storm
Mesos	
 
(Cluster	
 Resource	
 Manager)
Elastic	
 data-足processing	
 and	
 analytics	
 stack
Open	
 REST	
 API	
 (Play)
WORM
Pre-足fabricated	
 Hardware
Applications
Need	
 for	
 a	
 robust,	
 scalable,	
 high	
 performance,	
 geo-足distributed	
 
data	
 storage	
 and	
 retrieval	
 system
 More	
 than	
 3	
 Peta	
 Bytes	
 of	
 archived	
 
data	
 
 80+	
 Billion	
 indexed	
 objects	
 
 Real-足time	
 scanning	
 of	
 35	
 million	
 
objects	
 per	
 day
100s	
 Gigabytes/year
Communication	
 Data	
 Growth Cumulative	
 Data	
 Growth
Over	
 3	
 Petabytes	
 today
$0.00
$0.75
$1.50
$2.25
$3.00
List Price Replication DR Isolation
$2.31
$1.15
$0.58
$0.19
Storing 1GB of Data
Storage	
 Cost
2000 2002 2004 2006 2008 2010 2012
Need	
 for	
 Low	
 Level	
 Security	
 Primitives
Document Level Security
Lorem	
 ipsum	
 dolor	
 sit	
 amet,	
 
consectetur	
 adipiscing	
 elit,	
 sed	
 do	
 
eiusmod	
 tempor	
 incididunt	
 ut	
 
labore	
 et	
 dolore	
 magna	
 aliqua.	
 Ut	
 
enim	
 ad	
 minim	
 veniam,	
 quis	
 
nostrud	
 exercitation	
 ullamco	
 laboris	
 
nisi	
 ut	
 aliquip	
 ex	
 ea	
 commodo	
 
consequat.	
 Duis	
 aute	
 irure	
 dolor	
 in	
 
reprehenderit	
 in	
 voluptate	
 velit	
 esse	
 
cillum	
 dolore	
 eu	
 fugiat	
 nulla	
 
pariatur.	
 Excepteur	
 sint	
 occaecat	
 
cupidatat	
 non	
 proident,	
 sunt	
 in	
 
culpa	
 qui	
 officia	
 deserunt	
 mollit	
 
anim	
 id	
 est	
 laborum
Company Level Security
Data StoreData Pipe Application
User Level Security
Data Store
Security	
 Solutions
 Post-process the queries	
 
 Too slow	
 
 Nasty bugs	
 
 Generate unique document for each view	
 
 Exponential growth in number of documents 	
 
 Use application specific features
 Solr dynamic fields, Mangled Fields	
 
 Accumulo Visibility
 Fast, Clean, Generic
Data	
 Model
Row ID Value
CompanyA_userX_20150426 <bytes>
CompanyA_userX_20150426 <bytes>
CompanyA_userX_20150427 <bytes>
CompanyA_userX_20150428 <bytes>
CompanyA_userY_20150427 <bytes>
CompanyB_userX_20150428 <bytes>
CompanyB_userX_20150428 <bytes>
CompanyB_userX_20150428 <bytes>
Find	
 all	
 Communications	
 for	
 a	
 Set	
 of	
 Users	
 for	
 a	
 Date	
 Range
Row ID Value
CompanyA_userX_20150426 <bytes>
CompanyA_userX_20150426 <bytes>
CompanyA_userX_20150427 <bytes>
CompanyA_userX_20150428 <bytes>
CompanyA_userY_20150427 <bytes>
CompanyB_userX_20150428 <bytes>
CompanyB_userX_20150428 <bytes>
CompanyB_userX_20150428 <bytes>
Batch
Scanner
Application
Find	
 all	
 Records	
 with	
 Libor
Filter
Row ID Value
CompanyA_userX_20150426 <bytes>
CompanyA_userX_20150426 <bytes>
CompanyA_userX_20150427 <bytes>
CompanyA_userX_20150428 <bytes>
CompanyA_userY_20150427 <bytes>
CompanyB_userX_20150428 <bytes>
CompanyB_userX_20150428 <bytes>
CompanyB_userX_20150428 <bytes>
Batch
Scanner
Application
Count	
 Number	
 of	
 Objects	
 that	
 Match	
 a	
 Filter
Counting
Iterator
Filter
Row ID Value
CompanyA_userX_20150426 <bytes>
CompanyA_userX_20150426 <bytes>
CompanyA_userX_20150427 <bytes>
CompanyA_userX_20150428 <bytes>
CompanyA_userY_20150427 <bytes>
CompanyB_userX_20150428 <bytes>
CompanyB_userX_20150428 <bytes>
CompanyB_userX_20150428 <bytes>
Batch
Scanner
Application
Scaling	
 Out
Application
Row ID Value
CompanyA_userX_20150426 <bytes>
CompanyA_userX_20150426 <bytes>
CompanyA_userX_20150427 <bytes>
CompanyA_userX_20150428 <bytes>
CompanyA_userY_20150427 <bytes>
CompanyB_userX_20150428 <bytes>
CompanyB_userX_20150428 <bytes>
CompanyB_userX_20150428 <bytes>
Counting
Iterator
Filter
Batch
Scanner
Counting
Iterator
Filter
Batch
Scanner
Counting
Iterator
Filter
Batch
Scanner
SparkProcessing
Low	
 Latency	
 Writes	
 using	
 Accumulo	
 File	
 System
RowID Family Qualifier Value
attach.pdf chunk 00001 <bytes>
attach.pdf chunk 00002 <bytes>
   
attach.pdf metadata file_size <file size>
attach.pdf metadata chunk_size <chunk size>
attach.pdf metadata sha256 <checksum>
WriteTimes(ms)
0 5 10 15 20
HDFS Accumulo File System
Conclusion
 Understand the data
 Free your data but enforce
access control
 Need sensible systems that help
achieve these goals
Thank You!
http://careers.bloomberg.com	
 
sgupta178@bloomberg.net
We Are Hiring!

More Related Content

accumulo summit 2015

  • 1. Accumulo @ Bloomberg Accumulo Summit 2015 Skand Gupta Bloomberg LP
  • 2. Bloomberg Bloomberg technology helps drive the worlds financial markets We build our own software, digital platforms, mobile applications and state of the art hardware We run one of the worlds largest private networks with over 20,000 routers across our network We have the largest server side JavaScript deployment in the world 22 million lines of JavaScript code We developed cloud computing and deployed software as a service well ahead of the general marketplace Our technology, has brought transparency to the global financial markets Bloomberg technologists More than 3,000 software developers and designers located around the world (London, NYC, SF tech hubs) BloombergLabs.com (@BloombergLabs) is our platform for dialogue between our experts and the broader tech community Our clients Over 320,000 subscribers Primarily financial professionals including investment bankers, CFOs, investor relations, hedge funds managers, foreign exchange, etc.
  • 3. Source: Wall Street Journal, CFTC , New York Times, Marketplace.org
  • 4. Source: Wall Street Journal, CFTC , New York Times Importance of Compliance
  • 5. Source: Commodity Futures Trading Commission Hiding in Plain Sight
  • 6. Compliance Platform and Processing Pipeline Chat Reference Data Trade Data Customer Data Product Data Market Data Counterparty Email Social Media Voice Human-足 and Machine-足 generated Data Surveillance Pipeline Communication Data Transactional Data User Data Case Management Compliance Platform Compliance Storage Compliance Officers Search, Review, Analyze
  • 7. HDFS Spark Kafka Storm Mesos (Cluster Resource Manager) Elastic data-足processing and analytics stack Open REST API (Play) WORM Pre-足fabricated Hardware Applications
  • 8. Need for a robust, scalable, high performance, geo-足distributed data storage and retrieval system More than 3 Peta Bytes of archived data 80+ Billion indexed objects Real-足time scanning of 35 million objects per day 100s Gigabytes/year Communication Data Growth Cumulative Data Growth Over 3 Petabytes today $0.00 $0.75 $1.50 $2.25 $3.00 List Price Replication DR Isolation $2.31 $1.15 $0.58 $0.19 Storing 1GB of Data Storage Cost 2000 2002 2004 2006 2008 2010 2012
  • 9. Need for Low Level Security Primitives Document Level Security Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum Company Level Security Data StoreData Pipe Application User Level Security Data Store
  • 10. Security Solutions Post-process the queries Too slow Nasty bugs Generate unique document for each view Exponential growth in number of documents Use application specific features Solr dynamic fields, Mangled Fields Accumulo Visibility Fast, Clean, Generic
  • 11. Data Model Row ID Value CompanyA_userX_20150426 <bytes> CompanyA_userX_20150426 <bytes> CompanyA_userX_20150427 <bytes> CompanyA_userX_20150428 <bytes> CompanyA_userY_20150427 <bytes> CompanyB_userX_20150428 <bytes> CompanyB_userX_20150428 <bytes> CompanyB_userX_20150428 <bytes>
  • 12. Find all Communications for a Set of Users for a Date Range Row ID Value CompanyA_userX_20150426 <bytes> CompanyA_userX_20150426 <bytes> CompanyA_userX_20150427 <bytes> CompanyA_userX_20150428 <bytes> CompanyA_userY_20150427 <bytes> CompanyB_userX_20150428 <bytes> CompanyB_userX_20150428 <bytes> CompanyB_userX_20150428 <bytes> Batch Scanner Application
  • 13. Find all Records with Libor Filter Row ID Value CompanyA_userX_20150426 <bytes> CompanyA_userX_20150426 <bytes> CompanyA_userX_20150427 <bytes> CompanyA_userX_20150428 <bytes> CompanyA_userY_20150427 <bytes> CompanyB_userX_20150428 <bytes> CompanyB_userX_20150428 <bytes> CompanyB_userX_20150428 <bytes> Batch Scanner Application
  • 14. Count Number of Objects that Match a Filter Counting Iterator Filter Row ID Value CompanyA_userX_20150426 <bytes> CompanyA_userX_20150426 <bytes> CompanyA_userX_20150427 <bytes> CompanyA_userX_20150428 <bytes> CompanyA_userY_20150427 <bytes> CompanyB_userX_20150428 <bytes> CompanyB_userX_20150428 <bytes> CompanyB_userX_20150428 <bytes> Batch Scanner Application
  • 15. Scaling Out Application Row ID Value CompanyA_userX_20150426 <bytes> CompanyA_userX_20150426 <bytes> CompanyA_userX_20150427 <bytes> CompanyA_userX_20150428 <bytes> CompanyA_userY_20150427 <bytes> CompanyB_userX_20150428 <bytes> CompanyB_userX_20150428 <bytes> CompanyB_userX_20150428 <bytes> Counting Iterator Filter Batch Scanner Counting Iterator Filter Batch Scanner Counting Iterator Filter Batch Scanner SparkProcessing
  • 16. Low Latency Writes using Accumulo File System RowID Family Qualifier Value attach.pdf chunk 00001 <bytes> attach.pdf chunk 00002 <bytes> attach.pdf metadata file_size <file size> attach.pdf metadata chunk_size <chunk size> attach.pdf metadata sha256 <checksum> WriteTimes(ms) 0 5 10 15 20 HDFS Accumulo File System
  • 17. Conclusion Understand the data Free your data but enforce access control Need sensible systems that help achieve these goals Thank You!