際際滷

際際滷Share a Scribd company logo
+
Hadoop Security Landscape
Sujee Maniyam
Founder / Principal
http://elephantscale.com/
sujee@ElephantScale.com
+
Approach to Security in Hadoop
Until Recently
+
But Security Picture Has Improved
Rapidly
n Lot of work going on in the eco system
n Hadoop vendors (Cloudera / HortonWorks ..) have been
very actively working on security features
n the core features are in
n Ease of use improving as well
+
What Does It Mean to be Secure?
n 1) Control who can get in?
n 2) Verify the persons identity
n 3) safeguard communications with user
n 4) What is allowed for this user
n 5) Audit / log access
n 6) Secure NOSQL
n 7) And finally
Protect data at rest
+
1) Who can get in
n Control which machines can connect to NoSQL cluster
n Dont expose the cluster to public
n Too many open ports
n Too vulnerable
n Solutions:
n Run cluster behind firewall
n Restrict which machines
can connect to cluster
n Linux / Network level security
n Outside the actual NoSQL
+
Trusted Environment
+
Apache Knox Gateway
+
2) User Authentication
n How can we verify the user?
n Username / password (gmail)
n Or use a third person (referee)
n Kerberos
Source : http://1.bp.blogspot.com/
Wolf : Knock
Knock
Wolf : Its me,
little piggy
Pig :who is it?
+
Kerberos : Quick Primer
n Kerberos is a authentication protocol for networked
machines
n Validates client to server and vice-versa
n Strong crypto algorithms (AES, 3DES)
+
Kerberos Protocol for Getting a
Beer in a Carnival / Fair J_
+
Kerberos Protocol Explained :
Getting Beer @ Fair / Party
n Prove your age (identity) to wrist-band issuer
n Ticket Granting Ticket
n Get a wristband  qualifies you to get beer
n Service Ticket
n Go to bartender and ask for beer using your wrist-band
n Service Request
n Get Beer ! J
n For technically correct explanation see :
http://www.roguelynn.com/words/explain-like-im-5-
kerberos/
+
3) Secure Client Communication
n Guard client / server communication (on the wire)
n Done by using SASL (certificates)
n Prevents snooping by third parties
+
4) What Is Allowed For This User?
n In unsecured environment users can read / write to any table
n  not very secure!
n Control which data users can see..
+
5) Audit logging
n See what is going on
USER : tim, resource = hdfs:/data/logs , type = read,
time=.
USER : tim, resource = hive:click_logs , type = read, query =
select *.
+
6) Secure NOSQL
n NoSQL solutions :
n On Hadoop : Hbase, Accumulo
n Other : Cassandra, 
n Access control
n Table level access : can I read / can I write-update-insert ?
n Within a table, column level access
n Who can read column social_security_number ?
+
Accumulo : Quick Intro
n Developed by the National Security Agency (NSA) !
n Google Big Table implementation
n Nosql store on top of HDFS
n Security is a first grade concept
HDFS
Accumulo
+
Accumulo Data Model
Family : info
Columns  name email Last 4 ssn Ssn Gmail
password
Visibility
tokens 
Level 1 Level 1 Level 1 Level 2
OR
Top
clearance
Top
clearance
≒ Every thing in HBase data model
≒ Plus each row has a Visibility Token
+
Users Are Assigned Visibility
Tokens
User id Visibility levels
User 1 Level 1
User 2 Level 1 + Level 2
Edward Snowden Level 1 + Level 2 + Top
Clearance
+
Accumulo only returns cells visible
to user
family
Columns  name email Last 4 SSN Full SSN Gmail
password
person1 Joe joe@gma
il.com
6789 123-45-67
89
JoeSuper
Man!
Visibility
tokens 
Level 1 Level 1 Level 1 Level 2
OR
Top
clearance
Top
clearance
+
What Users Can See
User Visibility Privilage Visible Cells
User 1 Level 1 Name
Email
Last 4 ssn
User 2 Level 1 +
Level 2
Name
Email
Last 4 SSN
Full SSN
Edward Snowden Level 1 +
Level 2 +
Top Clearance
Name
Email
Last 4 SSN
Full SSN
Gmail Password
+
6) Final Step : Encrypt Data At Rest
n Eventually data ends up in disk
n We need to protect the raw data on disk
n To prevent
n Users going to disk directly
n Theft of hardware
+
Transparent Encryption
+
OK, so where are we
Project /
Solution
Purpose Status Vendor
kerberos Identity
management
Available neutral
Knox Secure gateway Hortonworks
CLoudera ?
Sentry Access control incubating Cloudera
Ranger
(similar to
Sentry)
Access control
+ Audit
In development
(HDP 2.2)
(originally XA
secure)
Hortonworks
Rhino Secure HDFS
data at rest
Available from
Hadoop 2.6
Neutral
(originally from
Intel)
Accumulo Secure nosql Available neutral
+
Future.
n Really need a unified standard (no fragmentation)
n Ease of use
n Easy to setup policies
n Integrate with outside systems
n Easy audit tools
+
Thanks! & Questions?
Sujee Maniyam
Founder / Principal
http://elephantscale.com/
sujee@ElephantScale.com

More Related Content

Hadoop security landscape

  • 1. + Hadoop Security Landscape Sujee Maniyam Founder / Principal http://elephantscale.com/ sujee@ElephantScale.com
  • 2. + Approach to Security in Hadoop Until Recently
  • 3. + But Security Picture Has Improved Rapidly n Lot of work going on in the eco system n Hadoop vendors (Cloudera / HortonWorks ..) have been very actively working on security features n the core features are in n Ease of use improving as well
  • 4. + What Does It Mean to be Secure? n 1) Control who can get in? n 2) Verify the persons identity n 3) safeguard communications with user n 4) What is allowed for this user n 5) Audit / log access n 6) Secure NOSQL n 7) And finally Protect data at rest
  • 5. + 1) Who can get in n Control which machines can connect to NoSQL cluster n Dont expose the cluster to public n Too many open ports n Too vulnerable n Solutions: n Run cluster behind firewall n Restrict which machines can connect to cluster n Linux / Network level security n Outside the actual NoSQL
  • 8. + 2) User Authentication n How can we verify the user? n Username / password (gmail) n Or use a third person (referee) n Kerberos Source : http://1.bp.blogspot.com/ Wolf : Knock Knock Wolf : Its me, little piggy Pig :who is it?
  • 9. + Kerberos : Quick Primer n Kerberos is a authentication protocol for networked machines n Validates client to server and vice-versa n Strong crypto algorithms (AES, 3DES)
  • 10. + Kerberos Protocol for Getting a Beer in a Carnival / Fair J_
  • 11. + Kerberos Protocol Explained : Getting Beer @ Fair / Party n Prove your age (identity) to wrist-band issuer n Ticket Granting Ticket n Get a wristband qualifies you to get beer n Service Ticket n Go to bartender and ask for beer using your wrist-band n Service Request n Get Beer ! J n For technically correct explanation see : http://www.roguelynn.com/words/explain-like-im-5- kerberos/
  • 12. + 3) Secure Client Communication n Guard client / server communication (on the wire) n Done by using SASL (certificates) n Prevents snooping by third parties
  • 13. + 4) What Is Allowed For This User? n In unsecured environment users can read / write to any table n not very secure! n Control which data users can see..
  • 14. + 5) Audit logging n See what is going on USER : tim, resource = hdfs:/data/logs , type = read, time=. USER : tim, resource = hive:click_logs , type = read, query = select *.
  • 15. + 6) Secure NOSQL n NoSQL solutions : n On Hadoop : Hbase, Accumulo n Other : Cassandra, n Access control n Table level access : can I read / can I write-update-insert ? n Within a table, column level access n Who can read column social_security_number ?
  • 16. + Accumulo : Quick Intro n Developed by the National Security Agency (NSA) ! n Google Big Table implementation n Nosql store on top of HDFS n Security is a first grade concept HDFS Accumulo
  • 17. + Accumulo Data Model Family : info Columns name email Last 4 ssn Ssn Gmail password Visibility tokens Level 1 Level 1 Level 1 Level 2 OR Top clearance Top clearance ≒ Every thing in HBase data model ≒ Plus each row has a Visibility Token
  • 18. + Users Are Assigned Visibility Tokens User id Visibility levels User 1 Level 1 User 2 Level 1 + Level 2 Edward Snowden Level 1 + Level 2 + Top Clearance
  • 19. + Accumulo only returns cells visible to user family Columns name email Last 4 SSN Full SSN Gmail password person1 Joe joe@gma il.com 6789 123-45-67 89 JoeSuper Man! Visibility tokens Level 1 Level 1 Level 1 Level 2 OR Top clearance Top clearance
  • 20. + What Users Can See User Visibility Privilage Visible Cells User 1 Level 1 Name Email Last 4 ssn User 2 Level 1 + Level 2 Name Email Last 4 SSN Full SSN Edward Snowden Level 1 + Level 2 + Top Clearance Name Email Last 4 SSN Full SSN Gmail Password
  • 21. + 6) Final Step : Encrypt Data At Rest n Eventually data ends up in disk n We need to protect the raw data on disk n To prevent n Users going to disk directly n Theft of hardware
  • 23. + OK, so where are we Project / Solution Purpose Status Vendor kerberos Identity management Available neutral Knox Secure gateway Hortonworks CLoudera ? Sentry Access control incubating Cloudera Ranger (similar to Sentry) Access control + Audit In development (HDP 2.2) (originally XA secure) Hortonworks Rhino Secure HDFS data at rest Available from Hadoop 2.6 Neutral (originally from Intel) Accumulo Secure nosql Available neutral
  • 24. + Future. n Really need a unified standard (no fragmentation) n Ease of use n Easy to setup policies n Integrate with outside systems n Easy audit tools
  • 25. + Thanks! & Questions? Sujee Maniyam Founder / Principal http://elephantscale.com/ sujee@ElephantScale.com