The document discusses the improving security landscape in Hadoop. It covers several aspects of security including controlling access, user authentication using Kerberos, securing communications, authorization for data access, auditing, securing NoSQL databases like HBase and Accumulo, and encrypting data at rest. Many Hadoop vendors and projects are working on security features, getting the core functionality in with a focus on ease of use going forward. A unified security standard and easier administration are needs going into the future.
3. +
But Security Picture Has Improved
Rapidly
n Lot of work going on in the eco system
n Hadoop vendors (Cloudera / HortonWorks ..) have been
very actively working on security features
n the core features are in
n Ease of use improving as well
4. +
What Does It Mean to be Secure?
n 1) Control who can get in?
n 2) Verify the persons identity
n 3) safeguard communications with user
n 4) What is allowed for this user
n 5) Audit / log access
n 6) Secure NOSQL
n 7) And finally
Protect data at rest
5. +
1) Who can get in
n Control which machines can connect to NoSQL cluster
n Dont expose the cluster to public
n Too many open ports
n Too vulnerable
n Solutions:
n Run cluster behind firewall
n Restrict which machines
can connect to cluster
n Linux / Network level security
n Outside the actual NoSQL
8. +
2) User Authentication
n How can we verify the user?
n Username / password (gmail)
n Or use a third person (referee)
n Kerberos
Source : http://1.bp.blogspot.com/
Wolf : Knock
Knock
Wolf : Its me,
little piggy
Pig :who is it?
9. +
Kerberos : Quick Primer
n Kerberos is a authentication protocol for networked
machines
n Validates client to server and vice-versa
n Strong crypto algorithms (AES, 3DES)
11. +
Kerberos Protocol Explained :
Getting Beer @ Fair / Party
n Prove your age (identity) to wrist-band issuer
n Ticket Granting Ticket
n Get a wristband qualifies you to get beer
n Service Ticket
n Go to bartender and ask for beer using your wrist-band
n Service Request
n Get Beer ! J
n For technically correct explanation see :
http://www.roguelynn.com/words/explain-like-im-5-
kerberos/
12. +
3) Secure Client Communication
n Guard client / server communication (on the wire)
n Done by using SASL (certificates)
n Prevents snooping by third parties
13. +
4) What Is Allowed For This User?
n In unsecured environment users can read / write to any table
n not very secure!
n Control which data users can see..
14. +
5) Audit logging
n See what is going on
USER : tim, resource = hdfs:/data/logs , type = read,
time=.
USER : tim, resource = hive:click_logs , type = read, query =
select *.
15. +
6) Secure NOSQL
n NoSQL solutions :
n On Hadoop : Hbase, Accumulo
n Other : Cassandra,
n Access control
n Table level access : can I read / can I write-update-insert ?
n Within a table, column level access
n Who can read column social_security_number ?
16. +
Accumulo : Quick Intro
n Developed by the National Security Agency (NSA) !
n Google Big Table implementation
n Nosql store on top of HDFS
n Security is a first grade concept
HDFS
Accumulo
17. +
Accumulo Data Model
Family : info
Columns name email Last 4 ssn Ssn Gmail
password
Visibility
tokens
Level 1 Level 1 Level 1 Level 2
OR
Top
clearance
Top
clearance
≒ Every thing in HBase data model
≒ Plus each row has a Visibility Token
18. +
Users Are Assigned Visibility
Tokens
User id Visibility levels
User 1 Level 1
User 2 Level 1 + Level 2
Edward Snowden Level 1 + Level 2 + Top
Clearance
19. +
Accumulo only returns cells visible
to user
family
Columns name email Last 4 SSN Full SSN Gmail
password
person1 Joe joe@gma
il.com
6789 123-45-67
89
JoeSuper
Man!
Visibility
tokens
Level 1 Level 1 Level 1 Level 2
OR
Top
clearance
Top
clearance
20. +
What Users Can See
User Visibility Privilage Visible Cells
User 1 Level 1 Name
Email
Last 4 ssn
User 2 Level 1 +
Level 2
Name
Email
Last 4 SSN
Full SSN
Edward Snowden Level 1 +
Level 2 +
Top Clearance
Name
Email
Last 4 SSN
Full SSN
Gmail Password
21. +
6) Final Step : Encrypt Data At Rest
n Eventually data ends up in disk
n We need to protect the raw data on disk
n To prevent
n Users going to disk directly
n Theft of hardware
23. +
OK, so where are we
Project /
Solution
Purpose Status Vendor
kerberos Identity
management
Available neutral
Knox Secure gateway Hortonworks
CLoudera ?
Sentry Access control incubating Cloudera
Ranger
(similar to
Sentry)
Access control
+ Audit
In development
(HDP 2.2)
(originally XA
secure)
Hortonworks
Rhino Secure HDFS
data at rest
Available from
Hadoop 2.6
Neutral
(originally from
Intel)
Accumulo Secure nosql Available neutral
24. +
Future.
n Really need a unified standard (no fragmentation)
n Ease of use
n Easy to setup policies
n Integrate with outside systems
n Easy audit tools