- DST has established rules for organizing and securing big data across business units, with each unit representing a separate "data domain"
- Access to data domains is restricted by default between units, but agreements can grant additional access for approved purposes
- The document describes setting up a Linux file system to mirror this structure using users, groups, directories, and access control lists (ACLs) to partition and control access to simulated "data domains" according to the business rules.
2. The enclosed materials are highly sensitive, proprietary and confidential. Please use every effort to safeguard the confidentiality
of these materials. Please do not copy, distribute, use, share or otherwise provide access to these materials to any person inside
or outside DST Systems, Inc. without prior written approval.
This proprietary, confidential presentation is for general informational purposes only and does not constitute an agreement.
By making this presentation available to you, we are not granting any express or implied rights or licenses under any intellectual
property right.
If we permit your printing, copying or transmitting of content in this presentation, it is under a non-exclusive, non-transferable,
limited license, and you must include or refer to the copyright notice contained in this document. You may not create derivative
works of this presentation or its content without our prior written permission. Any reference in this presentation to another
entity or its products or services is provided for convenience only and does not constitute an offer to sell, or the solicitation of
an offer to buy, any products or services offered by such entity, nor does such reference constitute our endorsement, referral,
or recommendation.
Our trademarks and service marks and those of third parties used in this presentation are the property of their respective owners.
息 2015 DST Systems, Inc. All rights reserved.
DisclaimerDisclaimer
3. DST has established internal rules around the use of
Big Data
Data flowing into our data lake is partitioned by,
what we call, Data Domains
Each DST business unit is in essence at least one
Data Domain
Data Domains serve as the primary method of
organizing our permission-ing
Big (or not) Data Security
4. By default, one Business Unit is not granted access
to anothers data
Agreements between business units are made to
access data for purpose
Internal Data Scientists are given cross-Business Unit
access to data
Management mandate to secure data which has not
been explicitly granted access
What This Means
4
5. These rules result in a very complex matrix of permissions
Example below
Data Doman Business Unit A may be accessed by Business Unit A and Business
Unit D. Business Units B and C may not access this Data Domain
Complexity
5
BU A BU B BU C BU D
DataDomain
Business Unit A X X
Business Unit B X X
Business Unit C X X X
Third Party Data X X
6. Lets deal with just text data on a file system in a Linux server
Logical approach is to arrange directories to track with the Data Domains
For permission-ing, create a group and directory for each Data Domain
Assign the group ownership as appropriate
Set umask to 007 new files to have u:rw-, g:rw-, o:--- permissions
Scenario
6
10. The directory for the Data Domain Business Unit A can be accessed by
members of the bua group
How can we grant additional access to the bud group, but still restrict
other groups?
Complexity Redux
10
BU A BU B BU C BU D
DataDomain
Business Unit A X X
Business Unit B X X
Business Unit C X X X
Third Party Data X X
11. POSIX Access Control Lists (ACLs) are the answer to our dilemma
Not enabled by default. Needs to be enabled at the filesystem level
mount with the remount and acl options can enable
mount o remount o acl /dev/sda5 /home
See your system administrator for the permanent enable
The Secret Sauce
11
12. setfacl is used to set the ACL for a file or directory
getfacl is used to query and list the ACL of a file or directory
Our specific need:
In addition to rwx permissions for the group buag, add rwx permissions for
the group budg to the directory bua
In addition to rwx permissions for the group bubg, add rwx permissions for
the group budg to the directory bub
In addition to rwx permissions for the group bucg, add rwx permissions for
the groups bubg and budg to the directory buc
In addition to rwx permissions for the group tpdg, add rwx permissions for the
groups bucg and budg to the directory tpd
The Tools
12
13. In addition to rwx permissions for the group buag, add rwx permissions
for the group budg to the directory and contents of bua
setfacl R --set u::rwx,g::rwx,o::-,g:budg:rwx bua
In addition to rwx permissions for the group bubg, add rwx permissions
for the group budg to the directory and contents of bub
setfacl R --set u::rwx,g::rwx,o::-,g:budg:rwx bub
In addition to rwx permissions for the group bucg, add rwx permissions
for the groups bubg and budg to the directory and contents of buc
setfacl R --set u::rwx,g::rwx,o::-,g:bubg:rwx,g:budg:rwx buc
In addition to rwx permissions for the group tpdg, add rwx permissions
for the groups bucg and budg to the directory and contents of tpd
setfacl R --set u::rwx,g::rwx,o::-,g:bucg:rwx,g:budg:rwx tpd
The Commands
13
15. Hadoop HDFS v2.6 adds POSIX ACLs
Make sure to turn it on first
hdfs-site.xml
<property>
<name>dfs.namenode.acls.enabled</name>
<value>true</value>
</property>
Reboot the namenode
Set an ACL
hdfs dfs -setfacl -m u::rwx,g::rwx,o::-,g:budg:rwx /bua
See the ACLs
hdfs dfs getfacl /bua
How To Hadoop It
15
16. Use a Default ACL for Automatic Application to New Children
sudo setfacl -d --set u::rwx,g::rwx,o::-,g:budg:rwx bua
sudo setfacl -d --set u::rwx,g::rwx,o::-,g:budg:rwx bub
sudo setfacl -d --set u::rwx,g::rwx,o::-,g:bubg:rwx,g:budg:rwx buc
sudo setfacl -d --set u::rwx,g::rwx,o::-,g:bucg:rwx,g:budg:rwx tpd
And in Hadoop
hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:budg:rwx bua
hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:budg:rwx bub
hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:bubg:rwx,d:g:budg:rwx buc
hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:bucg:rwx,d:g:budg:rwx tpd
Other Goodies
16
18. Dont forget about the sticky bit
Makes it so that only root or the directory owner can delete files
sudo chmod +t bua
Use the setgid bit to set new files in a directory to have the same group
owner as the directory.
Very handy when paired with default ACLS
sudo chmod g+s bua
Last Extra Bits
18