際際滷

際際滷Share a Scribd company logo
Issues Securing Big Data
Mike Pluta, Sr Technical Architect | April 23, 2015
The enclosed materials are highly sensitive, proprietary and confidential. Please use every effort to safeguard the confidentiality
of these materials. Please do not copy, distribute, use, share or otherwise provide access to these materials to any person inside
or outside DST Systems, Inc. without prior written approval.
This proprietary, confidential presentation is for general informational purposes only and does not constitute an agreement.
By making this presentation available to you, we are not granting any express or implied rights or licenses under any intellectual
property right.
If we permit your printing, copying or transmitting of content in this presentation, it is under a non-exclusive, non-transferable,
limited license, and you must include or refer to the copyright notice contained in this document. You may not create derivative
works of this presentation or its content without our prior written permission. Any reference in this presentation to another
entity or its products or services is provided for convenience only and does not constitute an offer to sell, or the solicitation of
an offer to buy, any products or services offered by such entity, nor does such reference constitute our endorsement, referral,
or recommendation.
Our trademarks and service marks and those of third parties used in this presentation are the property of their respective owners.
息 2015 DST Systems, Inc. All rights reserved.
DisclaimerDisclaimer
 DST has established internal rules around the use of
Big Data
 Data flowing into our data lake is partitioned by,
what we call, Data Domains
 Each DST business unit is in essence at least one
Data Domain
 Data Domains serve as the primary method of
organizing our permission-ing
Big (or not) Data Security
 By default, one Business Unit is not granted access
to anothers data
 Agreements between business units are made to
access data for purpose
 Internal Data Scientists are given cross-Business Unit
access to data
 Management mandate to secure data which has not
been explicitly granted access
What This Means
4
 These rules result in a very complex matrix of permissions
 Example below
 Data Doman Business Unit A may be accessed by Business Unit A and Business
Unit D. Business Units B and C may not access this Data Domain
Complexity
5
BU A BU B BU C BU D
DataDomain
Business Unit A X X
Business Unit B X X
Business Unit C X X X
Third Party Data X X
 Lets deal with just text data on a file system in a Linux server
 Logical approach is to arrange directories to track with the Data Domains
 For permission-ing, create a group and directory for each Data Domain
 Assign the group ownership as appropriate
 Set umask to 007  new files to have u:rw-, g:rw-, o:--- permissions
Scenario
6
sudo useradd buaadm
sudo passwd -d buaadm
sudo useradd bubadm
sudo passwd -d bubadm
sudo useradd bucadm
sudo passwd -d bucadm
sudo useradd budadm
sudo passwd -d budadm
sudo useradd tpdadm
sudo passwd -d tpdadm
Details  Setup Users and Groups
7
sudo groupadd buag
sudo usermod -G buag buaadm
sudo groupadd bubg
sudo usermod -G bubg bubadm
sudo groupadd bucg
sudo usermod -G bucg bucadm
sudo groupadd budg
sudo usermod -G budg budadm
sudo groupadd tpdg
sudo usermod -G tpdg tpdadm
sudo usermod -a -G buag,bubg,bucg,budg,tpdg dt206031
umask 007
cd $HOME
mkdir data
cd data
mkdir bua
mkdir bub
mkdir buc
mkdir tpd
cd $HOME/data/bua
touch bua_file_1
touch bua_file_2
touch bua_file_3
touch bua_file_4
touch bua_file_5
sudo chown buaadm:buag *
Details  Setup Files
8
cd $HOME/data/bub
touch bub_file_1
touch bub_file_2
touch bub_file_3
touch bub_file_4
touch bub_file_5
sudo chown bubadm:bubg *
cd $HOME/data/buc
touch buc_file_1
touch buc_file_2
touch buc_file_3
touch buc_file_4
touch buc_file_5
sudo chown bucadm:bucg *
cd $HOME/data/tpd
touch tpd_file_1
touch tpd_file_2
touch tpd_file_3
touch tpd_file_4
touch tpd_file_5
sudo chown tpdadm:tpdg *
cd $HOME/data
sudo chown buaadm:buag bua
sudo chown bubadm:bubg bub
sudo chown bucadm:bucg buc
sudo chown tpdadm:tpdg tpd
What It Looks Like
9
 The directory for the Data Domain Business Unit A can be accessed by
members of the bua group
 How can we grant additional access to the bud group, but still restrict
other groups?
Complexity Redux
10
BU A BU B BU C BU D
DataDomain
Business Unit A X X
Business Unit B X X
Business Unit C X X X
Third Party Data X X
 POSIX Access Control Lists (ACLs) are the answer to our dilemma
 Not enabled by default. Needs to be enabled at the filesystem level
 mount with the remount and acl options can enable
 mount o remount o acl /dev/sda5 /home
 See your system administrator for the permanent enable
The Secret Sauce
11
 setfacl is used to set the ACL for a file or directory
 getfacl is used to query and list the ACL of a file or directory
 Our specific need:
 In addition to rwx permissions for the group buag, add rwx permissions for
the group budg to the directory bua
 In addition to rwx permissions for the group bubg, add rwx permissions for
the group budg to the directory bub
 In addition to rwx permissions for the group bucg, add rwx permissions for
the groups bubg and budg to the directory buc
 In addition to rwx permissions for the group tpdg, add rwx permissions for the
groups bucg and budg to the directory tpd
The Tools
12
 In addition to rwx permissions for the group buag, add rwx permissions
for the group budg to the directory and contents of bua
 setfacl R --set u::rwx,g::rwx,o::-,g:budg:rwx bua
 In addition to rwx permissions for the group bubg, add rwx permissions
for the group budg to the directory and contents of bub
 setfacl R --set u::rwx,g::rwx,o::-,g:budg:rwx bub
 In addition to rwx permissions for the group bucg, add rwx permissions
for the groups bubg and budg to the directory and contents of buc
 setfacl R --set u::rwx,g::rwx,o::-,g:bubg:rwx,g:budg:rwx buc
 In addition to rwx permissions for the group tpdg, add rwx permissions
for the groups bucg and budg to the directory and contents of tpd
 setfacl R --set u::rwx,g::rwx,o::-,g:bucg:rwx,g:budg:rwx tpd
The Commands
13
Results
14
 Hadoop HDFS v2.6 adds POSIX ACLs
 Make sure to turn it on first
hdfs-site.xml
<property>
<name>dfs.namenode.acls.enabled</name>
<value>true</value>
</property>
 Reboot the namenode
 Set an ACL
hdfs dfs -setfacl -m u::rwx,g::rwx,o::-,g:budg:rwx /bua
 See the ACLs
hdfs dfs getfacl /bua
How To Hadoop It
15
 Use a Default ACL for Automatic Application to New Children
sudo setfacl -d --set u::rwx,g::rwx,o::-,g:budg:rwx bua
sudo setfacl -d --set u::rwx,g::rwx,o::-,g:budg:rwx bub
sudo setfacl -d --set u::rwx,g::rwx,o::-,g:bubg:rwx,g:budg:rwx buc
sudo setfacl -d --set u::rwx,g::rwx,o::-,g:bucg:rwx,g:budg:rwx tpd
 And in Hadoop
hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:budg:rwx bua
hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:budg:rwx bub
hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:bubg:rwx,d:g:budg:rwx buc
hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:bucg:rwx,d:g:budg:rwx tpd
Other Goodies
16
Results With Default ACLs
17
 Dont forget about the sticky bit
 Makes it so that only root or the directory owner can delete files
sudo chmod +t bua
 Use the setgid bit to set new files in a directory to have the same group
owner as the directory.
 Very handy when paired with default ACLS
sudo chmod g+s bua
Last Extra Bits
18
19

More Related Content

Issues Securing (Big) Data

  • 1. Issues Securing Big Data Mike Pluta, Sr Technical Architect | April 23, 2015
  • 2. The enclosed materials are highly sensitive, proprietary and confidential. Please use every effort to safeguard the confidentiality of these materials. Please do not copy, distribute, use, share or otherwise provide access to these materials to any person inside or outside DST Systems, Inc. without prior written approval. This proprietary, confidential presentation is for general informational purposes only and does not constitute an agreement. By making this presentation available to you, we are not granting any express or implied rights or licenses under any intellectual property right. If we permit your printing, copying or transmitting of content in this presentation, it is under a non-exclusive, non-transferable, limited license, and you must include or refer to the copyright notice contained in this document. You may not create derivative works of this presentation or its content without our prior written permission. Any reference in this presentation to another entity or its products or services is provided for convenience only and does not constitute an offer to sell, or the solicitation of an offer to buy, any products or services offered by such entity, nor does such reference constitute our endorsement, referral, or recommendation. Our trademarks and service marks and those of third parties used in this presentation are the property of their respective owners. 息 2015 DST Systems, Inc. All rights reserved. DisclaimerDisclaimer
  • 3. DST has established internal rules around the use of Big Data Data flowing into our data lake is partitioned by, what we call, Data Domains Each DST business unit is in essence at least one Data Domain Data Domains serve as the primary method of organizing our permission-ing Big (or not) Data Security
  • 4. By default, one Business Unit is not granted access to anothers data Agreements between business units are made to access data for purpose Internal Data Scientists are given cross-Business Unit access to data Management mandate to secure data which has not been explicitly granted access What This Means 4
  • 5. These rules result in a very complex matrix of permissions Example below Data Doman Business Unit A may be accessed by Business Unit A and Business Unit D. Business Units B and C may not access this Data Domain Complexity 5 BU A BU B BU C BU D DataDomain Business Unit A X X Business Unit B X X Business Unit C X X X Third Party Data X X
  • 6. Lets deal with just text data on a file system in a Linux server Logical approach is to arrange directories to track with the Data Domains For permission-ing, create a group and directory for each Data Domain Assign the group ownership as appropriate Set umask to 007 new files to have u:rw-, g:rw-, o:--- permissions Scenario 6
  • 7. sudo useradd buaadm sudo passwd -d buaadm sudo useradd bubadm sudo passwd -d bubadm sudo useradd bucadm sudo passwd -d bucadm sudo useradd budadm sudo passwd -d budadm sudo useradd tpdadm sudo passwd -d tpdadm Details Setup Users and Groups 7 sudo groupadd buag sudo usermod -G buag buaadm sudo groupadd bubg sudo usermod -G bubg bubadm sudo groupadd bucg sudo usermod -G bucg bucadm sudo groupadd budg sudo usermod -G budg budadm sudo groupadd tpdg sudo usermod -G tpdg tpdadm sudo usermod -a -G buag,bubg,bucg,budg,tpdg dt206031
  • 8. umask 007 cd $HOME mkdir data cd data mkdir bua mkdir bub mkdir buc mkdir tpd cd $HOME/data/bua touch bua_file_1 touch bua_file_2 touch bua_file_3 touch bua_file_4 touch bua_file_5 sudo chown buaadm:buag * Details Setup Files 8 cd $HOME/data/bub touch bub_file_1 touch bub_file_2 touch bub_file_3 touch bub_file_4 touch bub_file_5 sudo chown bubadm:bubg * cd $HOME/data/buc touch buc_file_1 touch buc_file_2 touch buc_file_3 touch buc_file_4 touch buc_file_5 sudo chown bucadm:bucg * cd $HOME/data/tpd touch tpd_file_1 touch tpd_file_2 touch tpd_file_3 touch tpd_file_4 touch tpd_file_5 sudo chown tpdadm:tpdg * cd $HOME/data sudo chown buaadm:buag bua sudo chown bubadm:bubg bub sudo chown bucadm:bucg buc sudo chown tpdadm:tpdg tpd
  • 9. What It Looks Like 9
  • 10. The directory for the Data Domain Business Unit A can be accessed by members of the bua group How can we grant additional access to the bud group, but still restrict other groups? Complexity Redux 10 BU A BU B BU C BU D DataDomain Business Unit A X X Business Unit B X X Business Unit C X X X Third Party Data X X
  • 11. POSIX Access Control Lists (ACLs) are the answer to our dilemma Not enabled by default. Needs to be enabled at the filesystem level mount with the remount and acl options can enable mount o remount o acl /dev/sda5 /home See your system administrator for the permanent enable The Secret Sauce 11
  • 12. setfacl is used to set the ACL for a file or directory getfacl is used to query and list the ACL of a file or directory Our specific need: In addition to rwx permissions for the group buag, add rwx permissions for the group budg to the directory bua In addition to rwx permissions for the group bubg, add rwx permissions for the group budg to the directory bub In addition to rwx permissions for the group bucg, add rwx permissions for the groups bubg and budg to the directory buc In addition to rwx permissions for the group tpdg, add rwx permissions for the groups bucg and budg to the directory tpd The Tools 12
  • 13. In addition to rwx permissions for the group buag, add rwx permissions for the group budg to the directory and contents of bua setfacl R --set u::rwx,g::rwx,o::-,g:budg:rwx bua In addition to rwx permissions for the group bubg, add rwx permissions for the group budg to the directory and contents of bub setfacl R --set u::rwx,g::rwx,o::-,g:budg:rwx bub In addition to rwx permissions for the group bucg, add rwx permissions for the groups bubg and budg to the directory and contents of buc setfacl R --set u::rwx,g::rwx,o::-,g:bubg:rwx,g:budg:rwx buc In addition to rwx permissions for the group tpdg, add rwx permissions for the groups bucg and budg to the directory and contents of tpd setfacl R --set u::rwx,g::rwx,o::-,g:bucg:rwx,g:budg:rwx tpd The Commands 13
  • 15. Hadoop HDFS v2.6 adds POSIX ACLs Make sure to turn it on first hdfs-site.xml <property> <name>dfs.namenode.acls.enabled</name> <value>true</value> </property> Reboot the namenode Set an ACL hdfs dfs -setfacl -m u::rwx,g::rwx,o::-,g:budg:rwx /bua See the ACLs hdfs dfs getfacl /bua How To Hadoop It 15
  • 16. Use a Default ACL for Automatic Application to New Children sudo setfacl -d --set u::rwx,g::rwx,o::-,g:budg:rwx bua sudo setfacl -d --set u::rwx,g::rwx,o::-,g:budg:rwx bub sudo setfacl -d --set u::rwx,g::rwx,o::-,g:bubg:rwx,g:budg:rwx buc sudo setfacl -d --set u::rwx,g::rwx,o::-,g:bucg:rwx,g:budg:rwx tpd And in Hadoop hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:budg:rwx bua hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:budg:rwx bub hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:bubg:rwx,d:g:budg:rwx buc hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:bucg:rwx,d:g:budg:rwx tpd Other Goodies 16
  • 18. Dont forget about the sticky bit Makes it so that only root or the directory owner can delete files sudo chmod +t bua Use the setgid bit to set new files in a directory to have the same group owner as the directory. Very handy when paired with default ACLS sudo chmod g+s bua Last Extra Bits 18
  • 19. 19