This document summarizes CERN's efforts to implement an agile infrastructure using open source configuration tools like Puppet, Foreman, and OpenStack. It describes CERN's motivation to adopt these tools to better manage their increasing server needs. It provides details on the implementation of the Puppet and OpenStack components and outlines next steps to expand use and scale of the agile infrastructure project.
1 of 22
Downloaded 35 times
More Related Content
CERN Agile Infrastructure, Road to Production
1. CERN Agile Infrastructure
Road to Production
Steve Traylen,
steve.traylen@cern.ch
@traylenator
CERN, IT Department
HEPiX Autumn 2012 Workshop
CERN IT Department
CH-1211 Gen豕ve 23
Switzerland
1
www.cern.ch/it
Monday, October 15, 2012 1
2. CERN Agile Infrastructure
? Motivation
? Component Releases
每 Configuration
? Puppet, Foreman and Hiera
? Punch -> Judy
每 Provision
? OpenStack
每 Other Services
? koji , git, jira
? Community Interactions.
? AI as a Production Service
每 Expanding user base
CERN IT Department
CH-1211 Gen豕ve 23
Switzerland
www.cern.ch/it 2
Monday, October 15, 2012 2
3. Motivation for CERN AI.
? CERN IT is changing strategy for machine
provision and configuration.
? Rationale
每 Need to manage twice as many servers as today
每 No increase in staff numbers
每 Our deployment of configuration tools becoming
increasingly brittle.
每 New services take far to long to deploy.
? Approach
每 We are no longer a special case for compute.
每 Adopt open source tool chain model
CERN IT Department
每 Contribute new function back to community.
CH-1211 Gen豕ve 23
Switzerland
3
www.cern.ch/it
Monday, October 15, 2012 3
4. Configuration Components
? Puppet (2.7)
每 Responsible for configuration, an industry standard.
? Foreman (1.0)
每 Groups hosts into hostgroups of similar
configuration.
每 Generates kickstart files from where puppet can
take over.
? Hiera (1.0)
每 A data store used by puppet.
? Mcollective (2.2)
每 pub sub messaging to control and query hosts.
? CDB legacy (old)
CERN IT Department
CH-1211 Gen豕ve 23
Switzerland 每 Still some items in CDB... e.g warranty information.
4
www.cern.ch/it
Monday, October 15, 2012 4
5. Configuration - Punch Service
? First puppet infrastructure known as ※Punch§
每 One 4 core node, set up ※by hand§.
每 puppet, foreman running behind passenger
(mod_ruby)
每 In built own puppetca (cert authority)
每 All project members with root access.
? Secret files uploaded by hand.
? Secret files being distributed by puppet
每 Node started to struggle once 400 puppet agents
attached - CPU limitation on server.
? This was with reconfigurations every 15 minutes which
is excessive.
? Punch ran for 6 months.
CERN IT Department
CH-1211 Gen豕ve 23
Switzerland 每 Punch was never a scalable solution. 5
www.cern.ch/it
Monday, October 15, 2012 5
6. Configuration - Judy Service
? Punch replaced by Judy in August 2012.
每 All components are deployed with puppet.
每 2 backend puppetmasters, 2 backend foreman.
每 mod_loadbalence redirecting requests.
? Using CERN CA.
? CertBaby Service
? Hooks up users
kerberos identity,
machine
ownership and
certificate
requests.
CERN IT Department
CH-1211 Gen豕ve 23
Switzerland
6
www.cern.ch/it
Monday, October 15, 2012 6
7. Judy Service Scale
? Currently 1200 puppet agents.
每 500 node added in the last week.
每 100 a day being added right now.
每 Agents are running on
? Hardware
? CVI Service (hyper-v)
? OpenStack Nova (kvm) (all new ones)
每 Organized in 37 hostgroups with 60 subgroups.
? Adding more puppetmasters or foreman
backends is easy.
每 Same problem as scaling web pages, e.g
? Number of active connections at redirector.
CERN IT Department ? Consistency across back end servers.
CH-1211 Gen豕ve 23
Switzerland
7
www.cern.ch/it
Monday, October 15, 2012 7
8. Puppet Manifests.
? Puppet manifests are very (too?) quick to
develop.
每 Takes little longer than configuring the service.
每 e.g an apollo module written in two days.
? while apollo configuration was being learnt.
每 later paramatization of hardcoded values easy.
? Puppet code to be executed on nodes is
distributed by puppet first.
每 i.e no need to package any puppet modules.
每 Makes new feature development, deployment
very fast.
? We and others will get better at sharing
puppet manifests as hiera becomes normal.
CERN IT Department
CH-1211 Gen豕ve 23
Switzerland
8
www.cern.ch/it
Monday, October 15, 2012 8
9. Puppet Git and Environments
? Git used for puppet modules & manifests.
? Git branches map to dynamic environments
每 local development can be &puppet apply*d.
每 admins push changes to a (gitolite) repository
每 puppet masters pull branches and translate to
environments
每 Production, Testing & Devel branches
每 Topic branches for major changes
每 Some services live in their own branches
? risk of divergence...
? Atlassian Crucible & Fisheye for module
review process ... not really started.
CERN IT Department
CH-1211 Gen豕ve 23
Switzerland
9
www.cern.ch/it
Monday, October 15, 2012 9
10. Foreman
? Groups hosts of similar configuration.
? Top group -> service. e.g lxbatch, cernfts, ...
? Subgroups may be very different e.g
每 cvmfs/stratum0 vs cvmfs/lxcvmfs.
CERN IT Department
CH-1211 Gen豕ve 23
Switzerland
10
www.cern.ch/it
Monday, October 15, 2012 10
11. Separate Code and Data
? Quattor separated code and data well:
每 It was one motivation to write Quattor and drop
LCFGng in the first place.
? hiera takes the separation to a new level:
每 puppet asks for a value from hiera?
? $myNTP = hiera(&ntpservers*)
每 result can be string , array, hash, ....
每 The lookup is based on a nodes properties, e.g
? Since I am at CERN answer is ntp1.cern.ch
? Since I am in Budapest answer is ntp2.cern.ch
每 The schema of results for CERN nodes,
Budapest nodes, SLC5 nodes, debian nodes can
CERN IT Department
CH-1211 Gen豕ve 23
be arranged and changed as we please.
Switzerland
11
www.cern.ch/it
Monday, October 15, 2012 11
12. Hiera and Hostgroups
? We arrange nodes in to (sub)hostgroups in
foreman.
? A tree of YAML files stored in git maps on to
these. e.g for castor hostgroups
每 hostgroup/castor/diskserver/atlas.yaml
每 hostgroup/castor/diskserver.yaml
每 hostgroup/castor.yaml # A YAML file.
每 os/slc5.yaml ---
castorns: ns.cern.ch
每 common.yaml
? The files above contain increasingly general
keyvalues for look up in hiera.
? Schema and can be fully customized to CERN
CERN IT Department
CH-1211 Gen豕ve 23
Switzerland
space with no fear of polluting the code.
12
www.cern.ch/it
Monday, October 15, 2012 12
13. Configuration Next Steps
? Deploy puppetdb
每 Performance improvements - community raving.
每 Repository for configuration data mining.
? Deploy mcollective
每 Pub and Sub system for sending action
commands to hosts.
每 Message broker needs ACLs on queues
corresponding to full diversity of CERN hosts and
actions.
每 Data mine puppetdb.
? Workflow
每 Move to git pull request process for central
CERN IT Department configuration.
CH-1211 Gen豕ve 23
Switzerland
13
www.cern.ch/it
Monday, October 15, 2012 13
14. OpenStack Deployment
? Currently Essex code base from the EPEL
repository
? Good experience with the Fedora cloud-sig
team
? Cloud-init for contextualisation, oz for images
with RHEL/Fedora
? Components
每 Nova on KVM and Hyper-V
每 Keystone integrated with Active Directory
每 Glance with Oz
每 Horizon
CERN IT Department
? Test bed of 100 Hypervisors, 2000 VMs
CH-1211 Gen豕ve 23
Switzerland integrated with CERN infrastructure, Puppet14
www.cern.ch/it
Monday, October 15, 2012 14
15. OpenStack AD Integration
? CERN*s Active Directory
? Unified identity management across the site
每 44,000 users, 29,000 groups
每 200 arrivals/departures per month
? Full integration with Active Directory via
LDAP
每 Slightly different schema from OpenLDAP
每 Aim to minimise changes to AD Schema
每 7 patches submitted around hard coded values
and additional filtering
? Now in use for our pre-production instance
CERN IT Department 每 Model project definitions in Active Directory
CH-1211 Gen豕ve 23
Switzerland
www.cern.ch/it 每 Map roles to groups 15
Monday, October 15, 2012 15
16. Welcome Back Hyper V
? We currently use Hyper-V/System Centre for
our server consolidation
每 Over 3,200 VMs, 60% Linux/40% Windows
? Choice of hypervisors should be tactical
每 Performance
每 Compatibility/Support with integration
components
每 Image migration
? CERN is working closely with the Hyper-V
OpenStack team
每 Puppet to configure hypervisors on Windows
CERN IT Department
每 Most functions work well but further work on
CH-1211 Gen豕ve 23
Switzerland Console, Ceilometer, # 16
www.cern.ch/it
Monday, October 15, 2012 16
17. OpenStack Next Steps
? Deploy into production
每 Target for production is start of 2013 with Folsom
每 Use current grid model running on top of OpenStack
? Deploy multi-site
每 Extend to 2nd data centre in Hungary and disaster
recovery
? Deploy new functionality
每 Ceilometer for accounting
每 Bare metal for non-virtualised use cases such as high I/
O servers
每 PKI and X.509 user certificate authentication
每 Load balancing as a service
? Deploy at scale
CERN IT Department 每 Move towards 15,000 hypervisors over next two years
CH-1211 Gen豕ve 23
Switzerland
www.cern.ch/it
每 Estimate 100-300,000 virtual machines 17
Monday, October 15, 2012 17
18. Community Interactions
? CERN presenting to community/vendors.
每 PuppetConf , San Francisco, Sep 2012
每 Openstack Summit, San Francisco Apr 2012
每 Openstack Summit, San Diego , Oct 2012 (now)
每 PuppetCamp, Geneva, July 2012
? CERN has code contributions to:
每 facter, the foreman, puppet, various puppet
modules, mcollective, openstack nova, keystone
and swift.
每 This is increasing as new students/fellows are
employed for their puppet, ruby, .. skills.
? CERN puppet-users meeting , IT, ATLAS pit, ..
CERN IT Department
CH-1211 Gen豕ve 23 ? Share our own http://github.com/cernops
Switzerland
18
www.cern.ch/it
Monday, October 15, 2012 18
19. Other AI Services
? Agile is not just Puppet and Openstack.
? AI created a gitolite ACL*ed GIT service.
每 CERN IT is now provisioning a public GIT service
based on this.
每 AI will migrate its projects ASAP.
? AI created a Koji service for RPMs.
每 Creates RPMS and publishes to yum.
每 The service is now being used by others with in IT.
e.g castor builds, data management, lemon, ...
? AI ran jira early before a central service was
created.
CERN IT Department
每 AI already migrated to central service.
CH-1211 Gen豕ve 23
Switzerland
19
www.cern.ch/it
Monday, October 15, 2012 19
20. AI Service in Production
? Several Services running now on AI.
每 Some CVMFS components.
每 SLC6 batch services
每 SLC build machines
每 GIT gateways.
每 CASTOR (compass VO)
每 Test systems, glusterfs, swift, ..
每 New top level hostgroups every week now.
? From November AI opening up more.
每 Experiment services (voboxes) will start to use AI
service.
CERN IT Department
每 Documentation to be updated/consolidated.
CH-1211 Gen豕ve 23
Switzerland
20
www.cern.ch/it
Monday, October 15, 2012 20
21. Conclusions
? Agile Infrastructure Project
? We are ready for hardware arriving in
Budapest in 2013.
每 Puppet configured VMs on Puppet configured
OpenStack.
? Documentation:
每 More user facing documentation needed.
? Configuration with Puppet:
每 Services needing knowledge of everything
每 Inter sysadmin trust.
每 Test facility for AI.
CERN IT Department
CH-1211 Gen豕ve 23
? OpenStack deployment
Switzerland
www.cern.ch/it 每 Increase scale. 21
Monday, October 15, 2012 21
22. URLs
? AI Project Pages: http://cern.ch/go/7vFF
? CERN modules http://github.com/cernops
? CERN agile tickets https://agileinf.its.cern.ch/jira
? AI Presentations : http://cern.ch/go/6qRG
CERN IT Department
CH-1211 Gen豕ve 23
Switzerland
22
www.cern.ch/it
Monday, October 15, 2012 22