JAOO 2010
In this session, well run a retrospective on our efforts to break down organizational barriers with continuous deployment and other DevOps goodness. Well talk about what we have done with tools and practices like CI and build pipelines, Puppet and Yum. Well also address some puzzles we have encountered such as massive data deployments to many global data centres, and replacing silos with cross-functional teams in a complex, evolving environment.
http://jaoo.dk/aarhus-2010/presentation/Continuous%20Deployment%20and%20DevOps:%20Deprecating%20Silos
1 of 37
More Related Content
Continuous Deployment and DevOps: Deprecating Silos - JAOO 2010
1. CONTINUOUS DEPLOYMENT
AND DEVOPS
D E P R E C A T I N G S I L O S
JOSH DEVINS, NOKIA JAOO 2010
TOM SULSTON, THOUGHTWORKS RHUS, DENMARK
2. WHO ARE WE AND WHERE
ARE WE FROM?
Josh Devins, Nokia Berlin
Software architect, Location Services
Sysadmin of honour
Tom Sulston, ThoughtWorks
Lead consultant
DevOps, build & deploy
36. JOIN US!
Nokia is hiring in Berlin!
www.nokia.com/careers
ThoughtWorks is hiring in London, Hamburg and further
abroad.
www.thoughtworks.com/jobs
37. THANKS!
JOSH DEVINS www.joshdevins.net @joshdevins
TOM SULSTON www.thoughtworks.com @tomsulston
JOSH DEVINS, NOKIA JAOO 2010
TOM SULSTON, THOUGHTWORKS RHUS, DENMARK
Editor's Notes
#3: Flip to ovi maps, describe what the product is (kind of)
#4: A few words of introduction on what the “before” state was
- web and device
- growth from startup to millions of devices/mo
- free navigation earlier this year increased usage
- rapid feature and team growth
#5: http://www.flickr.com/photos/tonyjcase/4092410854/sizes/l/in/photostream/
Developers and operations teams separated both organisationally and physically
Whole different organisational structure - need to go to C-level (VP-level?) to find a common reporting line
Started as a hardware company, and really bolted on services at the beginning
Poor alignment of technology choices (base OS, packaging, monitoring)
Very little common ground, because...
#6: - lots of technology/approach divergence caused by:
- many ops teams - “operations”, “transitions”, “development support”
- many development teams - frontend, backend, backend function x/y/z
- Conway’s Law
- short term scaled well and fast
- right intention of giving small teams autonomy but...balance needed
- Lots of integration points
- more complexity than necessary
- lots of inventory
- Integration is v. painful
#7: - lots of things done by hand, non-repeatable
QA, almost nothing automated (except where really necessary -- perf tests)
Baroque configuration process
Releases take a long time and a lot of manual testing/verification
Cycle time is very slow
Right intentions, did not scale
- change management process (?)
- carrying knowledge/understanding across silos has a cost (x4)
Frequent rework - fixing the same problem again and again and usually at the last-minute
#8: http://www.flickr.com/photos/14608834@N00/2260818367/sizes/o/in/photostream/
- reality: about one and a half people knew how the whole thing worked end-to-end
- reality: ~10-days to build a new image with Java, 5 Tomcat instances, as many war files, nothing else!
- worse: the "image system" was not used anywhere except staging and production so failures can very late
- maintenance: in dev/QA regular Debian systems with DEB packaging was used, had to essentially maintain two complete distribution mechanisms
- change management process is heavyweight
- ITIL++, multi-tab Excel spreadsheets, CABs in other countries, not directly involved
- often circumvented
- communication gaps between ops teams
- package and config structure (ISO + rsync)
- it worked, but was slow and cryptic
- building whole OS images in very slow and non-parallelisable (4 hrs?) CI
- multi-phased approach requiring first a custom packaging system and description language (VERY cryptic and bespoke)
- using PXE Linux to boot images from a central control server for configuration rsync
- any booted server can act as a peer to boot other machines
#9: http://www.flickr.com/photos/14608834@N00/2260818367/sizes/o/in/photostream/
- lots of things done by hand, non-repeatable
- “We don’t have time to do it right”
- time-to-recovery is slow
- monitoring is:
inconsistent (lots of false alarms)
unclear (multiple tools, teams)
too coarse (the site is down!)
- hard to triage infrastructure or code issues
- inventory management is weak
- many data centres,
- not enough knowledge kept in-house
#10: - Any questions on describing the problem?
- has anyone got similar problems?
- What actions did we take to address these issues?
Time check: 20 mins
#11: http://www.flickr.com/photos/snogging/4688579468/sizes/l/
- what is continuous delivery?
- Continuous Delivery: every SCM commit results in releasable software
- that is, from a purely infrastructural and "binary-level" perspective, the software is always releasable
- This includes layers of testing, not just releasing anything that compiles!
- features may be incomplete, etc. so in practice you might not actually release every commit (ie: Continuous Deployment)
- “If something hurts, do it more often”
- You should have gone to Jez’s session this morning!
#12: http://www.uvm.edu/~wbowden/Image_files/Pipeline_at_Kuparuk.jpg
- how do we get from a SCM commit to something that is deployable and tested enough?
- Building the ‘conveyor belt’
- Turn up existing CI practices to 11
- Each team already did “build & unit test” - no deployable package (WARs to Nexus)
- Automated integration of various teams’ work
- Automated integration testing
- Testing deployments - same method on all environments
- Currently using Hudson & ant - this works OK.
#13: http://www.petsincasts.com/?p=162
- workaround: don't use the Maven "release" process or just live with it and do Maven "releases" as often as possible
- lesson learned: don't try to mess with "the Maven way", it gets very hairy and is a huge time suck
- lesson learned: don't depend on SNAPSHOT dependencies unless they are under your own control (can't safely release your module with SNAPSHOT deps meaning you will have to wait for someone else to release their module)
- standard Maven versioning lifecycle: 1.0.0-SNAPSHOT, pull down dependencies (some SNAPSHOTs themselves) from some repository (usually one that is not integrated with your source code repository)
- working away on 1.0.0-SNAPSHOT and I'm ready to release so then do a Maven "release", tagging SCM, and I get version 1.0.0
- crap we found a bug, so we keep working now on version 1.0.1-SNAPSHOT
- okay, ready to release again so I get version 1.0.1
- do some testing and everything is happy so I drop my 1.0.1 war into my production Tomcat
- what's wrong with this picture?
- key: we "release" software BEFORE we are satisfied with its' quality
- like we said before, continuous delivery is all about the possibility of releasing to production at all times, from all commits
#14: CDC - Consumer-Driven Contract
http://www.martinfowler.com/articles/consumerDrivenContracts.html
Each service/team provides tests for those teams whose services they consume. (ie: If I use your service, I write you a test that expresses how I am using it. You can then run that test in your build.)
Lets us do quick integration-type testing at the unit/functional level.
Much easier than maintaining stubs.
Designed to catch integration failures earlier (typical failure mode is for clients/servers to diverge while still passing their own tests, only to be caught at manual QA stages)
Ceremony for giving tests to another team
#15: http://www.flickr.com/photos/delgrossodotcom/2553424895/
- Build once!
- passing deployable packages (RPMs) up the value chain
- Categorically 100% sure that you’re testing what you’re going to deploy
- Can wrap up all sorts of useful things in OS packages
- reference data
- hook scripts
- dependencies on tiered applications
- build pipeline of repositories
- Each repo means “X level of testing has been done on these packages”
- gotcha: createrepo caching
- gotcha: no concurrent running of createrepo
- gotcha: using metapackages to join versions (Might re-introduce in future)
#16: - not doing this yet, but here are some ideas
- Currently using mySQL - is there a need to change to Key/Value store?
- RDBMS: check out ???
- NoSQL: big, huge question mark and little tooling support, so consider this seriously if considering NoSQL
- some teams are using BitTorrent to distribute large (GB and TB) datasets around the world - Lucene indices, map files, etc.
- similar to the idea that Twitter uses to deploy stuff with their Murder tool
- can we use dbdeploy?
#17: - Puppet overview & alternatives (Chef, CFEngine, hand-rolled tools)
- manifests
- modules and inheritance
- passing puppet configs with deployable code + configs
- Driven from developer-facing sysadmins
#18: - infrastructure testing with cucumber-puppet
- applying good development practices to the Ops world
- absolutely crucial to having a refactorable infrastructure
- how unchanging are your systems?
- can we start doing Behaviour-driven releases?
- This is alpha software!
- Does not catch all errors
#19: Configurations passed up from development team through Subversion
Deployed with puppet
Tested with cucumber-puppet
Tested on application start for missing values
Bundling application deployments simplifies configuration
TODO: review architecture of all apps and simplify (easier now that deployment tech debt is reduced)
#20: http://www.flickr.com/photos/jimbl/2881681649/sizes/o/
- scripted checks before anything even happens
- ensure that the stage is set and all known pre-requisites are tested and monitored
- application health-check on startup (are all my config values set?)
- check_http through nrpe
#21: http://www.flickr.com/photos/kylesteeddesign/4395772305/sizes/o/
- speaking of monitoring...
- Nagios, nrpe
- cucumber-nagios - Monitoring-driven deployments?
Would like developers to push up monitors alongside features.
- developers and engineers gaining common understanding around monitoring and system behaviour
#22: ITIL is a framework. DevOps is a series of practices.
While you could have lightweight ITIL implementations, they tend to be process-heavy.
DevOps is about doing all the good technical diligence in a way that marries with Agile practices and values
- not dependent on tool choice
Build up shared understanding by automation
Jez: A document proves nothing. But a script is real proof that you have done what is in the script.
#23: ITIL is a framework. DevOps is a series of practices.
While you could have lightweight ITIL implementations, they tend to be process-heavy.
DevOps is about doing all the good technical diligence in a way that marries with Agile practices and values
- not dependent on tool choice
Build up shared understanding by automation
Jez: A document proves nothing. But a script is real proof that you have done what is in the script.
#24: - not doing continuous deployment, but are making-ready
- it takes time for large organisations to catch up to technical change
- addressing cultural issues
- building common understanding and shared ownership
#26: “Stock photos are the bullet points of the twenty-first century” - Martin Fowler