際際滷

際際滷Share a Scribd company logo
A Framework for Modeling and
Execution of
Infrastructure Contention
Experiments
Carmelo Ragusa, Philip Robinson and Sergej Svorobej
MERMAT 2013, FIA, 7 May, Dublin
息 2013 SAP AG or an SAP affiliate company. All rights reserved. 2
Agenda
Introduction
 Problem analysis of conducting experiments about resource sharing in multi-tenancy
systems
Proposed solution
 COCOMA framework: objectives, design, benefits and stakeholders
Use case
 How resource contention affect users for different physical resources partitioning
configurations
Summary and future work
息 2013 SAP AG or an SAP affiliate company. All rights reserved. 3
Motivation
Resource (memory, disk, network, cpu) contention occurs in shared multy-
tenancy environments such as clouds
Cloud
Infrastructure
SuT Unknown
More detailed cloud experiments can be interested to investigate how the System
under Test (SuT) performs under various contention patterns that may occur in
practice.
Problem:
 How can we study these
issues?
 How can we emulate multi-
tenancy behaviour?
 How can we reproduce the
same experiments?
息 2013 SAP AG or an SAP affiliate company. All rights reserved. 4
Requirements for supporting rigorous
software testing in contentious, multi-tenant environments
 Scalability
 Reproducibility
 Portability
 Extensibility
 Self-containment
 Controllability
息 2013 SAP AG or an SAP affiliate company. All rights reserved. 5
Current approaches to reproduce operational conditions
such as contention
 Manual: use of a number of resource-specific tools and command-line
operations in order to manipulate operational conditions.
 Client-Based: simulation of usage, using multithreaded client-side request
generators in order to load target environment and raise required operational
conditions.
 Ad-Hoc Scripting: use of custom-built scripts for higher-level coordination of
load generation and test execution.
 Each one fails in same way to meet the previous listed requirements
息 2013 SAP AG or an SAP affiliate company. All rights reserved. 6
Proposed solution
A framework for COntrolled COntentious and Malicious (COCOMA) patterns:
deliberately make the platform misbehave  contention, faults and attacks.
Cloud
Infrastructure
SuT UnknownCOCOMA
Experimenters will be able to:
 study their system under real world effects conditions
 control those conditions
 reproduce exact conditions
 correlate conditions and results of their system under test
 use those findings to discover weaknesses and tune/enhance their system
息 2013 SAP AG or an SAP affiliate company. All rights reserved. 7
Design principles
 Separation of concerns
 Provide unified and coherent interface
for staging experiments
 Allowing advanced workload patterns
via distribution algorithms
 Abstraction from low level tools
 Easy composition of complex
patterns
 Easy extensibility
Virtual Machine
COCOMA
Distribution
Algorithms
Distribution
Algorithms
Distribution
Algorithms
Stressapptest
Lookbusy
Iperf
IperfIperfetc.
Emulators
Resource contention
Experiment
XML
息 2013 SAP AG or an SAP affiliate company. All rights reserved. 8
Concepts and terminology
 An emulation is composed of pairs
of distribution-emulator
 An emulator in a distribution is
bound to a specific resource type
 A distribution is a workload trend
 Distributions are broken down into
multiple runs to create the desired
trend
 Runs are single instantiation of low
level tools
 For complex scenarios users can
specify multiple distribution-
emulator pairs
Emulation Emulator
Distribution Resource-Type
e.g. CPU, RAM, Net, I/O
1 *
1
*
1
1 1
1
- type
- duration
- granularity
- parameters[] e.g. linear, trapezoidal, exp,
trace
Emulation time
Distribution-1
(CPU)
Distribution-2
(RAM)
Distribution-3 (NET)
Distribution-4
(I/O)
t1 tn
息 2013 SAP AG or an SAP affiliate company. All rights reserved. 9
Use case: How resource contention affect users for different
physical resources partitioning configurations
Test environment: single machine with 4 CPU cores and 16 GB of RAM
CPU (cores) RAM (GB range)
L (Low) 1 1 - 4
ML (Medium-Low) 2 5 - 8
MH (Medium-High) 3 9 - 12
H (High) 4 13 -16
COCOMA SuT Inactive VM Free resources
Conf CPU RAM Num VMs CPU RAM CPU RAM CPU RAM
1 MH H 1 L L n/a n/a n/a n/a
2 ML ML 1 ML ML n/a n/a n/a n/a
3 L L 1 MH H n/a n/a n/a n/a
4 MH H 3 L L n/a n/a n/a n/a
5 MH L 3 L L n/a n/a n/a n/a
6 L H 1 H L n/a n/a n/a n/a
7 ML ML 1 L L n/a n/a L ML
8 ML ML 1 L L L ML n/a n/a
Each configuration
represents VMs with
different resources
assigned, providing a
specific physical
resources partitioning
We abstract resources values with ranges from
Low to High
息 2013 SAP AG or an SAP affiliate company. All rights reserved. 10
Validation use case
 sysbench CPU benchmark
calculating the first 100K prime
numbers
 memspeed RAM benchmark
 tiobench IO benchmark
息 2013 SAP AG or an SAP affiliate company. All rights reserved. 11
Stakeholders
 Performance testers/engineers
 Cloud Service Providers
 Cloud Application Administrators
 Application Developers and Testers
 Benchmarks and Standards Groups
息 2013 SAP AG or an SAP affiliate company. All rights reserved. 12
Summary
 Experimenters need means that recreate conditions in cloud-like multy-tenancy
shared systems to enable testing their solutions
 Current approaches are limited as they fail to meet requirements for such
environments
 We proposed a solution that
 abstracts from low level tools
 enables from simple to complex repeatable scenarios, reducing experimenters effort
 allows investigating the system behaviour and correlate it to the specific conditions
created
 A simple validation use case was presented
息 2013 SAP AG or an SAP affiliate company. All rights reserved. 13
Future work
 Implementing malicious and faultiness modules, and relative
distributions
 Real system workload traces parsing and replay capability
 Distributed COCOMA solution to enable large complex distributed
scenarios
 Web UI to manage the framework
 Contention at CPU cache level
 Tool to record system traces in a COCOMA compatible format for reply
purpose
 Extend emulation analyser for distributed version to get warnings for
large scenarios' emulations
息 2013 SAP AG or an SAP affiliate company. All rights reserved. 14
Acknowledgements
This work has being carried out within the BonFIRE project, which has received
research funding from the ECs Seventh Framework Programs (EU ICT-2009-
257386 IP under the Information and Communication Technologies Program)
Thank you
Contact information:
Carmelo Ragusa
SAP HANA Cloud Infrastructure, Belfast
carmelo.ragusa@sap.com
COCOMA is released as Open Source under Apache v2 license:
https://github.com/cragusa/cocoma

More Related Content

COCOMA presentation, FIA 2013

  • 1. A Framework for Modeling and Execution of Infrastructure Contention Experiments Carmelo Ragusa, Philip Robinson and Sergej Svorobej MERMAT 2013, FIA, 7 May, Dublin
  • 2. 息 2013 SAP AG or an SAP affiliate company. All rights reserved. 2 Agenda Introduction Problem analysis of conducting experiments about resource sharing in multi-tenancy systems Proposed solution COCOMA framework: objectives, design, benefits and stakeholders Use case How resource contention affect users for different physical resources partitioning configurations Summary and future work
  • 3. 息 2013 SAP AG or an SAP affiliate company. All rights reserved. 3 Motivation Resource (memory, disk, network, cpu) contention occurs in shared multy- tenancy environments such as clouds Cloud Infrastructure SuT Unknown More detailed cloud experiments can be interested to investigate how the System under Test (SuT) performs under various contention patterns that may occur in practice. Problem: How can we study these issues? How can we emulate multi- tenancy behaviour? How can we reproduce the same experiments?
  • 4. 息 2013 SAP AG or an SAP affiliate company. All rights reserved. 4 Requirements for supporting rigorous software testing in contentious, multi-tenant environments Scalability Reproducibility Portability Extensibility Self-containment Controllability
  • 5. 息 2013 SAP AG or an SAP affiliate company. All rights reserved. 5 Current approaches to reproduce operational conditions such as contention Manual: use of a number of resource-specific tools and command-line operations in order to manipulate operational conditions. Client-Based: simulation of usage, using multithreaded client-side request generators in order to load target environment and raise required operational conditions. Ad-Hoc Scripting: use of custom-built scripts for higher-level coordination of load generation and test execution. Each one fails in same way to meet the previous listed requirements
  • 6. 息 2013 SAP AG or an SAP affiliate company. All rights reserved. 6 Proposed solution A framework for COntrolled COntentious and Malicious (COCOMA) patterns: deliberately make the platform misbehave contention, faults and attacks. Cloud Infrastructure SuT UnknownCOCOMA Experimenters will be able to: study their system under real world effects conditions control those conditions reproduce exact conditions correlate conditions and results of their system under test use those findings to discover weaknesses and tune/enhance their system
  • 7. 息 2013 SAP AG or an SAP affiliate company. All rights reserved. 7 Design principles Separation of concerns Provide unified and coherent interface for staging experiments Allowing advanced workload patterns via distribution algorithms Abstraction from low level tools Easy composition of complex patterns Easy extensibility Virtual Machine COCOMA Distribution Algorithms Distribution Algorithms Distribution Algorithms Stressapptest Lookbusy Iperf IperfIperfetc. Emulators Resource contention Experiment XML
  • 8. 息 2013 SAP AG or an SAP affiliate company. All rights reserved. 8 Concepts and terminology An emulation is composed of pairs of distribution-emulator An emulator in a distribution is bound to a specific resource type A distribution is a workload trend Distributions are broken down into multiple runs to create the desired trend Runs are single instantiation of low level tools For complex scenarios users can specify multiple distribution- emulator pairs Emulation Emulator Distribution Resource-Type e.g. CPU, RAM, Net, I/O 1 * 1 * 1 1 1 1 - type - duration - granularity - parameters[] e.g. linear, trapezoidal, exp, trace Emulation time Distribution-1 (CPU) Distribution-2 (RAM) Distribution-3 (NET) Distribution-4 (I/O) t1 tn
  • 9. 息 2013 SAP AG or an SAP affiliate company. All rights reserved. 9 Use case: How resource contention affect users for different physical resources partitioning configurations Test environment: single machine with 4 CPU cores and 16 GB of RAM CPU (cores) RAM (GB range) L (Low) 1 1 - 4 ML (Medium-Low) 2 5 - 8 MH (Medium-High) 3 9 - 12 H (High) 4 13 -16 COCOMA SuT Inactive VM Free resources Conf CPU RAM Num VMs CPU RAM CPU RAM CPU RAM 1 MH H 1 L L n/a n/a n/a n/a 2 ML ML 1 ML ML n/a n/a n/a n/a 3 L L 1 MH H n/a n/a n/a n/a 4 MH H 3 L L n/a n/a n/a n/a 5 MH L 3 L L n/a n/a n/a n/a 6 L H 1 H L n/a n/a n/a n/a 7 ML ML 1 L L n/a n/a L ML 8 ML ML 1 L L L ML n/a n/a Each configuration represents VMs with different resources assigned, providing a specific physical resources partitioning We abstract resources values with ranges from Low to High
  • 10. 息 2013 SAP AG or an SAP affiliate company. All rights reserved. 10 Validation use case sysbench CPU benchmark calculating the first 100K prime numbers memspeed RAM benchmark tiobench IO benchmark
  • 11. 息 2013 SAP AG or an SAP affiliate company. All rights reserved. 11 Stakeholders Performance testers/engineers Cloud Service Providers Cloud Application Administrators Application Developers and Testers Benchmarks and Standards Groups
  • 12. 息 2013 SAP AG or an SAP affiliate company. All rights reserved. 12 Summary Experimenters need means that recreate conditions in cloud-like multy-tenancy shared systems to enable testing their solutions Current approaches are limited as they fail to meet requirements for such environments We proposed a solution that abstracts from low level tools enables from simple to complex repeatable scenarios, reducing experimenters effort allows investigating the system behaviour and correlate it to the specific conditions created A simple validation use case was presented
  • 13. 息 2013 SAP AG or an SAP affiliate company. All rights reserved. 13 Future work Implementing malicious and faultiness modules, and relative distributions Real system workload traces parsing and replay capability Distributed COCOMA solution to enable large complex distributed scenarios Web UI to manage the framework Contention at CPU cache level Tool to record system traces in a COCOMA compatible format for reply purpose Extend emulation analyser for distributed version to get warnings for large scenarios' emulations
  • 14. 息 2013 SAP AG or an SAP affiliate company. All rights reserved. 14 Acknowledgements This work has being carried out within the BonFIRE project, which has received research funding from the ECs Seventh Framework Programs (EU ICT-2009- 257386 IP under the Information and Communication Technologies Program)
  • 15. Thank you Contact information: Carmelo Ragusa SAP HANA Cloud Infrastructure, Belfast carmelo.ragusa@sap.com COCOMA is released as Open Source under Apache v2 license: https://github.com/cragusa/cocoma

Editor's Notes

  1. Challenges for multi-tenancy include optimisation of resource sharing and guaranteed isolation against physical limitations, co-located faults and malicious attacks. For these reasons testing the performance and resilience of applications with different hardware, platforms, configurations, resource sharing conditions and user loads is important for increasing the assurance that providers and consumers have in cloud applications.
  2. Scalability: given that customer applications can vary from 1 to 1000s of nodes, it must be possible to readily set up and execute useful testing environments for 1 to n number of independent hosts and network elements, avoiding cumbersome, error-prone configuration.Reproducibility: it must be possible to easily repeat testing conditions and tests in order to perform viable regression testing and make reliable claims about software quality.Portability: as hardware and virtualization technologies change, or as applications may be migrated to alternative data centers and platforms, it should be possible to easily reuse and recreate test designs and conditions during these changes.Extensibility: in addition to portability, test designs will need to be modified over time in order to take into account changes in quality constraints, scope, expected demands and software functionality. Systematic, guided procedures for modifying and extending test designs and mechanisms for these changes are necessary, as opposed to starting from scratch each time or making changes without knowledge of all dependencies.Self-containment: it is desirable to have a single top level solution, operational interface and workflows for designing and executing tests as opposed to the tester having to switch between multiple contexts and tools.Controllability: in spite of abstraction and higher level tooling, there is still a need to have control over the behaviour of resources used in the test, minimising the amount of disturbances that might provide unknown variations in test results.
  3. Each requirement is assessed considering four different perspectives that arise in real-world systems with changing business priorities and technologies:Different test types (e.g. functional, load, security) need to be performed.A variety of resource kinds (i.e. network, disk, CPU, memory) need to be manipulated.Heterogeneity of physical hosts, nodes and devices.Given different customers, different scenarios and workload mixes have to be considered in parallel.
  4. The main principles behind the design of the framework are the abstraction from the lower level tools that are used to create loads over the resources, allowing to emulate the wanted contention, as well as the separation of concerns, providing an effective modularisation of the tool which enables easy extensibility and additions of emulators and distributions.
  5. A distribution creates the trend through runs as in a sampling process
  6. Given a share physical machine, we divide the physical resources to VMs (SuT and COCOMA) each time essentially with a different percentage of the total.
  7. CPU, the SuT CPU benchmark degradation over the percentage of CPU used by COCOMAThree clusters can be identifiedin the highest one there are configurations 1, 4 and 5 which according to table II have all the MH setup for the CPU for COCOMAbelow there are configurations 2, 7 and 8, which have the ML setup for the CPU for COCOMAand in the lowest part there are 6 and 3, with COCOMA CPU setup as LAs expected the more physical CPU is controlled by COCOMA the more the SuT is affected when CPU intensive operations are performedRAM, COCOMA used the maximum assigned RAM while increasing the number of threads performing writing operations on the RAMAmount of RAM and overall number of VMs assigned to COCOMA influence the resultsconfigurations 1, 2, 3 and 6, which have all 2 VMs in total, the more amount of RAM is assigned to COCOMA the more the SuT is affectedif we compare configuration 5 and 1, the difference in results is due only to the total number of VMs, being the only differentiator parameter between the two configurationsIOthe amount of files used for the workload does not make any noticeable difference across all configurationsConfiguration 4 and 5 (with 3 VMs assigned to COCOMA vs 1 to the SuT) suffers most
  8. Performance testers/engineers: practitioners investigating for example new colocation algorithms, and generally in need to create a contentious/malicious environment to conduct their tests;Cloud Service Providers: in this case a service provider may offer performance isolation mechanisms and therefore wants to test the effectiveness of those mechanisms, investigate the possibility of offering those mechanisms, or study what characteristics applications need to coexist;Cloud Application Administrators: administrators may need to check when a system is restored, after some maintenance or a crash, that performance isolation mechanisms are working correctly;Application Developers and Testers: application developers may want to investigate the effects of contention over their system, while testers may want to check if providers isolation mechanisms work sufficiently;Benchmarks and Standards Groups: in this case it can be used to validate cloud patterns and workloads under investigation and/or characterisation.
  9. On the malicious part, we are looking into covert channels (or side channels) at cache level to infer other processes information and data, as well as at network level to get information about co-located guests, such as IP addresses. The latter could be used by fuzzers to send malicious workloads, or do a DoS.
  10. BonFIRE is an EU project which is designing, building and operating a multi-site cloud-based facility on top of six infrastructure offering heterogeneous Cloud resources, including compute, storage and network. BonFIRE is geared towards experimentation and research into Cloud/IoS, and offers the facilities to easily create, manage and monitor experiments, whilst giving the experimenters more information and control of Cloud resources than what is offered by other public Cloud providers.