A presentation about COCOMA, a framework for COntrolled COntentious and MAlicious patterns, presented at MERMAT, 2nd International Workshop on Measurement-based Experimental Research, Methodology and Tools, FIA 2013, Dublin, Ireland
1 of 15
Download to read offline
More Related Content
COCOMA presentation, FIA 2013
1. A Framework for Modeling and
Execution of
Infrastructure Contention
Experiments
Carmelo Ragusa, Philip Robinson and Sergej Svorobej
MERMAT 2013, FIA, 7 May, Dublin
2. 息 2013 SAP AG or an SAP affiliate company. All rights reserved. 2
Agenda
Introduction
Problem analysis of conducting experiments about resource sharing in multi-tenancy
systems
Proposed solution
COCOMA framework: objectives, design, benefits and stakeholders
Use case
How resource contention affect users for different physical resources partitioning
configurations
Summary and future work
3. 息 2013 SAP AG or an SAP affiliate company. All rights reserved. 3
Motivation
Resource (memory, disk, network, cpu) contention occurs in shared multy-
tenancy environments such as clouds
Cloud
Infrastructure
SuT Unknown
More detailed cloud experiments can be interested to investigate how the System
under Test (SuT) performs under various contention patterns that may occur in
practice.
Problem:
How can we study these
issues?
How can we emulate multi-
tenancy behaviour?
How can we reproduce the
same experiments?
4. 息 2013 SAP AG or an SAP affiliate company. All rights reserved. 4
Requirements for supporting rigorous
software testing in contentious, multi-tenant environments
Scalability
Reproducibility
Portability
Extensibility
Self-containment
Controllability
5. 息 2013 SAP AG or an SAP affiliate company. All rights reserved. 5
Current approaches to reproduce operational conditions
such as contention
Manual: use of a number of resource-specific tools and command-line
operations in order to manipulate operational conditions.
Client-Based: simulation of usage, using multithreaded client-side request
generators in order to load target environment and raise required operational
conditions.
Ad-Hoc Scripting: use of custom-built scripts for higher-level coordination of
load generation and test execution.
Each one fails in same way to meet the previous listed requirements
6. 息 2013 SAP AG or an SAP affiliate company. All rights reserved. 6
Proposed solution
A framework for COntrolled COntentious and Malicious (COCOMA) patterns:
deliberately make the platform misbehave contention, faults and attacks.
Cloud
Infrastructure
SuT UnknownCOCOMA
Experimenters will be able to:
study their system under real world effects conditions
control those conditions
reproduce exact conditions
correlate conditions and results of their system under test
use those findings to discover weaknesses and tune/enhance their system
7. 息 2013 SAP AG or an SAP affiliate company. All rights reserved. 7
Design principles
Separation of concerns
Provide unified and coherent interface
for staging experiments
Allowing advanced workload patterns
via distribution algorithms
Abstraction from low level tools
Easy composition of complex
patterns
Easy extensibility
Virtual Machine
COCOMA
Distribution
Algorithms
Distribution
Algorithms
Distribution
Algorithms
Stressapptest
Lookbusy
Iperf
IperfIperfetc.
Emulators
Resource contention
Experiment
XML
8. 息 2013 SAP AG or an SAP affiliate company. All rights reserved. 8
Concepts and terminology
An emulation is composed of pairs
of distribution-emulator
An emulator in a distribution is
bound to a specific resource type
A distribution is a workload trend
Distributions are broken down into
multiple runs to create the desired
trend
Runs are single instantiation of low
level tools
For complex scenarios users can
specify multiple distribution-
emulator pairs
Emulation Emulator
Distribution Resource-Type
e.g. CPU, RAM, Net, I/O
1 *
1
*
1
1 1
1
- type
- duration
- granularity
- parameters[] e.g. linear, trapezoidal, exp,
trace
Emulation time
Distribution-1
(CPU)
Distribution-2
(RAM)
Distribution-3 (NET)
Distribution-4
(I/O)
t1 tn
9. 息 2013 SAP AG or an SAP affiliate company. All rights reserved. 9
Use case: How resource contention affect users for different
physical resources partitioning configurations
Test environment: single machine with 4 CPU cores and 16 GB of RAM
CPU (cores) RAM (GB range)
L (Low) 1 1 - 4
ML (Medium-Low) 2 5 - 8
MH (Medium-High) 3 9 - 12
H (High) 4 13 -16
COCOMA SuT Inactive VM Free resources
Conf CPU RAM Num VMs CPU RAM CPU RAM CPU RAM
1 MH H 1 L L n/a n/a n/a n/a
2 ML ML 1 ML ML n/a n/a n/a n/a
3 L L 1 MH H n/a n/a n/a n/a
4 MH H 3 L L n/a n/a n/a n/a
5 MH L 3 L L n/a n/a n/a n/a
6 L H 1 H L n/a n/a n/a n/a
7 ML ML 1 L L n/a n/a L ML
8 ML ML 1 L L L ML n/a n/a
Each configuration
represents VMs with
different resources
assigned, providing a
specific physical
resources partitioning
We abstract resources values with ranges from
Low to High
10. 息 2013 SAP AG or an SAP affiliate company. All rights reserved. 10
Validation use case
sysbench CPU benchmark
calculating the first 100K prime
numbers
memspeed RAM benchmark
tiobench IO benchmark
11. 息 2013 SAP AG or an SAP affiliate company. All rights reserved. 11
Stakeholders
Performance testers/engineers
Cloud Service Providers
Cloud Application Administrators
Application Developers and Testers
Benchmarks and Standards Groups
12. 息 2013 SAP AG or an SAP affiliate company. All rights reserved. 12
Summary
Experimenters need means that recreate conditions in cloud-like multy-tenancy
shared systems to enable testing their solutions
Current approaches are limited as they fail to meet requirements for such
environments
We proposed a solution that
abstracts from low level tools
enables from simple to complex repeatable scenarios, reducing experimenters effort
allows investigating the system behaviour and correlate it to the specific conditions
created
A simple validation use case was presented
13. 息 2013 SAP AG or an SAP affiliate company. All rights reserved. 13
Future work
Implementing malicious and faultiness modules, and relative
distributions
Real system workload traces parsing and replay capability
Distributed COCOMA solution to enable large complex distributed
scenarios
Web UI to manage the framework
Contention at CPU cache level
Tool to record system traces in a COCOMA compatible format for reply
purpose
Extend emulation analyser for distributed version to get warnings for
large scenarios' emulations
14. 息 2013 SAP AG or an SAP affiliate company. All rights reserved. 14
Acknowledgements
This work has being carried out within the BonFIRE project, which has received
research funding from the ECs Seventh Framework Programs (EU ICT-2009-
257386 IP under the Information and Communication Technologies Program)
15. Thank you
Contact information:
Carmelo Ragusa
SAP HANA Cloud Infrastructure, Belfast
carmelo.ragusa@sap.com
COCOMA is released as Open Source under Apache v2 license:
https://github.com/cragusa/cocoma
Editor's Notes
Challenges for multi-tenancy include optimisation of resource sharing and guaranteed isolation against physical limitations, co-located faults and malicious attacks. For these reasons testing the performance and resilience of applications with different hardware, platforms, configurations, resource sharing conditions and user loads is important for increasing the assurance that providers and consumers have in cloud applications.
Scalability: given that customer applications can vary from 1 to 1000s of nodes, it must be possible to readily set up and execute useful testing environments for 1 to n number of independent hosts and network elements, avoiding cumbersome, error-prone configuration.Reproducibility: it must be possible to easily repeat testing conditions and tests in order to perform viable regression testing and make reliable claims about software quality.Portability: as hardware and virtualization technologies change, or as applications may be migrated to alternative data centers and platforms, it should be possible to easily reuse and recreate test designs and conditions during these changes.Extensibility: in addition to portability, test designs will need to be modified over time in order to take into account changes in quality constraints, scope, expected demands and software functionality. Systematic, guided procedures for modifying and extending test designs and mechanisms for these changes are necessary, as opposed to starting from scratch each time or making changes without knowledge of all dependencies.Self-containment: it is desirable to have a single top level solution, operational interface and workflows for designing and executing tests as opposed to the tester having to switch between multiple contexts and tools.Controllability: in spite of abstraction and higher level tooling, there is still a need to have control over the behaviour of resources used in the test, minimising the amount of disturbances that might provide unknown variations in test results.
Each requirement is assessed considering four different perspectives that arise in real-world systems with changing business priorities and technologies:Different test types (e.g. functional, load, security) need to be performed.A variety of resource kinds (i.e. network, disk, CPU, memory) need to be manipulated.Heterogeneity of physical hosts, nodes and devices.Given different customers, different scenarios and workload mixes have to be considered in parallel.
The main principles behind the design of the framework are the abstraction from the lower level tools that are used to create loads over the resources, allowing to emulate the wanted contention, as well as the separation of concerns, providing an effective modularisation of the tool which enables easy extensibility and additions of emulators and distributions.
A distribution creates the trend through runs as in a sampling process
Given a share physical machine, we divide the physical resources to VMs (SuT and COCOMA) each time essentially with a different percentage of the total.
CPU, the SuT CPU benchmark degradation over the percentage of CPU used by COCOMAThree clusters can be identifiedin the highest one there are configurations 1, 4 and 5 which according to table II have all the MH setup for the CPU for COCOMAbelow there are configurations 2, 7 and 8, which have the ML setup for the CPU for COCOMAand in the lowest part there are 6 and 3, with COCOMA CPU setup as LAs expected the more physical CPU is controlled by COCOMA the more the SuT is affected when CPU intensive operations are performedRAM, COCOMA used the maximum assigned RAM while increasing the number of threads performing writing operations on the RAMAmount of RAM and overall number of VMs assigned to COCOMA influence the resultsconfigurations 1, 2, 3 and 6, which have all 2 VMs in total, the more amount of RAM is assigned to COCOMA the more the SuT is affectedif we compare configuration 5 and 1, the difference in results is due only to the total number of VMs, being the only differentiator parameter between the two configurationsIOthe amount of files used for the workload does not make any noticeable difference across all configurationsConfiguration 4 and 5 (with 3 VMs assigned to COCOMA vs 1 to the SuT) suffers most
Performance testers/engineers: practitioners investigating for example new colocation algorithms, and generally in need to create a contentious/malicious environment to conduct their tests;Cloud Service Providers: in this case a service provider may offer performance isolation mechanisms and therefore wants to test the effectiveness of those mechanisms, investigate the possibility of offering those mechanisms, or study what characteristics applications need to coexist;Cloud Application Administrators: administrators may need to check when a system is restored, after some maintenance or a crash, that performance isolation mechanisms are working correctly;Application Developers and Testers: application developers may want to investigate the effects of contention over their system, while testers may want to check if providers isolation mechanisms work sufficiently;Benchmarks and Standards Groups: in this case it can be used to validate cloud patterns and workloads under investigation and/or characterisation.
On the malicious part, we are looking into covert channels (or side channels) at cache level to infer other processes information and data, as well as at network level to get information about co-located guests, such as IP addresses. The latter could be used by fuzzers to send malicious workloads, or do a DoS.
BonFIRE is an EU project which is designing, building and operating a multi-site cloud-based facility on top of six infrastructure offering heterogeneous Cloud resources, including compute, storage and network. BonFIRE is geared towards experimentation and research into Cloud/IoS, and offers the facilities to easily create, manage and monitor experiments, whilst giving the experimenters more information and control of Cloud resources than what is offered by other public Cloud providers.