ݺߣ

ݺߣShare a Scribd company logo
Budapest University of Technology and Economics
Department of Measurement and Information Systems
Budapest University of Technology and Economics
Fault Tolerant Systems Research Group
Towards a Macrobenchmark Framework
for Performance Analysis of Java Applications
Gábor Szárnyas, Dániel Varró
Performance
issues
Model-Driven Development
Modeling
Code
generation
Testing
Early validations
Transformations
Scalability
challenges
Model Sizes
 Models = graphs with 100M–1B elements
o Car industry
o Avionics
o Software analysis
o Cyber-physical systems
Source: Markus Scheidgen, Automated andTransparent
Model Fragmentation for Persisting Large Models, 2012
application model size
software models 108
sensor data 109
geo-spatial models 1012
Validation may take hours
Research Goal
 Scalable query engine for evaluating complex
queries on large models.
 Latest results are presented in tomorrow’s
session:
o János Maginecz, Gábor Szárnyas:
Sharded Joins for Scalable Incremental Graph Queries
 Today: benchmarking
Motivation
 Jennifer Widom: Tips for Writing Technical Papers
 On performance experiments:
It’s easy to do “hokey” or meaningless
experiments, and many papers do.
Many conferences expect experiments.
Benchmarks in Software Engineering
 Performance experiments = benchmarks
 Common goals
o Compare various tools
o Derive performance metrics
o Assess the scalability of the system
 Microbenchmarks
o Method-level
o Very difficult and not recommended
 Macrobenchmarks
o Application-level
INCQUERY-D Architecture
Server 1
Database
shard 1
Server 2
Database
shard 2
Server 3
Database
shard 3
Transaction
Database
shard 0
Server 0
Distributed query evaluation network
Distributed indexer Model access adapter
INCQUERY-D Architecture
Server 1
Database
shard 1
Server 2
Database
shard 2
Server 3
Database
shard 3
Transaction
Database
shard 0
Server 0
Distributed query evaluation network
Distributed indexer Model access adapter
Distributed storage
INCQUERY-D Architecture
Server 1
Database
shard 1
Server 2
Database
shard 2
Server 3
Database
shard 3
Transaction
Database
shard 0
Server 0
Distributed query evaluation network
Distributed indexer Model access adapter
Distributed indexing
Distributed storage
INCQUERY-D Architecture
Server 1
Database
shard 1
Server 2
Database
shard 2
Server 3
Database
shard 3
Transaction
Database
shard 0
Server 0
Distributed query evaluation network
Distributed indexer Model access adapter
Distributed indexing
Distributed storage
Distributed query network
INCQUERY-D Architecture
Server 1
Database
shard 1
Server 2
Database
shard 2
Server 3
Database
shard 3
Transaction
Database
shard 0
Server 0
Distributed query evaluation network
Distributed indexer Model access adapter
INCQUERY-D Architecture
Server 1
Database
shard 1
Server 2
Database
shard 2
Server 3
Database
shard 3
Transaction
Database
shard 0
Server 0
Distributed query evaluation network
Indexer Indexer Indexer Indexer
Join
Join
Antijoin
INCQUERY-D Architecture
Server 1
Database
shard 1
Server 2
Database
shard 2
Server 3
Database
shard 3
Transaction
Database
shard 0
Server 0
Distributed query evaluation network
Indexer Indexer Indexer Indexer
Join
Join
Antijoin
Separate processes
Operating System
Server 1
Database
shard 1
Server 2
Database
shard 2
Server 3
Database
shard 3
Transaction
Database
shard 0
OS OS OS OS
Server 0
Indexer Indexer Indexer Indexer
Join
Join
Antijoin
• File system caching
• Scheduled jobs
• Swapping
Cloud Noise
Server 1
Database
shard 1
Server 2
Database
shard 2
Server 3
Database
shard 3
Transaction
Database
shard 0
OS OS OS OS
Server 0
Indexer Indexer Indexer Indexer
Join
Join
Antijoin
Other VMs running
on the same server
Parallel Execution
Server 1
Database
shard 1
Server 2
Database
shard 2
Server 3
Database
shard 3
Transaction
Database
shard 0
OS OS OS OS
Server 0
Indexer Indexer Indexer Indexer
Join
Join
Antijoin
Multiple processes
in the same OS
Multithreaded
processes
Threats to Validity
 Operating system: caching, scheduled jobs
 Parallel processes
 Multithreaded execution
 Environmental noise in the cloud
 The hardest: Managed Runtime Environments
o Java Virtual Machine
o .NET CLR
Managed Runtime Environments
Just-in-time
compilation
Runtime
optimization
The Effect of Warmup
First
execution time
Median
execution time
of 5 executions
Two Java-based
query engines
Perform
multiple
executions
CASE STUDY:
THE TRAIN BENCHMARK
Gábor Szárnyas, Benedek Izsó, István Ráth, Dániel Varró
2016
Database Benchmarks
Criteria for domain-specific benchmarks
(Jim Gray, Benchmark Handbook, 1993):
 Relevant
 Scaleable
 Portable
 Simple
Two-phase commit, data cubes,
ACID transactions
The Train Benchmark
 Domain: railway network validation
 Goal: measure the scalability of query evaluation
 Supports various representations
 Randomly generated instance models
o Increasing sizes
o Close to real-world instances
o Faults are inserted during generation
Railway Model
 Synthetic model
 Customizable model generator
Semaphore
Route
Segment
Switch
Entry
Exit
Switch position
of the Route
Current
position
of the Switch
Validation Queries
 Well-formedness constraints
 Queries are looking for error patterns
o Lots of filtering, joins, etc.
 Transformations
o Fault injections
o Quick fix-like repair operations
Benchmark Phases
1. Load 3. Transform 4. Revalidate2. Validate
Iteration: × 10Run: × 5
Change set size
Model
increasing
size
Query Measure-
ments
 Framework features
o Automated build and unit testing
o Automated visualization
Warmup
Warmup
Model Transformation
7 OOMs
Some Conclusions
 Very hard to tell the actual memory consumption
o Manual calls to the garbage collector – not enough
o Profiler – still not enough
o Setting a hard limit is the best approach
 Benchmarking is difficult
o Lots of factors can add noise to the results
o Nothing works on the first try
 Visualization helps a lot
o Worth investing time to learn R
o Not a nice language, but very productive
Related Publications
Benchmark framework
 Szárnyas, G., Izsó, B., Ráth, I., and Varró, D.,
The Train Benchmark for Evaluating the Performance of Continuous Model
Validation, SOSYM journal (work-in-progress)
 Izsó, B., Szárnyas, G., Ráth, I., and Varró, D.,
MONDO-SAM: A Framework to Systematically Assess MDE Scalability,
2nd Workshop on Scalable Model Driven Engineering, 2014
Benchmark applications
 Szárnyas, G., Semeráth, O., Ráth, I., and Varró, D.,
The TTC 2015 Train Benchmark Case for Incremental Model Validation,
Transformation Tool Contest, 2015
 Szárnyas, G., Izsó, B., Ráth, I., Harmath, D., Bergmann, G., and Varró, D.,
IncQuery-D: A Distributed Incremental Model Query Framework in the Cloud,
ACM/IEEE 17th International Conference on Model Driven Engineering Languages
and Systems, 2014
 Izsó, B., Szárnyas, G., Ráth, I., and Varró, D.,
IncQuery-D: Incremental Graph Search in the Cloud,
Proceedings of the Workshop on Scalability in Model Driven Engineering, 2013
Ω

More Related Content

Towards a Macrobenchmark Framework for Performance Analysis of Java Applications

  • 1. Budapest University of Technology and Economics Department of Measurement and Information Systems Budapest University of Technology and Economics Fault Tolerant Systems Research Group Towards a Macrobenchmark Framework for Performance Analysis of Java Applications Gábor Szárnyas, Dániel Varró
  • 3. Model Sizes  Models = graphs with 100M–1B elements o Car industry o Avionics o Software analysis o Cyber-physical systems Source: Markus Scheidgen, Automated andTransparent Model Fragmentation for Persisting Large Models, 2012 application model size software models 108 sensor data 109 geo-spatial models 1012 Validation may take hours
  • 4. Research Goal  Scalable query engine for evaluating complex queries on large models.  Latest results are presented in tomorrow’s session: o János Maginecz, Gábor Szárnyas: Sharded Joins for Scalable Incremental Graph Queries  Today: benchmarking
  • 5. Motivation  Jennifer Widom: Tips for Writing Technical Papers  On performance experiments: It’s easy to do “hokey” or meaningless experiments, and many papers do. Many conferences expect experiments.
  • 6. Benchmarks in Software Engineering  Performance experiments = benchmarks  Common goals o Compare various tools o Derive performance metrics o Assess the scalability of the system  Microbenchmarks o Method-level o Very difficult and not recommended  Macrobenchmarks o Application-level
  • 7. INCQUERY-D Architecture Server 1 Database shard 1 Server 2 Database shard 2 Server 3 Database shard 3 Transaction Database shard 0 Server 0 Distributed query evaluation network Distributed indexer Model access adapter
  • 8. INCQUERY-D Architecture Server 1 Database shard 1 Server 2 Database shard 2 Server 3 Database shard 3 Transaction Database shard 0 Server 0 Distributed query evaluation network Distributed indexer Model access adapter Distributed storage
  • 9. INCQUERY-D Architecture Server 1 Database shard 1 Server 2 Database shard 2 Server 3 Database shard 3 Transaction Database shard 0 Server 0 Distributed query evaluation network Distributed indexer Model access adapter Distributed indexing Distributed storage
  • 10. INCQUERY-D Architecture Server 1 Database shard 1 Server 2 Database shard 2 Server 3 Database shard 3 Transaction Database shard 0 Server 0 Distributed query evaluation network Distributed indexer Model access adapter Distributed indexing Distributed storage Distributed query network
  • 11. INCQUERY-D Architecture Server 1 Database shard 1 Server 2 Database shard 2 Server 3 Database shard 3 Transaction Database shard 0 Server 0 Distributed query evaluation network Distributed indexer Model access adapter
  • 12. INCQUERY-D Architecture Server 1 Database shard 1 Server 2 Database shard 2 Server 3 Database shard 3 Transaction Database shard 0 Server 0 Distributed query evaluation network Indexer Indexer Indexer Indexer Join Join Antijoin
  • 13. INCQUERY-D Architecture Server 1 Database shard 1 Server 2 Database shard 2 Server 3 Database shard 3 Transaction Database shard 0 Server 0 Distributed query evaluation network Indexer Indexer Indexer Indexer Join Join Antijoin Separate processes
  • 14. Operating System Server 1 Database shard 1 Server 2 Database shard 2 Server 3 Database shard 3 Transaction Database shard 0 OS OS OS OS Server 0 Indexer Indexer Indexer Indexer Join Join Antijoin • File system caching • Scheduled jobs • Swapping
  • 15. Cloud Noise Server 1 Database shard 1 Server 2 Database shard 2 Server 3 Database shard 3 Transaction Database shard 0 OS OS OS OS Server 0 Indexer Indexer Indexer Indexer Join Join Antijoin Other VMs running on the same server
  • 16. Parallel Execution Server 1 Database shard 1 Server 2 Database shard 2 Server 3 Database shard 3 Transaction Database shard 0 OS OS OS OS Server 0 Indexer Indexer Indexer Indexer Join Join Antijoin Multiple processes in the same OS Multithreaded processes
  • 17. Threats to Validity  Operating system: caching, scheduled jobs  Parallel processes  Multithreaded execution  Environmental noise in the cloud  The hardest: Managed Runtime Environments o Java Virtual Machine o .NET CLR
  • 19. The Effect of Warmup First execution time Median execution time of 5 executions Two Java-based query engines Perform multiple executions
  • 20. CASE STUDY: THE TRAIN BENCHMARK Gábor Szárnyas, Benedek Izsó, István Ráth, Dániel Varró 2016
  • 21. Database Benchmarks Criteria for domain-specific benchmarks (Jim Gray, Benchmark Handbook, 1993):  Relevant  Scaleable  Portable  Simple Two-phase commit, data cubes, ACID transactions
  • 22. The Train Benchmark  Domain: railway network validation  Goal: measure the scalability of query evaluation  Supports various representations  Randomly generated instance models o Increasing sizes o Close to real-world instances o Faults are inserted during generation
  • 23. Railway Model  Synthetic model  Customizable model generator Semaphore Route Segment Switch Entry Exit Switch position of the Route Current position of the Switch
  • 24. Validation Queries  Well-formedness constraints  Queries are looking for error patterns o Lots of filtering, joins, etc.  Transformations o Fault injections o Quick fix-like repair operations
  • 25. Benchmark Phases 1. Load 3. Transform 4. Revalidate2. Validate Iteration: × 10Run: × 5 Change set size Model increasing size Query Measure- ments  Framework features o Automated build and unit testing o Automated visualization Warmup Warmup
  • 27. Some Conclusions  Very hard to tell the actual memory consumption o Manual calls to the garbage collector – not enough o Profiler – still not enough o Setting a hard limit is the best approach  Benchmarking is difficult o Lots of factors can add noise to the results o Nothing works on the first try  Visualization helps a lot o Worth investing time to learn R o Not a nice language, but very productive
  • 28. Related Publications Benchmark framework  Szárnyas, G., Izsó, B., Ráth, I., and Varró, D., The Train Benchmark for Evaluating the Performance of Continuous Model Validation, SOSYM journal (work-in-progress)  Izsó, B., Szárnyas, G., Ráth, I., and Varró, D., MONDO-SAM: A Framework to Systematically Assess MDE Scalability, 2nd Workshop on Scalable Model Driven Engineering, 2014 Benchmark applications  Szárnyas, G., Semeráth, O., Ráth, I., and Varró, D., The TTC 2015 Train Benchmark Case for Incremental Model Validation, Transformation Tool Contest, 2015  Szárnyas, G., Izsó, B., Ráth, I., Harmath, D., Bergmann, G., and Varró, D., IncQuery-D: A Distributed Incremental Model Query Framework in the Cloud, ACM/IEEE 17th International Conference on Model Driven Engineering Languages and Systems, 2014  Izsó, B., Szárnyas, G., Ráth, I., and Varró, D., IncQuery-D: Incremental Graph Search in the Cloud, Proceedings of the Workshop on Scalability in Model Driven Engineering, 2013
  • 29. Ω