This document discusses the need for macrobenchmarks to evaluate the performance and scalability of large model querying systems. It presents the Train Benchmark, which measures the performance of validation queries on randomly generated railway network models of increasing sizes. The benchmark includes loading models, running validation queries to detect errors, transforming models by injecting faults, and revalidating. It aims to provide a realistic and scalable way to assess model querying tools for domains like software engineering, where models can contain billions of elements.
1 of 29
Download to read offline
More Related Content
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
1. Budapest University of Technology and Economics
Department of Measurement and Information Systems
Budapest University of Technology and Economics
Fault Tolerant Systems Research Group
Towards a Macrobenchmark Framework
for Performance Analysis of Java Applications
Gábor Szárnyas, Dániel Varró
3. Model Sizes
Models = graphs with 100M–1B elements
o Car industry
o Avionics
o Software analysis
o Cyber-physical systems
Source: Markus Scheidgen, Automated andTransparent
Model Fragmentation for Persisting Large Models, 2012
application model size
software models 108
sensor data 109
geo-spatial models 1012
Validation may take hours
4. Research Goal
Scalable query engine for evaluating complex
queries on large models.
Latest results are presented in tomorrow’s
session:
o János Maginecz, Gábor Szárnyas:
Sharded Joins for Scalable Incremental Graph Queries
Today: benchmarking
5. Motivation
Jennifer Widom: Tips for Writing Technical Papers
On performance experiments:
It’s easy to do “hokey” or meaningless
experiments, and many papers do.
Many conferences expect experiments.
6. Benchmarks in Software Engineering
Performance experiments = benchmarks
Common goals
o Compare various tools
o Derive performance metrics
o Assess the scalability of the system
Microbenchmarks
o Method-level
o Very difficult and not recommended
Macrobenchmarks
o Application-level
7. INCQUERY-D Architecture
Server 1
Database
shard 1
Server 2
Database
shard 2
Server 3
Database
shard 3
Transaction
Database
shard 0
Server 0
Distributed query evaluation network
Distributed indexer Model access adapter
8. INCQUERY-D Architecture
Server 1
Database
shard 1
Server 2
Database
shard 2
Server 3
Database
shard 3
Transaction
Database
shard 0
Server 0
Distributed query evaluation network
Distributed indexer Model access adapter
Distributed storage
9. INCQUERY-D Architecture
Server 1
Database
shard 1
Server 2
Database
shard 2
Server 3
Database
shard 3
Transaction
Database
shard 0
Server 0
Distributed query evaluation network
Distributed indexer Model access adapter
Distributed indexing
Distributed storage
10. INCQUERY-D Architecture
Server 1
Database
shard 1
Server 2
Database
shard 2
Server 3
Database
shard 3
Transaction
Database
shard 0
Server 0
Distributed query evaluation network
Distributed indexer Model access adapter
Distributed indexing
Distributed storage
Distributed query network
11. INCQUERY-D Architecture
Server 1
Database
shard 1
Server 2
Database
shard 2
Server 3
Database
shard 3
Transaction
Database
shard 0
Server 0
Distributed query evaluation network
Distributed indexer Model access adapter
12. INCQUERY-D Architecture
Server 1
Database
shard 1
Server 2
Database
shard 2
Server 3
Database
shard 3
Transaction
Database
shard 0
Server 0
Distributed query evaluation network
Indexer Indexer Indexer Indexer
Join
Join
Antijoin
13. INCQUERY-D Architecture
Server 1
Database
shard 1
Server 2
Database
shard 2
Server 3
Database
shard 3
Transaction
Database
shard 0
Server 0
Distributed query evaluation network
Indexer Indexer Indexer Indexer
Join
Join
Antijoin
Separate processes
14. Operating System
Server 1
Database
shard 1
Server 2
Database
shard 2
Server 3
Database
shard 3
Transaction
Database
shard 0
OS OS OS OS
Server 0
Indexer Indexer Indexer Indexer
Join
Join
Antijoin
• File system caching
• Scheduled jobs
• Swapping
15. Cloud Noise
Server 1
Database
shard 1
Server 2
Database
shard 2
Server 3
Database
shard 3
Transaction
Database
shard 0
OS OS OS OS
Server 0
Indexer Indexer Indexer Indexer
Join
Join
Antijoin
Other VMs running
on the same server
16. Parallel Execution
Server 1
Database
shard 1
Server 2
Database
shard 2
Server 3
Database
shard 3
Transaction
Database
shard 0
OS OS OS OS
Server 0
Indexer Indexer Indexer Indexer
Join
Join
Antijoin
Multiple processes
in the same OS
Multithreaded
processes
17. Threats to Validity
Operating system: caching, scheduled jobs
Parallel processes
Multithreaded execution
Environmental noise in the cloud
The hardest: Managed Runtime Environments
o Java Virtual Machine
o .NET CLR
22. The Train Benchmark
Domain: railway network validation
Goal: measure the scalability of query evaluation
Supports various representations
Randomly generated instance models
o Increasing sizes
o Close to real-world instances
o Faults are inserted during generation
23. Railway Model
Synthetic model
Customizable model generator
Semaphore
Route
Segment
Switch
Entry
Exit
Switch position
of the Route
Current
position
of the Switch
24. Validation Queries
Well-formedness constraints
Queries are looking for error patterns
o Lots of filtering, joins, etc.
Transformations
o Fault injections
o Quick fix-like repair operations
25. Benchmark Phases
1. Load 3. Transform 4. Revalidate2. Validate
Iteration: × 10Run: × 5
Change set size
Model
increasing
size
Query Measure-
ments
Framework features
o Automated build and unit testing
o Automated visualization
Warmup
Warmup
27. Some Conclusions
Very hard to tell the actual memory consumption
o Manual calls to the garbage collector – not enough
o Profiler – still not enough
o Setting a hard limit is the best approach
Benchmarking is difficult
o Lots of factors can add noise to the results
o Nothing works on the first try
Visualization helps a lot
o Worth investing time to learn R
o Not a nice language, but very productive
28. Related Publications
Benchmark framework
Szárnyas, G., Izsó, B., Ráth, I., and Varró, D.,
The Train Benchmark for Evaluating the Performance of Continuous Model
Validation, SOSYM journal (work-in-progress)
Izsó, B., Szárnyas, G., Ráth, I., and Varró, D.,
MONDO-SAM: A Framework to Systematically Assess MDE Scalability,
2nd Workshop on Scalable Model Driven Engineering, 2014
Benchmark applications
Szárnyas, G., Semeráth, O., Ráth, I., and Varró, D.,
The TTC 2015 Train Benchmark Case for Incremental Model Validation,
Transformation Tool Contest, 2015
Szárnyas, G., Izsó, B., Ráth, I., Harmath, D., Bergmann, G., and Varró, D.,
IncQuery-D: A Distributed Incremental Model Query Framework in the Cloud,
ACM/IEEE 17th International Conference on Model Driven Engineering Languages
and Systems, 2014
Izsó, B., Szárnyas, G., Ráth, I., and Varró, D.,
IncQuery-D: Incremental Graph Search in the Cloud,
Proceedings of the Workshop on Scalability in Model Driven Engineering, 2013