This document outlines research on using virtual clusters and stream processing topologies for integrating sensor data streams. It describes using the Wirbelsturm tool to set up a virtual cluster on which Storm topologies can be deployed to perform real-time processing of sensor observations modeled with ontologies. A use case of integrating heterogeneous environmental sensor data from a sensor cloud is presented where the SSN and SWEET ontologies are used.
1 of 16
More Related Content
Virtual Clusters for (RDF) Stream Processing
1. Alejandro Llaves
Ontology Engineering Group
Universidad Politécnica de Madrid
Madrid, Spain
allaves@fi.upm.es
Oct 21 2015
Virtual Clusters for
(RDF) Stream Processing
4. Motivation
?
Integrating an unbounded stream of heterogeneous
sensor observations
?
Solution:
– Storm topologies for real-time processing
– Semantic Sensor Network (SSN) ontology for
modelling observations
– SWEET ontology for environmental phenomena
5. Use case: Sensor Cloud data integration (1/3)
Sensor Cloud
?
Viticulture, water
management, weather
monitoring, oyster farming...
?
RESTful API – JSON
?
Network → Platform →
Sensor → Phenomenon →
Observation
?
Lack of semantic
descriptions, e.g.
rain_trace vs Rain.
?
Multiple HTTP requests to
query various streams.
Source: CSIRO
6. Use case: Sensor Cloud data integration (2/3)
?
Sensor Cloud messages to field-named tuples
?
SWEET annotations for heterogeneous phenomena descriptions
<sample?time=”2015?05?28T16:30”?value=”15”?sensor=”bom_gov_au.94961.air.air_temp”/>
[“2015?05?28T16:32”,?“2015?05?28T16:30”,?“15”,?“bom_gov_au”,?“94961”,?“air”,?“air_temp”,
“?43.3167”,?“147.0075”]
network
phenomenon
platform sensorsampling time
system time
latitude longitude
SensorCloudParser
Bolt
SweetAnnotations
Bolt
7. Use case: Sensor Cloud data integration (3/3)
SSN mapping
SSNConverter
Bolt
8. Topologies everywhere
?
A Storm topology “is a graph of stream transformations
where each node is a spout or bolt”.
https://storm.apache.org/documentation/Tutorial.html
?
Example of simple topology
11. Setting up a virtual cluster (1/2)
Wirbelsturm - https://github.com/miguno/wirbelsturm/
?
Allows deploying (local or remote) virtual clusters.
?
Focus on Big Data technologies: Storm, Kafka,
Zookeeper...
?
Uses Vagrant for “easy to configure, reproducible, and
portable work environments” - https://docs.vagrantup.com/v2/why-vagrant/index.html
?
Uses Puppet for provisioning: installation and
configuration of SW packages in the cluster nodes.
12. Setting up a virtual cluster (2/2)
?
$?./deploy
?
Show wirbelsturm.yaml
?
Check Storm GUI -
http://localhost:28080/index.html
15. Conclusion
Conclusion
?
Wirbelsturm allows easy configuration & deployment of virtual clusters,
with focus on Big Data technologies.
?
SSN and SWEET ontologies to model and integrate environmental
sensor observations.
?
Parallelization of bottleneck tasks reduces the average message
processing latency (up to some extent). More about Storm
parallelization: http://bit.ly/1NVyjU2
?
Delaying RDF conversion does not speed up the processing of Sensor
Cloud messages in the tested environment.
?
Submitted paper to IJSWIS, special issue on Velocity and Variety
Dimensions of Big Data – Llaves, Corcho et al.
What's coming next
?
Flying faster with Heron - https://blog.twitter.com/2015/flying-faster-with-twitter-heron
16. The presented research has has been funded by Ministerio de
Economía y Competitividad (Spain) under the project ”4V:
Volumen, Velocidad, Variedad y Validez en la Gestión Innovadora
de Datos” (TIN2013-46238-C4-2-R), by the EU Marie Curie
IRSES project SemData (612551), and supported by an AWS in
Education Research Grant award.
Alejandro Llaves
allaves@fi.upm.es
Thanks!