狠狠撸

狠狠撸Share a Scribd company logo
Alejandro Llaves
Ontology Engineering Group
Universidad Politécnica de Madrid
Madrid, Spain
allaves@fi.upm.es
Oct 21 2015
Virtual Clusters for
(RDF) Stream Processing
Outline
?
Some context: morph-streams++
?
Motivation
?
Use case: Sensor Cloud data integration
?
Topologies everywhere
?
Setting up a virtual cluster
?
Deploying Storm topologies
?
Conclusion
Some context...
Motivation
?
Integrating an unbounded stream of heterogeneous
sensor observations
?
Solution:
– Storm topologies for real-time processing
– Semantic Sensor Network (SSN) ontology for
modelling observations
– SWEET ontology for environmental phenomena
Use case: Sensor Cloud data integration (1/3)
Sensor Cloud
?
Viticulture, water
management, weather
monitoring, oyster farming...
?
RESTful API – JSON
?
Network → Platform →
Sensor → Phenomenon →
Observation
?
Lack of semantic
descriptions, e.g.
rain_trace vs Rain.
?
Multiple HTTP requests to
query various streams.
Source: CSIRO
Use case: Sensor Cloud data integration (2/3)
?
Sensor Cloud messages to field-named tuples
?
SWEET annotations for heterogeneous phenomena descriptions
<sample?time=”2015?05?28T16:30”?value=”15”?sensor=”bom_gov_au.94961.air.air_temp”/>
[“2015?05?28T16:32”,?“2015?05?28T16:30”,?“15”,?“bom_gov_au”,?“94961”,?“air”,?“air_temp”,
“?43.3167”,?“147.0075”]
network
phenomenon
platform sensorsampling time
system time
latitude longitude
SensorCloudParser
Bolt
SweetAnnotations
Bolt
Use case: Sensor Cloud data integration (3/3)
SSN mapping
SSNConverter
Bolt
Topologies everywhere
?
A Storm topology “is a graph of stream transformations
where each node is a spout or bolt”.
https://storm.apache.org/documentation/Tutorial.html
?
Example of simple topology
Virtual Clusters for (RDF) Stream Processing
Virtual Clusters for (RDF) Stream Processing
Setting up a virtual cluster (1/2)
Wirbelsturm - https://github.com/miguno/wirbelsturm/
?
Allows deploying (local or remote) virtual clusters.
?
Focus on Big Data technologies: Storm, Kafka,
Zookeeper...
?
Uses Vagrant for “easy to configure, reproducible, and
portable work environments” - https://docs.vagrantup.com/v2/why-vagrant/index.html
?
Uses Puppet for provisioning: installation and
configuration of SW packages in the cluster nodes.
Setting up a virtual cluster (2/2)
?
$?./deploy
?
Show wirbelsturm.yaml
?
Check Storm GUI -
http://localhost:28080/index.html
Deploying Storm topologies
?
$?./deploy
?
Show wirbelsturm.yaml
?
Check Storm GUI -
http://localhost:28080/index.html
?
Describe simple topology
?
Compile & deploy
?
Describe a topology set
?
Configure Kafka
?
Compile & deploy
Virtual Clusters for (RDF) Stream Processing
Conclusion
Conclusion
?
Wirbelsturm allows easy configuration & deployment of virtual clusters,
with focus on Big Data technologies.
?
SSN and SWEET ontologies to model and integrate environmental
sensor observations.
?
Parallelization of bottleneck tasks reduces the average message
processing latency (up to some extent). More about Storm
parallelization: http://bit.ly/1NVyjU2
?
Delaying RDF conversion does not speed up the processing of Sensor
Cloud messages in the tested environment.
?
Submitted paper to IJSWIS, special issue on Velocity and Variety
Dimensions of Big Data – Llaves, Corcho et al.
What's coming next
?
Flying faster with Heron - https://blog.twitter.com/2015/flying-faster-with-twitter-heron
The presented research has has been funded by Ministerio de
Economía y Competitividad (Spain) under the project ”4V:
Volumen, Velocidad, Variedad y Validez en la Gestión Innovadora
de Datos” (TIN2013-46238-C4-2-R), by the EU Marie Curie
IRSES project SemData (612551), and supported by an AWS in
Education Research Grant award.
Alejandro Llaves
allaves@fi.upm.es
Thanks!

More Related Content

Virtual Clusters for (RDF) Stream Processing

  • 1. Alejandro Llaves Ontology Engineering Group Universidad Politécnica de Madrid Madrid, Spain allaves@fi.upm.es Oct 21 2015 Virtual Clusters for (RDF) Stream Processing
  • 2. Outline ? Some context: morph-streams++ ? Motivation ? Use case: Sensor Cloud data integration ? Topologies everywhere ? Setting up a virtual cluster ? Deploying Storm topologies ? Conclusion
  • 4. Motivation ? Integrating an unbounded stream of heterogeneous sensor observations ? Solution: – Storm topologies for real-time processing – Semantic Sensor Network (SSN) ontology for modelling observations – SWEET ontology for environmental phenomena
  • 5. Use case: Sensor Cloud data integration (1/3) Sensor Cloud ? Viticulture, water management, weather monitoring, oyster farming... ? RESTful API – JSON ? Network → Platform → Sensor → Phenomenon → Observation ? Lack of semantic descriptions, e.g. rain_trace vs Rain. ? Multiple HTTP requests to query various streams. Source: CSIRO
  • 6. Use case: Sensor Cloud data integration (2/3) ? Sensor Cloud messages to field-named tuples ? SWEET annotations for heterogeneous phenomena descriptions <sample?time=”2015?05?28T16:30”?value=”15”?sensor=”bom_gov_au.94961.air.air_temp”/> [“2015?05?28T16:32”,?“2015?05?28T16:30”,?“15”,?“bom_gov_au”,?“94961”,?“air”,?“air_temp”, “?43.3167”,?“147.0075”] network phenomenon platform sensorsampling time system time latitude longitude SensorCloudParser Bolt SweetAnnotations Bolt
  • 7. Use case: Sensor Cloud data integration (3/3) SSN mapping SSNConverter Bolt
  • 8. Topologies everywhere ? A Storm topology “is a graph of stream transformations where each node is a spout or bolt”. https://storm.apache.org/documentation/Tutorial.html ? Example of simple topology
  • 11. Setting up a virtual cluster (1/2) Wirbelsturm - https://github.com/miguno/wirbelsturm/ ? Allows deploying (local or remote) virtual clusters. ? Focus on Big Data technologies: Storm, Kafka, Zookeeper... ? Uses Vagrant for “easy to configure, reproducible, and portable work environments” - https://docs.vagrantup.com/v2/why-vagrant/index.html ? Uses Puppet for provisioning: installation and configuration of SW packages in the cluster nodes.
  • 12. Setting up a virtual cluster (2/2) ? $?./deploy ? Show wirbelsturm.yaml ? Check Storm GUI - http://localhost:28080/index.html
  • 13. Deploying Storm topologies ? $?./deploy ? Show wirbelsturm.yaml ? Check Storm GUI - http://localhost:28080/index.html ? Describe simple topology ? Compile & deploy ? Describe a topology set ? Configure Kafka ? Compile & deploy
  • 15. Conclusion Conclusion ? Wirbelsturm allows easy configuration & deployment of virtual clusters, with focus on Big Data technologies. ? SSN and SWEET ontologies to model and integrate environmental sensor observations. ? Parallelization of bottleneck tasks reduces the average message processing latency (up to some extent). More about Storm parallelization: http://bit.ly/1NVyjU2 ? Delaying RDF conversion does not speed up the processing of Sensor Cloud messages in the tested environment. ? Submitted paper to IJSWIS, special issue on Velocity and Variety Dimensions of Big Data – Llaves, Corcho et al. What's coming next ? Flying faster with Heron - https://blog.twitter.com/2015/flying-faster-with-twitter-heron
  • 16. The presented research has has been funded by Ministerio de Economía y Competitividad (Spain) under the project ”4V: Volumen, Velocidad, Variedad y Validez en la Gestión Innovadora de Datos” (TIN2013-46238-C4-2-R), by the EU Marie Curie IRSES project SemData (612551), and supported by an AWS in Education Research Grant award. Alejandro Llaves allaves@fi.upm.es Thanks!