ݺߣ

ݺߣShare a Scribd company logo
Image from: wikipedia.org/wiki/Systematic_review,2017
Data Stream
Processing Is the real-time processing of data continuously,
concurrently, and in a data-by-data fashion. SP treats
data as a continuous infinite stream integrated from
sources.
BIG DATA STREAM PROCESSING
SP
Stream Processing
This devices/social media/web content/…
generate massive stream signals denoted as “Big
Data Streams”.
BD
Big Data
In contrast traditional big data approaches, where
constraints of responsive real-time, mobility
problems, and energy availability aren’t
considered.
Mohammed Alayyoub, Ali Yazici, and Ziya
Karakaya. (2016). A Systematic Mapping Study
for Big Data Stream Processing Frameworks.
JADI - Brazi, vol.2, pp 4-11.
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Stream Processing Framework
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Stream Processing Framework
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Stream Processing Framework
summarizing the results of the included studies
RESULTS OF SYSTEMATIC REVIEW
RQ 1. What types of contributions are made by the papers?
A Systematic Mapping
Study for Big Data
Stream
Processing Frameworks
[Mohammed Alayyoub et al, 2016]
Contributions
Method/Technique/Approach : 35
Framework : 11
Comparison : 11
Analysis : 10
Other : 7
Model: 6
Tool : 5
Platform : 5
Overview : 4
Architecture : 4
Empirical Study: 3
nine research questions (RQs)
451 candidate studies from the selected sources.
91 studies that were conducted
between 2010 and 2015 were evaluated.
RESULTS OF SYSTEMATIC REVIEW
RQ 2. What type of research methods are used in the papers?
Solution Proposal: A solution for a problem is proposed.
Validation Research: Techniques investigated have not yet been
implemented.
Evaluation Research: Techniques are implemented in practice
and an evaluation of the technique is conducted.
Experience Papers: explain on what and how something has
been done in practice.
Research Methods
Solution Proposal : 20
Validation Research : 39
Evaluation Research : 31
Experience Papers : 1
A Systematic Mapping
Study for Big Data
Stream
Processing Frameworks
[Mohammed Alayyoub et al, 2016]
RESULTS OF SYSTEMATIC REVIEW
RQ 3. What type of research methods are used for each of the framework in the papers?
Spark Storm Flink InfoSphere
0
3
6
9
12
15
Spark
S4
0
3
6
9
12
15
Storm
0
3
6
9
12
15
Flink
0
3
6
9
12
15
InfoSphere
0
3
6
9
12
15
S4
Solution Proposal
Validation Research
Evaluation Research
Research methods for each SP
RESULTS OF SYSTEMATIC REVIEW
RQ 9. What type(s) of data is used most for each Big Data stream processing framework?
Sensors Social Media Graphical Geospatial
0
2
4
6
8
Sensor
Log data
0
2
4
6
8
Social Media
0
2
4
6
8
Graphical
0
2
4
6
8
Geospatial
0
2
4
6
8
Log data
0
2
4
6
8
Web Content
Web Content
RESULTS OF SYSTEMATIC REVIEW
RQ 5. What is the ratio of experimentation type (batch only, stream only or both) used for
each Big Data stream processing framework in the papers?
Spark Storm Flink InfoSphere
0
5
10
15
20
25
Spark
S4
0
5
10
15
20
25
Storm
Batch
Streaming
Both
0
5
10
15
20
25
Flink
0
5
10
15
20
25
InfoSphere
0
5
10
15
20
25
S4
Experimentation forms
RESULTS OF SYSTEMATIC REVIEW
RQ 4. What is the annual number of publications for each Big Data stream processing framework?
0
3
6
9
12
15
2009 2010 2011 2012 2013 2014 2015 2016
Spark Storm Flink InfoSphere S4
RESULTS OF SYSTEMATIC REVIEW
RQ 6. What is the ratio of contribution purposes (usage enhancement, performance
enhancement or both) for each Big Data stream processing framework in the papers?
Spark Storm Flink InfoSphere
0
3
6
9
12
15
Spark
S4
0
3
6
9
12
15
Storm
Usage enhancement
Performance enhancement
Both
0
3
6
9
12
15
Flink
0
3
6
9
12
15
InfoSphere
0
3
6
9
12
15
S4
RESULTS OF SYSTEMATIC REVIEW
RQ 7. Which data ingestion internal source/tool is used most for each framework?
Kafka
Client library to build SP apps.
RabbitMQ ZeroMQ
asynchronous message queue
Network Socket
0
5
Kafka
Twitter Streaming API
0
5
RabbitMQ
0
5
0MQ
0
5
Network Socket
0
5
Twitter Streaming API
Third party tool to ingest data from external sources
Streams API Libraries
RESULTS OF SYSTEMATIC REVIEW
RQ 8. What is the most preferred range for the number of nodes used in experimentation for
each Big Data stream processing framework?
Spark Storm Flink InfoSphere
0
3
6
9
12
15
Spark
S4
0
3
6
9
12
15
Storm
1 – 5 nodes
6 – 20 nodes
20+ nodes
0
3
6
9
12
15
Flink
0
3
6
9
12
15
InfoSphere
0
3
6
9
12
15
S4
Questions

More Related Content

حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Stream Processing Framework

  • 2. Data Stream Processing Is the real-time processing of data continuously, concurrently, and in a data-by-data fashion. SP treats data as a continuous infinite stream integrated from sources.
  • 3. BIG DATA STREAM PROCESSING SP Stream Processing This devices/social media/web content/… generate massive stream signals denoted as “Big Data Streams”. BD Big Data In contrast traditional big data approaches, where constraints of responsive real-time, mobility problems, and energy availability aren’t considered. Mohammed Alayyoub, Ali Yazici, and Ziya Karakaya. (2016). A Systematic Mapping Study for Big Data Stream Processing Frameworks. JADI - Brazi, vol.2, pp 4-11.
  • 7. summarizing the results of the included studies
  • 8. RESULTS OF SYSTEMATIC REVIEW RQ 1. What types of contributions are made by the papers? A Systematic Mapping Study for Big Data Stream Processing Frameworks [Mohammed Alayyoub et al, 2016] Contributions Method/Technique/Approach : 35 Framework : 11 Comparison : 11 Analysis : 10 Other : 7 Model: 6 Tool : 5 Platform : 5 Overview : 4 Architecture : 4 Empirical Study: 3 nine research questions (RQs) 451 candidate studies from the selected sources. 91 studies that were conducted between 2010 and 2015 were evaluated.
  • 9. RESULTS OF SYSTEMATIC REVIEW RQ 2. What type of research methods are used in the papers? Solution Proposal: A solution for a problem is proposed. Validation Research: Techniques investigated have not yet been implemented. Evaluation Research: Techniques are implemented in practice and an evaluation of the technique is conducted. Experience Papers: explain on what and how something has been done in practice. Research Methods Solution Proposal : 20 Validation Research : 39 Evaluation Research : 31 Experience Papers : 1 A Systematic Mapping Study for Big Data Stream Processing Frameworks [Mohammed Alayyoub et al, 2016]
  • 10. RESULTS OF SYSTEMATIC REVIEW RQ 3. What type of research methods are used for each of the framework in the papers? Spark Storm Flink InfoSphere 0 3 6 9 12 15 Spark S4 0 3 6 9 12 15 Storm 0 3 6 9 12 15 Flink 0 3 6 9 12 15 InfoSphere 0 3 6 9 12 15 S4 Solution Proposal Validation Research Evaluation Research Research methods for each SP
  • 11. RESULTS OF SYSTEMATIC REVIEW RQ 9. What type(s) of data is used most for each Big Data stream processing framework? Sensors Social Media Graphical Geospatial 0 2 4 6 8 Sensor Log data 0 2 4 6 8 Social Media 0 2 4 6 8 Graphical 0 2 4 6 8 Geospatial 0 2 4 6 8 Log data 0 2 4 6 8 Web Content Web Content
  • 12. RESULTS OF SYSTEMATIC REVIEW RQ 5. What is the ratio of experimentation type (batch only, stream only or both) used for each Big Data stream processing framework in the papers? Spark Storm Flink InfoSphere 0 5 10 15 20 25 Spark S4 0 5 10 15 20 25 Storm Batch Streaming Both 0 5 10 15 20 25 Flink 0 5 10 15 20 25 InfoSphere 0 5 10 15 20 25 S4 Experimentation forms
  • 13. RESULTS OF SYSTEMATIC REVIEW RQ 4. What is the annual number of publications for each Big Data stream processing framework? 0 3 6 9 12 15 2009 2010 2011 2012 2013 2014 2015 2016 Spark Storm Flink InfoSphere S4
  • 14. RESULTS OF SYSTEMATIC REVIEW RQ 6. What is the ratio of contribution purposes (usage enhancement, performance enhancement or both) for each Big Data stream processing framework in the papers? Spark Storm Flink InfoSphere 0 3 6 9 12 15 Spark S4 0 3 6 9 12 15 Storm Usage enhancement Performance enhancement Both 0 3 6 9 12 15 Flink 0 3 6 9 12 15 InfoSphere 0 3 6 9 12 15 S4
  • 15. RESULTS OF SYSTEMATIC REVIEW RQ 7. Which data ingestion internal source/tool is used most for each framework? Kafka Client library to build SP apps. RabbitMQ ZeroMQ asynchronous message queue Network Socket 0 5 Kafka Twitter Streaming API 0 5 RabbitMQ 0 5 0MQ 0 5 Network Socket 0 5 Twitter Streaming API Third party tool to ingest data from external sources Streams API Libraries
  • 16. RESULTS OF SYSTEMATIC REVIEW RQ 8. What is the most preferred range for the number of nodes used in experimentation for each Big Data stream processing framework? Spark Storm Flink InfoSphere 0 3 6 9 12 15 Spark S4 0 3 6 9 12 15 Storm 1 – 5 nodes 6 – 20 nodes 20+ nodes 0 3 6 9 12 15 Flink 0 3 6 9 12 15 InfoSphere 0 3 6 9 12 15 S4