The document summarizes the results of a systematic mapping study on big data stream processing frameworks. It examines 91 studies published between 2010-2015. The study addressed 9 research questions, including the types of contributions made by the papers, research methods used, experimentation types for different frameworks, most used data ingestion tools, and preferred number of nodes in experiments. The results provided breakdowns of findings for various frameworks like Spark, Storm, Flink, and InfoSphere across the different research questions.
1 of 17
More Related Content
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Stream Processing Framework
2. Data Stream
Processing Is the real-time processing of data continuously,
concurrently, and in a data-by-data fashion. SP treats
data as a continuous infinite stream integrated from
sources.
3. BIG DATA STREAM PROCESSING
SP
Stream Processing
This devices/social media/web content/…
generate massive stream signals denoted as “Big
Data Streams”.
BD
Big Data
In contrast traditional big data approaches, where
constraints of responsive real-time, mobility
problems, and energy availability aren’t
considered.
Mohammed Alayyoub, Ali Yazici, and Ziya
Karakaya. (2016). A Systematic Mapping Study
for Big Data Stream Processing Frameworks.
JADI - Brazi, vol.2, pp 4-11.
8. RESULTS OF SYSTEMATIC REVIEW
RQ 1. What types of contributions are made by the papers?
A Systematic Mapping
Study for Big Data
Stream
Processing Frameworks
[Mohammed Alayyoub et al, 2016]
Contributions
Method/Technique/Approach : 35
Framework : 11
Comparison : 11
Analysis : 10
Other : 7
Model: 6
Tool : 5
Platform : 5
Overview : 4
Architecture : 4
Empirical Study: 3
nine research questions (RQs)
451 candidate studies from the selected sources.
91 studies that were conducted
between 2010 and 2015 were evaluated.
9. RESULTS OF SYSTEMATIC REVIEW
RQ 2. What type of research methods are used in the papers?
Solution Proposal: A solution for a problem is proposed.
Validation Research: Techniques investigated have not yet been
implemented.
Evaluation Research: Techniques are implemented in practice
and an evaluation of the technique is conducted.
Experience Papers: explain on what and how something has
been done in practice.
Research Methods
Solution Proposal : 20
Validation Research : 39
Evaluation Research : 31
Experience Papers : 1
A Systematic Mapping
Study for Big Data
Stream
Processing Frameworks
[Mohammed Alayyoub et al, 2016]
10. RESULTS OF SYSTEMATIC REVIEW
RQ 3. What type of research methods are used for each of the framework in the papers?
Spark Storm Flink InfoSphere
0
3
6
9
12
15
Spark
S4
0
3
6
9
12
15
Storm
0
3
6
9
12
15
Flink
0
3
6
9
12
15
InfoSphere
0
3
6
9
12
15
S4
Solution Proposal
Validation Research
Evaluation Research
Research methods for each SP
11. RESULTS OF SYSTEMATIC REVIEW
RQ 9. What type(s) of data is used most for each Big Data stream processing framework?
Sensors Social Media Graphical Geospatial
0
2
4
6
8
Sensor
Log data
0
2
4
6
8
Social Media
0
2
4
6
8
Graphical
0
2
4
6
8
Geospatial
0
2
4
6
8
Log data
0
2
4
6
8
Web Content
Web Content
12. RESULTS OF SYSTEMATIC REVIEW
RQ 5. What is the ratio of experimentation type (batch only, stream only or both) used for
each Big Data stream processing framework in the papers?
Spark Storm Flink InfoSphere
0
5
10
15
20
25
Spark
S4
0
5
10
15
20
25
Storm
Batch
Streaming
Both
0
5
10
15
20
25
Flink
0
5
10
15
20
25
InfoSphere
0
5
10
15
20
25
S4
Experimentation forms
13. RESULTS OF SYSTEMATIC REVIEW
RQ 4. What is the annual number of publications for each Big Data stream processing framework?
0
3
6
9
12
15
2009 2010 2011 2012 2013 2014 2015 2016
Spark Storm Flink InfoSphere S4
14. RESULTS OF SYSTEMATIC REVIEW
RQ 6. What is the ratio of contribution purposes (usage enhancement, performance
enhancement or both) for each Big Data stream processing framework in the papers?
Spark Storm Flink InfoSphere
0
3
6
9
12
15
Spark
S4
0
3
6
9
12
15
Storm
Usage enhancement
Performance enhancement
Both
0
3
6
9
12
15
Flink
0
3
6
9
12
15
InfoSphere
0
3
6
9
12
15
S4
15. RESULTS OF SYSTEMATIC REVIEW
RQ 7. Which data ingestion internal source/tool is used most for each framework?
Kafka
Client library to build SP apps.
RabbitMQ ZeroMQ
asynchronous message queue
Network Socket
0
5
Kafka
Twitter Streaming API
0
5
RabbitMQ
0
5
0MQ
0
5
Network Socket
0
5
Twitter Streaming API
Third party tool to ingest data from external sources
Streams API Libraries
16. RESULTS OF SYSTEMATIC REVIEW
RQ 8. What is the most preferred range for the number of nodes used in experimentation for
each Big Data stream processing framework?
Spark Storm Flink InfoSphere
0
3
6
9
12
15
Spark
S4
0
3
6
9
12
15
Storm
1 – 5 nodes
6 – 20 nodes
20+ nodes
0
3
6
9
12
15
Flink
0
3
6
9
12
15
InfoSphere
0
3
6
9
12
15
S4