ݺߣ

ݺߣShare a Scribd company logo
Literature Review of Stream Data
Processing
Your name: Nabilahmed Patel
Email: nabilpatel11@gmail.com
References
 A. Gilani.Design And Implementation Of Stream Operators, Query Instantiator And
Stream BUFFER Manager. December 2003,
http://itlab.uta.edu/students/alumni/MS/Altaf_Gilani/AGil_MS2003.pdf.
 B. K. Kendai. Runtime Optimization And Load Shedding In Mavstream: Design
And Implementation. December 2006.
http://itlab.uta.edu/students/alumni/MS/Balakumar_Kendai/BKen_MS2006.pdf.
Sharma Chakravarthy 2
MavStream
 MavStream is Data Stream Management System (DSMS) for processing continuous
query over stream.
 It is modelled as a client-server architecture.
 The client accepts input (query) from the user and sends it to the server.
 The server processes the input and creates server specific data structures that will be
used while the continuous query is being processed.
 The output of query is given back to the client.
Sharma Chakravarthy 3
MavStream Architecture
Sharma Chakravarthy 4
MavStream Client
 Web-enabled GUI.
 Accepts queries and stream definition from users.
 Generates an ASCII file of the user input.
 CQs are defined in file by giving stream operator and information about instantiating them.
 To create query tree, it is required that user give association between operators.
 Server takes in the text input.
 Communication between client and server is command driven and protocol oriented.
Sharma Chakravarthy 5
MavStream Server
 TCP Server.
 Responsible for executing users requests and producing desired output.
 Integration and interaction of modules such as input processor, instantiator, operators, buffer manager
and scheduler.
 Some of the commands supported by the server are given below:
 Register a stream
 Receive a query plan object
 Start a query
 Send all stream information to the client
 Stop a query
Sharma Chakravarthy 6
Input Processor
 Generates Query Plan Object by processing text input.
 It is a sequence of operator nodes where every node describes an operator completely.
 Data structure called “Operator Data” is used to store operator definition.
 The “Operator Data” is wrapped in an “Operator Node” that has references to the parent
and child operators.
 The entire query plan object is accessed using a reference of the Operator Node for the root
operator.
Sharma Chakravarthy 7
Input Processor(Cont.)
 As data flows from leaves to root , query tree is traversed in bottom-up manner.
 On visiting each node input processor calls the instantiator module to instantiate the
operators.
 After instantiating the operators, the query tree is processed to create operator paths,
segments and simplified segments.
 Computes the processing capacity of paths and memory release capacity and sorts them.
 Processes the QoS parameters and stores them in the appropriate data structures to be used
by system monitor.
Sharma Chakravarthy 8
Instantiator
 Initialize and instantiate the stream operators and their associated buffers.
 It creates an instance of each needed operator and initializes it on reading Operator Node
data.
 It extracts the information from the operator node and converts it into the form required by
each operator.
 It then associated buffers with desired parameters to operators.
 It also associates a scheduler with the operator to facilitate communication for scheduling.
 Instantiator does not start the operator, it only does all the necessary initialization.
Sharma Chakravarthy 9
Scheduler
 In MavStream scheduling is done at operator level.
 Operators are scheduled on based of their state and priority.
 The scheduler maintains a ready queue, which decides the order in which operators are
scheduled.
 This queue is initially populated by the server.
 Operators must be in a ready state in order to get scheduled.
 Operator goes through a number of states while it is being scheduled.
Sharma Chakravarthy 10
Scheduler
 Scheduling Strategies Supported by MavStream:
 Round-Robin: all the operators are assigned the same priority(time quantum).
 Weighted Round-Robin: different time quanta are assigned to different operators based on their
requirements.
 Path capacity scheduling: schedules the operator path which has the maximum processing
capacity.
 Segment scheduling: Schedules the segment which has the maximum memory release capacity.
 Simplified segment scheduling: same as segment scheduling except the construction of segment is
different.
Sharma Chakravarthy 11
Scheduler
 Master Scheduler:
 The execution of all schedulers is controlled by the master scheduler.
 Master scheduler allocates time quantum to each scheduler to execute.
 At any instance of time only one scheduler is allowed to run by the master scheduler.
 It also provides an interface to add and remove queries from the ready queue of the
scheduler and to change the scheduling strategy of queries.
Sharma Chakravarthy 12
Feeder
 It has been developed to feed the tuples of streams to buffers of leaf operators.
 If many streams are combined and is fed as one stream then split condition should
be specified by user using “split” operator supported by MavStream.
 Each stream is fed using different thread.
 As there is no facility of direct stream from sensors, feeder reads the data from flat
files which contain synthetically generated data.
 The mean rate of feeder is changed over time and pauses to the feeding has also
been introduced to simulate bursty nature of streams.
 The characteristics of feeding can be specified by a configuration file.
Sharma Chakravarthy 13
Operators
 The operators are designed to handle long running queries producing results continuously
and incrementally.
 To deal with blocking operators (aggregates and join), operators are designed using window
concept.
 During the life span of an operator, it can either be in ready, running, suspended or stop
state .
 MavStream supports the following operators: split, select, project , join (hash and nested
versions), group by and various aggregate operators (sum, average, max, min, count).
Sharma Chakravarthy 14
Buffer Management
 To provide a mechanism to handle the mismatch between input rates and the processing
capacity by using available memory.
 As we have limited main memory, if incoming tuples exceed the upper limit, they have to
be either discarded or stored in secondary storage.
 An interface is provided to store tuples either in main memory buffers or on secondary
storage (using a configuration option) and retrieve the tuple stored in secondary storage.
 The management of main memory buffers and secondary storage for tuples is completely
handled by the buffer manager.
Sharma Chakravarthy 15
Runtime Optimizer
 The primary goal of the runtime optimizer is to monitor QoS measures to make sure that
user specified QoS values are met to the best extent possible.
 Based on the monitoring, system choose the best (or optimal) scheduling strategy for a
query.
 In MavStream Runtime Optimizer consists of a System Monitor, which monitors the values
of QoS measures for a query and a Decision Maker, which chooses the best scheduling
strategy for a query and controls the load shedders.
 If the scheduling strategy being used succeeds in meeting the QoS requirements of a query,
runtime optimizer does not take any actions for that query.
Sharma Chakravarthy 16
Runtime Optimizer (cont.)
 Alternative Design: Decision Making based on input rate of streams
 QoS measures of a query depends on the scheduling strategy chosen for that query and
arrival rate of input streams.
 The processing capacity of any system is fixed.
 As the input rates of streams are bursty, any change in the arrival rates of stream can
potentially trigger a change in scheduling strategy of queries depending on that stream.
 Such an approach would also be ignorant of the actual QoS requirements of the query
and may end up taking decisions to change scheduling strategy when it may not be
necessary.
 Hence MavStream utilizes a technique for choosing strategies using the feedback
obtained by monitoring actual QoS measures and a static table called decision table.
Sharma Chakravarthy 17
Thank You !!!
5/15/2016 © your name 18

More Related Content

What's hot (8)

Mule esb flow processing strategies
Mule esb flow processing strategiesMule esb flow processing strategies
Mule esb flow processing strategies
himajareddys
XenApp Load Balancing
XenApp Load BalancingXenApp Load Balancing
XenApp Load Balancing
Denis Gundarev
10 Multicore 07
10 Multicore 0710 Multicore 07
10 Multicore 07
timcrack
Multiprocessor Scheduling
Multiprocessor SchedulingMultiprocessor Scheduling
Multiprocessor Scheduling
Khadija Saleem
Latency aware write buffer resource
Latency aware write buffer resourceLatency aware write buffer resource
Latency aware write buffer resource
ijdpsjournal
Load balancing
Load balancingLoad balancing
Load balancing
Vetri Deepika
HIGH AVAILABILITY AND LOAD BALANCING FOR POSTGRESQL DATABASES: DESIGNING AND ...
HIGH AVAILABILITY AND LOAD BALANCING FOR POSTGRESQL DATABASES: DESIGNING AND ...HIGH AVAILABILITY AND LOAD BALANCING FOR POSTGRESQL DATABASES: DESIGNING AND ...
HIGH AVAILABILITY AND LOAD BALANCING FOR POSTGRESQL DATABASES: DESIGNING AND ...
IJDMS
Using scatter gather
Using scatter gatherUsing scatter gather
Using scatter gather
Rahul Kumar
Mule esb flow processing strategies
Mule esb flow processing strategiesMule esb flow processing strategies
Mule esb flow processing strategies
himajareddys
Latency aware write buffer resource
Latency aware write buffer resourceLatency aware write buffer resource
Latency aware write buffer resource
ijdpsjournal
HIGH AVAILABILITY AND LOAD BALANCING FOR POSTGRESQL DATABASES: DESIGNING AND ...
HIGH AVAILABILITY AND LOAD BALANCING FOR POSTGRESQL DATABASES: DESIGNING AND ...HIGH AVAILABILITY AND LOAD BALANCING FOR POSTGRESQL DATABASES: DESIGNING AND ...
HIGH AVAILABILITY AND LOAD BALANCING FOR POSTGRESQL DATABASES: DESIGNING AND ...
IJDMS

Viewers also liked (20)

Powerwave LGP21904
Powerwave LGP21904Powerwave LGP21904
Powerwave LGP21904
savomir
Resume
ResumeResume
Resume
Andrea Keys
Resume of Maximo M. Aton Jr. 160508
Resume of Maximo M. Aton Jr. 160508Resume of Maximo M. Aton Jr. 160508
Resume of Maximo M. Aton Jr. 160508
Maximo Aton
My Ideaology
My Ideaology My Ideaology
My Ideaology
Todd Nathaniel Davis SME,BDO
Manual rohveManual rohve
Manual rohve
Claudia Cruz de Pérez
3. hypothesis forms
3. hypothesis forms3. hypothesis forms
3. hypothesis forms
constancamorais94
Historia de la arquitectura IHistoria de la arquitectura I
Historia de la arquitectura I
maria elisa rodriguez martinez
Acuerdo para mejorar la educación en SinaloaAcuerdo para mejorar la educación en Sinaloa
Acuerdo para mejorar la educación en Sinaloa
AMLO-MORENA
Yury 1Yury 1
Yury 1
Yury Torres
Mental Retardation and other child psychiatric disorders
Mental Retardation and other child psychiatric disordersMental Retardation and other child psychiatric disorders
Mental Retardation and other child psychiatric disorders
Sathish Rajamani
 Silencio en el Cielo Silencio en el Cielo
Silencio en el Cielo
jenune
Cancer gastricoCancer gastrico
Cancer gastrico
Thiago Veríssimo de Melo
Presentation #4
Presentation #4Presentation #4
Presentation #4
constancamorais94
cse-5393_Final_Nabilahmed_Patel
cse-5393_Final_Nabilahmed_Patelcse-5393_Final_Nabilahmed_Patel
cse-5393_Final_Nabilahmed_Patel
Nabilahmed Patel
JAMES 10 - ANG KARUNUNGAN NI HESUS AT ANG IYONG KARUNUNGAN - PTR VETTY GUTIER...
JAMES 10 - ANG KARUNUNGAN NI HESUS AT ANG IYONG KARUNUNGAN - PTR VETTY GUTIER...JAMES 10 - ANG KARUNUNGAN NI HESUS AT ANG IYONG KARUNUNGAN - PTR VETTY GUTIER...
JAMES 10 - ANG KARUNUNGAN NI HESUS AT ANG IYONG KARUNUNGAN - PTR VETTY GUTIER...
Faithworks Christian Church
Conciencia ambientalConciencia ambiental
Conciencia ambiental
luis vites
Perjumpaan ibubapa kali 1 2016
Perjumpaan ibubapa kali 1 2016Perjumpaan ibubapa kali 1 2016
Perjumpaan ibubapa kali 1 2016
soffian sauri
Remedial politik
Remedial politikRemedial politik
Remedial politik
nadya era
escena del crimen escena del crimen
escena del crimen
Gustavo Godofredo Scandia
Expo derecho de aExpo derecho de a
Expo derecho de a
maribel191192
Powerwave LGP21904
Powerwave LGP21904Powerwave LGP21904
Powerwave LGP21904
savomir
Resume of Maximo M. Aton Jr. 160508
Resume of Maximo M. Aton Jr. 160508Resume of Maximo M. Aton Jr. 160508
Resume of Maximo M. Aton Jr. 160508
Maximo Aton
Historia de la arquitectura IHistoria de la arquitectura I
Historia de la arquitectura I
maria elisa rodriguez martinez
Acuerdo para mejorar la educación en SinaloaAcuerdo para mejorar la educación en Sinaloa
Acuerdo para mejorar la educación en Sinaloa
AMLO-MORENA
Mental Retardation and other child psychiatric disorders
Mental Retardation and other child psychiatric disordersMental Retardation and other child psychiatric disorders
Mental Retardation and other child psychiatric disorders
Sathish Rajamani
 Silencio en el Cielo Silencio en el Cielo
Silencio en el Cielo
jenune
JAMES 10 - ANG KARUNUNGAN NI HESUS AT ANG IYONG KARUNUNGAN - PTR VETTY GUTIER...
JAMES 10 - ANG KARUNUNGAN NI HESUS AT ANG IYONG KARUNUNGAN - PTR VETTY GUTIER...JAMES 10 - ANG KARUNUNGAN NI HESUS AT ANG IYONG KARUNUNGAN - PTR VETTY GUTIER...
JAMES 10 - ANG KARUNUNGAN NI HESUS AT ANG IYONG KARUNUNGAN - PTR VETTY GUTIER...
Faithworks Christian Church
Conciencia ambientalConciencia ambiental
Conciencia ambiental
luis vites
Perjumpaan ibubapa kali 1 2016
Perjumpaan ibubapa kali 1 2016Perjumpaan ibubapa kali 1 2016
Perjumpaan ibubapa kali 1 2016
soffian sauri
escena del crimen escena del crimen
escena del crimen
Gustavo Godofredo Scandia
Expo derecho de aExpo derecho de a
Expo derecho de a
maribel191192

Similar to CSE-5393 _Mid_Nabilahmed_Patel (20)

Final report group2
Final report group2Final report group2
Final report group2
George Sam
IRJET-Concurrency Control Model for Distributed Database
IRJET-Concurrency Control Model for Distributed DatabaseIRJET-Concurrency Control Model for Distributed Database
IRJET-Concurrency Control Model for Distributed Database
IRJET Journal
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxCS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
faithxdunce63732
Volume 2-issue-6-2061-2063
Volume 2-issue-6-2061-2063Volume 2-issue-6-2061-2063
Volume 2-issue-6-2061-2063
Editor IJARCET
Top 20 JMeter Interview Questions and Answers in 2023.pptx
Top 20 JMeter Interview Questions and Answers in 2023.pptxTop 20 JMeter Interview Questions and Answers in 2023.pptx
Top 20 JMeter Interview Questions and Answers in 2023.pptx
AnanthReddy38
Hybrid Scheduling Algorithm for Efficient Load Balancing In Cloud Computing
Hybrid Scheduling Algorithm for Efficient Load Balancing In Cloud ComputingHybrid Scheduling Algorithm for Efficient Load Balancing In Cloud Computing
Hybrid Scheduling Algorithm for Efficient Load Balancing In Cloud Computing
Eswar Publications
Performance Test Plan - Sample 1
Performance Test Plan - Sample 1Performance Test Plan - Sample 1
Performance Test Plan - Sample 1
Atul Pant
Optimization of FCFS Based Resource Provisioning Algorithm for Cloud Computing
Optimization of FCFS Based Resource Provisioning Algorithm for Cloud ComputingOptimization of FCFS Based Resource Provisioning Algorithm for Cloud Computing
Optimization of FCFS Based Resource Provisioning Algorithm for Cloud Computing
IOSR Journals
Module 9 - Implement advanced virtual networking.pdf
Module 9 - Implement advanced virtual networking.pdfModule 9 - Implement advanced virtual networking.pdf
Module 9 - Implement advanced virtual networking.pdf
ssuser22d8d2
Cad report
Cad reportCad report
Cad report
Priyanka Goswami
Cpu provisioning algorithms for service differentiation in cloud based enviro...
Cpu provisioning algorithms for service differentiation in cloud based enviro...Cpu provisioning algorithms for service differentiation in cloud based enviro...
Cpu provisioning algorithms for service differentiation in cloud based enviro...
ieeepondy
Architecture for monitoring applications in Cloud
Architecture for monitoring applications in CloudArchitecture for monitoring applications in Cloud
Architecture for monitoring applications in Cloud
Onkar Kadam
Streaming Analytics Unit 1 notes for engineers
Streaming Analytics Unit 1 notes for engineersStreaming Analytics Unit 1 notes for engineers
Streaming Analytics Unit 1 notes for engineers
ManjuAppukuttan2
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedSpark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Michael Spector
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...
ijceronline
Analysis of a Pool Management Scheme for Cloud Computing Centres by Using Par...
Analysis of a Pool Management Scheme for Cloud Computing Centres by Using Par...Analysis of a Pool Management Scheme for Cloud Computing Centres by Using Par...
Analysis of a Pool Management Scheme for Cloud Computing Centres by Using Par...
IJERA Editor
Top 20 JMeter Interview Questions and Answers in 2023.pdf
Top 20 JMeter Interview Questions and Answers in 2023.pdfTop 20 JMeter Interview Questions and Answers in 2023.pdf
Top 20 JMeter Interview Questions and Answers in 2023.pdf
AnanthReddy38
Distributed Services Scheduling and Cloud Provisioning
Distributed Services Scheduling and Cloud ProvisioningDistributed Services Scheduling and Cloud Provisioning
Distributed Services Scheduling and Cloud Provisioning
Ar Agarwal
Quick guide to plan and execute a load test
Quick guide to plan and execute a load testQuick guide to plan and execute a load test
Quick guide to plan and execute a load test
duke.kalra
Netezza workload management
Netezza workload managementNetezza workload management
Netezza workload management
Biju Nair
IRJET-Concurrency Control Model for Distributed Database
IRJET-Concurrency Control Model for Distributed DatabaseIRJET-Concurrency Control Model for Distributed Database
IRJET-Concurrency Control Model for Distributed Database
IRJET Journal
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxCS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
faithxdunce63732
Top 20 JMeter Interview Questions and Answers in 2023.pptx
Top 20 JMeter Interview Questions and Answers in 2023.pptxTop 20 JMeter Interview Questions and Answers in 2023.pptx
Top 20 JMeter Interview Questions and Answers in 2023.pptx
AnanthReddy38
Hybrid Scheduling Algorithm for Efficient Load Balancing In Cloud Computing
Hybrid Scheduling Algorithm for Efficient Load Balancing In Cloud ComputingHybrid Scheduling Algorithm for Efficient Load Balancing In Cloud Computing
Hybrid Scheduling Algorithm for Efficient Load Balancing In Cloud Computing
Eswar Publications
Performance Test Plan - Sample 1
Performance Test Plan - Sample 1Performance Test Plan - Sample 1
Performance Test Plan - Sample 1
Atul Pant
Optimization of FCFS Based Resource Provisioning Algorithm for Cloud Computing
Optimization of FCFS Based Resource Provisioning Algorithm for Cloud ComputingOptimization of FCFS Based Resource Provisioning Algorithm for Cloud Computing
Optimization of FCFS Based Resource Provisioning Algorithm for Cloud Computing
IOSR Journals
Module 9 - Implement advanced virtual networking.pdf
Module 9 - Implement advanced virtual networking.pdfModule 9 - Implement advanced virtual networking.pdf
Module 9 - Implement advanced virtual networking.pdf
ssuser22d8d2
Cpu provisioning algorithms for service differentiation in cloud based enviro...
Cpu provisioning algorithms for service differentiation in cloud based enviro...Cpu provisioning algorithms for service differentiation in cloud based enviro...
Cpu provisioning algorithms for service differentiation in cloud based enviro...
ieeepondy
Architecture for monitoring applications in Cloud
Architecture for monitoring applications in CloudArchitecture for monitoring applications in Cloud
Architecture for monitoring applications in Cloud
Onkar Kadam
Streaming Analytics Unit 1 notes for engineers
Streaming Analytics Unit 1 notes for engineersStreaming Analytics Unit 1 notes for engineers
Streaming Analytics Unit 1 notes for engineers
ManjuAppukuttan2
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedSpark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Michael Spector
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...
ijceronline
Analysis of a Pool Management Scheme for Cloud Computing Centres by Using Par...
Analysis of a Pool Management Scheme for Cloud Computing Centres by Using Par...Analysis of a Pool Management Scheme for Cloud Computing Centres by Using Par...
Analysis of a Pool Management Scheme for Cloud Computing Centres by Using Par...
IJERA Editor
Top 20 JMeter Interview Questions and Answers in 2023.pdf
Top 20 JMeter Interview Questions and Answers in 2023.pdfTop 20 JMeter Interview Questions and Answers in 2023.pdf
Top 20 JMeter Interview Questions and Answers in 2023.pdf
AnanthReddy38
Distributed Services Scheduling and Cloud Provisioning
Distributed Services Scheduling and Cloud ProvisioningDistributed Services Scheduling and Cloud Provisioning
Distributed Services Scheduling and Cloud Provisioning
Ar Agarwal
Quick guide to plan and execute a load test
Quick guide to plan and execute a load testQuick guide to plan and execute a load test
Quick guide to plan and execute a load test
duke.kalra
Netezza workload management
Netezza workload managementNetezza workload management
Netezza workload management
Biju Nair

CSE-5393 _Mid_Nabilahmed_Patel

  • 1. Literature Review of Stream Data Processing Your name: Nabilahmed Patel Email: nabilpatel11@gmail.com
  • 2. References  A. Gilani.Design And Implementation Of Stream Operators, Query Instantiator And Stream BUFFER Manager. December 2003, http://itlab.uta.edu/students/alumni/MS/Altaf_Gilani/AGil_MS2003.pdf.  B. K. Kendai. Runtime Optimization And Load Shedding In Mavstream: Design And Implementation. December 2006. http://itlab.uta.edu/students/alumni/MS/Balakumar_Kendai/BKen_MS2006.pdf. Sharma Chakravarthy 2
  • 3. MavStream  MavStream is Data Stream Management System (DSMS) for processing continuous query over stream.  It is modelled as a client-server architecture.  The client accepts input (query) from the user and sends it to the server.  The server processes the input and creates server specific data structures that will be used while the continuous query is being processed.  The output of query is given back to the client. Sharma Chakravarthy 3
  • 5. MavStream Client  Web-enabled GUI.  Accepts queries and stream definition from users.  Generates an ASCII file of the user input.  CQs are defined in file by giving stream operator and information about instantiating them.  To create query tree, it is required that user give association between operators.  Server takes in the text input.  Communication between client and server is command driven and protocol oriented. Sharma Chakravarthy 5
  • 6. MavStream Server  TCP Server.  Responsible for executing users requests and producing desired output.  Integration and interaction of modules such as input processor, instantiator, operators, buffer manager and scheduler.  Some of the commands supported by the server are given below:  Register a stream  Receive a query plan object  Start a query  Send all stream information to the client  Stop a query Sharma Chakravarthy 6
  • 7. Input Processor  Generates Query Plan Object by processing text input.  It is a sequence of operator nodes where every node describes an operator completely.  Data structure called “Operator Data” is used to store operator definition.  The “Operator Data” is wrapped in an “Operator Node” that has references to the parent and child operators.  The entire query plan object is accessed using a reference of the Operator Node for the root operator. Sharma Chakravarthy 7
  • 8. Input Processor(Cont.)  As data flows from leaves to root , query tree is traversed in bottom-up manner.  On visiting each node input processor calls the instantiator module to instantiate the operators.  After instantiating the operators, the query tree is processed to create operator paths, segments and simplified segments.  Computes the processing capacity of paths and memory release capacity and sorts them.  Processes the QoS parameters and stores them in the appropriate data structures to be used by system monitor. Sharma Chakravarthy 8
  • 9. Instantiator  Initialize and instantiate the stream operators and their associated buffers.  It creates an instance of each needed operator and initializes it on reading Operator Node data.  It extracts the information from the operator node and converts it into the form required by each operator.  It then associated buffers with desired parameters to operators.  It also associates a scheduler with the operator to facilitate communication for scheduling.  Instantiator does not start the operator, it only does all the necessary initialization. Sharma Chakravarthy 9
  • 10. Scheduler  In MavStream scheduling is done at operator level.  Operators are scheduled on based of their state and priority.  The scheduler maintains a ready queue, which decides the order in which operators are scheduled.  This queue is initially populated by the server.  Operators must be in a ready state in order to get scheduled.  Operator goes through a number of states while it is being scheduled. Sharma Chakravarthy 10
  • 11. Scheduler  Scheduling Strategies Supported by MavStream:  Round-Robin: all the operators are assigned the same priority(time quantum).  Weighted Round-Robin: different time quanta are assigned to different operators based on their requirements.  Path capacity scheduling: schedules the operator path which has the maximum processing capacity.  Segment scheduling: Schedules the segment which has the maximum memory release capacity.  Simplified segment scheduling: same as segment scheduling except the construction of segment is different. Sharma Chakravarthy 11
  • 12. Scheduler  Master Scheduler:  The execution of all schedulers is controlled by the master scheduler.  Master scheduler allocates time quantum to each scheduler to execute.  At any instance of time only one scheduler is allowed to run by the master scheduler.  It also provides an interface to add and remove queries from the ready queue of the scheduler and to change the scheduling strategy of queries. Sharma Chakravarthy 12
  • 13. Feeder  It has been developed to feed the tuples of streams to buffers of leaf operators.  If many streams are combined and is fed as one stream then split condition should be specified by user using “split” operator supported by MavStream.  Each stream is fed using different thread.  As there is no facility of direct stream from sensors, feeder reads the data from flat files which contain synthetically generated data.  The mean rate of feeder is changed over time and pauses to the feeding has also been introduced to simulate bursty nature of streams.  The characteristics of feeding can be specified by a configuration file. Sharma Chakravarthy 13
  • 14. Operators  The operators are designed to handle long running queries producing results continuously and incrementally.  To deal with blocking operators (aggregates and join), operators are designed using window concept.  During the life span of an operator, it can either be in ready, running, suspended or stop state .  MavStream supports the following operators: split, select, project , join (hash and nested versions), group by and various aggregate operators (sum, average, max, min, count). Sharma Chakravarthy 14
  • 15. Buffer Management  To provide a mechanism to handle the mismatch between input rates and the processing capacity by using available memory.  As we have limited main memory, if incoming tuples exceed the upper limit, they have to be either discarded or stored in secondary storage.  An interface is provided to store tuples either in main memory buffers or on secondary storage (using a configuration option) and retrieve the tuple stored in secondary storage.  The management of main memory buffers and secondary storage for tuples is completely handled by the buffer manager. Sharma Chakravarthy 15
  • 16. Runtime Optimizer  The primary goal of the runtime optimizer is to monitor QoS measures to make sure that user specified QoS values are met to the best extent possible.  Based on the monitoring, system choose the best (or optimal) scheduling strategy for a query.  In MavStream Runtime Optimizer consists of a System Monitor, which monitors the values of QoS measures for a query and a Decision Maker, which chooses the best scheduling strategy for a query and controls the load shedders.  If the scheduling strategy being used succeeds in meeting the QoS requirements of a query, runtime optimizer does not take any actions for that query. Sharma Chakravarthy 16
  • 17. Runtime Optimizer (cont.)  Alternative Design: Decision Making based on input rate of streams  QoS measures of a query depends on the scheduling strategy chosen for that query and arrival rate of input streams.  The processing capacity of any system is fixed.  As the input rates of streams are bursty, any change in the arrival rates of stream can potentially trigger a change in scheduling strategy of queries depending on that stream.  Such an approach would also be ignorant of the actual QoS requirements of the query and may end up taking decisions to change scheduling strategy when it may not be necessary.  Hence MavStream utilizes a technique for choosing strategies using the feedback obtained by monitoring actual QoS measures and a static table called decision table. Sharma Chakravarthy 17
  • 18. Thank You !!! 5/15/2016 © your name 18