The document describes CONFLuEnCE, a continuous workflow execution engine. It was developed to enable applications involving continuous data streams. CONFLuEnCE extends the Kepler workflow system with a new continuous workflow director, window operators, and push communication capabilities. It implements the continuous workflow model, which includes waves of events, window semantics, and continuously running activities. Two example applications developed with CONFLuEnCE are described: supply chain management and Astroshelf, a collaboration platform for astrophysicists.
1 of 29
Download to read offline
More Related Content
Confluence
1. CONtinuous workFLow ExeCution Engine
Panayiotis (Panickos) Neophytou
Panos K. Chrysanthis
Alexandros Labrinidis
CollaborateCom 2011
Advanced Data Management Technologies Lab
Computer Science Department
University of Pittsburgh
2. Workflows are GREAT!
? Ability to automate processes
? Integrate and orchestrate resources
(including humans) seamlessly and
effectively.
? Service composition.
? Process large data static sets
? Keep track of things (provenance)
? Re-usable
? Easy to program (Visual Languages)
CONFLuEnCE: Implementation and Application Design 2
3. High data rates ¨C Push model
? New type of data sources (proactive): (unsupported)
? Stock price ticker, twitter stream, DSMS tuple streams.
? Polling: blocking, miss updates.
? Data items participate in multiple interleaving WF
invocations.
CONFLuEnCE: Implementation and Application Design 3
4. Our approach
? Goal: Enable monitoring and collaborative
applications that involve processing and integration of
continuous streams of data.
? CONFLuEnCE: Continuous Workflow Execution
Engine
? Define the model. [CollaborateCom 2008]
? Develop the new constructs.
? Window semantics, event waves, support backwards workflow
compatibility, enable push
? Develop the new model of computation
? Continuously running workflow activities.
? Deadline driven scheduling.
? Implement prototype. [Demo SIGMOD 2011]
CONFLuEnCE: Implementation and Application Design 4
5. Overview
? Motivation
? Continuous Workflow Model
? Waves
? Window Operator
? Push communication
? CONFLuEnCE
? CWf Application Scenarios
? Conclusions
CONFLuEnCE: Implementation and Application Design 5
6. Continuous Workflow Model
? Includes all existing workflow constructs.
? Waves of events to distinguish between event
contexts.
? Window operators on queues.
? Continuously running activities.
? Ability to support push communications.
CONFLuEnCE: Implementation and Application Design 6
7. Wave of events
? Distinguish events between multiple invocations of an
activity.
? Waves expose provenance during design/execution.
? Allows synchronization of events of the same lineage.
? E.g., Customer order: multiple items, multiple handlers
CONFLuEnCE: Implementation and Application Design 7
8. Window Operator
? Apply flexible bounds on unbounded stream of events
? Size ¨C Token, Time, Wave, Semantics
? Step (period of recalculation) - Token, Time, Wave, Semantics
? Delete_used_events flag (after activity has finished executing)
? Triggers activities in combination with preconditions.
? Window definition
? Size=5min ? Activity preconditions
? Step=1min if (window.length >= 2)
? Delete_used_events=true fire activity
Out-of-stock
events 10
11
9
80
4
6
7
3
2
5
1 ?
BD
C
B
A Notify
D C B A
11 8 6 0 Manager
Fired: ?
?
Expired If 2 events occur between 5 min
A events of each other, then notify the
manager.
CONFLuEnCE: Implementation and Application Design 8
10. Push Communication
? Push communication patterns:
? Broadcast
? Publish/Subscribe
? In->Out ? Out->In
Port
WF WF
Producer Producer
input input
? Hybrid
Port
WF
Producer
input
Producer
Mediator
Producer
CONFLuEnCE: Implementation and Application Design 10
11. Overview
? Motivation
? Continuous Workflow Model
? CONFLuEnCE
? Kepler¡¯s Actor Oriented Modeling
? Continuous Workflow Director
? Windowed Operator
? Push Communication
? CWf Application Scenarios
? Conclusions
CONFLuEnCE: Implementation and Application Design 11
12. CONFLuEnCE: CONtinuous
workFLow ExeCution Engine
? Implements our Continuous Workflow model, in
Java, as a module in Kepler
? Kepler¡¯s benefits
? Open-source scientific workflow system
? Actor-based workflow modeling
? Built on top of PtolemyII
(modeling, simulating, designing concurrent, real-time
systems)
? Well defined models of computation ¨C extendible, pluggable
? Large number of basic and specialized actors (task
components)
? High-level visual language
CONFLuEnCE: Implementation and Application Design 12
13. Kepler¡¯s Actor Oriented Modeling
Ports
? each actor has a set of input and output ports
? produce/consume data (a.k.a. tokens)
CONFLuEnCE: Implementation and Application Design 13
14. Kepler¡¯s Actor Oriented Modeling
Dataflow Connections
? unidirectional actor ¡°communication¡± channels
? connect output ports with input ports
CONFLuEnCE: Implementation and Application Design 14
16. Kepler¡¯s Actor Oriented Modeling
PN Director
Directors
SDF Director
? defines the execution and communication
semantics of workflow graphs
? executes workflow graph (some schedule)
? sub-workflows may have different directors
? promotes reusability
CONFLuEnCE: Implementation and Application Design 16
17. Kepler Directors ¨C Models of
Computation
Directors separate the concerns of orchestration and
scheduling from conceptual design
? Synchronous Dataflow (SDF)
? Process Networks (PN)
? Dynamic Data Flow (DDF)
? Continuous Time (CT)
? Discrete Event (DE)
? ¡
CONFLuEnCE: Implementation and Application Design 17
18. Continuous Workflow Director
? CWfs require continuous execution of the actors
? Stream data are events in time. Require timestamps
? CWf director:
? Extends the PN director
? Add timestamps on events using TimeKeeper on each
actor.
? Add Window Operators on buffer queues (receivers)
CONFLuEnCE: Implementation and Application Design 18
19. Windowed Receiver
? Kepler extension to support window semantics
CWF Director
I/O Ports
Producer Consumer
windowed
receiver
CONFLuEnCE: Implementation and Application Design 19
20. Push Communication
? Implemented JSON WebSocket Server Actor (Out->In)
? Listens to predefined port
? Converts JSON objects to RecordToken(s)
? Enables continuous connectivity with web-browsers
? Implemented HTTP Socket Stream Source Actor (In->Out)
? Connects directly to an HTTP stream source (e.g., twitter) and
receives data continuously
? Implemented the hybrid approach using PubSubHubbub
[http://code.google.com/apis/pubsubhubbub/]
CONFLuEnCE: Implementation and Application Design 20
24. Astroshelf
? A collaboration platform for astrophysicists
? Annotate sky objects and events.
? CONFLuEnCE: Live annotations & Integration.
Astroshelf team:
? Liz Marai
? Timothy Luciani
? Rebecca Hachey
? Roxana Gheorghiu
? Boyu Sun
Astronomers:
? Arthur Kosowsky
? Jeffrey Newman
? Michael Wood-
Vasley
? Brian Cherinca
CONFLuEnCE: Implementation and Application Design ? Anja Weyant 24
25. Astroshelf
CONFLuEnCE: Implementation and Application Design 25
26. Conclusions
? The Continuous Workflow model
? Foundation for CONFLuEnCE
? CONtinuous workFLow ExeCution Engine
? Built on top of Kepler
? Includes a new director, windowed receiver and, source
actors enabling Push communication.
? Two Monitoring and Collaborative Application
implementations.
? Future: Design a director, which implements
scheduling, sensitive to QoS requirements.
CONFLuEnCE: Implementation and Application Design 26
27. Supported by NSF grants: IIS-0534531 and OIA-1028162
http://db.cs.pitt.edu/group/projects/confluence
http://db.cs.pitt.edu/group/projects/astroshelf
Special thanks to:
Astroshelf team: Astronomy
collaborators:
? Liz Marai,
? Arthur Kosowsky,
? Timothy Luciani, ? Jeffrey Newman,
? Rebecca Hachey, ? Michael Wood-
? Roxana Gheorghiu Vasley,
? Brian Cherinca,
? Anja Weyant.
CONFLuEnCE: Implementation and Application Design 27
28. Conclusions
? The Continuous Workflow model
? Foundation for CONFLuEnCE
? CONtinuous workFLow ExeCution Engine
? Built on top of Kepler
? Includes a new director, windowed receiver and, source
actors enabling Push communication.
? Two Monitoring and Collaborative Application
implementations.
? Future: Design a director, which implements
scheduling, sensitive to QoS requirements.
http://db.cs.pitt.edu/group/projects/confluence
http://db.cs.pitt.edu/group/projects/astroshelf
28
29. Workflows vs. DSMS vs. CWfs
DSMS CWfs WFs
Static
configuration Flexibility
QoS/QoD General purpose
driven
Declarative &
Stream
Procedural
processing
Human
integration
Declarative Feedback
Loops
CONFLuEnCE: Implementation and Application Design 29
Editor's Notes
#3: Support provenance: which is keeping track of processing and intermediate results
#4: Unfortunately current workflows cannot support the new data types. What I¡¯m referring to is data produced by stock price updates, sensors etcand also the current workflow model cannot integrate DSMSIf we rely on the existing model of polling data sources, we are missing updates and have blocking operations which are insufficient for this type of computation.What we envision is a workflow model where streaming sources feed the workflow with data in various input points.
#5: Our approach to supporting data stream processing for enablingmonitoring and collaborative applications is [click] CONFLUENCEInterestingly this project first introduced in collaborate com in 2008 where I unveiled our model, right here in Orlando.Since then we have implemented confluence by developing new workflow constructs, a new model of computation and have demoed our prototype in SIGMOD 2011.-- And in fact in this presentation I will share with you my experience in implementing this prototype.
#6: Following, I will describe 3 key constructs of the continuous workflow modelSome implementation detailsAnd present two representative apps from the business and scientific domains
#10: Additionally a group-by clause can be defined on a windowed queue. Besides support of the standard group-by on simple data types we also support complexIn this case you need to specify what needs to be used as grouping element.
#11: Finally a key requirement in enabling stream processing is the ability for workflows to receive push updates.We enable it using three techniques.Firstly from inside the workflow going out.
#12: And now I¡¯ll present details of how the key continuous workflow constructs were implemented.
#13: In summary, confluence is built in Java on top of Kepler.We chose Kepler because of openness and wide deployment.It¡¯s based upon PtelemyIIAnd it provides a wide range of basic as well as specialized actors.And workflows are composed using a high level visual language.
#14: Each task in Kepler is modeled as an actor, defined with input and output ports to consume and produce data tokens.Effectively this is the visual interface.
#15: Actors are interconnected in the workflow using channels.
#16: And some complex actors can be composed by other actors, and are classified as sub-workflows forming hierarchies.
#17: The execution and communication semantics are facilitated by an entity called the director.It executes the workflow using a scheduleand since directors define the communication and execution semantics, an actor configuration can be reused with a different director thus exhibiting different behavior.
#18: Kepler provides a variety of directors but none of them meets the requirements of the continuous workflow model.
#19: ¡so we have developed our own director which enables the continuous execution of actors by running each one in a separate thread (much like the PN director), --adds timestamps to events-- and adds the window operator to the input queues of actors as a new type of receiver.
#20: The receivers are objects contained inside the input ports of any actor.We have also made modifications to the port configuration dialogue box to define the size and step of windows, delete_used_token, as well as the the group-by flag and expression.
#21: Supporting push communications meant implementing new workflow data source actors:JSON:Javascript object notation
#22: Two representative applications from the business and science domains
#24: Here is the Kepler interface which defines the workflow.Only implemented the source actors. Everything else is off-the-shelf actors.
#25: Our second example is from the scientific domain. It¡¯s development was driven by real astronomer¡¯s.
#26: Astroshelf is the source of events and the feedback panel. Annotations engine records the meta-data.Confluence completes the feedback loop by processing the events and producing new meta-data.At this point our collaborators are considering to test this prototype in a classroom setting.Details of these interactions are described in the paper.
#27: -- turns our Kepler has proven to be a good choice, that allowed us to realize our model within a reasonable time and without unnecessary complexity.--that allowed us to show the usability of our model
#31: TODO: add pictures and animateB2B enables dynamic interactions by interpolating internal and external applicationsEstablish Virtual EnterprisesMiddleware infrastructure: ¡°the Grid¡±Seamlessly bring together the power of resources to the desktop.