際際滷shows by User: jssm1th / http://www.slideshare.net/images/logo.gif 際際滷shows by User: jssm1th / Thu, 06 Oct 2016 22:33:29 GMT 際際滷Share feed for 際際滷shows by User: jssm1th ReStream: Accelerating Backtesting and Stream Replay with Serial-Equivalent Parallel Processing /slideshow/restream-accelerating-backtesting-and-stream-replay-with-serialequivalent-parallel-processing/66828702 soccslideshare-161006223330
Real-time predictive applications can demand continuous and agile development, with new models constantly being trained, tested, and then deployed. Training and testing are done by replaying stored event logs, running new models in the context of historical data in a form of backtesting or ``what if?'' analysis. To replay weeks or months of logs while developers wait, we need systems that can stream event logs through prediction logic many times faster than the real-time rate. A challenge with high-speed replay is preserving sequential semantics while harnessing parallel processing power. The crux of the problem lies with causal dependencies inherent in the sequential semantics of log replay. We introduce an execution engine that produces serial-equivalent output while accelerating throughput with pipelining and distributed parallelism. This is made possible by optimizing for high throughput rather than the traditional stream processing goal of low latency, and by aggressive sharing of versioned state, a technique we term Multi-Versioned Parallel Streaming (MVPS). In experiments we see that this engine, which we call ReStream, performs as well as batch processing and more than an order of magnitude better than a single-threaded implementation.]]>

Real-time predictive applications can demand continuous and agile development, with new models constantly being trained, tested, and then deployed. Training and testing are done by replaying stored event logs, running new models in the context of historical data in a form of backtesting or ``what if?'' analysis. To replay weeks or months of logs while developers wait, we need systems that can stream event logs through prediction logic many times faster than the real-time rate. A challenge with high-speed replay is preserving sequential semantics while harnessing parallel processing power. The crux of the problem lies with causal dependencies inherent in the sequential semantics of log replay. We introduce an execution engine that produces serial-equivalent output while accelerating throughput with pipelining and distributed parallelism. This is made possible by optimizing for high throughput rather than the traditional stream processing goal of low latency, and by aggressive sharing of versioned state, a technique we term Multi-Versioned Parallel Streaming (MVPS). In experiments we see that this engine, which we call ReStream, performs as well as batch processing and more than an order of magnitude better than a single-threaded implementation.]]>
Thu, 06 Oct 2016 22:33:29 GMT /slideshow/restream-accelerating-backtesting-and-stream-replay-with-serialequivalent-parallel-processing/66828702 jssm1th@slideshare.net(jssm1th) ReStream: Accelerating Backtesting and Stream Replay with Serial-Equivalent Parallel Processing jssm1th Real-time predictive applications can demand continuous and agile development, with new models constantly being trained, tested, and then deployed. Training and testing are done by replaying stored event logs, running new models in the context of historical data in a form of backtesting or ``what if?'' analysis. To replay weeks or months of logs while developers wait, we need systems that can stream event logs through prediction logic many times faster than the real-time rate. A challenge with high-speed replay is preserving sequential semantics while harnessing parallel processing power. The crux of the problem lies with causal dependencies inherent in the sequential semantics of log replay. We introduce an execution engine that produces serial-equivalent output while accelerating throughput with pipelining and distributed parallelism. This is made possible by optimizing for high throughput rather than the traditional stream processing goal of low latency, and by aggressive sharing of versioned state, a technique we term Multi-Versioned Parallel Streaming (MVPS). In experiments we see that this engine, which we call ReStream, performs as well as batch processing and more than an order of magnitude better than a single-threaded implementation. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/soccslideshare-161006223330-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Real-time predictive applications can demand continuous and agile development, with new models constantly being trained, tested, and then deployed. Training and testing are done by replaying stored event logs, running new models in the context of historical data in a form of backtesting or ``what if?&#39;&#39; analysis. To replay weeks or months of logs while developers wait, we need systems that can stream event logs through prediction logic many times faster than the real-time rate. A challenge with high-speed replay is preserving sequential semantics while harnessing parallel processing power. The crux of the problem lies with causal dependencies inherent in the sequential semantics of log replay. We introduce an execution engine that produces serial-equivalent output while accelerating throughput with pipelining and distributed parallelism. This is made possible by optimizing for high throughput rather than the traditional stream processing goal of low latency, and by aggressive sharing of versioned state, a technique we term Multi-Versioned Parallel Streaming (MVPS). In experiments we see that this engine, which we call ReStream, performs as well as batch processing and more than an order of magnitude better than a single-threaded implementation.
ReStream: Accelerating Backtesting and Stream Replay with Serial-Equivalent Parallel Processing from Johann Schleier-Smith
]]>
1228 6 https://cdn.slidesharecdn.com/ss_thumbnails/soccslideshare-161006223330-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
An Architecture for Agile Machine Learning in Real-Time Applications /slideshow/an-architecture-for-agile-machine-learning-in-realtime-applications/51480596 jsskdd2015final-150811011408-lva1-app6892
Presented at KDD, August 11, 2015. Abstract of the paper: Machine learning techniques have proved effective in recommender systems and other applications, yet teams working to deploy them lack many of the advantages that those in more established software disciplines today take for granted. The well-known Agile methodology advances projects in a chain of rapid development cycles, with subsequent steps often informed by production experiments. Support for such workflow in machine learning applications remains primitive. The platform developed at if(we) embodies a specific machine learning approach and a rigorous data architecture constraint, so allowing teams to work in rapid iterative cycles. We require models to consume data from a time-ordered event history, and we focus on facilitating creative feature engineering. We make it practical for data scientists to use the same model code in development and in production deployment, and make it practical for them to collaborate on complex models. We deliver real-time recommendations at scale, returning top results from among 10,000,000 candidates with sub-second response times and incorporating new updates in just a few seconds. Using the approach and architecture described here, our team can routinely go from ideas for new models to production-validated results within two weeks.]]>

Presented at KDD, August 11, 2015. Abstract of the paper: Machine learning techniques have proved effective in recommender systems and other applications, yet teams working to deploy them lack many of the advantages that those in more established software disciplines today take for granted. The well-known Agile methodology advances projects in a chain of rapid development cycles, with subsequent steps often informed by production experiments. Support for such workflow in machine learning applications remains primitive. The platform developed at if(we) embodies a specific machine learning approach and a rigorous data architecture constraint, so allowing teams to work in rapid iterative cycles. We require models to consume data from a time-ordered event history, and we focus on facilitating creative feature engineering. We make it practical for data scientists to use the same model code in development and in production deployment, and make it practical for them to collaborate on complex models. We deliver real-time recommendations at scale, returning top results from among 10,000,000 candidates with sub-second response times and incorporating new updates in just a few seconds. Using the approach and architecture described here, our team can routinely go from ideas for new models to production-validated results within two weeks.]]>
Tue, 11 Aug 2015 01:14:08 GMT /slideshow/an-architecture-for-agile-machine-learning-in-realtime-applications/51480596 jssm1th@slideshare.net(jssm1th) An Architecture for Agile Machine Learning in Real-Time Applications jssm1th Presented at KDD, August 11, 2015. Abstract of the paper: Machine learning techniques have proved effective in recommender systems and other applications, yet teams working to deploy them lack many of the advantages that those in more established software disciplines today take for granted. The well-known Agile methodology advances projects in a chain of rapid development cycles, with subsequent steps often informed by production experiments. Support for such workflow in machine learning applications remains primitive. The platform developed at if(we) embodies a specific machine learning approach and a rigorous data architecture constraint, so allowing teams to work in rapid iterative cycles. We require models to consume data from a time-ordered event history, and we focus on facilitating creative feature engineering. We make it practical for data scientists to use the same model code in development and in production deployment, and make it practical for them to collaborate on complex models. We deliver real-time recommendations at scale, returning top results from among 10,000,000 candidates with sub-second response times and incorporating new updates in just a few seconds. Using the approach and architecture described here, our team can routinely go from ideas for new models to production-validated results within two weeks. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/jsskdd2015final-150811011408-lva1-app6892-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Presented at KDD, August 11, 2015. Abstract of the paper: Machine learning techniques have proved effective in recommender systems and other applications, yet teams working to deploy them lack many of the advantages that those in more established software disciplines today take for granted. The well-known Agile methodology advances projects in a chain of rapid development cycles, with subsequent steps often informed by production experiments. Support for such workflow in machine learning applications remains primitive. The platform developed at if(we) embodies a specific machine learning approach and a rigorous data architecture constraint, so allowing teams to work in rapid iterative cycles. We require models to consume data from a time-ordered event history, and we focus on facilitating creative feature engineering. We make it practical for data scientists to use the same model code in development and in production deployment, and make it practical for them to collaborate on complex models. We deliver real-time recommendations at scale, returning top results from among 10,000,000 candidates with sub-second response times and incorporating new updates in just a few seconds. Using the approach and architecture described here, our team can routinely go from ideas for new models to production-validated results within two weeks.
An Architecture for Agile Machine Learning in Real-Time Applications from Johann Schleier-Smith
]]>
3795 4 https://cdn.slidesharecdn.com/ss_thumbnails/jsskdd2015final-150811011408-lva1-app6892-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Architecting for Data Science /slideshow/architecting-for-data-science/46152444 ulrqosqoq9uli8jicwdx-signature-4ef123c01ebc69e3274a22a4f9e401d08d72f9267313c50fa9eef83192134a22-poli-150322223814-conversion-gate01
Presented at the O'Reilly Software Architecture Conference, Boston, March 19, 2015.]]>

Presented at the O'Reilly Software Architecture Conference, Boston, March 19, 2015.]]>
Sun, 22 Mar 2015 22:38:14 GMT /slideshow/architecting-for-data-science/46152444 jssm1th@slideshare.net(jssm1th) Architecting for Data Science jssm1th Presented at the O'Reilly Software Architecture Conference, Boston, March 19, 2015. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/ulrqosqoq9uli8jicwdx-signature-4ef123c01ebc69e3274a22a4f9e401d08d72f9267313c50fa9eef83192134a22-poli-150322223814-conversion-gate01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Presented at the O&#39;Reilly Software Architecture Conference, Boston, March 19, 2015.
Architecting for Data Science from Johann Schleier-Smith
]]>
1277 2 https://cdn.slidesharecdn.com/ss_thumbnails/ulrqosqoq9uli8jicwdx-signature-4ef123c01ebc69e3274a22a4f9e401d08d72f9267313c50fa9eef83192134a22-poli-150322223814-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Agile Machine Learning for Real-time Recommender Systems /slideshow/jssmith-mlconf-2014finalweb/41664871 nkboosy4q2slllpn26jw-signature-ccd5d6154d225c18d0aa95e0bd65f88e230de47867ef38b28a0dff304f58d1e7-poli-141117122051-conversion-gate01
These are slides presented at MLconf in San Francisco, November 14, 2014. I share the approach to real-time machine learning for recommender systems developed at if(we). We achieve rapid iterative cycles by adhering to a strict approach to structuring and accessing our data, as well as to building the online features that comprise our models. These developments support teams of data scientist and data engineers, who work together to solve complex recommendation problems. We also introduce the Antelope Realtime Events framework, an open source demonstration application which derives from our scalable proprietary software stack.]]>

These are slides presented at MLconf in San Francisco, November 14, 2014. I share the approach to real-time machine learning for recommender systems developed at if(we). We achieve rapid iterative cycles by adhering to a strict approach to structuring and accessing our data, as well as to building the online features that comprise our models. These developments support teams of data scientist and data engineers, who work together to solve complex recommendation problems. We also introduce the Antelope Realtime Events framework, an open source demonstration application which derives from our scalable proprietary software stack.]]>
Mon, 17 Nov 2014 12:20:51 GMT /slideshow/jssmith-mlconf-2014finalweb/41664871 jssm1th@slideshare.net(jssm1th) Agile Machine Learning for Real-time Recommender Systems jssm1th These are slides presented at MLconf in San Francisco, November 14, 2014. I share the approach to real-time machine learning for recommender systems developed at if(we). We achieve rapid iterative cycles by adhering to a strict approach to structuring and accessing our data, as well as to building the online features that comprise our models. These developments support teams of data scientist and data engineers, who work together to solve complex recommendation problems. We also introduce the Antelope Realtime Events framework, an open source demonstration application which derives from our scalable proprietary software stack. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/nkboosy4q2slllpn26jw-signature-ccd5d6154d225c18d0aa95e0bd65f88e230de47867ef38b28a0dff304f58d1e7-poli-141117122051-conversion-gate01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> These are slides presented at MLconf in San Francisco, November 14, 2014. I share the approach to real-time machine learning for recommender systems developed at if(we). We achieve rapid iterative cycles by adhering to a strict approach to structuring and accessing our data, as well as to building the online features that comprise our models. These developments support teams of data scientist and data engineers, who work together to solve complex recommendation problems. We also introduce the Antelope Realtime Events framework, an open source demonstration application which derives from our scalable proprietary software stack.
Agile Machine Learning for Real-time Recommender Systems from Johann Schleier-Smith
]]>
4255 5 https://cdn.slidesharecdn.com/ss_thumbnails/nkboosy4q2slllpn26jw-signature-ccd5d6154d225c18d0aa95e0bd65f88e230de47867ef38b28a0dff304f58d1e7-poli-141117122051-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
https://cdn.slidesharecdn.com/profile-photo-jssm1th-48x48.jpg?cb=1524231018 https://johann.schleier-smith.com/ https://cdn.slidesharecdn.com/ss_thumbnails/soccslideshare-161006223330-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/restream-accelerating-backtesting-and-stream-replay-with-serialequivalent-parallel-processing/66828702 ReStream: Accelerating... https://cdn.slidesharecdn.com/ss_thumbnails/jsskdd2015final-150811011408-lva1-app6892-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/an-architecture-for-agile-machine-learning-in-realtime-applications/51480596 An Architecture for Ag... https://cdn.slidesharecdn.com/ss_thumbnails/ulrqosqoq9uli8jicwdx-signature-4ef123c01ebc69e3274a22a4f9e401d08d72f9267313c50fa9eef83192134a22-poli-150322223814-conversion-gate01-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/architecting-for-data-science/46152444 Architecting for Data ...