ºÝºÝߣshows by User: ilganeli / http://www.slideshare.net/images/logo.gif ºÝºÝߣshows by User: ilganeli / Thu, 06 Apr 2017 18:21:30 GMT ºÝºÝߣShare feed for ºÝºÝߣshows by User: ilganeli Compression talk /slideshow/compression-talk/74572468 compressiontalk-170406182130
In the engineering world, we don’t always have the luxury of owning our data pipelines end to end. If only we could influence those outside components… Well, we tried, and this our story - replete with failure, discovery, and the serenity of enlightenment. Join us on our journey as we learned more than we ever wanted to know about compression in different Apache projects, deployed our own ingestion pipeline in Apache Flume, and ultimately unified these in a robust framework built on Apache Apex handling 1 TB of data per day. We end with some reflections on the joys and tribulations of the open source realm and some key lessons for other large applications atop multiple Apache solutions. ]]>

In the engineering world, we don’t always have the luxury of owning our data pipelines end to end. If only we could influence those outside components… Well, we tried, and this our story - replete with failure, discovery, and the serenity of enlightenment. Join us on our journey as we learned more than we ever wanted to know about compression in different Apache projects, deployed our own ingestion pipeline in Apache Flume, and ultimately unified these in a robust framework built on Apache Apex handling 1 TB of data per day. We end with some reflections on the joys and tribulations of the open source realm and some key lessons for other large applications atop multiple Apache solutions. ]]>
Thu, 06 Apr 2017 18:21:30 GMT /slideshow/compression-talk/74572468 ilganeli@slideshare.net(ilganeli) Compression talk ilganeli In the engineering world, we don’t always have the luxury of owning our data pipelines end to end. If only we could influence those outside components… Well, we tried, and this our story - replete with failure, discovery, and the serenity of enlightenment. Join us on our journey as we learned more than we ever wanted to know about compression in different Apache projects, deployed our own ingestion pipeline in Apache Flume, and ultimately unified these in a robust framework built on Apache Apex handling 1 TB of data per day. We end with some reflections on the joys and tribulations of the open source realm and some key lessons for other large applications atop multiple Apache solutions. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/compressiontalk-170406182130-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> In the engineering world, we don’t always have the luxury of owning our data pipelines end to end. If only we could influence those outside components… Well, we tried, and this our story - replete with failure, discovery, and the serenity of enlightenment. Join us on our journey as we learned more than we ever wanted to know about compression in different Apache projects, deployed our own ingestion pipeline in Apache Flume, and ultimately unified these in a robust framework built on Apache Apex handling 1 TB of data per day. We end with some reflections on the joys and tribulations of the open source realm and some key lessons for other large applications atop multiple Apache solutions.
Compression talk from Ilya Ganelin
]]>
541 2 https://cdn.slidesharecdn.com/ss_thumbnails/compressiontalk-170406182130-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Your Guide to Streaming - The Engineer's Perspective /slideshow/your-guide-to-streaming-the-engineers-perspective/69039027 dbtbstreamingilyaganelin-161116021042
It feels like every week there's a new open-source streaming platform out there. Yet, if you only look at the descriptions, performance metrics, or even the architecture, they all start to look exactly the same! In short, nothing really differentiates itself - whether it be Storm, Flink, Apex, GearPumk, Samza, KafkaStreams, AkkaStreams, or any of the other myriad technologies. So if they all look the same, how do you really pick a streaming platform to solve the problem that YOU have? This talk is about how to really compare these platforms, and it turns out that they do have their key differences, they're just not the ones you usually think about. The way that you need to compare these systems if you're building something to last, a well-engineered system, is to look at how they handle durability, availability, how easy they are to install and use, and how they deal with failures. ]]>

It feels like every week there's a new open-source streaming platform out there. Yet, if you only look at the descriptions, performance metrics, or even the architecture, they all start to look exactly the same! In short, nothing really differentiates itself - whether it be Storm, Flink, Apex, GearPumk, Samza, KafkaStreams, AkkaStreams, or any of the other myriad technologies. So if they all look the same, how do you really pick a streaming platform to solve the problem that YOU have? This talk is about how to really compare these platforms, and it turns out that they do have their key differences, they're just not the ones you usually think about. The way that you need to compare these systems if you're building something to last, a well-engineered system, is to look at how they handle durability, availability, how easy they are to install and use, and how they deal with failures. ]]>
Wed, 16 Nov 2016 02:10:42 GMT /slideshow/your-guide-to-streaming-the-engineers-perspective/69039027 ilganeli@slideshare.net(ilganeli) Your Guide to Streaming - The Engineer's Perspective ilganeli It feels like every week there's a new open-source streaming platform out there. Yet, if you only look at the descriptions, performance metrics, or even the architecture, they all start to look exactly the same! In short, nothing really differentiates itself - whether it be Storm, Flink, Apex, GearPumk, Samza, KafkaStreams, AkkaStreams, or any of the other myriad technologies. So if they all look the same, how do you really pick a streaming platform to solve the problem that YOU have? This talk is about how to really compare these platforms, and it turns out that they do have their key differences, they're just not the ones you usually think about. The way that you need to compare these systems if you're building something to last, a well-engineered system, is to look at how they handle durability, availability, how easy they are to install and use, and how they deal with failures. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/dbtbstreamingilyaganelin-161116021042-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> It feels like every week there&#39;s a new open-source streaming platform out there. Yet, if you only look at the descriptions, performance metrics, or even the architecture, they all start to look exactly the same! In short, nothing really differentiates itself - whether it be Storm, Flink, Apex, GearPumk, Samza, KafkaStreams, AkkaStreams, or any of the other myriad technologies. So if they all look the same, how do you really pick a streaming platform to solve the problem that YOU have? This talk is about how to really compare these platforms, and it turns out that they do have their key differences, they&#39;re just not the ones you usually think about. The way that you need to compare these systems if you&#39;re building something to last, a well-engineered system, is to look at how they handle durability, availability, how easy they are to install and use, and how they deal with failures.
Your Guide to Streaming - The Engineer's Perspective from Ilya Ganelin
]]>
365 3 https://cdn.slidesharecdn.com/ss_thumbnails/dbtbstreamingilyaganelin-161116021042-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
How to Actually Tune Your Spark Jobs So They Work /slideshow/how-to-actually-tune-your-spark-jobs-so-they-work/62873108 sparktuning-160608221356
Tuning your Spark applications for production readiness!]]>

Tuning your Spark applications for production readiness!]]>
Wed, 08 Jun 2016 22:13:55 GMT /slideshow/how-to-actually-tune-your-spark-jobs-so-they-work/62873108 ilganeli@slideshare.net(ilganeli) How to Actually Tune Your Spark Jobs So They Work ilganeli Tuning your Spark applications for production readiness! <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/sparktuning-160608221356-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Tuning your Spark applications for production readiness!
How to Actually Tune Your Spark Jobs So They Work from Ilya Ganelin
]]>
8971 16 https://cdn.slidesharecdn.com/ss_thumbnails/sparktuning-160608221356-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Stream Computing (The Engineer's Perspective) /slideshow/stream-computing-the-engineers-perspective/61586813 streamingucsflecture1-160502164515
This is a ground zero introduction to stream processing. The focus is on what differentiates them - this turns out not to be performance, but how they solve the challenges scalability, availability, durability, and failure-handling. We look at Storm, Flink, and Apex as case studies to understand the space.]]>

This is a ground zero introduction to stream processing. The focus is on what differentiates them - this turns out not to be performance, but how they solve the challenges scalability, availability, durability, and failure-handling. We look at Storm, Flink, and Apex as case studies to understand the space.]]>
Mon, 02 May 2016 16:45:14 GMT /slideshow/stream-computing-the-engineers-perspective/61586813 ilganeli@slideshare.net(ilganeli) Stream Computing (The Engineer's Perspective) ilganeli This is a ground zero introduction to stream processing. The focus is on what differentiates them - this turns out not to be performance, but how they solve the challenges scalability, availability, durability, and failure-handling. We look at Storm, Flink, and Apex as case studies to understand the space. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/streamingucsflecture1-160502164515-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> This is a ground zero introduction to stream processing. The focus is on what differentiates them - this turns out not to be performance, but how they solve the challenges scalability, availability, durability, and failure-handling. We look at Storm, Flink, and Apex as case studies to understand the space.
Stream Computing (The Engineer's Perspective) from Ilya Ganelin
]]>
968 7 https://cdn.slidesharecdn.com/ss_thumbnails/streamingucsflecture1-160502164515-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Frustration-Reduced PySpark: Data engineering with DataFrames /slideshow/frustrationreduced-pyspark-data-engineering-with-dataframes/58943986 sparklabusfdataframestuningmarch1-160302031113
In this talk I talk about my recent experience working with Spark Data Frames in Python. For DataFrames, the focus will be on usability. Specifically, a lot of the documentation does not cover common use cases like intricacies of creating data frames, adding or manipulating individual columns, and doing quick and dirty analytics.]]>

In this talk I talk about my recent experience working with Spark Data Frames in Python. For DataFrames, the focus will be on usability. Specifically, a lot of the documentation does not cover common use cases like intricacies of creating data frames, adding or manipulating individual columns, and doing quick and dirty analytics.]]>
Wed, 02 Mar 2016 03:11:13 GMT /slideshow/frustrationreduced-pyspark-data-engineering-with-dataframes/58943986 ilganeli@slideshare.net(ilganeli) Frustration-Reduced PySpark: Data engineering with DataFrames ilganeli In this talk I talk about my recent experience working with Spark Data Frames in Python. For DataFrames, the focus will be on usability. Specifically, a lot of the documentation does not cover common use cases like intricacies of creating data frames, adding or manipulating individual columns, and doing quick and dirty analytics. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/sparklabusfdataframestuningmarch1-160302031113-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> In this talk I talk about my recent experience working with Spark Data Frames in Python. For DataFrames, the focus will be on usability. Specifically, a lot of the documentation does not cover common use cases like intricacies of creating data frames, adding or manipulating individual columns, and doing quick and dirty analytics.
Frustration-Reduced PySpark: Data engineering with DataFrames from Ilya Ganelin
]]>
9575 8 https://cdn.slidesharecdn.com/ss_thumbnails/sparklabusfdataframestuningmarch1-160302031113-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library /slideshow/frustrationreduced-spark-dataframes-and-the-spark-timeseries-library/57936244 lunchandlearnfeb1dataframestimeseries-160205195222
In this talk I talk about my recent experience working with Spark Data Frames and the Spark TimeSeries library. For data frames, the focus will be on usability. Specifically, a lot of the documentation does not cover common use cases like intricacies of creating data frames, adding or manipulating individual columns, and doing quick and dirty analytics. For the time series library, I dive into the kind of use cases it supports and why it’s actually super useful. ]]>

In this talk I talk about my recent experience working with Spark Data Frames and the Spark TimeSeries library. For data frames, the focus will be on usability. Specifically, a lot of the documentation does not cover common use cases like intricacies of creating data frames, adding or manipulating individual columns, and doing quick and dirty analytics. For the time series library, I dive into the kind of use cases it supports and why it’s actually super useful. ]]>
Fri, 05 Feb 2016 19:52:21 GMT /slideshow/frustrationreduced-spark-dataframes-and-the-spark-timeseries-library/57936244 ilganeli@slideshare.net(ilganeli) Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library ilganeli In this talk I talk about my recent experience working with Spark Data Frames and the Spark TimeSeries library. For data frames, the focus will be on usability. Specifically, a lot of the documentation does not cover common use cases like intricacies of creating data frames, adding or manipulating individual columns, and doing quick and dirty analytics. For the time series library, I dive into the kind of use cases it supports and why it’s actually super useful. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/lunchandlearnfeb1dataframestimeseries-160205195222-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> In this talk I talk about my recent experience working with Spark Data Frames and the Spark TimeSeries library. For data frames, the focus will be on usability. Specifically, a lot of the documentation does not cover common use cases like intricacies of creating data frames, adding or manipulating individual columns, and doing quick and dirty analytics. For the time series library, I dive into the kind of use cases it supports and why it’s actually super useful.
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library from Ilya Ganelin
]]>
5485 8 https://cdn.slidesharecdn.com/ss_thumbnails/lunchandlearnfeb1dataframestimeseries-160205195222-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Next-Gen Decision Making in Under 2ms /slideshow/nextgen-decision-making-in-under-2ms/54980772 apextalkv4externaledit-151111040701-lva1-app6891
This talk covers the Vault 8 team's journey at Capital One where we investigated a wide variety of stream processing solutions to build a next generation real-time decisioning platform to power Capital One's infrastructure. The result of our analysis showed Apache Storm, Apache Flink, and Apache Apex as prime contenders for our use case with Apache Apex ultimately proving to be the solution of choice based on its present readiness for enterprise deployment and its excellent performance.]]>

This talk covers the Vault 8 team's journey at Capital One where we investigated a wide variety of stream processing solutions to build a next generation real-time decisioning platform to power Capital One's infrastructure. The result of our analysis showed Apache Storm, Apache Flink, and Apache Apex as prime contenders for our use case with Apache Apex ultimately proving to be the solution of choice based on its present readiness for enterprise deployment and its excellent performance.]]>
Wed, 11 Nov 2015 04:07:01 GMT /slideshow/nextgen-decision-making-in-under-2ms/54980772 ilganeli@slideshare.net(ilganeli) Next-Gen Decision Making in Under 2ms ilganeli This talk covers the Vault 8 team's journey at Capital One where we investigated a wide variety of stream processing solutions to build a next generation real-time decisioning platform to power Capital One's infrastructure. The result of our analysis showed Apache Storm, Apache Flink, and Apache Apex as prime contenders for our use case with Apache Apex ultimately proving to be the solution of choice based on its present readiness for enterprise deployment and its excellent performance. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/apextalkv4externaledit-151111040701-lva1-app6891-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> This talk covers the Vault 8 team&#39;s journey at Capital One where we investigated a wide variety of stream processing solutions to build a next generation real-time decisioning platform to power Capital One&#39;s infrastructure. The result of our analysis showed Apache Storm, Apache Flink, and Apache Apex as prime contenders for our use case with Apache Apex ultimately proving to be the solution of choice based on its present readiness for enterprise deployment and its excellent performance.
Next-Gen Decision Making in Under 2ms from Ilya Ganelin
]]>
8994 9 https://cdn.slidesharecdn.com/ss_thumbnails/apextalkv4externaledit-151111040701-lva1-app6891-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
https://cdn.slidesharecdn.com/profile-photo-ilganeli-48x48.jpg?cb=1674675249 https://cdn.slidesharecdn.com/ss_thumbnails/compressiontalk-170406182130-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/compression-talk/74572468 Compression talk https://cdn.slidesharecdn.com/ss_thumbnails/dbtbstreamingilyaganelin-161116021042-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/your-guide-to-streaming-the-engineers-perspective/69039027 Your Guide to Streamin... https://cdn.slidesharecdn.com/ss_thumbnails/sparktuning-160608221356-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/how-to-actually-tune-your-spark-jobs-so-they-work/62873108 How to Actually Tune Y...