際際滷shows by User: AnyaBida / http://www.slideshare.net/images/logo.gif 際際滷shows by User: AnyaBida / Tue, 21 Jun 2022 17:03:28 GMT 際際滷Share feed for 際際滷shows by User: AnyaBida Run Apache Spark on Kubernetes in Large Scale_ Challenges and Solutions-2.pdf /slideshow/run-apache-spark-on-kubernetes-in-large-scale-challenges-and-solutions2pdf/252031752 runapachesparkonkubernetesinlargescalechallengesandsolutions-2-220621170328-fd08b4a3
Speaker: Bo Yang Summary: More and more people are running Apache Spark on Kubernetes due to the popularity of Kubernetes. There are a lot of challenges since Spark was not originally designed for Kubernetes, for example, easily submitting/managing applications, accessing Spark UI, allocating resource queues based on cpu/memory, and etc. This talk will present how to address these challenges and provide Spark As Service in a large scale.]]>

Speaker: Bo Yang Summary: More and more people are running Apache Spark on Kubernetes due to the popularity of Kubernetes. There are a lot of challenges since Spark was not originally designed for Kubernetes, for example, easily submitting/managing applications, accessing Spark UI, allocating resource queues based on cpu/memory, and etc. This talk will present how to address these challenges and provide Spark As Service in a large scale.]]>
Tue, 21 Jun 2022 17:03:28 GMT /slideshow/run-apache-spark-on-kubernetes-in-large-scale-challenges-and-solutions2pdf/252031752 AnyaBida@slideshare.net(AnyaBida) Run Apache Spark on Kubernetes in Large Scale_ Challenges and Solutions-2.pdf AnyaBida Speaker: Bo Yang Summary: More and more people are running Apache Spark on Kubernetes due to the popularity of Kubernetes. There are a lot of challenges since Spark was not originally designed for Kubernetes, for example, easily submitting/managing applications, accessing Spark UI, allocating resource queues based on cpu/memory, and etc. This talk will present how to address these challenges and provide Spark As Service in a large scale. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/runapachesparkonkubernetesinlargescalechallengesandsolutions-2-220621170328-fd08b4a3-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Speaker: Bo Yang Summary: More and more people are running Apache Spark on Kubernetes due to the popularity of Kubernetes. There are a lot of challenges since Spark was not originally designed for Kubernetes, for example, easily submitting/managing applications, accessing Spark UI, allocating resource queues based on cpu/memory, and etc. This talk will present how to address these challenges and provide Spark As Service in a large scale.
Run Apache Spark on Kubernetes in Large Scale_ Challenges and Solutions-2.pdf from Anya Bida
]]>
297 0 https://cdn.slidesharecdn.com/ss_thumbnails/runapachesparkonkubernetesinlargescalechallengesandsolutions-2-220621170328-fd08b4a3-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Just Enough DevOps for Data Scientists Part II: Handling Infra Failures When Running Spark Handling Infra Failures When Running Spark /slideshow/just-enough-devops-for-data-scientists-part-ii-handling-infra-failures-when-running-spark-handling-infra-failures-when-running-spark/92388717 wit-justenoughdevopsfordatascientists-180330072852
Abstract: Imagine we have Ada, our data science intern. Let's run through a very simple wordcount spark job, and find a handful of potential failure points. Dozens of failures can and should happen when running spark jobs on commodity hardware. Given the basic foundation for infrastructure-level expectations, this talk gives Ada tools to ensure her job isnt caught dead. Once the simple example job runs reliably, with the potential to scale, our data scientist can apply the same toolset to focus on some more interesting algorithms. Turn SNAFUs into successes by anticipating and handling Infra failures gracefully. Note: this talk is a spark-focused extension of Part I, "Just Enough DevOps For Data Scientists" from Scale by The Bay 2018 https://www.youtube.com/watch?v=RqpnBl5NgW0&t=19s Bio: Anya Bida (https://www.linkedin.com/in/anyabida/)]]>

Abstract: Imagine we have Ada, our data science intern. Let's run through a very simple wordcount spark job, and find a handful of potential failure points. Dozens of failures can and should happen when running spark jobs on commodity hardware. Given the basic foundation for infrastructure-level expectations, this talk gives Ada tools to ensure her job isnt caught dead. Once the simple example job runs reliably, with the potential to scale, our data scientist can apply the same toolset to focus on some more interesting algorithms. Turn SNAFUs into successes by anticipating and handling Infra failures gracefully. Note: this talk is a spark-focused extension of Part I, "Just Enough DevOps For Data Scientists" from Scale by The Bay 2018 https://www.youtube.com/watch?v=RqpnBl5NgW0&t=19s Bio: Anya Bida (https://www.linkedin.com/in/anyabida/)]]>
Fri, 30 Mar 2018 07:28:51 GMT /slideshow/just-enough-devops-for-data-scientists-part-ii-handling-infra-failures-when-running-spark-handling-infra-failures-when-running-spark/92388717 AnyaBida@slideshare.net(AnyaBida) Just Enough DevOps for Data Scientists Part II: Handling Infra Failures When Running Spark Handling Infra Failures When Running Spark AnyaBida Abstract: Imagine we have Ada, our data science intern. Let's run through a very simple wordcount spark job, and find a handful of potential failure points. Dozens of failures can and should happen when running spark jobs on commodity hardware. Given the basic foundation for infrastructure-level expectations, this talk gives Ada tools to ensure her job isnt caught dead. Once the simple example job runs reliably, with the potential to scale, our data scientist can apply the same toolset to focus on some more interesting algorithms. Turn SNAFUs into successes by anticipating and handling Infra failures gracefully. Note: this talk is a spark-focused extension of Part I, "Just Enough DevOps For Data Scientists" from Scale by The Bay 2018 https://www.youtube.com/watch?v=RqpnBl5NgW0&t=19s Bio: Anya Bida (https://www.linkedin.com/in/anyabida/) <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/wit-justenoughdevopsfordatascientists-180330072852-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Abstract: Imagine we have Ada, our data science intern. Let&#39;s run through a very simple wordcount spark job, and find a handful of potential failure points. Dozens of failures can and should happen when running spark jobs on commodity hardware. Given the basic foundation for infrastructure-level expectations, this talk gives Ada tools to ensure her job isnt caught dead. Once the simple example job runs reliably, with the potential to scale, our data scientist can apply the same toolset to focus on some more interesting algorithms. Turn SNAFUs into successes by anticipating and handling Infra failures gracefully. Note: this talk is a spark-focused extension of Part I, &quot;Just Enough DevOps For Data Scientists&quot; from Scale by The Bay 2018 https://www.youtube.com/watch?v=RqpnBl5NgW0&amp;t=19s Bio: Anya Bida (https://www.linkedin.com/in/anyabida/)
Just Enough DevOps for Data Scientists Part II: Handling Infra Failures When Running Spark Handling Infra Failures When Running Spark from Anya Bida
]]>
408 2 https://cdn.slidesharecdn.com/ss_thumbnails/wit-justenoughdevopsfordatascientists-180330072852-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
JustEnoughDevOpsForDataScientists /slideshow/justenoughdevopsfordatascientists/82293013 justenoughdevopsfordatascientists-171118235701
Let's say you're a data scientist, and you've been asked to build infrastructure. Here I've distilled some best practices as an introduction for people who are new to DevOps. ]]>

Let's say you're a data scientist, and you've been asked to build infrastructure. Here I've distilled some best practices as an introduction for people who are new to DevOps. ]]>
Sat, 18 Nov 2017 23:57:01 GMT /slideshow/justenoughdevopsfordatascientists/82293013 AnyaBida@slideshare.net(AnyaBida) JustEnoughDevOpsForDataScientists AnyaBida Let's say you're a data scientist, and you've been asked to build infrastructure. Here I've distilled some best practices as an introduction for people who are new to DevOps. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/justenoughdevopsfordatascientists-171118235701-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Let&#39;s say you&#39;re a data scientist, and you&#39;ve been asked to build infrastructure. Here I&#39;ve distilled some best practices as an introduction for people who are new to DevOps.
JustEnoughDevOpsForDataScientists from Anya Bida
]]>
461 2 https://cdn.slidesharecdn.com/ss_thumbnails/justenoughdevopsfordatascientists-171118235701-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Spark tuning2016may11bida /slideshow/spark-tuning2016may11bida/61932233 sparktuning2016may11bida-160512051553
This is a version of a talk I presented at Spark Summit East 2016 with Rachel Warren. In this version, I also discuss memory management on the JVM with pictures from Alexey Grishchenko, Sandy Ryza, and Mark Grover. ]]>

This is a version of a talk I presented at Spark Summit East 2016 with Rachel Warren. In this version, I also discuss memory management on the JVM with pictures from Alexey Grishchenko, Sandy Ryza, and Mark Grover. ]]>
Thu, 12 May 2016 05:15:53 GMT /slideshow/spark-tuning2016may11bida/61932233 AnyaBida@slideshare.net(AnyaBida) Spark tuning2016may11bida AnyaBida This is a version of a talk I presented at Spark Summit East 2016 with Rachel Warren. In this version, I also discuss memory management on the JVM with pictures from Alexey Grishchenko, Sandy Ryza, and Mark Grover. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/sparktuning2016may11bida-160512051553-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> This is a version of a talk I presented at Spark Summit East 2016 with Rachel Warren. In this version, I also discuss memory management on the JVM with pictures from Alexey Grishchenko, Sandy Ryza, and Mark Grover.
Spark tuning2016may11bida from Anya Bida
]]>
594 6 https://cdn.slidesharecdn.com/ss_thumbnails/sparktuning2016may11bida-160512051553-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Spark Tuning for Enterprise System Administrators /slideshow/spark-tuning-for-enterprise-system-administrators/61652313 sparktuningsfmay032016-160504052244
Video: https://www.youtube.com/watch?v=DNWaMR8uKDc&feature=youtu.be]]>

Video: https://www.youtube.com/watch?v=DNWaMR8uKDc&feature=youtu.be]]>
Wed, 04 May 2016 05:22:44 GMT /slideshow/spark-tuning-for-enterprise-system-administrators/61652313 AnyaBida@slideshare.net(AnyaBida) Spark Tuning for Enterprise System Administrators AnyaBida Video: https://www.youtube.com/watch?v=DNWaMR8uKDc&feature=youtu.be <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/sparktuningsfmay032016-160504052244-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Video: https://www.youtube.com/watch?v=DNWaMR8uKDc&amp;feature=youtu.be
Spark Tuning for Enterprise System Administrators from Anya Bida
]]>
467 5 https://cdn.slidesharecdn.com/ss_thumbnails/sparktuningsfmay032016-160504052244-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016 /slideshow/bida-sse2016final-58237248/58237248 bidasse2016final-160214054407
by Anya Bida and Rachel Warren from Alpine Data https://spark-summit.org/east-2016/events/spark-tuning-for-enterprise-system-administrators/ Spark offers the promise of speed, but many enterprises are reluctant to make the leap from Hadoop to Spark. Indeed, System Administrators will face many challenges with tuning Spark performance. This talk is a gentle introduction to Spark Tuning for the Enterprise System Administrator, based on experience assisting two enterprise companies running Spark in yarn-cluster mode. The initial challenges can be categorized in two FAQs. First, with so many Spark Tuning parameters, how do I know which parameters are important for which jobs? Second, once I know which Spark Tuning parameters I need, how do I enforce them for the various users submitting various jobs to my cluster? This introduction to Spark Tuning will enable enterprise system administrators to overcome common issues quickly and focus on more advanced Spark Tuning challenges. The audience will understand the cheat-sheet posted here: http://techsuppdiva.github.io/ Key takeaways: FAQ 1: With so many Spark Tuning parameters, how do I know which parameters are important for which jobs? Solution 1: The Spark Tuning cheat-sheet! A visualization that guides the System Administrator to quickly overcome the most common hurdles to algorithm deployment. [1]http://techsuppdiva.github.io/ FAQ 2: Once I know which Spark Tuning parameters I need, how do I enforce them at the user level? job level? algorithm level? project level? cluster level? Solution 2: Well approach these challenges using job & cluster configuration, the Spark context, and 3rd party tools of which Alpine will be one example. Well operationalize Spark parameters according to user, job, algorithm, workflow pipeline, or cluster levels.]]>

by Anya Bida and Rachel Warren from Alpine Data https://spark-summit.org/east-2016/events/spark-tuning-for-enterprise-system-administrators/ Spark offers the promise of speed, but many enterprises are reluctant to make the leap from Hadoop to Spark. Indeed, System Administrators will face many challenges with tuning Spark performance. This talk is a gentle introduction to Spark Tuning for the Enterprise System Administrator, based on experience assisting two enterprise companies running Spark in yarn-cluster mode. The initial challenges can be categorized in two FAQs. First, with so many Spark Tuning parameters, how do I know which parameters are important for which jobs? Second, once I know which Spark Tuning parameters I need, how do I enforce them for the various users submitting various jobs to my cluster? This introduction to Spark Tuning will enable enterprise system administrators to overcome common issues quickly and focus on more advanced Spark Tuning challenges. The audience will understand the cheat-sheet posted here: http://techsuppdiva.github.io/ Key takeaways: FAQ 1: With so many Spark Tuning parameters, how do I know which parameters are important for which jobs? Solution 1: The Spark Tuning cheat-sheet! A visualization that guides the System Administrator to quickly overcome the most common hurdles to algorithm deployment. [1]http://techsuppdiva.github.io/ FAQ 2: Once I know which Spark Tuning parameters I need, how do I enforce them at the user level? job level? algorithm level? project level? cluster level? Solution 2: Well approach these challenges using job & cluster configuration, the Spark context, and 3rd party tools of which Alpine will be one example. Well operationalize Spark parameters according to user, job, algorithm, workflow pipeline, or cluster levels.]]>
Sun, 14 Feb 2016 05:44:07 GMT /slideshow/bida-sse2016final-58237248/58237248 AnyaBida@slideshare.net(AnyaBida) Spark Tuning For Enterprise System Administrators, Spark Summit East 2016 AnyaBida by Anya Bida and Rachel Warren from Alpine Data https://spark-summit.org/east-2016/events/spark-tuning-for-enterprise-system-administrators/ Spark offers the promise of speed, but many enterprises are reluctant to make the leap from Hadoop to Spark. Indeed, System Administrators will face many challenges with tuning Spark performance. This talk is a gentle introduction to Spark Tuning for the Enterprise System Administrator, based on experience assisting two enterprise companies running Spark in yarn-cluster mode. The initial challenges can be categorized in two FAQs. First, with so many Spark Tuning parameters, how do I know which parameters are important for which jobs? Second, once I know which Spark Tuning parameters I need, how do I enforce them for the various users submitting various jobs to my cluster? This introduction to Spark Tuning will enable enterprise system administrators to overcome common issues quickly and focus on more advanced Spark Tuning challenges. The audience will understand the cheat-sheet posted here: http://techsuppdiva.github.io/ Key takeaways: FAQ 1: With so many Spark Tuning parameters, how do I know which parameters are important for which jobs? Solution 1: The Spark Tuning cheat-sheet! A visualization that guides the System Administrator to quickly overcome the most common hurdles to algorithm deployment. [1]http://techsuppdiva.github.io/ FAQ 2: Once I know which Spark Tuning parameters I need, how do I enforce them at the user level? job level? algorithm level? project level? cluster level? Solution 2: Well approach these challenges using job & cluster configuration, the Spark context, and 3rd party tools of which Alpine will be one example. Well operationalize Spark parameters according to user, job, algorithm, workflow pipeline, or cluster levels. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/bidasse2016final-160214054407-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> by Anya Bida and Rachel Warren from Alpine Data https://spark-summit.org/east-2016/events/spark-tuning-for-enterprise-system-administrators/ Spark offers the promise of speed, but many enterprises are reluctant to make the leap from Hadoop to Spark. Indeed, System Administrators will face many challenges with tuning Spark performance. This talk is a gentle introduction to Spark Tuning for the Enterprise System Administrator, based on experience assisting two enterprise companies running Spark in yarn-cluster mode. The initial challenges can be categorized in two FAQs. First, with so many Spark Tuning parameters, how do I know which parameters are important for which jobs? Second, once I know which Spark Tuning parameters I need, how do I enforce them for the various users submitting various jobs to my cluster? This introduction to Spark Tuning will enable enterprise system administrators to overcome common issues quickly and focus on more advanced Spark Tuning challenges. The audience will understand the cheat-sheet posted here: http://techsuppdiva.github.io/ Key takeaways: FAQ 1: With so many Spark Tuning parameters, how do I know which parameters are important for which jobs? Solution 1: The Spark Tuning cheat-sheet! A visualization that guides the System Administrator to quickly overcome the most common hurdles to algorithm deployment. [1]http://techsuppdiva.github.io/ FAQ 2: Once I know which Spark Tuning parameters I need, how do I enforce them at the user level? job level? algorithm level? project level? cluster level? Solution 2: Well approach these challenges using job &amp; cluster configuration, the Spark context, and 3rd party tools of which Alpine will be one example. Well operationalize Spark parameters according to user, job, algorithm, workflow pipeline, or cluster levels.
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016 from Anya Bida
]]>
4371 9 https://cdn.slidesharecdn.com/ss_thumbnails/bidasse2016final-160214054407-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
https://public.slidesharecdn.com/v2/images/profile-picture.png https://cdn.slidesharecdn.com/ss_thumbnails/runapachesparkonkubernetesinlargescalechallengesandsolutions-2-220621170328-fd08b4a3-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/run-apache-spark-on-kubernetes-in-large-scale-challenges-and-solutions2pdf/252031752 Run Apache Spark on Ku... https://cdn.slidesharecdn.com/ss_thumbnails/wit-justenoughdevopsfordatascientists-180330072852-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/just-enough-devops-for-data-scientists-part-ii-handling-infra-failures-when-running-spark-handling-infra-failures-when-running-spark/92388717 Just Enough DevOps for... https://cdn.slidesharecdn.com/ss_thumbnails/justenoughdevopsfordatascientists-171118235701-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/justenoughdevopsfordatascientists/82293013 JustEnoughDevOpsForDat...