ºÝºÝߣshows by User: jcrobak / http://www.slideshare.net/images/logo.gif ºÝºÝߣshows by User: jcrobak / Fri, 06 Sep 2013 08:25:43 GMT ºÝºÝߣShare feed for ºÝºÝߣshows by User: jcrobak Workflow Engines for Hadoop /slideshow/data-engineermeetup-201309/25956867 dataengineermeetup-2013-09-130906082543-
Building a reliable pipeline of data ingress, batch computation, and data egress with Hadoop can be a major challenge. Most folks start out with cron to manage workflows, but soon discover that doesn't scale past a handful of jobs. There are a number of open-source workflow engines with support for Hadoop, including Azkaban (from LinkedIn), Luigi (from Spotify), and Apache Oozie. Having deployed all three of these systems in production, Joe will talk about what features and qualities are important for a workflow system. ]]>

Building a reliable pipeline of data ingress, batch computation, and data egress with Hadoop can be a major challenge. Most folks start out with cron to manage workflows, but soon discover that doesn't scale past a handful of jobs. There are a number of open-source workflow engines with support for Hadoop, including Azkaban (from LinkedIn), Luigi (from Spotify), and Apache Oozie. Having deployed all three of these systems in production, Joe will talk about what features and qualities are important for a workflow system. ]]>
Fri, 06 Sep 2013 08:25:43 GMT /slideshow/data-engineermeetup-201309/25956867 jcrobak@slideshare.net(jcrobak) Workflow Engines for Hadoop jcrobak Building a reliable pipeline of data ingress, batch computation, and data egress with Hadoop can be a major challenge. Most folks start out with cron to manage workflows, but soon discover that doesn't scale past a handful of jobs. There are a number of open-source workflow engines with support for Hadoop, including Azkaban (from LinkedIn), Luigi (from Spotify), and Apache Oozie. Having deployed all three of these systems in production, Joe will talk about what features and qualities are important for a workflow system. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/dataengineermeetup-2013-09-130906082543--thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Building a reliable pipeline of data ingress, batch computation, and data egress with Hadoop can be a major challenge. Most folks start out with cron to manage workflows, but soon discover that doesn&#39;t scale past a handful of jobs. There are a number of open-source workflow engines with support for Hadoop, including Azkaban (from LinkedIn), Luigi (from Spotify), and Apache Oozie. Having deployed all three of these systems in production, Joe will talk about what features and qualities are important for a workflow system.
Workflow Engines for Hadoop from Joe Crobak
]]>
38406 20 https://cdn.slidesharecdn.com/ss_thumbnails/dataengineermeetup-2013-09-130906082543--thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
https://cdn.slidesharecdn.com/profile-photo-jcrobak-48x48.jpg?cb=1523619625 I’m passionate about creating scalable and efficient software for the toughest challenges in distributed systems. I enjoy participating in all phases of software engineering, especially system evaluation, architecture design, and deployment. I strive to automate everything and be involved with all parts of systems including production support. Specialties: • The Hadoop stack: HDFS, MapReduce, Hive, Luigi, Kafka, Zookeeper. • High-performance database and distributed caching systems: Memcached, MySQL, HBase • Java & Scala • Linux system administration, deployment, and automation using Puppet and Chef http://www.crobak.org