際際滷shows by User: huguk / http://www.slideshare.net/images/logo.gif 際際滷shows by User: huguk / Sun, 18 Sep 2016 09:00:12 GMT 際際滷Share feed for 際際滷shows by User: huguk Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta /slideshow/data-wrangling-on-hadoop-olivier-de-garrigues-trifacta/66135246 trifactapresentationhuguk-160918090012
As Hadoop became mainstream, the need to simplify and speed up analytics processes grew rapidly. Data wrangling emerged as a necessary step in any analytical pipeline, and is often considered to be its crux, taking as much as 80% of an analyst's time. In this presentation we will discuss how data wrangling solutions can be leveraged to streamline, strengthen and improve data analytics initiatives on Hadoop, including use cases from Trifacta customers. Bio: Olivier is EMEA Solutions Lead at Trifacta. He has 7 years experience in analytics with prior roles as technical lead for business analytics at Splunk and quantitative analyst at Accenture and Aon.]]>

As Hadoop became mainstream, the need to simplify and speed up analytics processes grew rapidly. Data wrangling emerged as a necessary step in any analytical pipeline, and is often considered to be its crux, taking as much as 80% of an analyst's time. In this presentation we will discuss how data wrangling solutions can be leveraged to streamline, strengthen and improve data analytics initiatives on Hadoop, including use cases from Trifacta customers. Bio: Olivier is EMEA Solutions Lead at Trifacta. He has 7 years experience in analytics with prior roles as technical lead for business analytics at Splunk and quantitative analyst at Accenture and Aon.]]>
Sun, 18 Sep 2016 09:00:12 GMT /slideshow/data-wrangling-on-hadoop-olivier-de-garrigues-trifacta/66135246 huguk@slideshare.net(huguk) Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta huguk As Hadoop became mainstream, the need to simplify and speed up analytics processes grew rapidly. Data wrangling emerged as a necessary step in any analytical pipeline, and is often considered to be its crux, taking as much as 80% of an analyst's time. In this presentation we will discuss how data wrangling solutions can be leveraged to streamline, strengthen and improve data analytics initiatives on Hadoop, including use cases from Trifacta customers. Bio: Olivier is EMEA Solutions Lead at Trifacta. He has 7 years experience in analytics with prior roles as technical lead for business analytics at Splunk and quantitative analyst at Accenture and Aon. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/trifactapresentationhuguk-160918090012-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> As Hadoop became mainstream, the need to simplify and speed up analytics processes grew rapidly. Data wrangling emerged as a necessary step in any analytical pipeline, and is often considered to be its crux, taking as much as 80% of an analyst&#39;s time. In this presentation we will discuss how data wrangling solutions can be leveraged to streamline, strengthen and improve data analytics initiatives on Hadoop, including use cases from Trifacta customers. Bio: Olivier is EMEA Solutions Lead at Trifacta. He has 7 years experience in analytics with prior roles as technical lead for business analytics at Splunk and quantitative analyst at Accenture and Aon.
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta from huguk
]]>
1305 4 https://cdn.slidesharecdn.com/ss_thumbnails/trifactapresentationhuguk-160918090012-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
ether.camp - Hackathon & ether.camp intro /slideshow/ethercamp-hackathon-ethercamp-intro/65909196 hackethercamp-5mins-stevemb2-160911163940
Stephen Taylor is the community manager for Ether Camp. They provide an analysis tool for the Ethereum blockchain, Block Explorer and also an Intergrated Development Environment (I.D.E) that empowers developers to build, test and deploy applications in a sandbox environment. This November they are launching their second annual hackathon, hack.ether.camp which is aiming to deliver a more sustained approach to the hackathon ideology, by utilising blockchain technology.]]>

Stephen Taylor is the community manager for Ether Camp. They provide an analysis tool for the Ethereum blockchain, Block Explorer and also an Intergrated Development Environment (I.D.E) that empowers developers to build, test and deploy applications in a sandbox environment. This November they are launching their second annual hackathon, hack.ether.camp which is aiming to deliver a more sustained approach to the hackathon ideology, by utilising blockchain technology.]]>
Sun, 11 Sep 2016 16:39:39 GMT /slideshow/ethercamp-hackathon-ethercamp-intro/65909196 huguk@slideshare.net(huguk) ether.camp - Hackathon & ether.camp intro huguk Stephen Taylor is the community manager for Ether Camp. They provide an analysis tool for the Ethereum blockchain, Block Explorer and also an Intergrated Development Environment (I.D.E) that empowers developers to build, test and deploy applications in a sandbox environment. This November they are launching their second annual hackathon, hack.ether.camp which is aiming to deliver a more sustained approach to the hackathon ideology, by utilising blockchain technology. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/hackethercamp-5mins-stevemb2-160911163940-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Stephen Taylor is the community manager for Ether Camp. They provide an analysis tool for the Ethereum blockchain, Block Explorer and also an Intergrated Development Environment (I.D.E) that empowers developers to build, test and deploy applications in a sandbox environment. This November they are launching their second annual hackathon, hack.ether.camp which is aiming to deliver a more sustained approach to the hackathon ideology, by utilising blockchain technology.
ether.camp - Hackathon & ether.camp intro from huguk
]]>
510 1 https://cdn.slidesharecdn.com/ss_thumbnails/hackethercamp-5mins-stevemb2-160911163940-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop /slideshow/google-cloud-dataproc-easier-faster-more-costeffective-spark-and-hadoop-65909195/65909195 googleclouddataproc-lonhug-160911163939
At Google Cloud Platform, we're combining the Apache Spark and Hadoop ecosystem with our software and hardware innovations. We want to make these awesome tools easier, faster, and more cost-effective, from 3 to 30,000 cores. This presentation will showcase how Google Cloud Platform is innovating with the goal of bringing the Hadoop ecosystem to everyone. Bio: "I love data because it surrounds us - everything is data. I also love open source software, because it shows what is possible when people come together to solve common problems with technology. While they are awesome on their own, I am passionate about combining the power of open source software with the potential unlimited uses of data. That's why I joined Google. I am a product manager for Google Cloud Platform and manage Cloud Dataproc and Apache Beam (incubating). I've previously spent time hanging out at Disney and Amazon. Beyond Google, love data, amateur radio, Disneyland, photography, running and Legos."]]>

At Google Cloud Platform, we're combining the Apache Spark and Hadoop ecosystem with our software and hardware innovations. We want to make these awesome tools easier, faster, and more cost-effective, from 3 to 30,000 cores. This presentation will showcase how Google Cloud Platform is innovating with the goal of bringing the Hadoop ecosystem to everyone. Bio: "I love data because it surrounds us - everything is data. I also love open source software, because it shows what is possible when people come together to solve common problems with technology. While they are awesome on their own, I am passionate about combining the power of open source software with the potential unlimited uses of data. That's why I joined Google. I am a product manager for Google Cloud Platform and manage Cloud Dataproc and Apache Beam (incubating). I've previously spent time hanging out at Disney and Amazon. Beyond Google, love data, amateur radio, Disneyland, photography, running and Legos."]]>
Sun, 11 Sep 2016 16:39:39 GMT /slideshow/google-cloud-dataproc-easier-faster-more-costeffective-spark-and-hadoop-65909195/65909195 huguk@slideshare.net(huguk) Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop huguk At Google Cloud Platform, we're combining the Apache Spark and Hadoop ecosystem with our software and hardware innovations. We want to make these awesome tools easier, faster, and more cost-effective, from 3 to 30,000 cores. This presentation will showcase how Google Cloud Platform is innovating with the goal of bringing the Hadoop ecosystem to everyone. Bio: "I love data because it surrounds us - everything is data. I also love open source software, because it shows what is possible when people come together to solve common problems with technology. While they are awesome on their own, I am passionate about combining the power of open source software with the potential unlimited uses of data. That's why I joined Google. I am a product manager for Google Cloud Platform and manage Cloud Dataproc and Apache Beam (incubating). I've previously spent time hanging out at Disney and Amazon. Beyond Google, love data, amateur radio, Disneyland, photography, running and Legos." <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/googleclouddataproc-lonhug-160911163939-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> At Google Cloud Platform, we&#39;re combining the Apache Spark and Hadoop ecosystem with our software and hardware innovations. We want to make these awesome tools easier, faster, and more cost-effective, from 3 to 30,000 cores. This presentation will showcase how Google Cloud Platform is innovating with the goal of bringing the Hadoop ecosystem to everyone. Bio: &quot;I love data because it surrounds us - everything is data. I also love open source software, because it shows what is possible when people come together to solve common problems with technology. While they are awesome on their own, I am passionate about combining the power of open source software with the potential unlimited uses of data. That&#39;s why I joined Google. I am a product manager for Google Cloud Platform and manage Cloud Dataproc and Apache Beam (incubating). I&#39;ve previously spent time hanging out at Disney and Amazon. Beyond Google, love data, amateur radio, Disneyland, photography, running and Legos.&quot;
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop from huguk
]]>
6571 12 https://cdn.slidesharecdn.com/ss_thumbnails/googleclouddataproc-lonhug-160911163939-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox, digital.Arup /slideshow/using-big-data-techniques-to-query-and-store-openstreetmap-data-stephen-knox-digitalarup/59162324 usingbigdatatechniqueswithopenstreetmaplondonhadoop-160306192306
This talk will describe his research into using Hadoop to query and manage big geographic datasets, specifically OpenStreetMap(OSM). OSM is an open-source map of the world, growing at a large rate, currently around 5TB of data. The talk will introduce OSM, detail some aspects of the research, but also discuss his experiences with using the SpatialHadoop stack on Azure and Google Cloud.]]>

This talk will describe his research into using Hadoop to query and manage big geographic datasets, specifically OpenStreetMap(OSM). OSM is an open-source map of the world, growing at a large rate, currently around 5TB of data. The talk will introduce OSM, detail some aspects of the research, but also discuss his experiences with using the SpatialHadoop stack on Azure and Google Cloud.]]>
Sun, 06 Mar 2016 19:23:06 GMT /slideshow/using-big-data-techniques-to-query-and-store-openstreetmap-data-stephen-knox-digitalarup/59162324 huguk@slideshare.net(huguk) Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox, digital.Arup huguk This talk will describe his research into using Hadoop to query and manage big geographic datasets, specifically OpenStreetMap(OSM). OSM is an open-source map of the world, growing at a large rate, currently around 5TB of data. The talk will introduce OSM, detail some aspects of the research, but also discuss his experiences with using the SpatialHadoop stack on Azure and Google Cloud. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/usingbigdatatechniqueswithopenstreetmaplondonhadoop-160306192306-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> This talk will describe his research into using Hadoop to query and manage big geographic datasets, specifically OpenStreetMap(OSM). OSM is an open-source map of the world, growing at a large rate, currently around 5TB of data. The talk will introduce OSM, detail some aspects of the research, but also discuss his experiences with using the SpatialHadoop stack on Azure and Google Cloud.
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox, digital.Arup from huguk
]]>
2407 9 https://cdn.slidesharecdn.com/ss_thumbnails/usingbigdatatechniqueswithopenstreetmaplondonhadoop-160306192306-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Extracting maximum value from data while protecting consumer privacy. Jason McFall, Privitar /huguk/extracting-maximum-value-from-data-while-protecting-consumer-privacy-jason-mcfall-privitar jasonmcfallhugfeb2016-160306192052
Big organisations have a wealth of rich customer data which opens up huge new opportunities. However, they have the challenge of how to extract value from this data while protecting the privacy of their individual customers. He will talk about the risks organisations face, and what they should do about it. He will survey the techniques which can be used to make data safe for analysis, and talk briefly about how they are solving this problem at Privitar. ]]>

Big organisations have a wealth of rich customer data which opens up huge new opportunities. However, they have the challenge of how to extract value from this data while protecting the privacy of their individual customers. He will talk about the risks organisations face, and what they should do about it. He will survey the techniques which can be used to make data safe for analysis, and talk briefly about how they are solving this problem at Privitar. ]]>
Sun, 06 Mar 2016 19:20:52 GMT /huguk/extracting-maximum-value-from-data-while-protecting-consumer-privacy-jason-mcfall-privitar huguk@slideshare.net(huguk) Extracting maximum value from data while protecting consumer privacy. Jason McFall, Privitar huguk Big organisations have a wealth of rich customer data which opens up huge new opportunities. However, they have the challenge of how to extract value from this data while protecting the privacy of their individual customers. He will talk about the risks organisations face, and what they should do about it. He will survey the techniques which can be used to make data safe for analysis, and talk briefly about how they are solving this problem at Privitar. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/jasonmcfallhugfeb2016-160306192052-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Big organisations have a wealth of rich customer data which opens up huge new opportunities. However, they have the challenge of how to extract value from this data while protecting the privacy of their individual customers. He will talk about the risks organisations face, and what they should do about it. He will survey the techniques which can be used to make data safe for analysis, and talk briefly about how they are solving this problem at Privitar.
Extracting maximum value from data while protecting consumer privacy. Jason McFall, Privitar from huguk
]]>
668 6 https://cdn.slidesharecdn.com/ss_thumbnails/jasonmcfallhugfeb2016-160306192052-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson /slideshow/intelligence-augmented-vs-artificial-intelligence-alex-flamant-ibm-watson/59162047 hadoopmeetup-160306191522
IBM is developing the Watson Ecosystem to leverage its Developer Cloud, APIs, Content Store and Talent Hub. This is part of IBM's recent announcement of the $1B investment in Watson as a new business unit including Silicon Alley NYC headquarters. For the first time, IBM will open up Watson as a development platform in the Cloud to spur innovation and fuel a new ecosystem of entrepreneurial software app providers who will bring forward a new generation of applications infused with Watson's cognitive computing intelligence.]]>

IBM is developing the Watson Ecosystem to leverage its Developer Cloud, APIs, Content Store and Talent Hub. This is part of IBM's recent announcement of the $1B investment in Watson as a new business unit including Silicon Alley NYC headquarters. For the first time, IBM will open up Watson as a development platform in the Cloud to spur innovation and fuel a new ecosystem of entrepreneurial software app providers who will bring forward a new generation of applications infused with Watson's cognitive computing intelligence.]]>
Sun, 06 Mar 2016 19:15:22 GMT /slideshow/intelligence-augmented-vs-artificial-intelligence-alex-flamant-ibm-watson/59162047 huguk@slideshare.net(huguk) Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson huguk IBM is developing the Watson Ecosystem to leverage its Developer Cloud, APIs, Content Store and Talent Hub. This is part of IBM's recent announcement of the $1B investment in Watson as a new business unit including Silicon Alley NYC headquarters. For the first time, IBM will open up Watson as a development platform in the Cloud to spur innovation and fuel a new ecosystem of entrepreneurial software app providers who will bring forward a new generation of applications infused with Watson's cognitive computing intelligence. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/hadoopmeetup-160306191522-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> IBM is developing the Watson Ecosystem to leverage its Developer Cloud, APIs, Content Store and Talent Hub. This is part of IBM&#39;s recent announcement of the $1B investment in Watson as a new business unit including Silicon Alley NYC headquarters. For the first time, IBM will open up Watson as a development platform in the Cloud to spur innovation and fuel a new ecosystem of entrepreneurial software app providers who will bring forward a new generation of applications infused with Watson&#39;s cognitive computing intelligence.
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson from huguk
]]>
698 9 https://cdn.slidesharecdn.com/ss_thumbnails/hadoopmeetup-160306191522-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Streaming Dataflow with Apache Flink /slideshow/streaming-dataflow-with-apache-flink/54634625 hug-london-151102085759-lva1-app6892
In this talk about Apache Flink we will touch on three main things, an introductory look at Flink, a look under the hood and a demo. * In the introduction we will briefly look at the history of Flink and then go on to the API and different use cases. Here we will also see how it can be deployed in practice and what some of the pitfalls in a cluster setting can be. * In the second section we will look at the streaming execution engine that lies at the heart of Flink. Here we will see what makes it tick and also what distinguishes it from other approaches, such as the mini-batch execution model. Ufuk Celebi - PMC member at Apache Flink and co-founder and software engineer at data Artisans * In the final section we will see a live demo of a fault-tolerant streaming job that performs analysis of the wikipedia edit-stream.]]>

In this talk about Apache Flink we will touch on three main things, an introductory look at Flink, a look under the hood and a demo. * In the introduction we will briefly look at the history of Flink and then go on to the API and different use cases. Here we will also see how it can be deployed in practice and what some of the pitfalls in a cluster setting can be. * In the second section we will look at the streaming execution engine that lies at the heart of Flink. Here we will see what makes it tick and also what distinguishes it from other approaches, such as the mini-batch execution model. Ufuk Celebi - PMC member at Apache Flink and co-founder and software engineer at data Artisans * In the final section we will see a live demo of a fault-tolerant streaming job that performs analysis of the wikipedia edit-stream.]]>
Mon, 02 Nov 2015 08:57:59 GMT /slideshow/streaming-dataflow-with-apache-flink/54634625 huguk@slideshare.net(huguk) Streaming Dataflow with Apache Flink huguk In this talk about Apache Flink we will touch on three main things, an introductory look at Flink, a look under the hood and a demo. * In the introduction we will briefly look at the history of Flink and then go on to the API and different use cases. Here we will also see how it can be deployed in practice and what some of the pitfalls in a cluster setting can be. * In the second section we will look at the streaming execution engine that lies at the heart of Flink. Here we will see what makes it tick and also what distinguishes it from other approaches, such as the mini-batch execution model. Ufuk Celebi - PMC member at Apache Flink and co-founder and software engineer at data Artisans * In the final section we will see a live demo of a fault-tolerant streaming job that performs analysis of the wikipedia edit-stream. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/hug-london-151102085759-lva1-app6892-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> In this talk about Apache Flink we will touch on three main things, an introductory look at Flink, a look under the hood and a demo. * In the introduction we will briefly look at the history of Flink and then go on to the API and different use cases. Here we will also see how it can be deployed in practice and what some of the pitfalls in a cluster setting can be. * In the second section we will look at the streaming execution engine that lies at the heart of Flink. Here we will see what makes it tick and also what distinguishes it from other approaches, such as the mini-batch execution model. Ufuk Celebi - PMC member at Apache Flink and co-founder and software engineer at data Artisans * In the final section we will see a live demo of a fault-tolerant streaming job that performs analysis of the wikipedia edit-stream.
Streaming Dataflow with Apache Flink from huguk
]]>
1158 4 https://cdn.slidesharecdn.com/ss_thumbnails/hug-london-151102085759-lva1-app6892-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Lambda architecture on Spark, Kafka for real-time large scale ML /slideshow/lambda-architecture-on-spark-kafka-for-realtime-large-scale-ml/54634624 oryx2-151102085756-lva1-app6892
Sean Owen Director of Data Science @Cloudera Building machine learning models is all well and good, but how do they get productionized into a service? It's a long way from a Python script on a laptop, to a fault-tolerant system that learns continuously, serves thousands of queries per second, and scales to terabytes. The confederation of open source technologies we know as Hadoop now offers data scientists the raw materials from which to assemble an answer: the means to build models but also ingest data and serve queries, at scale. This short talk will introduce Oryx 2, a blueprint for building this type of service on Hadoop technologies. It will survey the problem and the standard technologies and ideas that Oryx 2 combines: Apache Spark, Kafka, HDFS, the lambda architecture, PMML, REST APIs. The talk will touch on a key use case for this architecture -- recommendation engines.]]>

Sean Owen Director of Data Science @Cloudera Building machine learning models is all well and good, but how do they get productionized into a service? It's a long way from a Python script on a laptop, to a fault-tolerant system that learns continuously, serves thousands of queries per second, and scales to terabytes. The confederation of open source technologies we know as Hadoop now offers data scientists the raw materials from which to assemble an answer: the means to build models but also ingest data and serve queries, at scale. This short talk will introduce Oryx 2, a blueprint for building this type of service on Hadoop technologies. It will survey the problem and the standard technologies and ideas that Oryx 2 combines: Apache Spark, Kafka, HDFS, the lambda architecture, PMML, REST APIs. The talk will touch on a key use case for this architecture -- recommendation engines.]]>
Mon, 02 Nov 2015 08:57:56 GMT /slideshow/lambda-architecture-on-spark-kafka-for-realtime-large-scale-ml/54634624 huguk@slideshare.net(huguk) Lambda architecture on Spark, Kafka for real-time large scale ML huguk Sean Owen Director of Data Science @Cloudera Building machine learning models is all well and good, but how do they get productionized into a service? It's a long way from a Python script on a laptop, to a fault-tolerant system that learns continuously, serves thousands of queries per second, and scales to terabytes. The confederation of open source technologies we know as Hadoop now offers data scientists the raw materials from which to assemble an answer: the means to build models but also ingest data and serve queries, at scale. This short talk will introduce Oryx 2, a blueprint for building this type of service on Hadoop technologies. It will survey the problem and the standard technologies and ideas that Oryx 2 combines: Apache Spark, Kafka, HDFS, the lambda architecture, PMML, REST APIs. The talk will touch on a key use case for this architecture -- recommendation engines. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/oryx2-151102085756-lva1-app6892-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Sean Owen Director of Data Science @Cloudera Building machine learning models is all well and good, but how do they get productionized into a service? It&#39;s a long way from a Python script on a laptop, to a fault-tolerant system that learns continuously, serves thousands of queries per second, and scales to terabytes. The confederation of open source technologies we know as Hadoop now offers data scientists the raw materials from which to assemble an answer: the means to build models but also ingest data and serve queries, at scale. This short talk will introduce Oryx 2, a blueprint for building this type of service on Hadoop technologies. It will survey the problem and the standard technologies and ideas that Oryx 2 combines: Apache Spark, Kafka, HDFS, the lambda architecture, PMML, REST APIs. The talk will touch on a key use case for this architecture -- recommendation engines.
Lambda architecture on Spark, Kafka for real-time large scale ML from huguk
]]>
2697 5 https://cdn.slidesharecdn.com/ss_thumbnails/oryx2-151102085756-lva1-app6892-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Todays reality Hadoop with Spark- How to select the best Data Science approach when using Big Data Platforms and Technologies? /huguk/todays-reality-hadoop-with-spark-how-to-select-the-best-data-science-approach-when-using-big-data-platforms-and-technologies hugthinkbig-151102085347-lva1-app6891
Martin Oberhuber and Eliano Marques, Senior Data Scientists @Think Big International In this talk Think Big International Lead Data Scientists will discuss the options that exist today for engineering and data science teams aiming to use big data patterns to solve new business problems. With the enterprise adoption of the Hadoop ecosystem and the emerging momentum of open source projects like Spark it is becoming mandatory to have an approach that solves for business results but remains flexible to adapt and change with the open source market.]]>

Martin Oberhuber and Eliano Marques, Senior Data Scientists @Think Big International In this talk Think Big International Lead Data Scientists will discuss the options that exist today for engineering and data science teams aiming to use big data patterns to solve new business problems. With the enterprise adoption of the Hadoop ecosystem and the emerging momentum of open source projects like Spark it is becoming mandatory to have an approach that solves for business results but remains flexible to adapt and change with the open source market.]]>
Mon, 02 Nov 2015 08:53:46 GMT /huguk/todays-reality-hadoop-with-spark-how-to-select-the-best-data-science-approach-when-using-big-data-platforms-and-technologies huguk@slideshare.net(huguk) Todays reality Hadoop with Spark- How to select the best Data Science approach when using Big Data Platforms and Technologies? huguk Martin Oberhuber and Eliano Marques, Senior Data Scientists @Think Big International In this talk Think Big International Lead Data Scientists will discuss the options that exist today for engineering and data science teams aiming to use big data patterns to solve new business problems. With the enterprise adoption of the Hadoop ecosystem and the emerging momentum of open source projects like Spark it is becoming mandatory to have an approach that solves for business results but remains flexible to adapt and change with the open source market. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/hugthinkbig-151102085347-lva1-app6891-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Martin Oberhuber and Eliano Marques, Senior Data Scientists @Think Big International In this talk Think Big International Lead Data Scientists will discuss the options that exist today for engineering and data science teams aiming to use big data patterns to solve new business problems. With the enterprise adoption of the Hadoop ecosystem and the emerging momentum of open source projects like Spark it is becoming mandatory to have an approach that solves for business results but remains flexible to adapt and change with the open source market.
Todays reality Hadoop with Spark- How to select the best Data Science approach when using Big Data Platforms and Technologies? from huguk
]]>
767 4 https://cdn.slidesharecdn.com/ss_thumbnails/hugthinkbig-151102085347-lva1-app6891-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Jonathon Southam: Venture Capital, Funding & Pitching /slideshow/jonathon-southam-venture-capital-funding-pitching/52835605 vcfundingpitchingjonathonsoutham-150916074037-lva1-app6891
Keynote about going from an idea to a startup, pitch to a VC and get funding.]]>

Keynote about going from an idea to a startup, pitch to a VC and get funding.]]>
Wed, 16 Sep 2015 07:40:37 GMT /slideshow/jonathon-southam-venture-capital-funding-pitching/52835605 huguk@slideshare.net(huguk) Jonathon Southam: Venture Capital, Funding & Pitching huguk Keynote about going from an idea to a startup, pitch to a VC and get funding. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/vcfundingpitchingjonathonsoutham-150916074037-lva1-app6891-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Keynote about going from an idea to a startup, pitch to a VC and get funding.
Jonathon Southam: Venture Capital, Funding & Pitching from huguk
]]>
1184 7 https://cdn.slidesharecdn.com/ss_thumbnails/vcfundingpitchingjonathonsoutham-150916074037-lva1-app6891-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Signal Media: Real-Time Media & News Monitoring /slideshow/signal-media-realtime-media-news-monitoring/52835601 signalmediawesleyhall-150916074036-lva1-app6892
Startup pitch presented by CTO Wesley Hall. Signal Media is a real-time media and news monitoring platform that tracks media outlets. News items are analysed for brand & media monitoring as well as market intelligence.]]>

Startup pitch presented by CTO Wesley Hall. Signal Media is a real-time media and news monitoring platform that tracks media outlets. News items are analysed for brand & media monitoring as well as market intelligence.]]>
Wed, 16 Sep 2015 07:40:36 GMT /slideshow/signal-media-realtime-media-news-monitoring/52835601 huguk@slideshare.net(huguk) Signal Media: Real-Time Media & News Monitoring huguk Startup pitch presented by CTO Wesley Hall. Signal Media is a real-time media and news monitoring platform that tracks media outlets. News items are analysed for brand & media monitoring as well as market intelligence. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/signalmediawesleyhall-150916074036-lva1-app6892-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Startup pitch presented by CTO Wesley Hall. Signal Media is a real-time media and news monitoring platform that tracks media outlets. News items are analysed for brand &amp; media monitoring as well as market intelligence.
Signal Media: Real-Time Media & News Monitoring from huguk
]]>
657 4 https://cdn.slidesharecdn.com/ss_thumbnails/signalmediawesleyhall-150916074036-lva1-app6892-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Dean Bryen: Scaling The Platform For Your Startup /slideshow/dean-bryen-scaling-the-platform-for-your-startup/52835600 scalingplatformforyourstartupdeanbryen-150916074036-lva1-app6892
Keynote about scaling your startup on the AWS platform presented by Dean Bryen, AWS Solutions Architect.]]>

Keynote about scaling your startup on the AWS platform presented by Dean Bryen, AWS Solutions Architect.]]>
Wed, 16 Sep 2015 07:40:36 GMT /slideshow/dean-bryen-scaling-the-platform-for-your-startup/52835600 huguk@slideshare.net(huguk) Dean Bryen: Scaling The Platform For Your Startup huguk Keynote about scaling your startup on the AWS platform presented by Dean Bryen, AWS Solutions Architect. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/scalingplatformforyourstartupdeanbryen-150916074036-lva1-app6892-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Keynote about scaling your startup on the AWS platform presented by Dean Bryen, AWS Solutions Architect.
Dean Bryen: Scaling The Platform For Your Startup from huguk
]]>
544 4 https://cdn.slidesharecdn.com/ss_thumbnails/scalingplatformforyourstartupdeanbryen-150916074036-lva1-app6892-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Peter Karney: Intro to the Digital catapult /huguk/peter-karney-intro-to-the-digital-catapult digitalcatapultpeterkarney-150916074035-lva1-app6891
Introduction to the Digital Catapult presented by Peter Karney, head of engineering.]]>

Introduction to the Digital Catapult presented by Peter Karney, head of engineering.]]>
Wed, 16 Sep 2015 07:40:35 GMT /huguk/peter-karney-intro-to-the-digital-catapult huguk@slideshare.net(huguk) Peter Karney: Intro to the Digital catapult huguk Introduction to the Digital Catapult presented by Peter Karney, head of engineering. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/digitalcatapultpeterkarney-150916074035-lva1-app6891-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Introduction to the Digital Catapult presented by Peter Karney, head of engineering.
Peter Karney: Intro to the Digital catapult from huguk
]]>
1037 6 https://cdn.slidesharecdn.com/ss_thumbnails/digitalcatapultpeterkarney-150916074035-lva1-app6891-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Cytora: Real-Time Political Risk Analysis /slideshow/cytora-realtime-political-risk-analysis/52835597 cytoraaeneaswiener-150916074032-lva1-app6892
Startup pitch presented by Aeneas Wiener. Cytora is a real-time geopolitical risk analysis platform that extracts events from open-source intelligence and evaluates these events on their geopolitical impact.]]>

Startup pitch presented by Aeneas Wiener. Cytora is a real-time geopolitical risk analysis platform that extracts events from open-source intelligence and evaluates these events on their geopolitical impact.]]>
Wed, 16 Sep 2015 07:40:32 GMT /slideshow/cytora-realtime-political-risk-analysis/52835597 huguk@slideshare.net(huguk) Cytora: Real-Time Political Risk Analysis huguk Startup pitch presented by Aeneas Wiener. Cytora is a real-time geopolitical risk analysis platform that extracts events from open-source intelligence and evaluates these events on their geopolitical impact. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/cytoraaeneaswiener-150916074032-lva1-app6892-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Startup pitch presented by Aeneas Wiener. Cytora is a real-time geopolitical risk analysis platform that extracts events from open-source intelligence and evaluates these events on their geopolitical impact.
Cytora: Real-Time Political Risk Analysis from huguk
]]>
1313 7 https://cdn.slidesharecdn.com/ss_thumbnails/cytoraaeneaswiener-150916074032-lva1-app6892-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Cubitic: Predictive Analytics /slideshow/cubitic-predictive-analytics/52835596 cubiticjacoels-150916074032-lva1-app6892
Startup pitch presented by co-founder and CEO Jaco Els. Cubitic offers a predictive analytics platform that allows developers to build custom solutions for analytics and visualisation on top of a machine learning engine.]]>

Startup pitch presented by co-founder and CEO Jaco Els. Cubitic offers a predictive analytics platform that allows developers to build custom solutions for analytics and visualisation on top of a machine learning engine.]]>
Wed, 16 Sep 2015 07:40:32 GMT /slideshow/cubitic-predictive-analytics/52835596 huguk@slideshare.net(huguk) Cubitic: Predictive Analytics huguk Startup pitch presented by co-founder and CEO Jaco Els. Cubitic offers a predictive analytics platform that allows developers to build custom solutions for analytics and visualisation on top of a machine learning engine. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/cubiticjacoels-150916074032-lva1-app6892-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Startup pitch presented by co-founder and CEO Jaco Els. Cubitic offers a predictive analytics platform that allows developers to build custom solutions for analytics and visualisation on top of a machine learning engine.
Cubitic: Predictive Analytics from huguk
]]>
611 4 https://cdn.slidesharecdn.com/ss_thumbnails/cubiticjacoels-150916074032-lva1-app6892-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Bird.i: Earth Observation Data Made Social /slideshow/birdi-earth-observation-data-made-social/52835594 bird-150916074031-lva1-app6891
Startup pitch presented by co-founder and CEO Corentin Guillo. Bird.i is building a platform for up-to-date earth observation data that will bring satellite imagery to the mass market. Providing fresh imagery together with analytics around the forecast of localised demand opens up innovative opportunities in sectors like construction, tourism, real-estate and remote facility monitoring.]]>

Startup pitch presented by co-founder and CEO Corentin Guillo. Bird.i is building a platform for up-to-date earth observation data that will bring satellite imagery to the mass market. Providing fresh imagery together with analytics around the forecast of localised demand opens up innovative opportunities in sectors like construction, tourism, real-estate and remote facility monitoring.]]>
Wed, 16 Sep 2015 07:40:30 GMT /slideshow/birdi-earth-observation-data-made-social/52835594 huguk@slideshare.net(huguk) Bird.i: Earth Observation Data Made Social huguk Startup pitch presented by co-founder and CEO Corentin Guillo. Bird.i is building a platform for up-to-date earth observation data that will bring satellite imagery to the mass market. Providing fresh imagery together with analytics around the forecast of localised demand opens up innovative opportunities in sectors like construction, tourism, real-estate and remote facility monitoring. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/bird-150916074031-lva1-app6891-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Startup pitch presented by co-founder and CEO Corentin Guillo. Bird.i is building a platform for up-to-date earth observation data that will bring satellite imagery to the mass market. Providing fresh imagery together with analytics around the forecast of localised demand opens up innovative opportunities in sectors like construction, tourism, real-estate and remote facility monitoring.
Bird.i: Earth Observation Data Made Social from huguk
]]>
1013 4 https://cdn.slidesharecdn.com/ss_thumbnails/bird-150916074031-lva1-app6891-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Aiseedo: Real Time Machine Intelligence /slideshow/aiseedo-real-time-machine-intelligence/52835580 aiseedonicgrenwaylaureandrieux-150916074011-lva1-app6892
Startup pitch presented by co-founders Laure Andrieux and Nic Greenway. Aiseedo applies real-time machine learning, where the model of the world is constantly updated, to build adaptive systems which can be applied to robotics, the Internet of Things and healthcare.]]>

Startup pitch presented by co-founders Laure Andrieux and Nic Greenway. Aiseedo applies real-time machine learning, where the model of the world is constantly updated, to build adaptive systems which can be applied to robotics, the Internet of Things and healthcare.]]>
Wed, 16 Sep 2015 07:40:11 GMT /slideshow/aiseedo-real-time-machine-intelligence/52835580 huguk@slideshare.net(huguk) Aiseedo: Real Time Machine Intelligence huguk Startup pitch presented by co-founders Laure Andrieux and Nic Greenway. Aiseedo applies real-time machine learning, where the model of the world is constantly updated, to build adaptive systems which can be applied to robotics, the Internet of Things and healthcare. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/aiseedonicgrenwaylaureandrieux-150916074011-lva1-app6892-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Startup pitch presented by co-founders Laure Andrieux and Nic Greenway. Aiseedo applies real-time machine learning, where the model of the world is constantly updated, to build adaptive systems which can be applied to robotics, the Internet of Things and healthcare.
Aiseedo: Real Time Machine Intelligence from huguk
]]>
660 5 https://cdn.slidesharecdn.com/ss_thumbnails/aiseedonicgrenwaylaureandrieux-150916074011-lva1-app6892-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Secrets of Spark's success - Deenar Toraskar, Think Reactive /huguk/why-spark-50456579 whyspark-150713075423-lva1-app6891
This talk will cover the design and implementation decisions that have been key to the success of Apache Spark over other competing cluster computing frameworks. It will be delving into the whitepaper behind Spark and cover the design of Spark RDDs, the abstraction enables the Spark execution engine to be extended to support a wide variety of use cases: Spark SQL, Spark Streaming, MLib and GraphX. RDDs allow Spark to outperform existing models by up to 100x in multi-pass analytics.]]>

This talk will cover the design and implementation decisions that have been key to the success of Apache Spark over other competing cluster computing frameworks. It will be delving into the whitepaper behind Spark and cover the design of Spark RDDs, the abstraction enables the Spark execution engine to be extended to support a wide variety of use cases: Spark SQL, Spark Streaming, MLib and GraphX. RDDs allow Spark to outperform existing models by up to 100x in multi-pass analytics.]]>
Mon, 13 Jul 2015 07:54:23 GMT /huguk/why-spark-50456579 huguk@slideshare.net(huguk) Secrets of Spark's success - Deenar Toraskar, Think Reactive huguk This talk will cover the design and implementation decisions that have been key to the success of Apache Spark over other competing cluster computing frameworks. It will be delving into the whitepaper behind Spark and cover the design of Spark RDDs, the abstraction enables the Spark execution engine to be extended to support a wide variety of use cases: Spark SQL, Spark Streaming, MLib and GraphX. RDDs allow Spark to outperform existing models by up to 100x in multi-pass analytics. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/whyspark-150713075423-lva1-app6891-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> This talk will cover the design and implementation decisions that have been key to the success of Apache Spark over other competing cluster computing frameworks. It will be delving into the whitepaper behind Spark and cover the design of Spark RDDs, the abstraction enables the Spark execution engine to be extended to support a wide variety of use cases: Spark SQL, Spark Streaming, MLib and GraphX. RDDs allow Spark to outperform existing models by up to 100x in multi-pass analytics.
Secrets of Spark's success - Deenar Toraskar, Think Reactive from huguk
]]>
834 6 https://cdn.slidesharecdn.com/ss_thumbnails/whyspark-150713075423-lva1-app6891-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewalski & Cyril Papadacci, King /slideshow/150604-hadoop-and-tv-marketing/50456526 150604-hadoopandtvmarketing-150713075239-lva1-app6892
Technical developments in the area of data warehousing have allowed companies to push their analysis a step further and, therefore, allowed data scientists to deliver more value to business areas. In that session, we will focus on the case of performance marketing at King and demonstrate how we use Hadoop capabilities to exploit user-level data efficiently. That approach results in obtaining a more holistic view in a return-on-investment analysis of TV advertisement.]]>

Technical developments in the area of data warehousing have allowed companies to push their analysis a step further and, therefore, allowed data scientists to deliver more value to business areas. In that session, we will focus on the case of performance marketing at King and demonstrate how we use Hadoop capabilities to exploit user-level data efficiently. That approach results in obtaining a more holistic view in a return-on-investment analysis of TV advertisement.]]>
Mon, 13 Jul 2015 07:52:39 GMT /slideshow/150604-hadoop-and-tv-marketing/50456526 huguk@slideshare.net(huguk) TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewalski & Cyril Papadacci, King huguk Technical developments in the area of data warehousing have allowed companies to push their analysis a step further and, therefore, allowed data scientists to deliver more value to business areas. In that session, we will focus on the case of performance marketing at King and demonstrate how we use Hadoop capabilities to exploit user-level data efficiently. That approach results in obtaining a more holistic view in a return-on-investment analysis of TV advertisement. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/150604-hadoopandtvmarketing-150713075239-lva1-app6892-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Technical developments in the area of data warehousing have allowed companies to push their analysis a step further and, therefore, allowed data scientists to deliver more value to business areas. In that session, we will focus on the case of performance marketing at King and demonstrate how we use Hadoop capabilities to exploit user-level data efficiently. That approach results in obtaining a more holistic view in a return-on-investment analysis of TV advertisement.
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewalski & Cyril Papadacci, King from huguk
]]>
856 4 https://cdn.slidesharecdn.com/ss_thumbnails/150604-hadoopandtvmarketing-150713075239-lva1-app6892-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Hadoop - Looking to the Future By Arun Murthy /slideshow/london-hug-20150413/47982802 londonhug20150413-150511065608-lva1-app6891
Hadoop - Looking to the Future By Arun Murthy (Founder of Hortonworks, Creator of YARN) The Apache Hadoop ecosystem began as just HDFS & MapReduce nearly 10 years ago in 2006. Very much like the Ship of Theseus (http://en.wikipedia.org/wiki/Ship_of_Theseus), Hadoop has undergone incredible amount of transformation from multi-purpose YARN to interactive SQL with Hive/Tez to machine learning with Spark. Much more lies ahead: whether you want sub-second SQL with Hive or use SSDs/Memory effectively in HDFS or manage Metadata-driven security policies in Ranger, the Hadoop ecosystem in the Apache Software Foundation continues to evolve to meet new challenges and use-cases. Arun C Murthy has been involved with Apache Hadoop since the beginning of the project - nearly 10 years now. In the beginning he led MapReduce, went on to create YARN and then drove Tez & the Stinger effort to get to interactive & sub-second Hive. Recently he has been very involved in the Metadata and Governance efforts. In between he founded Hortonworks, the first public Hadoop distribution company.]]>

Hadoop - Looking to the Future By Arun Murthy (Founder of Hortonworks, Creator of YARN) The Apache Hadoop ecosystem began as just HDFS & MapReduce nearly 10 years ago in 2006. Very much like the Ship of Theseus (http://en.wikipedia.org/wiki/Ship_of_Theseus), Hadoop has undergone incredible amount of transformation from multi-purpose YARN to interactive SQL with Hive/Tez to machine learning with Spark. Much more lies ahead: whether you want sub-second SQL with Hive or use SSDs/Memory effectively in HDFS or manage Metadata-driven security policies in Ranger, the Hadoop ecosystem in the Apache Software Foundation continues to evolve to meet new challenges and use-cases. Arun C Murthy has been involved with Apache Hadoop since the beginning of the project - nearly 10 years now. In the beginning he led MapReduce, went on to create YARN and then drove Tez & the Stinger effort to get to interactive & sub-second Hive. Recently he has been very involved in the Metadata and Governance efforts. In between he founded Hortonworks, the first public Hadoop distribution company.]]>
Mon, 11 May 2015 06:56:08 GMT /slideshow/london-hug-20150413/47982802 huguk@slideshare.net(huguk) Hadoop - Looking to the Future By Arun Murthy huguk Hadoop - Looking to the Future By Arun Murthy (Founder of Hortonworks, Creator of YARN) The Apache Hadoop ecosystem began as just HDFS & MapReduce nearly 10 years ago in 2006. Very much like the Ship of Theseus (http://en.wikipedia.org/wiki/Ship_of_Theseus), Hadoop has undergone incredible amount of transformation from multi-purpose YARN to interactive SQL with Hive/Tez to machine learning with Spark. Much more lies ahead: whether you want sub-second SQL with Hive or use SSDs/Memory effectively in HDFS or manage Metadata-driven security policies in Ranger, the Hadoop ecosystem in the Apache Software Foundation continues to evolve to meet new challenges and use-cases. Arun C Murthy has been involved with Apache Hadoop since the beginning of the project - nearly 10 years now. In the beginning he led MapReduce, went on to create YARN and then drove Tez & the Stinger effort to get to interactive & sub-second Hive. Recently he has been very involved in the Metadata and Governance efforts. In between he founded Hortonworks, the first public Hadoop distribution company. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/londonhug20150413-150511065608-lva1-app6891-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Hadoop - Looking to the Future By Arun Murthy (Founder of Hortonworks, Creator of YARN) The Apache Hadoop ecosystem began as just HDFS &amp; MapReduce nearly 10 years ago in 2006. Very much like the Ship of Theseus (http://en.wikipedia.org/wiki/Ship_of_Theseus), Hadoop has undergone incredible amount of transformation from multi-purpose YARN to interactive SQL with Hive/Tez to machine learning with Spark. Much more lies ahead: whether you want sub-second SQL with Hive or use SSDs/Memory effectively in HDFS or manage Metadata-driven security policies in Ranger, the Hadoop ecosystem in the Apache Software Foundation continues to evolve to meet new challenges and use-cases. Arun C Murthy has been involved with Apache Hadoop since the beginning of the project - nearly 10 years now. In the beginning he led MapReduce, went on to create YARN and then drove Tez &amp; the Stinger effort to get to interactive &amp; sub-second Hive. Recently he has been very involved in the Metadata and Governance efforts. In between he founded Hortonworks, the first public Hadoop distribution company.
Hadoop - Looking to the Future By Arun Murthy from huguk
]]>
763 1 https://cdn.slidesharecdn.com/ss_thumbnails/londonhug20150413-150511065608-lva1-app6891-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
https://cdn.slidesharecdn.com/profile-photo-huguk-48x48.jpg?cb=1523054285 https://cdn.slidesharecdn.com/ss_thumbnails/trifactapresentationhuguk-160918090012-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/data-wrangling-on-hadoop-olivier-de-garrigues-trifacta/66135246 Data Wrangling on Hado... https://cdn.slidesharecdn.com/ss_thumbnails/hackethercamp-5mins-stevemb2-160911163940-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/ethercamp-hackathon-ethercamp-intro/65909196 ether.camp - Hackathon... https://cdn.slidesharecdn.com/ss_thumbnails/googleclouddataproc-lonhug-160911163939-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/google-cloud-dataproc-easier-faster-more-costeffective-spark-and-hadoop-65909195/65909195 Google Cloud Dataproc ...