際際滷shows by User: kkrugler / http://www.slideshare.net/images/logo.gif 際際滷shows by User: kkrugler / Wed, 11 May 2016 19:38:43 GMT 際際滷Share feed for 際際滷shows by User: kkrugler Faster Workflows, Faster /slideshow/faster-workflows-faster/61920390 scaleunlimited-kkruglerapacheconbd2016-160511193843
際際滷s from my talk on using Flink to run complex big data workflows defined via Cascading's Java API.]]>

際際滷s from my talk on using Flink to run complex big data workflows defined via Cascading's Java API.]]>
Wed, 11 May 2016 19:38:43 GMT /slideshow/faster-workflows-faster/61920390 kkrugler@slideshare.net(kkrugler) Faster Workflows, Faster kkrugler 際際滷s from my talk on using Flink to run complex big data workflows defined via Cascading's Java API. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/scaleunlimited-kkruglerapacheconbd2016-160511193843-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> 際際滷s from my talk on using Flink to run complex big data workflows defined via Cascading&#39;s Java API.
Faster Workflows, Faster from Ken Krugler
]]>
677 4 https://cdn.slidesharecdn.com/ss_thumbnails/scaleunlimited-kkruglerapacheconbd2016-160511193843-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Similarity at scale /slideshow/similarity-at-scale/35922441 similarityatscale-140616090303-phpapp02
This is a presentation I gave at Hadoop Summit San Jose 2014, on doing fuzzy matching at large scale using combinations of Hadoop & Solr-based techniques.]]>

This is a presentation I gave at Hadoop Summit San Jose 2014, on doing fuzzy matching at large scale using combinations of Hadoop & Solr-based techniques.]]>
Mon, 16 Jun 2014 09:03:03 GMT /slideshow/similarity-at-scale/35922441 kkrugler@slideshare.net(kkrugler) Similarity at scale kkrugler This is a presentation I gave at Hadoop Summit San Jose 2014, on doing fuzzy matching at large scale using combinations of Hadoop & Solr-based techniques. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/similarityatscale-140616090303-phpapp02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> This is a presentation I gave at Hadoop Summit San Jose 2014, on doing fuzzy matching at large scale using combinations of Hadoop &amp; Solr-based techniques.
Similarity at scale from Ken Krugler
]]>
2946 8 https://cdn.slidesharecdn.com/ss_thumbnails/similarityatscale-140616090303-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Suicide Risk Prediction Using Social Media and Cassandra /slideshow/scale-unlimited-kkrugler/23113041 scaleunlimited-kkrugler-130617103853-phpapp01
Presentation at Cassandra Summit 2013]]>

Presentation at Cassandra Summit 2013]]>
Mon, 17 Jun 2013 10:38:53 GMT /slideshow/scale-unlimited-kkrugler/23113041 kkrugler@slideshare.net(kkrugler) Suicide Risk Prediction Using Social Media and Cassandra kkrugler Presentation at Cassandra Summit 2013 <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/scaleunlimited-kkrugler-130617103853-phpapp01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Presentation at Cassandra Summit 2013
Suicide Risk Prediction Using Social Media and Cassandra from Ken Krugler
]]>
1809 5 https://cdn.slidesharecdn.com/ss_thumbnails/scaleunlimited-kkrugler-130617103853-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Faster, Cheaper, Better - Replacing Oracle with Hadoop & Solr /slideshow/thu-225pm-scale-unlimitedkrugler/18131581 thu225pmscaleunlimitedkrugler-130403163952-phpapp01
Our client helps advertisers target publishers/networks and improve ad results by analyzing millions of web pages every day. They have been able to cut monthly costs by more than 50%, improve response time by 4x, and quickly add new features by switching from a traditional DB-centric approach to one based on Hadoop & Solr. This analysis is handled by a complex Hadoop-based workflow, where the end result is a set of unique, highly optimized Solr indexes. The data processing platform provided by Hadoop also enables scalable machine learning using Mahout. This presentation covers some of the unique challenges in switching the web site from relying on slow, expensive real-time analytics using database queries to fast, affordable batch analytics and search using Hadoop and Solr.]]>

Our client helps advertisers target publishers/networks and improve ad results by analyzing millions of web pages every day. They have been able to cut monthly costs by more than 50%, improve response time by 4x, and quickly add new features by switching from a traditional DB-centric approach to one based on Hadoop & Solr. This analysis is handled by a complex Hadoop-based workflow, where the end result is a set of unique, highly optimized Solr indexes. The data processing platform provided by Hadoop also enables scalable machine learning using Mahout. This presentation covers some of the unique challenges in switching the web site from relying on slow, expensive real-time analytics using database queries to fast, affordable batch analytics and search using Hadoop and Solr.]]>
Wed, 03 Apr 2013 16:39:52 GMT /slideshow/thu-225pm-scale-unlimitedkrugler/18131581 kkrugler@slideshare.net(kkrugler) Faster, Cheaper, Better - Replacing Oracle with Hadoop & Solr kkrugler Our client helps advertisers target publishers/networks and improve ad results by analyzing millions of web pages every day. They have been able to cut monthly costs by more than 50%, improve response time by 4x, and quickly add new features by switching from a traditional DB-centric approach to one based on Hadoop & Solr. This analysis is handled by a complex Hadoop-based workflow, where the end result is a set of unique, highly optimized Solr indexes. The data processing platform provided by Hadoop also enables scalable machine learning using Mahout. This presentation covers some of the unique challenges in switching the web site from relying on slow, expensive real-time analytics using database queries to fast, affordable batch analytics and search using Hadoop and Solr. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/thu225pmscaleunlimitedkrugler-130403163952-phpapp01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Our client helps advertisers target publishers/networks and improve ad results by analyzing millions of web pages every day. They have been able to cut monthly costs by more than 50%, improve response time by 4x, and quickly add new features by switching from a traditional DB-centric approach to one based on Hadoop &amp; Solr. This analysis is handled by a complex Hadoop-based workflow, where the end result is a set of unique, highly optimized Solr indexes. The data processing platform provided by Hadoop also enables scalable machine learning using Mahout. This presentation covers some of the unique challenges in switching the web site from relying on slow, expensive real-time analytics using database queries to fast, affordable batch analytics and search using Hadoop and Solr.
Faster, Cheaper, Better - Replacing Oracle with Hadoop & Solr from Ken Krugler
]]>
2150 4 https://cdn.slidesharecdn.com/ss_thumbnails/thu225pmscaleunlimitedkrugler-130403163952-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Strata web mining tutorial /kkrugler/strata-web-mining-tutorial stratawebminingtutorial-120228174751-phpapp02
These are the slides from my tutorial at Strata Santa Clara 2012 today.]]>

These are the slides from my tutorial at Strata Santa Clara 2012 today.]]>
Tue, 28 Feb 2012 17:47:49 GMT /kkrugler/strata-web-mining-tutorial kkrugler@slideshare.net(kkrugler) Strata web mining tutorial kkrugler These are the slides from my tutorial at Strata Santa Clara 2012 today. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/stratawebminingtutorial-120228174751-phpapp02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> These are the slides from my tutorial at Strata Santa Clara 2012 today.
Strata web mining tutorial from Ken Krugler
]]>
14441 9 https://cdn.slidesharecdn.com/ss_thumbnails/stratawebminingtutorial-120228174751-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
A (very) short intro to Hadoop /slideshow/a-very-short-intro-to-hadoop/10637386 shortintrotohadoop-111219085334-phpapp02
A very short introduction to Hadoop, from the talk I gave at the BigDataCamp held in Washington DC this past November 2011. Some of this content is also covered in the various big data classes we offer via on-site training (see http://www.scaleunlimited.com/training/)]]>

A very short introduction to Hadoop, from the talk I gave at the BigDataCamp held in Washington DC this past November 2011. Some of this content is also covered in the various big data classes we offer via on-site training (see http://www.scaleunlimited.com/training/)]]>
Mon, 19 Dec 2011 08:53:33 GMT /slideshow/a-very-short-intro-to-hadoop/10637386 kkrugler@slideshare.net(kkrugler) A (very) short intro to Hadoop kkrugler A very short introduction to Hadoop, from the talk I gave at the BigDataCamp held in Washington DC this past November 2011. Some of this content is also covered in the various big data classes we offer via on-site training (see http://www.scaleunlimited.com/training/) <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/shortintrotohadoop-111219085334-phpapp02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> A very short introduction to Hadoop, from the talk I gave at the BigDataCamp held in Washington DC this past November 2011. Some of this content is also covered in the various big data classes we offer via on-site training (see http://www.scaleunlimited.com/training/)
A (very) short intro to Hadoop from Ken Krugler
]]>
22899 19 https://cdn.slidesharecdn.com/ss_thumbnails/shortintrotohadoop-111219085334-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
A (very) short history of big data /slideshow/a-very-short-history-of-big-data/10632926 shorthistoryofbigdata-111218213303-phpapp01
My lightening talk from the BigDataCamp in Washington, DC this past November (2011).]]>

My lightening talk from the BigDataCamp in Washington, DC this past November (2011).]]>
Sun, 18 Dec 2011 21:33:02 GMT /slideshow/a-very-short-history-of-big-data/10632926 kkrugler@slideshare.net(kkrugler) A (very) short history of big data kkrugler My lightening talk from the BigDataCamp in Washington, DC this past November (2011). <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/shorthistoryofbigdata-111218213303-phpapp01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> My lightening talk from the BigDataCamp in Washington, DC this past November (2011).
A (very) short history of big data from Ken Krugler
]]>
5751 9 https://cdn.slidesharecdn.com/ss_thumbnails/shorthistoryofbigdata-111218213303-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Thinking at scale with hadoop /slideshow/thinking-at-scale-with-hadoop/5286926 thinkingatscalewithhadoop-100925172426-phpapp02
Presentation by Ken Krugler at the SDForum SAM SIG (Software Architecture & Modeling) meeting on Sept 22nd, 2010. This talk provides a brief introduction to Map-Reduce & Hadoop, then discusses challenges of implementing complex data processing using low-level Map-Reduce support, and a number of solutions.]]>

Presentation by Ken Krugler at the SDForum SAM SIG (Software Architecture & Modeling) meeting on Sept 22nd, 2010. This talk provides a brief introduction to Map-Reduce & Hadoop, then discusses challenges of implementing complex data processing using low-level Map-Reduce support, and a number of solutions.]]>
Sat, 25 Sep 2010 17:24:17 GMT /slideshow/thinking-at-scale-with-hadoop/5286926 kkrugler@slideshare.net(kkrugler) Thinking at scale with hadoop kkrugler Presentation by Ken Krugler at the SDForum SAM SIG (Software Architecture & Modeling) meeting on Sept 22nd, 2010. This talk provides a brief introduction to Map-Reduce & Hadoop, then discusses challenges of implementing complex data processing using low-level Map-Reduce support, and a number of solutions. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/thinkingatscalewithhadoop-100925172426-phpapp02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Presentation by Ken Krugler at the SDForum SAM SIG (Software Architecture &amp; Modeling) meeting on Sept 22nd, 2010. This talk provides a brief introduction to Map-Reduce &amp; Hadoop, then discusses challenges of implementing complex data processing using low-level Map-Reduce support, and a number of solutions.
Thinking at scale with hadoop from Ken Krugler
]]>
508 1 https://cdn.slidesharecdn.com/ss_thumbnails/thinkingatscalewithhadoop-100925172426-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Elastic Web Mining /slideshow/elastic-web-mining-2407818/2407818 acmtalk-slideshare-091102203022-phpapp02
PDF version (with notes) of my talk at the ACM Data Mining Unconference on 01 Nov 2009. How to use an open source stack (Hadoop, Cascading, Bixo) in EC2 for cost effective, scalable and reliable web mining.]]>

PDF version (with notes) of my talk at the ACM Data Mining Unconference on 01 Nov 2009. How to use an open source stack (Hadoop, Cascading, Bixo) in EC2 for cost effective, scalable and reliable web mining.]]>
Mon, 02 Nov 2009 20:30:17 GMT /slideshow/elastic-web-mining-2407818/2407818 kkrugler@slideshare.net(kkrugler) Elastic Web Mining kkrugler PDF version (with notes) of my talk at the ACM Data Mining Unconference on 01 Nov 2009. How to use an open source stack (Hadoop, Cascading, Bixo) in EC2 for cost effective, scalable and reliable web mining. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/acmtalk-slideshare-091102203022-phpapp02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> PDF version (with notes) of my talk at the ACM Data Mining Unconference on 01 Nov 2009. How to use an open source stack (Hadoop, Cascading, Bixo) in EC2 for cost effective, scalable and reliable web mining.
Elastic Web Mining from Ken Krugler
]]>
2344 6 https://cdn.slidesharecdn.com/ss_thumbnails/acmtalk-slideshare-091102203022-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds document Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Elastic Web Mining /slideshow/elastic-web-mining/2407600 acmuctalk-091102194640-phpapp02
My talk at the ACM Data Mining Unconference on 01 Nov 2009. How to use an open source stack (Hadoop, Cascading, Bixo) in EC2 for cost effective, scalable and reliable web mining.]]>

My talk at the ACM Data Mining Unconference on 01 Nov 2009. How to use an open source stack (Hadoop, Cascading, Bixo) in EC2 for cost effective, scalable and reliable web mining.]]>
Mon, 02 Nov 2009 19:46:29 GMT /slideshow/elastic-web-mining/2407600 kkrugler@slideshare.net(kkrugler) Elastic Web Mining kkrugler My talk at the ACM Data Mining Unconference on 01 Nov 2009. How to use an open source stack (Hadoop, Cascading, Bixo) in EC2 for cost effective, scalable and reliable web mining. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/acmuctalk-091102194640-phpapp02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> My talk at the ACM Data Mining Unconference on 01 Nov 2009. How to use an open source stack (Hadoop, Cascading, Bixo) in EC2 for cost effective, scalable and reliable web mining.
Elastic Web Mining from Ken Krugler
]]>
1252 9 https://cdn.slidesharecdn.com/ss_thumbnails/acmuctalk-091102194640-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
https://cdn.slidesharecdn.com/profile-photo-kkrugler-48x48.jpg?cb=1677694385 President of Scale Unlimited. Consulting and training for big data processing and web mining problems, using Hadoop, Cascading, Cassandra and Solr. Apache Software Foundation member, committer for Tika open source content extraction toolkit www.scaleunlimited.com https://cdn.slidesharecdn.com/ss_thumbnails/scaleunlimited-kkruglerapacheconbd2016-160511193843-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/faster-workflows-faster/61920390 Faster Workflows, Faster https://cdn.slidesharecdn.com/ss_thumbnails/similarityatscale-140616090303-phpapp02-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/similarity-at-scale/35922441 Similarity at scale https://cdn.slidesharecdn.com/ss_thumbnails/scaleunlimited-kkrugler-130617103853-phpapp01-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/scale-unlimited-kkrugler/23113041 Suicide Risk Predictio...