ºÝºÝߣshows by User: thelabdude / http://www.slideshare.net/images/logo.gif ºÝºÝߣshows by User: thelabdude / Tue, 19 Sep 2017 19:11:10 GMT ºÝºÝߣShare feed for ºÝºÝߣshows by User: thelabdude Running Solr in the Cloud at Memory Speed with Alluxio /slideshow/running-solr-in-the-cloud-at-memory-speed-with-alluxio/79949790 rev-2017-alluxio-170919191110
In this talk, I introduce Alluxio, the fastest growing open source project in the big data ecosystem, and show how to leverage it for optimizing Solr performance. I'll begin with a brief introduction about how Alluxio works and why it's interesting for the Solr community. Next, I describe how to run Solr on Alluxio and cover basic integration scenarios. Lastly, I provide some performance comparisons between running Solr on Alluxio vs. a local FS and HDFS. Attendees will come away with a new toolset to help them use Solr to tackle a wide array of big data problems.]]>

In this talk, I introduce Alluxio, the fastest growing open source project in the big data ecosystem, and show how to leverage it for optimizing Solr performance. I'll begin with a brief introduction about how Alluxio works and why it's interesting for the Solr community. Next, I describe how to run Solr on Alluxio and cover basic integration scenarios. Lastly, I provide some performance comparisons between running Solr on Alluxio vs. a local FS and HDFS. Attendees will come away with a new toolset to help them use Solr to tackle a wide array of big data problems.]]>
Tue, 19 Sep 2017 19:11:10 GMT /slideshow/running-solr-in-the-cloud-at-memory-speed-with-alluxio/79949790 thelabdude@slideshare.net(thelabdude) Running Solr in the Cloud at Memory Speed with Alluxio thelabdude In this talk, I introduce Alluxio, the fastest growing open source project in the big data ecosystem, and show how to leverage it for optimizing Solr performance. I'll begin with a brief introduction about how Alluxio works and why it's interesting for the Solr community. Next, I describe how to run Solr on Alluxio and cover basic integration scenarios. Lastly, I provide some performance comparisons between running Solr on Alluxio vs. a local FS and HDFS. Attendees will come away with a new toolset to help them use Solr to tackle a wide array of big data problems. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/rev-2017-alluxio-170919191110-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> In this talk, I introduce Alluxio, the fastest growing open source project in the big data ecosystem, and show how to leverage it for optimizing Solr performance. I&#39;ll begin with a brief introduction about how Alluxio works and why it&#39;s interesting for the Solr community. Next, I describe how to run Solr on Alluxio and cover basic integration scenarios. Lastly, I provide some performance comparisons between running Solr on Alluxio vs. a local FS and HDFS. Attendees will come away with a new toolset to help them use Solr to tackle a wide array of big data problems.
Running Solr in the Cloud at Memory Speed with Alluxio from thelabdude
]]>
3312 5 https://cdn.slidesharecdn.com/ss_thumbnails/rev-2017-alluxio-170919191110-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
NYC Lucene/Solr Meetup: Spark / Solr /slideshow/nyc-lucenesolr-meetup-spark-solr/50095073 sparksolrnymeetup-150702133736-lva1-app6891
]]>

]]>
Thu, 02 Jul 2015 13:37:36 GMT /slideshow/nyc-lucenesolr-meetup-spark-solr/50095073 thelabdude@slideshare.net(thelabdude) NYC Lucene/Solr Meetup: Spark / Solr thelabdude <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/sparksolrnymeetup-150702133736-lva1-app6891-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br>
NYC Lucene/Solr Meetup: Spark / Solr from thelabdude
]]>
1801 6 https://cdn.slidesharecdn.com/ss_thumbnails/sparksolrnymeetup-150702133736-lva1-app6891-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
ApacheCon NA 2015 Spark / Solr Integration /slideshow/apachecon-na-2015-spark-solr-integration/47143130 apacheconsparksolr-150418090244-conversion-gate01
Apache Solr has been adopted by all major Hadoop platform vendors because of its ability to scale horizontally to meet even the most demanding big data search problems. Apache Spark has emerged as the leading platform for real-time big data analytics and machine learning. In this presentation, Timothy Potter presents several common use cases for integrating Solr and Spark. Specifically, Tim covers how to populate Solr from a Spark streaming job as well as how to expose the results of any Solr query as an RDD. The Solr RDD makes efficient use of deep paging cursors and SolrCloud sharding to maximize parallel computation in Spark. After covering basic use cases, Tim digs a little deeper to show how to use MLLib to enrich documents before indexing in Solr, such as sentiment analysis (logistic regression), language detection, and topic modeling (LDA), and document classification.]]>

Apache Solr has been adopted by all major Hadoop platform vendors because of its ability to scale horizontally to meet even the most demanding big data search problems. Apache Spark has emerged as the leading platform for real-time big data analytics and machine learning. In this presentation, Timothy Potter presents several common use cases for integrating Solr and Spark. Specifically, Tim covers how to populate Solr from a Spark streaming job as well as how to expose the results of any Solr query as an RDD. The Solr RDD makes efficient use of deep paging cursors and SolrCloud sharding to maximize parallel computation in Spark. After covering basic use cases, Tim digs a little deeper to show how to use MLLib to enrich documents before indexing in Solr, such as sentiment analysis (logistic regression), language detection, and topic modeling (LDA), and document classification.]]>
Sat, 18 Apr 2015 09:02:44 GMT /slideshow/apachecon-na-2015-spark-solr-integration/47143130 thelabdude@slideshare.net(thelabdude) ApacheCon NA 2015 Spark / Solr Integration thelabdude Apache Solr has been adopted by all major Hadoop platform vendors because of its ability to scale horizontally to meet even the most demanding big data search problems. Apache Spark has emerged as the leading platform for real-time big data analytics and machine learning. In this presentation, Timothy Potter presents several common use cases for integrating Solr and Spark. Specifically, Tim covers how to populate Solr from a Spark streaming job as well as how to expose the results of any Solr query as an RDD. The Solr RDD makes efficient use of deep paging cursors and SolrCloud sharding to maximize parallel computation in Spark. After covering basic use cases, Tim digs a little deeper to show how to use MLLib to enrich documents before indexing in Solr, such as sentiment analysis (logistic regression), language detection, and topic modeling (LDA), and document classification. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/apacheconsparksolr-150418090244-conversion-gate01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Apache Solr has been adopted by all major Hadoop platform vendors because of its ability to scale horizontally to meet even the most demanding big data search problems. Apache Spark has emerged as the leading platform for real-time big data analytics and machine learning. In this presentation, Timothy Potter presents several common use cases for integrating Solr and Spark. Specifically, Tim covers how to populate Solr from a Spark streaming job as well as how to expose the results of any Solr query as an RDD. The Solr RDD makes efficient use of deep paging cursors and SolrCloud sharding to maximize parallel computation in Spark. After covering basic use cases, Tim digs a little deeper to show how to use MLLib to enrich documents before indexing in Solr, such as sentiment analysis (logistic regression), language detection, and topic modeling (LDA), and document classification.
ApacheCon NA 2015 Spark / Solr Integration from thelabdude
]]>
5545 9 https://cdn.slidesharecdn.com/ss_thumbnails/apacheconsparksolr-150418090244-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Benchmarking Solr Performance at Scale /slideshow/solr-performance/38977250 solr-perf-meetup-140911103405-phpapp02
Organizations continue to adopt Solr because of its ability to scale to meet even the most demanding workflows. Recently, LucidWorks has been leading the effort to identify, measure, and expand the limits of Solr. As part of this effort, we've learned a few things along the way that should prove useful for any organization wanting to scale Solr. Attendees will come away with a better understanding of how sharding and replication impact performance. Also, no benchmark is useful without being repeatable; Tim will also cover how to perform similar tests using the Solr-Scale-Toolkit in Amazon EC2.​]]>

Organizations continue to adopt Solr because of its ability to scale to meet even the most demanding workflows. Recently, LucidWorks has been leading the effort to identify, measure, and expand the limits of Solr. As part of this effort, we've learned a few things along the way that should prove useful for any organization wanting to scale Solr. Attendees will come away with a better understanding of how sharding and replication impact performance. Also, no benchmark is useful without being repeatable; Tim will also cover how to perform similar tests using the Solr-Scale-Toolkit in Amazon EC2.​]]>
Thu, 11 Sep 2014 10:34:05 GMT /slideshow/solr-performance/38977250 thelabdude@slideshare.net(thelabdude) Benchmarking Solr Performance at Scale thelabdude Organizations continue to adopt Solr because of its ability to scale to meet even the most demanding workflows. Recently, LucidWorks has been leading the effort to identify, measure, and expand the limits of Solr. As part of this effort, we've learned a few things along the way that should prove useful for any organization wanting to scale Solr. Attendees will come away with a better understanding of how sharding and replication impact performance. Also, no benchmark is useful without being repeatable; Tim will also cover how to perform similar tests using the Solr-Scale-Toolkit in Amazon EC2.​ <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/solr-perf-meetup-140911103405-phpapp02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Organizations continue to adopt Solr because of its ability to scale to meet even the most demanding workflows. Recently, LucidWorks has been leading the effort to identify, measure, and expand the limits of Solr. As part of this effort, we&#39;ve learned a few things along the way that should prove useful for any organization wanting to scale Solr. Attendees will come away with a better understanding of how sharding and replication impact performance. Also, no benchmark is useful without being repeatable; Tim will also cover how to perform similar tests using the Solr-Scale-Toolkit in Amazon EC2.​
Benchmarking Solr Performance at Scale from thelabdude
]]>
18275 8 https://cdn.slidesharecdn.com/ss_thumbnails/solr-perf-meetup-140911103405-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Solr Exchange: Introduction to SolrCloud /slideshow/solr-exchange-introtosolrcloud/33761366 solrexchangeintrotosolrcloud-140421104742-phpapp01
SolrCloud is a set of features in Apache Solr that enable elastic scaling of search indexes using sharding and replication. In this presentation, Tim Potter will provide an architectural overview of SolrCloud and highlight its most important features. Specifically, Tim covers topics such as: sharding, replication, ZooKeeper fundamentals, leaders/replicas, and failure/recovery scenarios. Any discussion of a complex distributed system would not be complete without a discussion of the CAP theorem. Mr. Potter will describe why Solr is considered a CP system and how that impacts the design of a search application.]]>

SolrCloud is a set of features in Apache Solr that enable elastic scaling of search indexes using sharding and replication. In this presentation, Tim Potter will provide an architectural overview of SolrCloud and highlight its most important features. Specifically, Tim covers topics such as: sharding, replication, ZooKeeper fundamentals, leaders/replicas, and failure/recovery scenarios. Any discussion of a complex distributed system would not be complete without a discussion of the CAP theorem. Mr. Potter will describe why Solr is considered a CP system and how that impacts the design of a search application.]]>
Mon, 21 Apr 2014 10:47:42 GMT /slideshow/solr-exchange-introtosolrcloud/33761366 thelabdude@slideshare.net(thelabdude) Solr Exchange: Introduction to SolrCloud thelabdude SolrCloud is a set of features in Apache Solr that enable elastic scaling of search indexes using sharding and replication. In this presentation, Tim Potter will provide an architectural overview of SolrCloud and highlight its most important features. Specifically, Tim covers topics such as: sharding, replication, ZooKeeper fundamentals, leaders/replicas, and failure/recovery scenarios. Any discussion of a complex distributed system would not be complete without a discussion of the CAP theorem. Mr. Potter will describe why Solr is considered a CP system and how that impacts the design of a search application. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/solrexchangeintrotosolrcloud-140421104742-phpapp01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> SolrCloud is a set of features in Apache Solr that enable elastic scaling of search indexes using sharding and replication. In this presentation, Tim Potter will provide an architectural overview of SolrCloud and highlight its most important features. Specifically, Tim covers topics such as: sharding, replication, ZooKeeper fundamentals, leaders/replicas, and failure/recovery scenarios. Any discussion of a complex distributed system would not be complete without a discussion of the CAP theorem. Mr. Potter will describe why Solr is considered a CP system and how that impacts the design of a search application.
Solr Exchange: Introduction to SolrCloud from thelabdude
]]>
6974 5 https://cdn.slidesharecdn.com/ss_thumbnails/solrexchangeintrotosolrcloud-140421104742-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit /slideshow/apache-con-managingsolrcloudinthecloud/33289636 apacheconmanagingsolrcloudinthecloud-140408151256-phpapp02
SolrCloud is a set of features in Apache Solr that enable elastic scaling of search indexes using sharding and replication. In this presentation, Tim Potter will demonstrate how to provision, configure, and manage a SolrCloud cluster in Amazon EC2, using a Fabric/boto based solution for automating SolrCloud operations. Attendees will come away with a solid understanding of how to operate a large-scale Solr cluster, as well as tools to help them do it. Tim will also demonstrate these tools live during his presentation. Covered technologies, include: Apache Solr, Apache ZooKeeper, Linux, Python, Fabric, boto, Apache Kafka, Apache JMeter.]]>

SolrCloud is a set of features in Apache Solr that enable elastic scaling of search indexes using sharding and replication. In this presentation, Tim Potter will demonstrate how to provision, configure, and manage a SolrCloud cluster in Amazon EC2, using a Fabric/boto based solution for automating SolrCloud operations. Attendees will come away with a solid understanding of how to operate a large-scale Solr cluster, as well as tools to help them do it. Tim will also demonstrate these tools live during his presentation. Covered technologies, include: Apache Solr, Apache ZooKeeper, Linux, Python, Fabric, boto, Apache Kafka, Apache JMeter.]]>
Tue, 08 Apr 2014 15:12:56 GMT /slideshow/apache-con-managingsolrcloudinthecloud/33289636 thelabdude@slideshare.net(thelabdude) Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit thelabdude SolrCloud is a set of features in Apache Solr that enable elastic scaling of search indexes using sharding and replication. In this presentation, Tim Potter will demonstrate how to provision, configure, and manage a SolrCloud cluster in Amazon EC2, using a Fabric/boto based solution for automating SolrCloud operations. Attendees will come away with a solid understanding of how to operate a large-scale Solr cluster, as well as tools to help them do it. Tim will also demonstrate these tools live during his presentation. Covered technologies, include: Apache Solr, Apache ZooKeeper, Linux, Python, Fabric, boto, Apache Kafka, Apache JMeter. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/apacheconmanagingsolrcloudinthecloud-140408151256-phpapp02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> SolrCloud is a set of features in Apache Solr that enable elastic scaling of search indexes using sharding and replication. In this presentation, Tim Potter will demonstrate how to provision, configure, and manage a SolrCloud cluster in Amazon EC2, using a Fabric/boto based solution for automating SolrCloud operations. Attendees will come away with a solid understanding of how to operate a large-scale Solr cluster, as well as tools to help them do it. Tim will also demonstrate these tools live during his presentation. Covered technologies, include: Apache Solr, Apache ZooKeeper, Linux, Python, Fabric, boto, Apache Kafka, Apache JMeter.
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit from thelabdude
]]>
5224 5 https://cdn.slidesharecdn.com/ss_thumbnails/apacheconmanagingsolrcloudinthecloud-140408151256-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Integrate Solr with real-time stream processing applications /slideshow/lsr-dublin-timothypotterstorm/32549618 lsrdublintimothypotterstorm-140320141022-phpapp02
Storm is a real-time distributed computation system used to process massive streams of data. Many organizations are turning to technologies like Storm to complement batch-oriented big data technologies, such as Hadoop, to deliver time-sensitive analytics at scale. This talk introduces on an emerging architectural pattern of integrating Solr and Storm to process big data in real time. There are a number of natural integration points between Solr and Storm, such as populating a Solr index or supplying data to Storm using Solr’s real-time get support. In this session, Timothy will cover the basic concepts of Storm, such as spouts and bolts. He’ll then provide examples of how to integrate Solr into Storm to perform large-scale indexing in near real-time. In addition, we'll see how to embed Solr in a Storm bolt to match incoming tuples against pre-configured queries, commonly known as percolator. Attendees will come away from this presentation with a good introduction to stream processing technologies and several real-world use cases of how to integrate Solr with Storm.]]>

Storm is a real-time distributed computation system used to process massive streams of data. Many organizations are turning to technologies like Storm to complement batch-oriented big data technologies, such as Hadoop, to deliver time-sensitive analytics at scale. This talk introduces on an emerging architectural pattern of integrating Solr and Storm to process big data in real time. There are a number of natural integration points between Solr and Storm, such as populating a Solr index or supplying data to Storm using Solr’s real-time get support. In this session, Timothy will cover the basic concepts of Storm, such as spouts and bolts. He’ll then provide examples of how to integrate Solr into Storm to perform large-scale indexing in near real-time. In addition, we'll see how to embed Solr in a Storm bolt to match incoming tuples against pre-configured queries, commonly known as percolator. Attendees will come away from this presentation with a good introduction to stream processing technologies and several real-world use cases of how to integrate Solr with Storm.]]>
Thu, 20 Mar 2014 14:10:22 GMT /slideshow/lsr-dublin-timothypotterstorm/32549618 thelabdude@slideshare.net(thelabdude) Integrate Solr with real-time stream processing applications thelabdude Storm is a real-time distributed computation system used to process massive streams of data. Many organizations are turning to technologies like Storm to complement batch-oriented big data technologies, such as Hadoop, to deliver time-sensitive analytics at scale. This talk introduces on an emerging architectural pattern of integrating Solr and Storm to process big data in real time. There are a number of natural integration points between Solr and Storm, such as populating a Solr index or supplying data to Storm using Solr’s real-time get support. In this session, Timothy will cover the basic concepts of Storm, such as spouts and bolts. He’ll then provide examples of how to integrate Solr into Storm to perform large-scale indexing in near real-time. In addition, we'll see how to embed Solr in a Storm bolt to match incoming tuples against pre-configured queries, commonly known as percolator. Attendees will come away from this presentation with a good introduction to stream processing technologies and several real-world use cases of how to integrate Solr with Storm. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/lsrdublintimothypotterstorm-140320141022-phpapp02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Storm is a real-time distributed computation system used to process massive streams of data. Many organizations are turning to technologies like Storm to complement batch-oriented big data technologies, such as Hadoop, to deliver time-sensitive analytics at scale. This talk introduces on an emerging architectural pattern of integrating Solr and Storm to process big data in real time. There are a number of natural integration points between Solr and Storm, such as populating a Solr index or supplying data to Storm using Solr’s real-time get support. In this session, Timothy will cover the basic concepts of Storm, such as spouts and bolts. He’ll then provide examples of how to integrate Solr into Storm to perform large-scale indexing in near real-time. In addition, we&#39;ll see how to embed Solr in a Storm bolt to match incoming tuples against pre-configured queries, commonly known as percolator. Attendees will come away from this presentation with a good introduction to stream processing technologies and several real-world use cases of how to integrate Solr with Storm.
Integrate Solr with real-time stream processing applications from thelabdude
]]>
2273 7 https://cdn.slidesharecdn.com/ss_thumbnails/lsrdublintimothypotterstorm-140320141022-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Scaling Through Partitioning and Shard Splitting in Solr 4 /slideshow/tjp-solr-webinar/24402788 tjpsolrwebinar-130718181116-phpapp02
Over the past several months, Solr has reached a critical milestone of being able to elastically scale-out to handle indexes reaching into the hundreds of millions of documents. At Dachis Group, we've scaled our largest Solr 4 index to nearly 900M documents and growing. As our index grows, so does our need to manage this growth. In practice, it's common for indexes to continue to grow as organizations acquire new data. Over time, even the best designed Solr cluster will reach a point where individual shards are too large to maintain query performance. In this Webinar, you'll learn about new features in Solr to help manage large-scale clusters. Specifically, we'll cover data partitioning and shard splitting. Partitioning helps you organize subsets of data based on data contained in your documents, such as a date or customer ID. We'll see how to use custom hashing to route documents to specific shards during indexing. Shard splitting allows you to split a large shard into 2 smaller shards to increase parallelism during query execution. Attendees will come away from this presentation with a real-world use case that proves Solr 4 is elastically scalable, stable, and is production ready.]]>

Over the past several months, Solr has reached a critical milestone of being able to elastically scale-out to handle indexes reaching into the hundreds of millions of documents. At Dachis Group, we've scaled our largest Solr 4 index to nearly 900M documents and growing. As our index grows, so does our need to manage this growth. In practice, it's common for indexes to continue to grow as organizations acquire new data. Over time, even the best designed Solr cluster will reach a point where individual shards are too large to maintain query performance. In this Webinar, you'll learn about new features in Solr to help manage large-scale clusters. Specifically, we'll cover data partitioning and shard splitting. Partitioning helps you organize subsets of data based on data contained in your documents, such as a date or customer ID. We'll see how to use custom hashing to route documents to specific shards during indexing. Shard splitting allows you to split a large shard into 2 smaller shards to increase parallelism during query execution. Attendees will come away from this presentation with a real-world use case that proves Solr 4 is elastically scalable, stable, and is production ready.]]>
Thu, 18 Jul 2013 18:11:16 GMT /slideshow/tjp-solr-webinar/24402788 thelabdude@slideshare.net(thelabdude) Scaling Through Partitioning and Shard Splitting in Solr 4 thelabdude Over the past several months, Solr has reached a critical milestone of being able to elastically scale-out to handle indexes reaching into the hundreds of millions of documents. At Dachis Group, we've scaled our largest Solr 4 index to nearly 900M documents and growing. As our index grows, so does our need to manage this growth. In practice, it's common for indexes to continue to grow as organizations acquire new data. Over time, even the best designed Solr cluster will reach a point where individual shards are too large to maintain query performance. In this Webinar, you'll learn about new features in Solr to help manage large-scale clusters. Specifically, we'll cover data partitioning and shard splitting. Partitioning helps you organize subsets of data based on data contained in your documents, such as a date or customer ID. We'll see how to use custom hashing to route documents to specific shards during indexing. Shard splitting allows you to split a large shard into 2 smaller shards to increase parallelism during query execution. Attendees will come away from this presentation with a real-world use case that proves Solr 4 is elastically scalable, stable, and is production ready. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/tjpsolrwebinar-130718181116-phpapp02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Over the past several months, Solr has reached a critical milestone of being able to elastically scale-out to handle indexes reaching into the hundreds of millions of documents. At Dachis Group, we&#39;ve scaled our largest Solr 4 index to nearly 900M documents and growing. As our index grows, so does our need to manage this growth. In practice, it&#39;s common for indexes to continue to grow as organizations acquire new data. Over time, even the best designed Solr cluster will reach a point where individual shards are too large to maintain query performance. In this Webinar, you&#39;ll learn about new features in Solr to help manage large-scale clusters. Specifically, we&#39;ll cover data partitioning and shard splitting. Partitioning helps you organize subsets of data based on data contained in your documents, such as a date or customer ID. We&#39;ll see how to use custom hashing to route documents to specific shards during indexing. Shard splitting allows you to split a large shard into 2 smaller shards to increase parallelism during query execution. Attendees will come away from this presentation with a real-world use case that proves Solr 4 is elastically scalable, stable, and is production ready.
Scaling Through Partitioning and Shard Splitting in Solr 4 from thelabdude
]]>
14760 10 https://cdn.slidesharecdn.com/ss_thumbnails/tjpsolrwebinar-130718181116-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Analytics /slideshow/lsr-13-ustimothypotterthurs/20482940 lsr13ustimothypotterthurs-130503094321-phpapp01
My presentation focuses on how we implemented Solr 4 to be the cornerstone of our social marketing analytics platform. Our platform analyzes relationships, behaviors, and conversations between 30,000 brands and 100M social accounts every 15 minutes. Combined with our Hadoop cluster, we have achieved throughput rates greater than 8,000 documents per second. Our index currently contains more than 620M documents and is growing by 3 to 4 million documents per day. My presentation will include details about: 1) Designing a Solr Cloud cluster for scalability and high-availability using sharding and replication with Zookeeper, 2) Operations concerns like how to handle a failed node and monitoring, 3) How we deal with indexing big data from Pig/Hadoop as an example of using the CloudSolrServer in SolrJ and managing searchers for high indexing throughput, 4) Example uses of key features like real-time gets, atomic updates, custom hashing, and distributed facets. Attendees will come away from this presentation with a real-world use case that proves Solr 4 is scalable, stable, and is production ready.]]>

My presentation focuses on how we implemented Solr 4 to be the cornerstone of our social marketing analytics platform. Our platform analyzes relationships, behaviors, and conversations between 30,000 brands and 100M social accounts every 15 minutes. Combined with our Hadoop cluster, we have achieved throughput rates greater than 8,000 documents per second. Our index currently contains more than 620M documents and is growing by 3 to 4 million documents per day. My presentation will include details about: 1) Designing a Solr Cloud cluster for scalability and high-availability using sharding and replication with Zookeeper, 2) Operations concerns like how to handle a failed node and monitoring, 3) How we deal with indexing big data from Pig/Hadoop as an example of using the CloudSolrServer in SolrJ and managing searchers for high indexing throughput, 4) Example uses of key features like real-time gets, atomic updates, custom hashing, and distributed facets. Attendees will come away from this presentation with a real-world use case that proves Solr 4 is scalable, stable, and is production ready.]]>
Fri, 03 May 2013 09:43:21 GMT /slideshow/lsr-13-ustimothypotterthurs/20482940 thelabdude@slideshare.net(thelabdude) Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Analytics thelabdude My presentation focuses on how we implemented Solr 4 to be the cornerstone of our social marketing analytics platform. Our platform analyzes relationships, behaviors, and conversations between 30,000 brands and 100M social accounts every 15 minutes. Combined with our Hadoop cluster, we have achieved throughput rates greater than 8,000 documents per second. Our index currently contains more than 620M documents and is growing by 3 to 4 million documents per day. My presentation will include details about: 1) Designing a Solr Cloud cluster for scalability and high-availability using sharding and replication with Zookeeper, 2) Operations concerns like how to handle a failed node and monitoring, 3) How we deal with indexing big data from Pig/Hadoop as an example of using the CloudSolrServer in SolrJ and managing searchers for high indexing throughput, 4) Example uses of key features like real-time gets, atomic updates, custom hashing, and distributed facets. Attendees will come away from this presentation with a real-world use case that proves Solr 4 is scalable, stable, and is production ready. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/lsr13ustimothypotterthurs-130503094321-phpapp01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> My presentation focuses on how we implemented Solr 4 to be the cornerstone of our social marketing analytics platform. Our platform analyzes relationships, behaviors, and conversations between 30,000 brands and 100M social accounts every 15 minutes. Combined with our Hadoop cluster, we have achieved throughput rates greater than 8,000 documents per second. Our index currently contains more than 620M documents and is growing by 3 to 4 million documents per day. My presentation will include details about: 1) Designing a Solr Cloud cluster for scalability and high-availability using sharding and replication with Zookeeper, 2) Operations concerns like how to handle a failed node and monitoring, 3) How we deal with indexing big data from Pig/Hadoop as an example of using the CloudSolrServer in SolrJ and managing searchers for high indexing throughput, 4) Example uses of key features like real-time gets, atomic updates, custom hashing, and distributed facets. Attendees will come away from this presentation with a real-world use case that proves Solr 4 is scalable, stable, and is production ready.
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Analytics from thelabdude
]]>
2985 4 https://cdn.slidesharecdn.com/ss_thumbnails/lsr13ustimothypotterthurs-130503094321-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Boosting Documents in Solr (Lucene Revolution 2011) /slideshow/boosting-documents-in-solr-lucene-revolution-2011/16327437 lucenerevolution2011-timthypotter-110608233035-phpapp01-130203132928-phpapp01
Presentation on boosting and/or filtering documents by recency, popularity, and personal preferences.]]>

Presentation on boosting and/or filtering documents by recency, popularity, and personal preferences.]]>
Sun, 03 Feb 2013 13:29:28 GMT /slideshow/boosting-documents-in-solr-lucene-revolution-2011/16327437 thelabdude@slideshare.net(thelabdude) Boosting Documents in Solr (Lucene Revolution 2011) thelabdude Presentation on boosting and/or filtering documents by recency, popularity, and personal preferences. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/lucenerevolution2011-timthypotter-110608233035-phpapp01-130203132928-phpapp01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Presentation on boosting and/or filtering documents by recency, popularity, and personal preferences.
Boosting Documents in Solr (Lucene Revolution 2011) from thelabdude
]]>
2074 4 https://cdn.slidesharecdn.com/ss_thumbnails/lucenerevolution2011-timthypotter-110608233035-phpapp01-130203132928-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Dachis Group Pig Hackday: Pig 202 /slideshow/dachis-group-pig-hackday-pig-202/12895677 dgpig202may112012-120511101605-phpapp02
ºÝºÝߣs for Pig 202 tutorial presented by Timothy Potter at DG Pig Hackday, May 11, 2012]]>

ºÝºÝߣs for Pig 202 tutorial presented by Timothy Potter at DG Pig Hackday, May 11, 2012]]>
Fri, 11 May 2012 10:16:03 GMT /slideshow/dachis-group-pig-hackday-pig-202/12895677 thelabdude@slideshare.net(thelabdude) Dachis Group Pig Hackday: Pig 202 thelabdude ºÝºÝߣs for Pig 202 tutorial presented by Timothy Potter at DG Pig Hackday, May 11, 2012 <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/dgpig202may112012-120511101605-phpapp02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> ºÝºÝߣs for Pig 202 tutorial presented by Timothy Potter at DG Pig Hackday, May 11, 2012
Dachis Group Pig Hackday: Pig 202 from thelabdude
]]>
1036 4 https://cdn.slidesharecdn.com/ss_thumbnails/dgpig202may112012-120511101605-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
https://public.slidesharecdn.com/v2/images/profile-picture.png https://cdn.slidesharecdn.com/ss_thumbnails/rev-2017-alluxio-170919191110-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/running-solr-in-the-cloud-at-memory-speed-with-alluxio/79949790 Running Solr in the Cl... https://cdn.slidesharecdn.com/ss_thumbnails/sparksolrnymeetup-150702133736-lva1-app6891-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/nyc-lucenesolr-meetup-spark-solr/50095073 NYC Lucene/Solr Meetup... https://cdn.slidesharecdn.com/ss_thumbnails/apacheconsparksolr-150418090244-conversion-gate01-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/apachecon-na-2015-spark-solr-integration/47143130 ApacheCon NA 2015 Spar...