ºÝºÝߣshows by User: jjkoshy / http://www.slideshare.net/images/logo.gif ºÝºÝߣshows by User: jjkoshy / Wed, 19 Apr 2017 06:46:13 GMT ºÝºÝߣShare feed for ºÝºÝߣshows by User: jjkoshy Redlining Kafka Pipelines /slideshow/redlining-kafka-pipelines-75165675/75165675 redliningkafka-170419064613
LinkedIn’s deployment of Kafka and its use cases have grown tremendously over the last couple of years. Given our large scale deployments, we keep careful watch on performance and capex. In this talk we will take a close look at some of the shifting performance bottlenecks and cost considerations that we have had to grapple with over the years, and how newer features in Kafka 0.10 and hardware improvements have helped address these. ]]>

LinkedIn’s deployment of Kafka and its use cases have grown tremendously over the last couple of years. Given our large scale deployments, we keep careful watch on performance and capex. In this talk we will take a close look at some of the shifting performance bottlenecks and cost considerations that we have had to grapple with over the years, and how newer features in Kafka 0.10 and hardware improvements have helped address these. ]]>
Wed, 19 Apr 2017 06:46:13 GMT /slideshow/redlining-kafka-pipelines-75165675/75165675 jjkoshy@slideshare.net(jjkoshy) Redlining Kafka Pipelines jjkoshy LinkedIn’s deployment of Kafka and its use cases have grown tremendously over the last couple of years. Given our large scale deployments, we keep careful watch on performance and capex. In this talk we will take a close look at some of the shifting performance bottlenecks and cost considerations that we have had to grapple with over the years, and how newer features in Kafka 0.10 and hardware improvements have helped address these. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/redliningkafka-170419064613-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> LinkedIn’s deployment of Kafka and its use cases have grown tremendously over the last couple of years. Given our large scale deployments, we keep careful watch on performance and capex. In this talk we will take a close look at some of the shifting performance bottlenecks and cost considerations that we have had to grapple with over the years, and how newer features in Kafka 0.10 and hardware improvements have helped address these.
Redlining Kafka Pipelines from Joel Koshy
]]>
1133 6 https://cdn.slidesharecdn.com/ss_thumbnails/redliningkafka-170419064613-thumbnail.jpg?width=120&height=120&fit=bounds presentation 000000 http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Kafkaesque days at linked in in 2015 /jjkoshy/kafkaesque-days-at-linked-in-in-2015 kafkaesquedaysatlinkedinin2015-160427005751
Presented at the inaugural Kafka summit (2016) hosted by Confluent in San Francisco Abstract: Kafka is a backbone for various data pipelines and asynchronous messaging at LinkedIn and beyond. 2015 was an exciting year at LinkedIn in that we hit a new level of scale with Kafka: we now process more than 1 trillion published messages per day across nearly 1300 brokers. We run into some interesting production issues at this scale and I will dive into some of the most critical incidents that we encountered at LinkedIn in the past year: Data loss: We have extremely stringent SLAs on latency and completeness that were violated on a few occasions. Some of these incidents were due to subtle configuration problems or even missing features. Offset resets: As of early 2015, Kafka-based offset management was still a relatively new feature and we occasionally hit offset resets. Troubleshooting these incidents turned out to be extremely tricky and resulted in various fixes in offset management/log compaction as well as our monitoring. Cluster unavailability due to high request/response latencies: Such incidents demonstrate how even subtle performance regressions and monitoring gaps can lead to an eventual cluster meltdown. Power failures! What happens when an entire data center goes down? We experienced this first hand and it was not so pretty. and more… This talk will go over how we detected, investigated and remediated each of these issues and summarize some of the features in Kafka that we are working on that will help eliminate or mitigate such incidents in the future.]]>

Presented at the inaugural Kafka summit (2016) hosted by Confluent in San Francisco Abstract: Kafka is a backbone for various data pipelines and asynchronous messaging at LinkedIn and beyond. 2015 was an exciting year at LinkedIn in that we hit a new level of scale with Kafka: we now process more than 1 trillion published messages per day across nearly 1300 brokers. We run into some interesting production issues at this scale and I will dive into some of the most critical incidents that we encountered at LinkedIn in the past year: Data loss: We have extremely stringent SLAs on latency and completeness that were violated on a few occasions. Some of these incidents were due to subtle configuration problems or even missing features. Offset resets: As of early 2015, Kafka-based offset management was still a relatively new feature and we occasionally hit offset resets. Troubleshooting these incidents turned out to be extremely tricky and resulted in various fixes in offset management/log compaction as well as our monitoring. Cluster unavailability due to high request/response latencies: Such incidents demonstrate how even subtle performance regressions and monitoring gaps can lead to an eventual cluster meltdown. Power failures! What happens when an entire data center goes down? We experienced this first hand and it was not so pretty. and more… This talk will go over how we detected, investigated and remediated each of these issues and summarize some of the features in Kafka that we are working on that will help eliminate or mitigate such incidents in the future.]]>
Wed, 27 Apr 2016 00:57:50 GMT /jjkoshy/kafkaesque-days-at-linked-in-in-2015 jjkoshy@slideshare.net(jjkoshy) Kafkaesque days at linked in in 2015 jjkoshy Presented at the inaugural Kafka summit (2016) hosted by Confluent in San Francisco Abstract: Kafka is a backbone for various data pipelines and asynchronous messaging at LinkedIn and beyond. 2015 was an exciting year at LinkedIn in that we hit a new level of scale with Kafka: we now process more than 1 trillion published messages per day across nearly 1300 brokers. We run into some interesting production issues at this scale and I will dive into some of the most critical incidents that we encountered at LinkedIn in the past year: Data loss: We have extremely stringent SLAs on latency and completeness that were violated on a few occasions. Some of these incidents were due to subtle configuration problems or even missing features. Offset resets: As of early 2015, Kafka-based offset management was still a relatively new feature and we occasionally hit offset resets. Troubleshooting these incidents turned out to be extremely tricky and resulted in various fixes in offset management/log compaction as well as our monitoring. Cluster unavailability due to high request/response latencies: Such incidents demonstrate how even subtle performance regressions and monitoring gaps can lead to an eventual cluster meltdown. Power failures! What happens when an entire data center goes down? We experienced this first hand and it was not so pretty. and more… This talk will go over how we detected, investigated and remediated each of these issues and summarize some of the features in Kafka that we are working on that will help eliminate or mitigate such incidents in the future. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/kafkaesquedaysatlinkedinin2015-160427005751-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Presented at the inaugural Kafka summit (2016) hosted by Confluent in San Francisco Abstract: Kafka is a backbone for various data pipelines and asynchronous messaging at LinkedIn and beyond. 2015 was an exciting year at LinkedIn in that we hit a new level of scale with Kafka: we now process more than 1 trillion published messages per day across nearly 1300 brokers. We run into some interesting production issues at this scale and I will dive into some of the most critical incidents that we encountered at LinkedIn in the past year: Data loss: We have extremely stringent SLAs on latency and completeness that were violated on a few occasions. Some of these incidents were due to subtle configuration problems or even missing features. Offset resets: As of early 2015, Kafka-based offset management was still a relatively new feature and we occasionally hit offset resets. Troubleshooting these incidents turned out to be extremely tricky and resulted in various fixes in offset management/log compaction as well as our monitoring. Cluster unavailability due to high request/response latencies: Such incidents demonstrate how even subtle performance regressions and monitoring gaps can lead to an eventual cluster meltdown. Power failures! What happens when an entire data center goes down? We experienced this first hand and it was not so pretty. and more… This talk will go over how we detected, investigated and remediated each of these issues and summarize some of the features in Kafka that we are working on that will help eliminate or mitigate such incidents in the future.
Kafkaesque days at linked in in 2015 from Joel Koshy
]]>
4492 8 https://cdn.slidesharecdn.com/ss_thumbnails/kafkaesquedaysatlinkedinin2015-160427005751-thumbnail.jpg?width=120&height=120&fit=bounds presentation 000000 http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Troubleshooting Kafka's socket server: from incident to resolution /slideshow/troubleshooting-kafkas-socket-server-from-incident-to-resolution/55313682 slideshareversionoftroubleshootingkafkafromincidenttoresolution-151119214543-lva1-app6891
LinkedIn’s Kafka deployment is nearing 1300 brokers that move close to 1.3 trillion messages a day. While operating Kafka smoothly even at this scale is testament to both Kafka’s scalability and the operational expertise of LinkedIn SREs we occasionally run into some very interesting bugs at this scale. In this talk I will dive into a production issue that we recently encountered as an example of how even a subtle bug can suddenly manifest at scale and cause a near meltdown of the cluster. We will go over how we detected and responded to the situation, investigated it after the fact and summarize some lessons learned and best-practices from this incident. ]]>

LinkedIn’s Kafka deployment is nearing 1300 brokers that move close to 1.3 trillion messages a day. While operating Kafka smoothly even at this scale is testament to both Kafka’s scalability and the operational expertise of LinkedIn SREs we occasionally run into some very interesting bugs at this scale. In this talk I will dive into a production issue that we recently encountered as an example of how even a subtle bug can suddenly manifest at scale and cause a near meltdown of the cluster. We will go over how we detected and responded to the situation, investigated it after the fact and summarize some lessons learned and best-practices from this incident. ]]>
Thu, 19 Nov 2015 21:45:43 GMT /slideshow/troubleshooting-kafkas-socket-server-from-incident-to-resolution/55313682 jjkoshy@slideshare.net(jjkoshy) Troubleshooting Kafka's socket server: from incident to resolution jjkoshy LinkedIn’s Kafka deployment is nearing 1300 brokers that move close to 1.3 trillion messages a day. While operating Kafka smoothly even at this scale is testament to both Kafka’s scalability and the operational expertise of LinkedIn SREs we occasionally run into some very interesting bugs at this scale. In this talk I will dive into a production issue that we recently encountered as an example of how even a subtle bug can suddenly manifest at scale and cause a near meltdown of the cluster. We will go over how we detected and responded to the situation, investigated it after the fact and summarize some lessons learned and best-practices from this incident. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/slideshareversionoftroubleshootingkafkafromincidenttoresolution-151119214543-lva1-app6891-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> LinkedIn’s Kafka deployment is nearing 1300 brokers that move close to 1.3 trillion messages a day. While operating Kafka smoothly even at this scale is testament to both Kafka’s scalability and the operational expertise of LinkedIn SREs we occasionally run into some very interesting bugs at this scale. In this talk I will dive into a production issue that we recently encountered as an example of how even a subtle bug can suddenly manifest at scale and cause a near meltdown of the cluster. We will go over how we detected and responded to the situation, investigated it after the fact and summarize some lessons learned and best-practices from this incident.
Troubleshooting Kafka's socket server: from incident to resolution from Joel Koshy
]]>
6576 10 https://cdn.slidesharecdn.com/ss_thumbnails/slideshareversionoftroubleshootingkafkafromincidenttoresolution-151119214543-lva1-app6891-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Consumer offset management in Kafka /slideshow/offset-management-in-kafka/46247626 offsetmanagement-150324213320-conversion-gate01
Overview of consumer offset management in Kafka presented at Kafka meetup @ LinkedIn. (March 24, 2015)]]>

Overview of consumer offset management in Kafka presented at Kafka meetup @ LinkedIn. (March 24, 2015)]]>
Tue, 24 Mar 2015 21:33:20 GMT /slideshow/offset-management-in-kafka/46247626 jjkoshy@slideshare.net(jjkoshy) Consumer offset management in Kafka jjkoshy Overview of consumer offset management in Kafka presented at Kafka meetup @ LinkedIn. (March 24, 2015) <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/offsetmanagement-150324213320-conversion-gate01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Overview of consumer offset management in Kafka presented at Kafka meetup @ LinkedIn. (March 24, 2015)
Consumer offset management in Kafka from Joel Koshy
]]>
63327 12 https://cdn.slidesharecdn.com/ss_thumbnails/offsetmanagement-150324213320-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds presentation 000000 http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
https://cdn.slidesharecdn.com/profile-photo-jjkoshy-48x48.jpg?cb=1622525845 Data Infrastructure at LinkedIn https://cdn.slidesharecdn.com/ss_thumbnails/redliningkafka-170419064613-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/redlining-kafka-pipelines-75165675/75165675 Redlining Kafka Pipelines https://cdn.slidesharecdn.com/ss_thumbnails/kafkaesquedaysatlinkedinin2015-160427005751-thumbnail.jpg?width=320&height=320&fit=bounds jjkoshy/kafkaesque-days-at-linked-in-in-2015 Kafkaesque days at lin... https://cdn.slidesharecdn.com/ss_thumbnails/slideshareversionoftroubleshootingkafkafromincidenttoresolution-151119214543-lva1-app6891-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/troubleshooting-kafkas-socket-server-from-incident-to-resolution/55313682 Troubleshooting Kafka&#39;...