本文介绍了 Apache Pulsar 的架构及其服务器无事件流处理功能的实现,重点讨论了 Go 函数的设计和使用场景,包括 ETL、数据过滤和动态路由等。Pulsar 函数作为一个轻量级的事件流框架,支持多种语言和运行时,并利用自动负载均衡机制进行函数的管理和调度。文中还提到了一些安全隐患和 Go 语言的使用限制。
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021StreamNative
?
本文详细介绍了 kop (Kafka on Pulsar) 的设计及其最新进展,重点阐述了为什么需要 kop 进行 Kafka 到 Pulsar 的迁移及其三种实现方法。文档还讨论了 kop 的基本设计,包括协议处理、认证、消息存储方式以及如何处理 Kafka Offset 与 Pulsar Message ID 的差异。此外,kop 2.9 的新特性包括对多种 Kafka 客户端版本的支持、基于租户的组协调器以及改进的鉴权功能。
Kubernetes, meaning 'helmsman' in Greek, is a production-grade system that automates the deployment, scaling, and management of containerized applications. Key components include the master node for orchestrating worker nodes, pods as the smallest deployable units, and the use of kubectl for administrative tasks. Kubernetes enhances resource utilization by permitting multiple pods to run on a single node and offers various methods to expose services and ensure application availability.
Ansible is an automation tool for deployment and management, governed under GPL 3.0 by Red Hat. It uses YAML for playbooks and modules to configure nodes over SSH, differentiating itself from other tools like Puppet and Chef by allowing any computer to act as a controller. The document provides a quick start guide, an overview of inventory and playbook structure, as well as details on roles and handlers for task management.
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022StreamNative
?
The document discusses a presentation by Ricardo Ferreira, a senior developer advocate at AWS, on the topic of Kafka-on-Pulsar (KOP) during the Pulsar Summit in San Francisco on August 18, 2022. It highlights his extensive background in distributed systems and messaging. The presentation includes a link to the code related to the topic.
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...StreamNative
?
The document outlines Klaviyo's development of an asynchronous application framework using Python and Pulsar, addressing challenges of reliability, scalability, and process ownership. It details the transition from RabbitMQ to Pulsar, highlighting the improvements in system architecture and team ownership dynamics. Future plans include managing internal adoption, scaling capabilities for peak seasons, and introducing a 'publish gateway' for enhanced performance.
Kubernetes, meaning 'helmsman' in Greek, is a production-grade system that automates the deployment, scaling, and management of containerized applications. Key components include the master node for orchestrating worker nodes, pods as the smallest deployable units, and the use of kubectl for administrative tasks. Kubernetes enhances resource utilization by permitting multiple pods to run on a single node and offers various methods to expose services and ensure application availability.
Ansible is an automation tool for deployment and management, governed under GPL 3.0 by Red Hat. It uses YAML for playbooks and modules to configure nodes over SSH, differentiating itself from other tools like Puppet and Chef by allowing any computer to act as a controller. The document provides a quick start guide, an overview of inventory and playbook structure, as well as details on roles and handlers for task management.
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022StreamNative
?
The document discusses a presentation by Ricardo Ferreira, a senior developer advocate at AWS, on the topic of Kafka-on-Pulsar (KOP) during the Pulsar Summit in San Francisco on August 18, 2022. It highlights his extensive background in distributed systems and messaging. The presentation includes a link to the code related to the topic.
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...StreamNative
?
The document outlines Klaviyo's development of an asynchronous application framework using Python and Pulsar, addressing challenges of reliability, scalability, and process ownership. It details the transition from RabbitMQ to Pulsar, highlighting the improvements in system architecture and team ownership dynamics. Future plans include managing internal adoption, scaling capabilities for peak seasons, and introducing a 'publish gateway' for enhanced performance.
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...StreamNative
?
The document discusses Toast's adoption and use of Apache Pulsar for asynchronous messaging in their microservices architecture. It describes how they built a "Pulsar Toggle" leveraging Envoy proxy to enable blue/green deployments of Pulsar consumers. The Pulsar Toggle allows consumers to be paused and resumed based on their status in the Envoy control plane, improving the reliability and usability of deploying changes to Pulsar-based services. Toast has seen increased adoption of Pulsar and benefits from its stability and scalability.
Distributed Database Design Decisions to Support High Performance Event Strea...StreamNative
?
Peter Corless, Director of Technical Advocacy at ScyllaDB, discusses the design decisions needed for distributed databases to effectively support high-performance event streaming. He emphasizes the importance of database architectures that accommodate real-time data changes, high availability, and scalability while highlighting ScyllaDB's evolution towards robust event streaming capabilities. The presentation also covers the future of database technology in the context of increasing demands for data storage and processing speeds.
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022StreamNative
?
The document discusses the Pulsar Functions and Function Mesh presented at the Pulsar Summit in San Francisco on August 18, 2022, highlighting their development, benefits, and use cases such as ETL jobs and real-time processing. It emphasizes a new SQL abstraction for simplifying Pulsar function pipelines, detailing its components and REST API functionalities for easier development and management. Future work is envisioned to include more syntax support and built-in aggregation operations.
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022StreamNative
?
This document summarizes Matteo Merli's talk on moving Apache Pulsar to a ZooKeeper-less metadata model. It discusses how Pulsar currently uses ZooKeeper for metadata storage but faces scalability issues. The talk outlines PIP-45, a plan to introduce a pluggable metadata backend into Pulsar to replace the direct ZooKeeper usage. This would allow alternative storage options like Etcd and improve scalability. It also discusses successes already achieved in Pulsar 2.10 by abstracting the metadata access and future goals around scaling to support millions of topics.
The document discusses validating Apache Pulsar's behavior under failure conditions, emphasizing its high availability, strong message ordering, and low latency guarantees. It outlines expectations regarding service availability, message delivery, and methods for testing and analysis, including chaos testing and resilience engineering. The importance of well-defined test plans and effective tooling for monitoring and validation in real-world production environments is also highlighted.
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...StreamNative
?
The document presents a detailed overview of integrating Apache Flink with Pulsar for creating data pipelines, addressing stream processing challenges, and showcasing Flink SQL's capabilities. It includes demos illustrating the functionality of Flink SQL for real-time processing and the advantages of a unified data stack. The document also outlines community involvement opportunities and additional resources for further learning.
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022StreamNative
?
This document summarizes the message redelivery process in Apache Pulsar. It discusses how messages are redelivered when producing or consuming messages. When producing, messages are redelivered if the broker does not acknowledge receipt in a timely manner. When consuming, messages are redelivered under three circumstances: if the acknowledgment times out, if messages are negatively acknowledged, or if delivery is delayed. The document provides details on the commands and objects involved in establishing connections, publishing, consuming, acknowledging, and redelivering messages between Pulsar clients and brokers.
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...StreamNative
?
The document discusses a presentation on unlocking the power of lakehouse architectures using Apache Pulsar and Apache Hudi, presented at the Pulsar Summit in San Francisco. It covers the integration of these technologies, detailing lakehouse concepts, the roles of Apache Hudi in data management, and the advantages of using Pulsar for real-time data streaming. The presentation also includes technical demonstrations and potential use cases, highlighting the benefits of combining these platforms for efficient data processing and analytics.
Understanding Broker Load Balancing - Pulsar Summit SF 2022StreamNative
?
The document discusses broker load balancing in Apache Pulsar, detailing its mechanisms and logic for efficiently distributing topic messages across brokers. It outlines strategies for dynamic topic rebalancing, bundle management, and performance metrics, emphasizing the separation of serving and persistence layers. The presentation also covers operational tips and future improvements for better load balancing performance.
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...StreamNative
?
The document discusses the development of an asynchronous application framework using Python and Pulsar, highlighting challenges in reliability, scalability, and team ownership with previous architectures like RabbitMQ. It presents a solution involving a new framework that enables better handling of workloads, schema management, and community collaboration while addressing issues like outages and process inefficiencies. Future plans include scaling the framework for high-demand periods and enhancing features like online schema changes and complex workflows.
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022StreamNative
?
The document discusses Yahoo's implementation of Apache Pulsar as a messaging system in cloud environments, highlighting its features such as multi-tenancy, cost-effectiveness, and high performance. It addresses the challenges of cloud messaging systems, including data security and availability, and emphasizes the secure connectivity and deployment benefits offered by Pulsar. Additionally, it summarizes the advantages of using Pulsar on public and hybrid clouds, including low latency and geo-replication capabilities.
Event-Driven Applications Done Right - Pulsar Summit SF 2022StreamNative
?
This document contains the agenda for a Pulsar Summit keynote on event-driven applications. The keynote will feature talks from Sijie Guo, Co-Founder and CEO of StreamNative, and Matteo Merli, CTO of StreamNative. Guo will discuss the growth of the Pulsar community and platform. Merli will cover the evolution of event-driven applications and the five fundamentals of modern event-driven architecture: data abstraction, API, primitives, processing semantics, and tools. The keynote aims to explain how Pulsar solves challenges in building complex event-driven applications.
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022StreamNative
?
Ignacio Alvarez from Mercado Libre discussed the implementation and benefits of Pulsar at scale, sharing details on handling 200 million requests per minute across 1,000 instances. He emphasized the importance of abstractions and operational tooling for flexibility, as well as the challenges faced in achieving latency and scalability. Key learnings included the effectiveness of their event-driven ecosystem and the need to address operational issues and design flaws.
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022StreamNative
?
The keynote at the Pulsar Summit by Xiang Fu discussed the concept of data democracy and the evolution towards user-facing analytics, highlighting the advancements in streaming technologies like Apache Pinot. Pinot enables real-time ingestion and query processing, significantly improving analytics efficiency and reducing latency. The presentation emphasized the growing demand for high-quality insights delivered to external users to fully democratize data access.
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022StreamNative
?
The document outlines a keynote presentation by Byron Ellis on Apache Beam and its integration with Pulsar, highlighting their capabilities for stream processing. Apache Beam is described as a unified platform for batch and streaming data processing, with essential I/O connectors integrated into the project. The presentation also discusses the evolving development of a Pulsar Beam connector to enhance the platform's functionality.
Welcome and Opening Remarks - Pulsar Summit SF 2022StreamNative
?
The document outlines the agenda for the Pulsar Summit, including keynotes, sessions, and training opportunities. It features Carolyn King as the VP of Marketing and highlights events like breakfast, lunch, and a happy hour. Additional details include information on training sessions for developers and operators, as well as social media engagement with the hashtag #pulsarsummitsf.
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...StreamNative
?
The document outlines the architecture and features of Milvus, an open-source vector database designed for efficient similarity searches on dense vectors. It emphasizes the importance of unstructured data processing, the use of Apache Pulsar for log storage, and the database's scalability and ease of use. Real-world use cases include applications in customer service chatbots and face recognition, showcasing Milvus's capacity for real-time data ingestion and system extensibility.
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...StreamNative
?
The document outlines the MQTT on Pulsar (MOP) plugin, which allows MQTT clients to interact seamlessly with Apache Pulsar without modifying existing client code. It details the protocol handler, installation process, supported features such as various MQTT versions, authentication methods, and metrics management. Future work includes performance testing and additional features like QoS 2 support and a management API.