Embulk, an open-source plugin-based parallel bulk data loaderSadayuki Furuhashi
?
The document discusses Embulk, an open-source parallel bulk data loader that uses plugins. Embulk loads records from various sources ("A") to various targets ("B") using plugins for different source and target types. This makes the painful process of data integration more relaxed. Embulk executes in parallel, validates data, handles errors, behaves deterministically, and allows for idempotent retries of bulk loads.
This document discusses messaging queues and platforms. It begins with an introduction to messaging queues and their core components. It then provides a table comparing 8 popular open source messaging platforms: Apache Kafka, ActiveMQ, RabbitMQ, NATS, NSQ, Redis, ZeroMQ, and Nanomsg. The document discusses using Apache Kafka for streaming and integration with Google Pub/Sub, Dataflow, and BigQuery. It also covers benchmark testing of these platforms, comparing throughput and latency. Finally, it emphasizes that messaging queues can help applications by allowing producers and consumers to communicate asynchronously.
Jakarta EE 最前線 - Jakarta EEの現在、ロードマップなどオラクルエンジニア通信
?
This document provides an overview of Jakarta EE 8 and the future plans for Jakarta EE 9 and beyond. It discusses how Java EE was donated to the Eclipse Foundation and renamed to Jakarta EE. It outlines the release timeline and key features of Jakarta EE 8. It also summarizes the goals and proposed changes for Jakarta EE 9, including renaming javax packages to jakarta and tooling updates. Finally, it speculates on potential new specifications that could be added to Jakarta EE in the future.
There are quite many talks about Quarkus, explaining basic development mechanics and advertising extremely small memory footprint and slim deployment artifacts. However in all those talks audience has just to "believe", almost nobody explains, how does Quarkus achieve it, what tools and approaches work under the hood. I'm going to provide a balanced explanation, giving knowledge of how it works behind the scenes, but not going into long complex theoretical stories, which make people sleep during the talk.
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
?
The document discusses optimizations made to Spark SQL performance when working with Parquet files at ByteDance. It describes how Spark originally reads Parquet files and identifies two main areas for optimization: Parquet filter pushdown and the Parquet reader. For filter pushdown, sorting columns improved statistics and reduced data reads by 30%. For the reader, splitting it to first filter then read other columns prevented loading unnecessary data. These changes improved Spark SQL performance at ByteDance without changing jobs.
This document discusses messaging queues and platforms. It begins with an introduction to messaging queues and their core components. It then provides a table comparing 8 popular open source messaging platforms: Apache Kafka, ActiveMQ, RabbitMQ, NATS, NSQ, Redis, ZeroMQ, and Nanomsg. The document discusses using Apache Kafka for streaming and integration with Google Pub/Sub, Dataflow, and BigQuery. It also covers benchmark testing of these platforms, comparing throughput and latency. Finally, it emphasizes that messaging queues can help applications by allowing producers and consumers to communicate asynchronously.
Jakarta EE 最前線 - Jakarta EEの現在、ロードマップなどオラクルエンジニア通信
?
This document provides an overview of Jakarta EE 8 and the future plans for Jakarta EE 9 and beyond. It discusses how Java EE was donated to the Eclipse Foundation and renamed to Jakarta EE. It outlines the release timeline and key features of Jakarta EE 8. It also summarizes the goals and proposed changes for Jakarta EE 9, including renaming javax packages to jakarta and tooling updates. Finally, it speculates on potential new specifications that could be added to Jakarta EE in the future.
There are quite many talks about Quarkus, explaining basic development mechanics and advertising extremely small memory footprint and slim deployment artifacts. However in all those talks audience has just to "believe", almost nobody explains, how does Quarkus achieve it, what tools and approaches work under the hood. I'm going to provide a balanced explanation, giving knowledge of how it works behind the scenes, but not going into long complex theoretical stories, which make people sleep during the talk.
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
?
The document discusses optimizations made to Spark SQL performance when working with Parquet files at ByteDance. It describes how Spark originally reads Parquet files and identifies two main areas for optimization: Parquet filter pushdown and the Parquet reader. For filter pushdown, sorting columns improved statistics and reduced data reads by 30%. For the reader, splitting it to first filter then read other columns prevented loading unnecessary data. These changes improved Spark SQL performance at ByteDance without changing jobs.