This document discusses using HyperLogLog (HLL) to estimate cardinality for count(distinct) queries in PostgreSQL.
HLL is an algorithm that uses constant memory to estimate the number of unique elements in a large set. It works by mapping elements to registers in a bitmap and tracking the number of leading zeros in each hash value. The harmonic mean of these counts is used to estimate cardinality.
PG-Strom implements HLL in PostgreSQL to enable fast count(distinct) queries on GPUs. On a table with 60 million rows and 87GB in size, HLL estimated the distinct count within 0.3% accuracy in just 9 seconds, over 40x faster than the regular count(distinct).
PG-Strom is an extension of PostgreSQL that utilizes GPUs and NVMe SSDs to enable terabyte-scale data processing and in-database analytics. It features SSD-to-GPU Direct SQL, which loads data directly from NVMe SSDs to GPUs using RDMA, bypassing CPU and RAM. This improves query performance by reducing I/O traffic over the PCIe bus. PG-Strom also uses Apache Arrow columnar storage format to further boost performance by transferring only referenced columns and enabling vector processing on GPUs. Benchmark results show PG-Strom can process over a billion rows per second on a simple 1U server configuration with an NVIDIA GPU and multiple NVMe SSDs.
This document provides an introduction to HeteroDB, Inc. and its chief architect, KaiGai Kohei. It discusses PG-Strom, an open source PostgreSQL extension developed by HeteroDB for high performance data processing using heterogeneous architectures like GPUs. PG-Strom uses techniques like SSD-to-GPU direct data transfer and a columnar data store to accelerate analytics and reporting workloads on terabyte-scale log data using GPUs and NVMe SSDs. Benchmark results show PG-Strom can process terabyte workloads at throughput nearing the hardware limit of the storage and network infrastructure.
4. 使ってみる(2)
$ make
Please use `make <target>' where <target> is one of
html to make standalone HTML files
dirhtml to make HTML files named index.html in directories
singlehtml to make a single large HTML file
pickle to make pickle files
json to make JSON files
htmlhelp to make HTML files and a HTML help project
qthelp to make HTML files and a qthelp project
devhelp to make HTML files and a Devhelp project
epub to make an epub
latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter
latexpdf to make LaTeX files and run them through pdflatex
text to make text files
man to make manual pages