The document describes a quicksort algorithm adapted for graphics processors. It discusses:
1) The system model of GPUs, including their many-core architecture, use of shared memory, and thread block organization.
2) A two-phase quicksort algorithm that first partitions the data across thread blocks, then has each block sort its portion independently using shared memory for synchronization.
3) Experiments testing the algorithm on different data distributions, GPU hardware, and comparisons to other sorting methods, finding the quicksort approach competitive or faster in many cases.
1 of 67
Downloaded 149 times
More Related Content
GPU-Quicksort
1. A Practical Quicksort Algorithmfor Graphics ProcessorsDaniel Cederman and Philippas TsigasDistributed Computing and SystemsChalmers University of Technology
4. CPU vs. GPUNormal processorsGraphics processorsMulti-coreLarge cacheFew threadsSpeculative executionMany-coreSmall cacheWide and fast memory busThousands of threads hides memory latency
67. Minimal, manual cacheUsed only for prefix sum and bitonic32-word SIMD instructionMain part executes same instructionsCoalesced memory accessAll reads coalescedNo block synchronizationOnly required in first phaseExpensive synchronization primitivesTwo passes amortizes costConclusions
68. Quicksort is a viable sorting method for graphics processors and can be implemented in a data parallel wayIt is competitiveConclusions