ComputeBitonicSort
new ComputeBitonicSort(dataBuffer : StorageBufferNode, options : Object)GPU-accelerated parallel bitonic sort for Three.js TSL compute shaders.
ComputeBitonicSort provides an efficient O(n log²n) parallel sorting algorithm that runs entirely on the GPU. It's designed for sorting large arrays of key-value pairs where deterministic, stable ordering is required. It is used in SPH, PBF, Boids, IndirectBatchedMesh, ComputeInstanceCulling.
Features
- Fully GPU-accelerated: All sorting operations execute as compute shaders
- Stable sorting: Elements with equal keys maintain their relative order via tie-breaker IDs
- Deterministic: Produces identical results across frames and devices
- Dual-backend support: Optimized path for WebGPU, fallback for WebGL
- Workgroup optimization: Uses shared memory for efficient local sorting (WebGPU)
- Bounds-safe: Handles non-power-of-two counts with sentinel values
Usage
Data Format
The data buffer must contain uvec2 elements where:
- x: The primary sort key (e.g., spatial hash, distance, priority)
- y: A unique stable ID to break ties deterministically
Elements are sorted in ascending order by (x, y) lexicographically.
Performance Considerations
- Buffer count should ideally be a power of 2 for optimal performance
- Larger workgroup sizes (64-256) typically perform better on modern GPUs
- The
compute()method executes all steps synchronously; usecomputeStep()for amortized sorting across multiple frames
Constructor Parameters
dataBufferStorageBufferNodeoptionsoptionalObjectDefault is
{}.workgroupSizeoptionalnumberThe workgroup size for compute shaders. Must be a power of 2. Larger values (64-256) typically perform better but are limited by GPU capabilities. Will be clamped to valid ranges automatically.
Default is64.
Example
import { instancedArray } from 'three/tsl';
import { ComputeBitonicSort } from './ComputeBitonicSort.js';
// Create a buffer of uvec2 pairs: (sortKey, stableId)
const count = 1024;
const dataBuffer = instancedArray( count, 'uvec2' );
// Initialize the sorter
const sorter = new ComputeBitonicSort( dataBuffer, { workgroupSize: 64 } );
// In your render loop:
sorter.compute( renderer ); // Full sort in one call
// Or for amortized sorting across frames:
sorter.computeStep( renderer ); // One step per frame
Properties
# .dataBuffer : StorageBufferNode
The storage buffer containing the data to be sorted. Elements are uvec2 pairs: (sortKey, stableId).
# .count : number
The number of elements in the data buffer.
# .dispatchSize : number
The number of thread groups to dispatch for compute operations. Each thread handles one comparison, so dispatchSize = count / 2.
# .workgroupSize : number
The number of threads per workgroup. Determines how many elements can be sorted in local shared memory. Each workgroup sorts 2 * workgroupSize elements locally.
# .localStorage : WorkgroupInfoNode|null
Workgroup-scoped shared memory buffer for local sorting operations. Holds 2 * workgroupSize elements during local sort phases. Only initialized for WebGPU backends.
# .tempBuffer : StorageBufferNode
Temporary storage buffer for ping-pong operations during global swaps. Global operations read from one buffer and write to the other to avoid read-after-write hazards.
# .infoStorage : StorageBufferNode
Storage buffer containing algorithm state: [stepType, currentSwapSpan, maxSwapSpan]. Used to coordinate multi-dispatch sorting operations.
# .swapOpCount : number
Total number of distinct swap operations (flips + disperses) in a complete sort. For n elements: (log2(n) * (log2(n) + 1)) / 2
# .stepCount : number
Total number of compute dispatches needed for a complete sort. Depends on whether local sorting is available (WebGPU) or not (WebGL).
# .readBufferName : string
Name of the buffer currently being read from ('Data' or 'Temp'). Used for ping-pong buffer management during global operations.
# .flipGlobalNodes : Object.
Compute nodes for global flip operations, keyed by source buffer name.
# .disperseGlobalNodes : Object.
Compute nodes for global disperse operations, keyed by source buffer name.
# .swapLocalFn : ComputeNode|null
Compute node for complete local sorting (flip + all disperse stages). Only available on WebGPU backends.
# .disperseLocalNodes : Object.|null
Compute nodes for local disperse operations, keyed by source buffer name. Only available on WebGPU backends.
# .setAlgoFn : ComputeNode|null
Compute node that advances the algorithm state to the next step.
# .alignFn : ComputeNode
Compute node that copies data from temp buffer back to data buffer. Used to ensure results end up in the original data buffer.
# .resetFn : ComputeNode|null
Compute node that resets algorithm state for a new sort operation.
# .currentDispatch : number
Current dispatch index within the sort operation (0 to stepCount - 1).
# .globalOpsRemaining : number
Number of remaining global operations before switching to local disperse. Used for WebGPU path state management.
# .globalOpsInSpan : number
Total global operations in the current span. Increments as we process larger swap spans.
# .initialized : boolean
Whether the sorter has been initialized with a renderer.
# .isWebGL : boolean
Whether the current backend is WebGL (lacks workgroup shared memory).
Methods
init#
init(renderer : WebGPURenderer)Initializes the sorter for the given renderer.
This method detects the backend type (WebGL vs WebGPU) and creates the
appropriate compute shaders. Must be called before sorting, but is
automatically invoked by compute() and computeStep() if needed.
Parameters
rendererWebGPURenderercomputeStep#
computeStep(renderer : WebGPURenderer)Executes a single step of the bitonic sort.
Call this method repeatedly (once per frame) to amortize sorting cost
over multiple frames. A complete sort requires stepCount calls.
The method automatically:
- Initializes on first call
- Advances through flip/disperse stages
- Handles buffer ping-ponging
- Resets state when sort completes
Parameters
rendererWebGPURenderercompute#
compute(renderer : WebGPURenderer)Executes a complete bitonic sort in a single call.
This method runs all sorting steps synchronously, which may cause
frame drops for large arrays. For real-time applications, consider
using computeStep() to spread the work across multiple frames.
Parameters
rendererWebGPURendererExample
const sorter = new ComputeBitonicSort( dataBuffer, { workgroupSize: 128 } );