Skip to content

GPU-instanced particle rendering #1404

@obiot

Description

@obiot

Summary

Render particles using GPU instancing instead of individual `addQuad()` calls. All particle state (position, velocity, scale, rotation, alpha, tint) is uploaded to a GPU buffer and rendered in a single draw call. Particle physics (gravity, fade, spin) is computed in the vertex shader — zero per-particle CPU cost per frame.

Current State

  • `ParticleEmitter` creates individual `Particle` renderables
  • Each particle is drawn via `addQuad()` in the quad batcher — 4 vertices pushed per particle per frame
  • CPU handles all per-particle updates: position, velocity, gravity, rotation, alpha fade, scaling
  • Performance caps around 500-1000 visible particles before frame drops
  • Each particle's `update()` and `draw()` is called individually

Proposed Architecture

Instance buffer

A single vertex buffer containing per-instance data for all active particles:

  • `vec2 position` — world position
  • `float rotation` — current angle
  • `float scale` — current size
  • `float alpha` — current opacity
  • `uint32 tint` — color tint (packed ARGB)
  • `float life` — remaining lifetime (0-1 normalized)

Updated once per frame via `bufferSubData()` — one upload for all particles.

Vertex shader

Computes the final quad corners for each instance:

  • Applies position, rotation, scale from the instance buffer
  • Optionally computes physics on GPU (velocity, gravity, fade) if spawn parameters are uploaded instead of per-frame state

Single draw call

`gl.drawArraysInstanced(gl.TRIANGLE_STRIP, 0, 4, particleCount)` — one call renders all particles regardless of count.

Integration

  • New `GPUParticleEmitter` class (or opt-in mode on existing `ParticleEmitter`)
  • Shares the same configuration API: `minLife`, `maxLife`, `speed`, `gravity`, `wind`, etc.
  • Falls back to existing CPU particle system for Canvas renderer
  • Requires WebGL2 for `drawArraysInstanced` (or use ANGLE_instanced_arrays extension for WebGL1)

Performance expectations

  • CPU: near-zero per-frame cost — no per-particle update loop, no per-particle vertex pushing
  • GPU: single draw call, single texture bind, hardware instancing handles the rest
  • Capacity: 50,000-100,000+ particles vs current ~500-1000 cap
  • Memory: one instance buffer (~24 bytes per particle) — 100K particles = ~2.4MB

API Sketch

// opt-in GPU mode on existing emitter
const emitter = new ParticleEmitter(x, y, {
    image: texture,
    totalParticles: 10000,
    gpu: true, // enable GPU instancing
    // same config as before...
});

// or a dedicated class
const emitter = new GPUParticleEmitter(x, y, {
    image: texture,
    totalParticles: 50000,
    speed: { min: 1, max: 5 },
    gravity: 0.5,
    wind: 0.1,
});

Challenges

  • Particle spawn/despawn: need a ring buffer or free-list on the CPU side to manage active particles without reallocating
  • Per-particle animation: frame-based sprite animation would need a frame index in the instance data + atlas UV lookup in the shader
  • Sorting: transparent particles may need back-to-front sorting for correct blending — can be done on CPU before upload or approximated
  • WebGL1 fallback: `ANGLE_instanced_arrays` extension provides instancing on WebGL1 but not universally available

References

  • `ParticleEmitter`: `src/particles/emitter.js`
  • `Particle`: `src/particles/particle.js`
  • `QuadBatcher`: `src/video/webgl/batchers/quad_batcher.js`
  • WebGL2 instancing: `gl.drawArraysInstanced()`
  • WebGL1 extension: `ANGLE_instanced_arrays`

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions