Summary
Render particles using GPU instancing instead of individual `addQuad()` calls. All particle state (position, velocity, scale, rotation, alpha, tint) is uploaded to a GPU buffer and rendered in a single draw call. Particle physics (gravity, fade, spin) is computed in the vertex shader — zero per-particle CPU cost per frame.
Current State
- `ParticleEmitter` creates individual `Particle` renderables
- Each particle is drawn via `addQuad()` in the quad batcher — 4 vertices pushed per particle per frame
- CPU handles all per-particle updates: position, velocity, gravity, rotation, alpha fade, scaling
- Performance caps around 500-1000 visible particles before frame drops
- Each particle's `update()` and `draw()` is called individually
Proposed Architecture
Instance buffer
A single vertex buffer containing per-instance data for all active particles:
- `vec2 position` — world position
- `float rotation` — current angle
- `float scale` — current size
- `float alpha` — current opacity
- `uint32 tint` — color tint (packed ARGB)
- `float life` — remaining lifetime (0-1 normalized)
Updated once per frame via `bufferSubData()` — one upload for all particles.
Vertex shader
Computes the final quad corners for each instance:
- Applies position, rotation, scale from the instance buffer
- Optionally computes physics on GPU (velocity, gravity, fade) if spawn parameters are uploaded instead of per-frame state
Single draw call
`gl.drawArraysInstanced(gl.TRIANGLE_STRIP, 0, 4, particleCount)` — one call renders all particles regardless of count.
Integration
- New `GPUParticleEmitter` class (or opt-in mode on existing `ParticleEmitter`)
- Shares the same configuration API: `minLife`, `maxLife`, `speed`, `gravity`, `wind`, etc.
- Falls back to existing CPU particle system for Canvas renderer
- Requires WebGL2 for `drawArraysInstanced` (or use ANGLE_instanced_arrays extension for WebGL1)
Performance expectations
- CPU: near-zero per-frame cost — no per-particle update loop, no per-particle vertex pushing
- GPU: single draw call, single texture bind, hardware instancing handles the rest
- Capacity: 50,000-100,000+ particles vs current ~500-1000 cap
- Memory: one instance buffer (~24 bytes per particle) — 100K particles = ~2.4MB
API Sketch
// opt-in GPU mode on existing emitter
const emitter = new ParticleEmitter(x, y, {
image: texture,
totalParticles: 10000,
gpu: true, // enable GPU instancing
// same config as before...
});
// or a dedicated class
const emitter = new GPUParticleEmitter(x, y, {
image: texture,
totalParticles: 50000,
speed: { min: 1, max: 5 },
gravity: 0.5,
wind: 0.1,
});
Challenges
- Particle spawn/despawn: need a ring buffer or free-list on the CPU side to manage active particles without reallocating
- Per-particle animation: frame-based sprite animation would need a frame index in the instance data + atlas UV lookup in the shader
- Sorting: transparent particles may need back-to-front sorting for correct blending — can be done on CPU before upload or approximated
- WebGL1 fallback: `ANGLE_instanced_arrays` extension provides instancing on WebGL1 but not universally available
References
- `ParticleEmitter`: `src/particles/emitter.js`
- `Particle`: `src/particles/particle.js`
- `QuadBatcher`: `src/video/webgl/batchers/quad_batcher.js`
- WebGL2 instancing: `gl.drawArraysInstanced()`
- WebGL1 extension: `ANGLE_instanced_arrays`
Summary
Render particles using GPU instancing instead of individual `addQuad()` calls. All particle state (position, velocity, scale, rotation, alpha, tint) is uploaded to a GPU buffer and rendered in a single draw call. Particle physics (gravity, fade, spin) is computed in the vertex shader — zero per-particle CPU cost per frame.
Current State
Proposed Architecture
Instance buffer
A single vertex buffer containing per-instance data for all active particles:
Updated once per frame via `bufferSubData()` — one upload for all particles.
Vertex shader
Computes the final quad corners for each instance:
Single draw call
`gl.drawArraysInstanced(gl.TRIANGLE_STRIP, 0, 4, particleCount)` — one call renders all particles regardless of count.
Integration
Performance expectations
API Sketch
Challenges
References