WebGPU represents a major advance in web graphics technology, allowing web pages to take advantage of your device’s GPU for enhanced rendering capabilities. This is a practical upgrade that builds on the foundation laid by WebGL and enhances web graphics performance.
WebGPU was first introduced in Google Chrome in April 2023, but has gradually been extended to other browsers such as Safari and Firefox. It’s still in development, but the potential is clear.
WebGPU allows developers to create stunning 3D graphics on an HTML canvas and efficiently perform GPU calculations. It comes with its own language WGSL to simplify the development process.
This tutorial goes straight to a very specific WebGPU technique: using computational shaders for image effects. If you want to get a solid understanding of WebGPU first, we highly recommend reading the introductory tutorial, My First WebGPU App and WebGPU Basics, before continuing with this tutorial.
If you want to learn more about reaction-diffusion algorithms, check out these resources: Karl Sims’ Reaction-Diffusion Tutorial and Thecoding Train’s Reaction-Diffusion Algorithms in p5.js.
For now, the demo only runs in Chrome, so here’s a short video of what the demo looks like.
Browser support:
- chromiumSupported from version 113 onwards
- Firefoxnot supported
- internet explorernot supported
- safarinot supported
- operanot supported
overview
In this tutorial, we will explore important aspects of WebGPU that leverage compute shaders for image effects. Coming from a WebGL background, it was quite difficult for me to understand how to efficiently use compute shaders for image effects that involve convolution with filter kernels (such as Gaussian blur). Therefore, this tutorial focuses on one way to use compute shaders for such purposes. The method I present is based on the image blur sample from the excellent WebGPU samples website.
Program structure
This tutorial details only a few interesting parts of the demo application. However, we hope you can take advantage of inline comments to help you understand the source code.
The main components are two WebGPU pipelines.
- A computational pipeline that performs multiple iterations of the reaction-diffusion algorithm (
js/rd-compute.js
andjs/shader/rd-compute-shader.js
). - A render pipeline (which takes the results of the compute pipeline and creates the final composition by rendering a full-screen triangle)
js/composite.js
andjs/shader/composite-shader.js
).
WebGPU is a very chatty API, so to make it a little easier to use, we’re using the webgpu-utils library by Gregg Tavares. Additionally, I’ve included the float16 library by Kenta Moriuchi, which is used to create and update storage textures in the compute pipeline.
computing workflow
A common way to run reaction-diffusion simulations on the GPU is to use something called “texture ping pong.” This includes creating two of his textures. One texture holds the current state of the simulation being read, and the other stores the results of the current iteration. After each iteration, the textures are swapped.
This method can also be implemented in WebGL using a fragment shader and framebuffer. However, WebGPU can accomplish the same thing using compute shaders and storage textures as buffers. The advantage of this is that you can write directly to any pixel in the texture you want. Compute shaders also provide performance benefits.
Initialization
The first thing to do is initialize the pipeline with all required layout descriptors. Additionally, all buffers, textures, and binding groups must be set up. Using the webgpu-utils library will save you a lot of work here.
WebGPU does not allow you to change the size of a buffer or texture after it is created. Therefore, it is necessary to distinguish between buffers whose size does not change (e.g. uniforms) and buffers whose size changes in certain situations (e.g. textures when the canvas is resized). In the latter case, you need a way to recreate them if necessary and discard the old ones.
All textures used for reaction-diffusion simulations are a fraction of the size of the canvas (for example, one quarter of the canvas size). Processing fewer pixels frees up computing resources for more iterations. Therefore, faster simulations are possible with relatively less visual loss.
In addition to the two textures included in Texture Ping Pong, the demo also has a third texture, which we call the Seed Texture. This texture contains image data for an HTML canvas with clock text. The seed texture is used as a type of influence map in a reaction-diffusion simulation to visualize the clock text. If the WebGPU canvas is resized, this texture and the corresponding HTML canvas must also be recreated/resized.
Run the simulation
Once all the necessary initialization is complete, you can focus on actually running the reaction-diffusion simulation using the compute shader. First, let’s review some general aspects of compute shaders.
Each invocation of a compute shader processes many threads in parallel. The number of threads is defined by the compute shader’s workgroup size. The number of shader invocations is defined by the dispatch size (total number of threads = workgroup size * dispatch size).
These size values are specified in three dimensions. So a compute shader that processes 64 threads in parallel looks like this:
@compute @workgroup_size(8, 8, 1) fn compute()
Running this shader 256 times will create 16,384 threads and require a dispatch size of:
pass.dispatchWorkgroups(16, 16, 1);
Reaction-diffusion simulations must address every pixel of the texture. One way to achieve this is to use a workgroup size of 1 and a dispatch size equal to the total number of pixels (mimicking a fragment shader in some way). However, this is not very performant since multiple threads in a workgroup are faster than individual dispatch.
On the other hand, some might suggest using a workgroup size equal to the number of pixels and calling it only once (dispatch size = 1). However, this is not possible because the maximum workgroup size is limited. The general advice for WebGPU is to choose a workgroup size of 64. This requires dividing the number of pixels in the texture into blocks of workgroup size (= 64 pixels) and dispatching the workgroup often enough to cover the entire texture. This rarely works exactly, but our shader can handle it.
Now you have a constant value for the workgroup size and can find the appropriate dispatch size to run the simulation. But there are other things you can optimize.
Pixels per thread
We introduce tile sizes to allow each workgroup to cover a larger area (more pixels). Tile size defines the number of pixels each individual thread processes. This requires the use of nested for loops within the shader, so you may need to keep the tile size very small (e.g. 2×2).
pixel cache
A key step to perform reaction-diffusion simulations is convolution with the Laplacian kernel, which is a 3×3 matrix. Therefore, for each pixel we process, we need to read all 9 pixels covered by the kernel to perform the calculation. Overlapping kernels between pixels results in many redundant texture reads.
Fortunately, compute shaders allow you to share memory between threads. So you can create what I call a pixel cache. The idea (from the image blur sample) is that each thread reads the pixels of its tile and writes them to the cache. Once all threads in the workgroup have cached their pixels (workgroup barriers ensure this), actual processing only needs to use the prefetched pixels from the cache. Therefore, no further texture reading is required. The structure of the compute function looks like this:
// the pixel cache shared accross all threads of the workgroup
var<workgroup> cache: array<array<vec4f, 128>, 128>;
@compute @workgroup_size(8, 8, 1)
fn compute_main(/* ...builtin variables */ )
// add the pixels of this thread's tiles to the cache
for (var c=0u; c<2; c++)
for (var r=0u; r<2; r++)
// ... calculate the pixel coords from the builtin variables
// store the pixel value in the cache
cache[y][x] = value;
// don't continue until all threads have reached this point
workgroupBarrier();
// process every pixel of this threads tile
for (var c=0u; c<2; c++)
for (var r=0u; r<2; r++)
// ...perform reaction-diffusion algorithm
textureStore(/* ... */);
}
But there is one more tricky aspect that you need to be aware of. The problem is that kernel convolution requires reading more pixels than it ultimately processes. You can also expand the pixel cache size. However, the size of memory shared by workgroup threads is limited to 16,384 bytes. Therefore, you need to reduce the dispatch size to: (kernelSize - 1)/2
on each side. I hope the following diagram makes these steps more clear.
UV distortion
One of the disadvantages of using compute shaders compared to fragment shader solutions is that you cannot use samplers for storage textures within compute shaders (you can only read integer pixel coordinates). If you want to animate the simulation by moving through texture space (that is, distorting the UV coordinates by a fraction), you will need to do the sampling yourself.
One way to deal with this is to use a manual bilinear sampling function. The sampling function used in the demo is based on the one shown here, with some adjustments made for use within a compute shader. This allows you to sample pixel values below the decimal point.
fn texture2D_bilinear(t: texture_2d<f32>, coord: vec2f, dims: vec2u) -> vec4f
let f: vec2f = fract(coord);
let sample: vec2u = vec2u(coord + (0.5 - f));
let tl: vec4f = textureLoad(t, clamp(sample, vec2u(1, 1), dims), 0);
let tr: vec4f = textureLoad(t, clamp(sample + vec2u(1, 0), vec2u(1, 1), dims), 0);
let bl: vec4f = textureLoad(t, clamp(sample + vec2u(0, 1), vec2u(1, 1), dims), 0);
let br: vec4f = textureLoad(t, clamp(sample + vec2u(1, 1), vec2u(1, 1), dims), 0);
let tA: vec4f = mix(tl, tr, f.x);
let tB: vec4f = mix(bl, br, f.x);
return mix(tA, tB, f.y);
The pulsating movement from the center of the simulation that can be seen in the demo was created in this way.
parameter animation
One of the things I really like about reaction-diffusion is that you can get different patterns by changing just a few parameters. Animating these changes over time or in response to user interaction can create some really interesting effects. For example, in the demo, some parameters change depending on the distance from the center and the speed of the pointer.
composition rendering
Once the reaction-diffusion simulation is complete, all you need to do is draw the results on the screen. This is the job of the composition render pipeline.
I would like to provide a quick overview of the steps involved in the demo application. However, these largely depend on the style you want to achieve. The main adjustments made during the synthesis pass of the demo were:
- Bulge Distortion: Bulge distortion is applied to the UV coordinates (based on this ShaderToy code) before sampling the texture of the reaction-diffusion result. This adds a sense of depth to the scene.
- Color: Color palette is applied (from Inigo Quilez)
- Emboss Filter: Add volume to your “veins” with a simple emboss effect.
- Fake Rainbow: This subtle effect is based on a different color palette, but is applied to the negative space of the embossed result. The fake rainbow colors make the scene look a little more vibrant.
- Vignette: Vignette overlay is used to darken edges.
conclusion
As far as performance is concerned, I have created a very basic performance test between the fragment variant and the compute variant (including bilinear sampling). The compute variant is much faster, at least on my device. Performance tests are located in a separate folder within the repository. main.js
to compare fragments and compute (GPU time is timestamp-query
API).
I’m still a beginner when it comes to WebGPU development. If you see anything that could be improved or incorrect in my tutorial, I’d be happy to let you know.
Unfortunately, I was not able to go into all the details and could only very superficially explain the idea behind the use of compute shaders to perform reaction-diffusion simulations. However, I hope you enjoy this tutorial and find it useful in your own projects. thank you for reading!
Inspirational Website Roundup: Webflow Special #3