### Description

///////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////
/////// TODO WE NEED TO TALK ABOUT COMPUTE PASSES /////////////
/////// THE USAGE OF COMPUTE PIPELINE PRESENTED HERE IS OBSOLETE //
///////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////

The new WebGPU compute API allows offloading computations from the CPU to the GPU using WebGPU compute shaders
through the new `vtkWebGPUComputePipeline` and `vtkWebGPUComputePass` classes.

The compute API is used through the `vtkWebGPUComputePipeline` and `vtkWebGPUComputePass` objects. They are used
to set the inputs and outputs to the compute shader, execute it and get the results back to the CPU.

### Compute pipeline - Usage outside a rendering pipeline

The basic usage of a compute pipeline outside a rendering pipeline goes as follows:
 *  - Create a vtkWebGPUComputePipeline
 *  - Obtain a compute pass from this compute pipeline
 *  - Set its shader source code
 *  - Set its shader entry point
 *  - Create the vtkWebGPUComputeBuffers that contain the data manipulated by the compute pass
 *  - Add the buffers to the compute pass
 *  - Set the number of workgroups
 *  - Dispatch the compute shader
 *  - ReadBufferFromGPU() to make results from the GPU available to the CPU
 *  - Update() the pipeline so that the compute pass is executed

Let's now review these steps in more details:

All the major setup steps that you will do are done to the `vtkWebGPUComputePass` object. The `vtkWebGPUComputePass`
is the core of the compute shader. `vtkWebGPUComputePipeline` only exist to allow multiple passes to execute one after
the other and to share resources.

A `vtkWebGPUComputePass` object can be obtained by calling `vtkWebGPUComputePipeline::CreateComputePass()`. After obtaining
a `vtkWebGPUComputePass` object, you must first indicate the source code of the compute shader
to use. This is done through the `SetShaderSource()` method. This method expects raw WGSL shader code. On top
of setting the shader source code, the entry point (function name) of the compute shader must also be given via
`SetShaderEntryPoint()`.

The next (although the order of these compute pass set-up operations does not matter) step is to set the input
and output buffers that will be used by the compute shader. This is done through the `AddBuffer()` method.
This method expect a `vtkWebGPUComputeBuffer` object as input. A `vtkWebGPUComputeBuffer` object represents a set of parameters
that will be used to create the buffer on the GPU and upload data to it.

To be valid, all `vtkWebGPUComputeBuffer` objects need to be given a group (via `SetGroup()`), a binding (via `SetBinding()`)
and a mode (modes are described by `vtkWebGPUComputeBuffer::BufferMode`) using `SetMode()`.
All groups/bindings combination must be unique as these two values define the "location" of your buffer in the shader (they
directly relate the the `@group` and `@binding` WGSL annotations.). It then makes sense that no buffer can have the same
group and binding (the same location) as another buffer (or texture view).

You also need to indicate what data to use for input buffers using `SetData()`.
`SetData()` accepts both `vtkDataArray` and `std::vector` data containers.
Note that the data isn't uploaded (copied to the GPU) until the buffer is actually added to the compute pass
by `vtkWebGPUComputePass::AddBuffer()` so the data needs to stay valid (i.e. not destroyed) on the CPU until
`AddBuffer()` is called.

For buffers used as outputs of the compute shader, you only need to specify their size using `SetByteSize()`.
For input buffers, their size is automatically determined when `SetData()` is called. `SetByteSize()` should not be called on input buffers.

Once a buffer is set up, it can be added to the pipeline using `AddBuffer()`. This uploads the data to the GPU and the
CPU-side data can safey be destroyed.

After the shader source and its entry point as well as the buffers have been set on the pipeline, one more step needs to be
taken care of before you can dispatch the compute shader: setting the number of workgroups using `SetWorkgroups(X, Y, Z)`. The number of
workgroups you need to set depends on the size of your data and the `@compute @workgroup_size(WX, WY, WZ)` annotation
in your shader. As a simple rule, the result `X * Y * Z * WX * WY * WZ` must be at least as big as the length of your
input data. You can have a look at the compute passes tests under `Rendering/WebGPU/Testing` for more details and implementation
examples.

With the number of workgroups set, you can then `Dispatch()` the compute pipeline. Note that `Dispatch()` does not actually
execute the compute shader. Calling `Dispatch()` only produces a "command" that will be executed when the `Update()` method
of the `vtkWebGPUComputePipeline` this compute pass belongs to will be called (see below).

After the compute pass dispatch command has been emitted, you probably want to read back the results of the compute shader.
Because the compute shader executed on the GPU, the results are still on the GPU and it is impossible for the CPU to directly read
them. The results first need to be copied back to the CPU using the `ReadBufferFromGPU()` method.
This method takes a buffer index as its first argument (returned by the `AddBuffer()` method). This is the index of the buffer that is
going to be copied to the CPU.
The second argument is a function that will be called after the buffer has been successfully sent from the GPU.
This is typically the function that will copy the data into a CPU-side buffer (`std::vector` or equivalent). That `std::vector`
(or equivalent) can be passed through the third argument of `ReadBufferFromGPU` which is a pointer to some data that you want
accessible from the callback (second argument). This third argument can also be a structure if you need more than one argument
to be available in the callback.

The very last step is simple: you need to call the `Update()` of the `vtkWebGPUComputePipeline` to actually execute the GPU commands
(queued by the `Dispatch()` and `ReadBufferFromGPU()` calls for example).

Here's an example of how this all can look like in practice:

```c++
  std::vector<float> inputValues;
  // ...
  // Fill the input vector with data
  // ...

  // Creating the input buffer to the compute shader
  vtkNew<vtkWebGPUComputeBuffer> inputBuffer;
  inputBuffer->SetGroup(0);
  inputBuffer->SetBinding(0);
  inputBuffer->SetMode(vtkWebGPUComputeBuffer::BufferMode::INPUT_COMPUTE_STORAGE);
  inputBuffer->SetData(inputValues);

  // Creating the output buffer of the compute shader
  vtkNew<vtkWebGPUComputeBuffer> outputBuffer;
  outputBuffer->SetGroup(0);
  outputBuffer->SetBinding(1);
  outputBuffer->SetMode(vtkWebGPUComputeBuffer::BufferMode::OUTPUT_COMPUTE_STORAGE);
  outputBuffer->SetByteSize(inputValues.size() * sizeof(OutputDataType));

  // Creating the compute pipeline
  vtkNew<vtkWebGPUComputePipeline> computePipeline;
  vktSmartPointer<vtkWebGPUComputePass> computePass = computePipeline->CreateComputePass();

  computePass->SetShaderSource(computeShaderSource);
  computePass->SetShaderEntryPoint("computeMainFunction");
  computePass->AddBuffer(inputBuffer);
  // Getting the index of the output buffer for later mapping with ReadBufferFromGPU()
  int outputBufferIndex = computePipeline->AddBuffer(outputBuffer);

  computePass->SetWorkgroups(workgroupsX, workgroupsY, workgroupsZ);

  // We've set up everything, ready to dispatch
  computePass->Dispatch();

  std::vector<OutputDataType> outputData(outputBufferSize);
  auto onBufferMapped = [](const void* mappedData, void* userdata)
  {
    // userdata here is the third argument of ReadBufferFromGPU() '&outputData'
    std::vector<OutputDataType>* outputDataVector = reinterpret_cast<std::vector<OutputDataType>*>(userdata);
    vtkIdType elementCount = outputDataVector->Size();

    const OutputDataType* mappedData = static_cast<const OutputDataType*>(mappedData);
    for (int i = 0; i < elementCount; i++)
    {
      (*outputDataVector)[i] = mappedData[i];
    }
  };

  // Mapping the buffer on the CPU to get the results from the GPU
  computePass->ReadBufferFromGPU(outputBufferIndex, onBufferMapped, &outputData);
  // Update() to actually execute WebGPU commands. Without this, the compute shader won't execute and
  // the data that we try to map here may not be available yet
  computePipeline->Update();

  // ... Do something with the output data here
```

Note also that `Update()` does not need to be called after every call to the compute pass. Calling it once
at the end is valid.

The `Update()` method executes commands in the order they were emitted by calls to the compute passes methods.

This means that:

```c++
computePass->ReadBufferFromGPU(outputBufferIndex, /* callback parameters */);
computePass->Dispatch();
computePipeline->Update();
```

is not going to produce the expected results as the buffer would be mapped (and read by the CPU)
before the compute shader executed.

### Uniforms

Uniforms are simply treated as `vtkWebGPUComputeBuffer` with the `vtkWebGPUComputeBuffer::BufferMode::UNIFORM_BUFFER` mode.

Example:

```c++
const char* shaderSource = R"(
@group(0) @binding(0) var<storage, read> inputBuffer: array<i32, 128>;
@group(0) @binding(1) var<uniform> myUniform: f32;

@compute @workgroup_size(32, 1, 1)
fn computeFunction(@builtin(global_invocation_id) id: vec3<u32>)
{
  // ...
  // Do something
  // ...
})";

int main()
{
  // Ipnut data vector
  std::vector<int> inputVector1Values(128);
  // ...
  // Fill the data
  // ...

  // Creating the input buffer to the compute shader
  vtkNew<vtkWebGPUComputeBuffer> inputBuffer;
  inputBuffer->SetGroup(0);
  inputBuffer->SetBinding(0);
  inputBuffer->SetMode(vtkWebGPUComputeBuffer::BufferMode::READ_ONLY_COMPUTE_STORAGE);
  inputBuffer->SetData(inputVector1Values);
  inputBuffer->SetLabel("First input buffer");

  // Creating a buffer for the additional uniform
  float myUniform = 2.5f;
  std::vector<float> uniformData = { myUniform };
  vtkNew<vtkWebGPUComputeBuffer> uniformBuffer;
  uniformBuffer->SetGroup(0);
  uniformBuffer->SetBinding(1);
  uniformBuffer->SetMode(vtkWebGPUComputeBuffer::BufferMode::UNIFORM_BUFFER);
  uniformBuffer->SetData(uniformData);
  uniformBuffer->SetLabel("Uniform buffer");
}
```

Because uniforms are buffers, you could also have 'myUniform' be an array in the shader and upload
more than one float (or other types) when calling `SetData()`.

### Textures

Compute passes can also manipulate `vtkWebGPUComputeTexture`s. They follow the same principles as the
`vtkWebGPUComputeBuffer` presented earlier. although with a few differences.

It starts the same way as with a `vtkWebGPUComputeBuffer`, that is by creating a `vtkWebGPUComputeTexture`:
```c++
vtkNew<vtkWebGPUComputeTexture> myTexture;
```

The texture then needs to be configured:

```c++
  myTexture->SetLabel("My texture");
  myTexture->SetMode(vtkWebGPUComputeTexture::TextureMode::WRITE_ONLY_STORAGE);
  myTexture->SetFormat(vtkWebGPUComputeTexture::TextureFormat::RGBA);
  myTexture->SetDimension(vtkWebGPUComputeTexture::TextureDimension::DIMENSION_2D);
  myTexture->SetSampleType(vtkWebGPUComputeTexture::TextureSampleType::FLOAT);
  myTexture->SetSize(TEXTURE_WIDTH, TEXTURE_HEIGHT);
```

You may have noticed that there is no group / binding configuration here. This is explained later.

The mode of the texture indicates how the texture is going to be used in the shader (whether it is write
only, read only or read and write). The 'storage' prefix indicates that a sampler cannot be used with
the texture. It is indeed impossible to sample a write-only texture since GPU samplers are not made for
writing to textures. You will probably not have to worry too much about that storage prefix anyway since
the compute API essentially exposes only one type of READ_ONLY, WRITE_ONLY and READ_WRITE.

The format indicates how pixels are encoded in the texture. RGBA indicates 8-bit red, green, blue and alpha channel
for each pixel.

The dimension is either 1D, 2D or 3D. Should be self explanatory.

The sample type is the format of the value you will get after sampling the texture in the shader.

The size of the texture is its extents in the X, Y and Z direction. Just to be clear, if your texture is 2D,
the Z size is unused. Same for the Y size if your texture is a 1D texture.

Once the texture is configured, you can add it to a compute pass:

```c++
  int myTextureIndex = myComputePass->AddTexture(myTexture);
```

This is where the major difference with compute buffers starts. Where you would be done with the configuration
of a compute buffer, you will now have to create a view of your texture.
Texture views give access to the memory block that is your texture. You cannot just access textures directly as
with buffers. You need to go through a view.
Because views are the only way to access textures, your compute shader will also have to go through a view to
access the texture's data. This means that the texture view will be bound to the shader, not the texture
itself. This is why the group / binding configuration is done on the texture view and not the texture.

A `vtkWebGPUComputeTextureView` of a texture can be obtained by calling `GetTextureView()` and passing the
index of texture you want a view of as a parameter.

```c++
  vtkSmartPointer<vtkWebGPUComputeTextureView> textureView = myComputePass->GetTextureView(myTextureIndex);
```

It can then be configured before being added to a compute pass

```c++
  textureView->SetGroup(0);
  textureView->SetBinding(0);
  textureView->SetAspect(vtkWebGPUComputeTextureView::ASPECT_ALL);
  textureView->SetMode(vtkWebGPUComputeTextureView::TextureViewMode::WRITE_ONLY_STORAGE);

  myComputePass->AddTextureView(textureView);
```

    // TODO explain that SetMode on a texture view is optional if a vtkWebGPUComputeBinding is configured


The group and binding define a location in the shader for the texture view. This behaves exactly the same
as with buffers.

The aspect is the part of the texture that will be sampled in the shader. This is only really relevant
for textures that contain multiple aspects such as Depth + Stencil textures. For such textures, you may only
be interested in the depth or the stencil part of the texture in your shader but not both. You would then choose
either `vtkWebGPUComputeTextureView::ASPECT_DEPTH` or `vtkWebGPUComputeTextureView::ASPECT_STENCIL`.
For regular color textures, the aspect must be `vtkWebGPUComputeTextureView::ASPECT_ALL`. This is the default
value pre-configured of a ComputeTextureView so you don't need to call `SetAspect()` only to pass
`vtkWebGPUComputeTextureView::ASPECT_ALL` as the parameter.

The mode of the texture view determines whether the texture data will be read from or written to only. Read + write
texture views are not supported by the compute API for portability reasons. As an example, if you wanted to be able
to read from and write to a texture from a single compute shader, you would need to set the mode of the texture to
`vtkWebGPUComputeTexture::TextureMode::READ_WRITE_STORAGE` and then create two texture views:
- One view with the `vtkWebGPUComputeTextureView::TextureViewMode::READ_ONLY` mode
- A second view with the `vtkWebGPUComputeTextureView::TextureViewMode::WRITE_ONLY_STORAGE` mode
with each view on a unique group / binding combination.

A texture can be read from the GPU by the CPU following the exact same principle as with buffers. One very important
detail is going to be the new `bytesPerRow` parameter of the callback.

```c++
  // Output buffer for the result data
  std::vector<unsigned char> outputPixels(TEXTURE_HEIGHT * TEXTURE_WIDTH * myTexture->GetBytesPerPixel());

  struct CallbackData
  {
    std::vector<unsigned char>* outputPixels;
    int textureWidth, textureHeight;
  };

  // Note the mandatory bytesPerRow parameter here
  auto onTextureMapped = [](const void* mappedData, int bytesPerRow, void* userdata)
  {
    CallbackData* data = reinterpret_cast<CallbackData*>(userdata);
    std::vector<unsigned char>* outputPixels = data->outputPixels;
    const unsigned char* mappedDataChar = reinterpret_cast<const unsigned char*>(mappedData);

    for (int y = 0; y < data->textureHeight; y++)
    {
      for (int x = 0; x < data->textureWidth; x++)
      {
        int outputPixelsIndex = x + y * data->textureWidth;
        // Dividing by 4 here because we want to multiply Y by the 'width' which is in number of
        // pixels RGBA, not bytes
        int mappedIndex = x + y * (bytesPerRow / 4);

        // Copying the RGBA channels of each pixel
        (*outputPixels)[outputPixelsIndex * 4 + 0] = mappedDataChar[mappedIndex * 4 + 0];
        (*outputPixels)[outputPixelsIndex * 4 + 1] = mappedDataChar[mappedIndex * 4 + 1];
        (*outputPixels)[outputPixelsIndex * 4 + 2] = mappedDataChar[mappedIndex * 4 + 2];
        (*outputPixels)[outputPixelsIndex * 4 + 3] = mappedDataChar[mappedIndex * 4 + 3];
      }
    }
  };

  CallbackData callbackData;
  callbackData.outputPixels = &outputPixels;
  callbackData.textureWidth = TEXTURE_WIDTH;
  callbackData.textureHeight = TEXTURE_HEIGHT;

  // Mapping the texture on the CPU to get the results from the GPU
  myComputePass->ReadTextureFromGPU(myTextureIndex, 0, onTextureMapped, &callbackData);
```

This should all be familiar except for the `bytesPerRow` parameter. This parameter represents the number of bytes
between the start of each row of the mapped texture in the mapped data. This parameter is necessary because one
restriction of WebGPU is that the rows of a mapped texture must be a multiple of 256 bytes in size.

For example:
  - You created an RGBA8, 70x80 pixels texture
  - The RGBA8 is 4 bytes per pixel.
  - Your texture is 70 pixels wide. This corresponds to 4 * 70 = 280 bytes per row for your texture.
  - 280 bytes per row is not a multiple of 256 bytes so the WebGPU restriction is not satisfied.
  - Thus, when the texture is mapped to be able to read its data from the CPU, the compute API will
    pad each of the texture to reach a multiple of 256 bytes.
  - The next multiple of 256 from 280 is 512 so the padded data will contain 232 bytes of padded and unrelevant data
    at the end of each row.
  - In this case, the `bytesPerRow` parameter will be equal to 512 when the callback is called.
  - To ensure correct results, you will need to skip these extra 232 bytes when copying from the mapped data
    to your custom buffer (`outputPixels` in the example). This is what the `mappedIndex` variable does in the above
    piece of code. It multiplies `y` by `bytesPerRow` (and not `data->textureWidth`) to make sure that we skip the padding
    /////////////////////////////////////////////////
    /////////////////////////////////////////////////
    //////////////// TODO the rest ///////////////////////////
    /////////////////////////////////////////////////
    /////////////////////////////////////////////////...

// TODO explain vtkWebGPUComputeBindings

### Compute pipeline - Integration in an existing rendering pipeline

A compute pass can also be used to access and modify data buffers used for rendering
(point/cell data attributes: colors, positions, normals, ...).

The following documentation only discusses point data attributes but everything applies for cell data attributes
as well (using `vtkWebGPUPolyDataMapper::AcquireCellAttributeComputeRenderBuffer()`).

The global usage is the same as when using the pipeline outside of the rendering pipeline, the main difference
being that you do not create the `vtkWebGPUComputeBuffer` yourself (since you're using the existing buffers from the
render pipeline). You can retrieve existing buffers by calling `vtkWebGPUPolyDataMapper::AcquirePointAttributeComputeRenderBuffer()`
(or `AcquireCellAttributeComputeRenderBuffer`):

```c++
int bufferGroup = 0, bufferBinding = 0;
int uniformsGroup = 0, uniformsBinding = 1;

vtkSmartPointer<vtkWebGPUComputeRenderBuffer> pointColorsRenderBuffer =
  webGPUMapper->AcquirePointAttributeComputeRenderBuffer(vtkWebGPUPolyDataMapper::PointDataAttributes::COLORS,
  bufferGroup, bufferBinding, uniformsGroup, uniformsBinding);
```

Sidenote: if you need to access specific textures used by the compute pipeline (such as the depth buffer of
the current `RenderWindow` for example), you can do so by acquiring `vtkWebGPUComputeRenderTexture`s that
follow the same principles as `vtkWebGPUComputeRenderBuffer`s. Have a look at the various
`AcquireXXXXRenderTexture` functions from `vtkWebGPURenderWindow`.

All the points attributes data (positions, colors, normals, tangents, UVs, ...) are actually contained within
the single returned buffer. The first parameter of `vtkWebGPUPolyDataMapper::AcquirePointAttributeComputeRenderBuffer()` allows
to specify which attribute of the whole buffer you're interested in.

The next 2 parameters indicate where the buffer is going to be bound in your coompute shader, same as when creating a buffer
yourself.

Because the buffer you get from `vtkWebGPUPolyDataMapper::AcquirePointAttributeComputeRenderBuffer()` contains all the point data of
the mapper you called the function on (not only the attribute you requested), you are going to need a way to identify where
the part of the buffer that you requested (positions, colors, ... depending on the first parameter of the function) begins and ends.

This is made available through the last 2 parameters of the function. The compute pipeline will automatically bind a
uniform buffer of two u32 values to your shader. These u32 values are, in order:
- The buffer offset, corresponding to the location where the attribute you requested starts in the whole buffer, expressed in **number of float elements**.
- The buffer length, in terms of **number of attributes**.

For example, if you requested colors and the buffer length uniform is 2, then you have 8 floats of relevant data (a color value corresponding to 4 floats)
beginning at `buffer[bufferOffset]` where buffer is a WGSL `array<f32>` and `bufferOffset` is the first u32 uniform.

These two uniform values are bound in your shader as a uniform buffer at the group and binding given as the third and fourth parameters.

The returned render buffer can then be added to the compute pass by calling `AddRenderBuffer()` (not `AddBuffer()`,
since we're manipulating a buffer used by the VTK rendering pipeline here):

```c++
myComputePass->AddRenderBuffer(pointColorsRenderBuffer);
```

The compute pipeline then needs to be added to a `vtkWebGPURenderer`:

```c++
vtkWebGPURenderer* wegpuRenderer = vtkWebGPURenderer::SafeDownCast(renWin->GetRenderers()->GetFirstRenderer());
wegpuRenderer->AddComputePipeline(myComputePipeline);
```

And that's it. In the case where you added your pipeline to a `vtkWebGPURenderer`, you do not need to call the
`Update()` method, this is done automatically by the rendering pipeline on each frame.
