Orillusion

shuangliu

w3c的doc国内可能不容易打开，可以参看Orillusion的官方翻译
https://www.orillusion.com/zh/wgsl.html#var-and-let

shuangliu

offset 用来标识 attributes 的偏移量，你的另一个帖子中
https://forum.orillusion.com/topic/28/关于setvertexbuffer-插槽的理解
用 setVertexBuffer(index, vertexBufer) 根据不同的 index 可以在vertex shader 中设置不同的 location 插槽，这是一种做法，调用多次 setVertexBuffer 设置不同插槽

除了设置不同的index，我们还可以通过一次 setVertexBuffer 直接把所有顶点相关信息一次性提交，效率更高，比如一般常规来说，我们的geometry信息会包含 position、normal或color、uv三种信息，那么可以直接设置一个大的buffer：

const cubeVertexArray = new Float32Array([
  // float4 position, float4 color, float2 uv,
  1, -1, 1, 1,   1, 0, 1, 1,  1, 1,
  -1, -1, 1, 1,  0, 0, 1, 1,  0, 1,
  -1, -1, -1, 1, 0, 0, 0, 1,  0, 0,
  ...... // 省略篇幅
  1, 1, -1, 1,   1, 1, 0, 1,  1, 0,
  1, -1, -1, 1,  1, 0, 0, 1,  1, 1,
  -1, 1, -1, 1,  0, 1, 0, 1,  0, 0,
]);

那么在pipeline创建时可以通过 offset 指定buffer的偏移量用来标识不同的插槽属性:

const renderPipeline = this.device.createRenderPipeline({
    vertex: {
        module:  vertexShader,
        entryPoint: 'main',
        buffers: [{
            arrayStride: 10 * 4, // 标明每 10 个值一组数据
            attributes: [
                {
                    // position
                    shaderLocation: 0, // 插槽index
                    offset: 0, // 偏移量 0
                    format: 'float32x4',  // 标明4个 float32 position
                },
                {
                    // color
                    shaderLocation: 1, // 插槽index
                    offset: 4 * 4, // 偏移量 4
                    format: 'float32x4', // 标明 4 个 float32 color
                },
                {
                    // uv
                    shaderLocation: 2, // 插槽index
                    offset: 8 * 4, // 偏移量 8
                    format: 'float32x2',  // 标明 2 个 float32 uv
                }
            ],
        }]
    },
    ....  // 省略
})

那么只需设置一次vertexbuffer

const verticesBuffer = device.createBuffer({
    size: cubeVertexArray.byteLength,
    usage: GPUBufferUsage.VERTEX,
    mappedAtCreation: true,
});
new Float32Array(verticesBuffer.getMappedRange()).set(cubeVertexArray);
  verticesBuffer.unmap();

... // 省略

passEncoder.setVertexBuffer(0, verticesBuffer);

在shader中，就可以通过offset调用不同的location：

[[stage(vertex)]]
fn main([[location(0)]] position : vec4<f32>,
        [[location(1)]] color : vec4<f32>,
        [[location(2)]] uv : vec2<f32>,
) -> VertexOutput {
        ... // 省略
}

这种做法效率更高一些，做高性能的 webgpu 程序要尽可能减少不必要的command次数和 gpu 数据交换次数，所以：

减少setVertexBuffer 的次数，把所有相关信息一次提交，要比多次设置index快很多
在数据层面，创建一个gpubuffer，要比创建多个零散的gpubuffer效率高很多，减少cpu与gpu的交换数据次数，也减少了GPU在内部命中内存的速度，这个的提升也很重要。

shuangliu

用来标明是拷贝的来源还是目标，copy_dst 就是可以做为 copy 的目标，比如用 copyExternalImageToTexture 把 image
拷贝给 texture。相应的，copy_src 就是可以作为 copy 来源，也就是可以被拷贝，比如用 copyTexureToTexture，t1 copy to t2，那t1需要有 copy_src，t2要有 copy_dst
因为纹理贴图支持多维度的贴图，除了普通的 2d 贴图，webgpu 还支持 1d, cube, 3d, 2d-array, 3d-array 等类型，那么就对应的depthOrArrayLayer 标明深度参数或者 Array 的layer数量，比如cube对应的 depthOrArrayLayer 就必须是6，其他比如 2d-array 根据贴图数量设置
rgba/bgra 只是 rgb 的排列方式不同而已，对应着小端对齐/大端对齐，并没有特殊的不同，我不确定 samplecount 为4，既开启了 MSAA 时，一定要用bgra8unorm吗？
如果是，可能因为目前 webgpu 默认的 MSAA 需要硬件/系统驱动支持，所以必须要用 perfered format 才可以。
目前大部分设备的色彩空间格式默认是 bgra 排列，起码我手上的几个 windows, mac 和 ios 都是，所以开启默认的 MSAA 时，必须用 bgra8unorm。
至于为什么默认是 bgra，因为 bgra 排列更符合cpu/gpu的内存排列方式，读取不需要额外的转换操作，可以直接使用底层驱动API去操作
实践上，一般推荐调用context.getPreferredFormat() 得到系统默认的格式，可以避免一些不必要的问题，理论上性能也更好一些

shuangliu

A simple data performance test between WebGL.bufferSubData and WebGPU.writeBuffer, try to simulate 1million transform matrix submit to GPU:

For my platform:
CPU: Intel i7 8700k
Mem: 16gb
GPU: Intel UHD 630
OS: Mac os 12.01
Chrome: 100.0.4862.3
Firefox: 99.0a1

for > 500k, take 1000k for heavy test

	Chrome 100	FireFox 99
WebGL	~13 ms	26-30 ms
writeBuffer	50~70 ms	26-30 ms
mapAsync/unmap	40-50 ms	70-100 ms

It is obviously that writeBuffer on Chrome is not ready for big dataset, basically 5-10x times slower than bufferSubData. Where mapAsync slightly faster than writeBuffer

But for <150k, writeBuffer is faster

	Chrome 100	FireFox 99
WebGL	2 - 4 ms	5 - 6 ms
writeBuffer	1 - 2 ms	~2 ms
mapAsync/unmap	5 -10 ms	60 - 100 ms

For small dataset, the mapAsync is slower …

When it comes to real application, WebGPU could get up to 20x slower then WebGL in Chrome, the device queue will be blocked while writing multi textrure/matrix/index/vertex ..

In our contrast test:
https://contrast.orillusion.com

Event though Orillusion gets much better overall FPS than Three instance draw, Orillusion actually takes 10x longer time than Three.js in sending data to gpu. When it comes to 300k boxes, Orillusion takes over 50-70ms on writeBuffer where Three only takes 3-4ms on bufferSubData.

This performance issue has been discussed for a long time, e.g. https://bugs.chromium.org/p/chromium/issues/detail?id=1266727&q=writeBuffer&can=2
But it is still not addressed up to Chrome Canary v101

For now, the best practice in Chrome, try mapAsync if you are working on very big dataset, but may not get much helpful in real application. Use writeBuffer for simple relatively small data is always a good choice. But overall, they are not fast as WebGL so far

By the way, in our real practice, mapAsync may not a good choice for heavy rendering interactive Application.
The async callback/promise can be delayed from 4ms up to 20ms, such as mouse/keyboard/ajax/networking events, or other mapAsync jobs.

If you are working on a computing job, the delay is acceptable. But for real-time rendering, it can trigger a wired fps sometime, unstable frame change, hard to control from JS side. mapAsync vs writeBuffer is like relatively fast but unstable fps vs relatively slow but stable fps.

Welcome anyone try the test, and post your result.

shuangliu

1.首先并不需要每个texture对应一个pipeline，可以在同一pipeline中使用不同的BindGroup，每个group对应不同的texture binding. e.g.

let sampler = device.createSampler({
    magFilter: 'linear',
    minFilter: 'linear',
});
let texture1 = device.createTexture({ ... });
let texture2 = device.createTexture({ ... });
const group1 = device.createBindGroup({
    layout: pipeline.getBindGroupLayout(0),
    entries: [
        {
            binding: 1,
            resource: sampler,
        },
        {
            binding: 2,
            resource: texture1.createView(),
        }
    ]
});
const group2 = device.createBindGroup({
    layout: pipeline.getBindGroupLayout(0),
    entries: [
        {
            binding: 1,
            resource: sampler,
        },
        {
            binding: 2,
            resource: texture2.createView(),
        }
    ]
});

然后在 rendering loop 中通过 setBindGroup 切换 group1 和 group 2 即可。

2.如果一组 textures 是同样的大小和格式，可以直接使用 texture_2d_array 去存储多个 image，这样即使一个group内，也可以在shader中通过切换array的index，来直接切换 texutre.

对于动态的 video texuture，我们可以直接使用 importExternalTexture 进行引入，随着video播放，texture 会自动更新

const video = document.createElement('video');
video.loop = true;
video.autoplay = true;
video.muted = true;
video.src = '...';
await video.play();

const group = device.createBindGroup({
    layout: pipeline.getBindGroupLayout(0),
    entries: [
        {
            binding: 1,
            resource: sampler,
        },
        {
            binding: 2,
            resource: device.importExternalTexture({
                source: video,
            })
        }
    ]
});

3.对于更新静态的 texutre，可以根据目标 image/texture 的类型进行 copy 更新，目前 webgpu 提供多个copy command，我们一般最主要会用到的：copyExternalImageToTexture ，可以更新外部image到gpu texture:

let texture = device.createTexture({ ... });
const img1 = document.createElement('img');
img1.src = '....';
await img1.decode();
const image1 = await createImageBitmap(img1);
device.queue.copyExternalImageToTexture(
      { source: image1 },
      { texture: texture },
      [image1.width, image1.height]
);
...
...js
// 更新texture的image
const img2 = document.createElement('img');
img2.src = '....';
await img2.decode();
const image2 = await createImageBitmap(img2);
device.queue.copyExternalImageToTexture(
      { source: image2 },
      { texture: texture },
      [image2.width, image2.height]
);

另外，也可以用 commandEncoder 执行 copyTextureToTexture 进行两个 gpu texutre 之间的copy 更新

// e.g. 创建两个texture，texture1 用于显示，tempTexture用于接收外部图片，图片loading 后进行 copy 更新
let texture1 = device.createTexture({ ... });
...
let tempTexture = device.createTexture({ ... });
let newImage = await loadImage() // 异步加载图片 
device.queue.copyExternalImageToTexture(
    { source: newImage },
    { texture: tempTexture },
    [newImage.width, newImage.height]
);
const commandEncoder = device.createCommandEncoder();
commandEncoder.copyTextureToTexture(
    { texture: tempTexture },
    { texture: texture1 }
);
device.queue.submit([commandEncoder.finish()]);

其他的更新texture API 使用方法，可以参阅 https://www.orillusion.com/webgpu.html#image-copies

4.对于 sampler，目前 webgpu 没有直接更新 sampler 的API，只能创建新的 sampler，并使用新的group进行渲染，e.g.

let linearSampler = device.createSampler({
    magFilter: 'linear',
    minFilter: 'linear',
});
const group = device.createBindGroup({
    layout: pipeline.getBindGroupLayout(0),
    entries: [
        {
            binding: 1,
            resource: linearSampler,
        },
        {
            binding: 2,
            resource: texture
        }
    ]
});
let nearestSampler = device.createSampler({
    magFilter: 'nearest',
    minFilter: 'nearest',
});
const newGroup = device.createBindGroup({
    layout: pipeline.getBindGroupLayout(0),
    entries: [
        {
            binding: 1,
            resource: nearestSampler,
        },
        {
            binding: 2,
            resource: texture
        }
    ]
});

group中其他的resource并不需要重新创建，可以复用

shuangliu

orillusion 不是严格的 ecs 架构，弱化了 system 的概念，将 system 基本一起放在了 component 里

对于查找 Component，可以参考 object 的相关 API，比如 getComponentsInChild 可以查找一个节点中所有目标 Component

// get all MeshRenderer components in obj
let meshList = obj.getComponentsInChild(MeshRenderer)

类似，可以通过遍历 scene 找到所有目标 components

// get all attack components in scene
let attaclist = scene.getComponentsInChild(Attack)

shuangliu

这个跟 WGSL 无关，任何一种shader 语言都没有自带 Math.random() 这种API，都需要开发者实现基本的伪随机算法，网上公开的算法和例子很多，比如

https://zhuanlan.zhihu.com/p/390862782
https://gamedev.stackexchange.com/questions/32681/random-number-hlsl

基本上利用 fract/sin/cos 处理 position/uv/ 就可以实现基本的伪随机

shuangliu

@wenhao0807 这个是根据你自己的vertex设置安排的

const cubeVertexArray = new Float32Array([
  // float4 position, float4 color, float2 uv,
  1, -1, 1, 1,   1, 0, 1, 1,  1, 1,
  -1, -1, 1, 1,  0, 0, 1, 1,  0, 1,
  -1, -1, -1, 1, 0, 0, 0, 1,  0, 0,
  ...... // 省略篇幅
  1, 1, -1, 1,   1, 1, 0, 1,  1, 0,
  1, -1, -1, 1,  1, 0, 0, 1,  1, 1,
  -1, 1, -1, 1,  0, 1, 0, 1,  0, 0,
]);

这个例子中，一行10个数字，你可以安排排列方式，1-4 是position, 5-8 是 color或者一般设置normal，9-10是uv值，相应的就是 pipeline 中的 offset。
一般来说，对于普通object/mesh，我们一般用8个值做顶点buffer，3个 position, 3个normal, 2个uv:

buffers: [
    {
        arrayStride: 8 * 4,
        attributes: [
            {
                // position
                shaderLocation: 0,
                offset: 0,
                format: 'float32x3',
            },
            {
                // normal
                shaderLocation: 1,
                offset: 3 * 4,
                format: 'float32x3',
            },
            {
                // uv
                shaderLocation: 2,
                offset: 6 * 4,
                format: 'float32x2',
            }
        ],
    }
]

其他信息，比如transform,color,light,material等参数会用 uniform/storage buffer进行传递。当然你可以自行安排，比如对于lines/points，就不需要normal或者uv，那vertexbuffer中就会设置color

shuangliu

新加了 bundle 的 sample:
https://orillusion.github.io/orillusion-webgpu-samples/#cubesRenderBundle

shuangliu

@jkwang007 0.6.2已修复removeChild相关bug

shuangliu

全局光照我们也会在稍后几版中加入

shuangliu

对于大模型，一般是无法直接加载到内存或显存里的，除非配置超大的显卡和内存，所以通常来说，需要配合后端的 模型预处理 和 实时网络分发 来进行大模型展示

一般需要先将模型预处理成特殊的 分片 和 LOD 处理，比如 3dtiles 这种模型标准，常用于 GIS 类的应用，它会将整个模型进行坐标分片和 lod 处理，再配合上层应用引擎进行选择性加载，比如只加载当前视口内的分片模型和lod，其它的模型会从内存和显存内剔除，根据视口的变换实时向服务器请求不同的分片模型，保证同一时间内引擎其实只加载了很小一部分，并不会全部加载几十G 的内容

Orillusion 当前版本还没有开发对类似 3dtiles 等主流分片模型的支持，但有计划开发相关的扩展支持，也计划会开发对模型进行分片处理的后端工具和服务，如果您有gis类的开发经验，也非常欢迎提交相关 pr

shuangliu

本地 localhost 运行还是域名网站？如果不是localhost，chrome 默认限制 webgpu 必须使用 https 才可以开启

shuangliu

是打错字了吧，应该是webgpu，大概率是因为没有运行在 localhost 的域名下
chrome 限制 webgpu 只能在 localhost 或 https 的环境下运行，如果是 ip 类的网址，需要手动开启 https 才可以成功加载

shuangliu

具体代码？build 时是 node 环境，没有 window 对象，可以使用 globalThis 统一 node 和浏览器，具体的使用需要看你的代码了

shuangliu

可以上传一下模型zip，或者去github发个issue，我们看一下模型

Orillusion

0

694

212

542

shuangliu

帖子