Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XRManager: Fix tone mapping and output color space. #30387

Draft
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

Mugen87
Copy link
Collaborator

@Mugen87 Mugen87 commented Jan 23, 2025

Related issue: #28968

Description

This PR makes sure tone mapping and output color space conversion are supported with XRManager.

@cabanier XRManager is the new WebXRManager for WebGPURenderer. The implementation of the new manager is not on par with WebXRManager yet (e.g. support for WEBGL_multisampled_render_to_texture, depth sensing and performance tweaks are still missing) but a few features have already been implemented.

This PR adds the missing tone mapping and output color space conversion. In WebGPURenderer we do not perform this step inline in the material shaders anymore but with an additional render pass. We know this adds overhead but it fixes long-standing issues with transparency and color spaces.

At least on Desktop the difference in performance is negligible but mobile XR is a different thing. Even one additional pass with a really simple fragment shader noticeably hits performance on a Quest 2. I have not measured the FPS yet but you can immediately see stuttering. Can you give as a hint what we do wrong? I suspect multi-pass rendering is not an uncommon thing for native XR apps but I guess they manage bindings and states more efficiently. I've also checked the browser console via remote debugging and made sure there are no WebGL errors/warnings.

Below are two links to the same demo but with different XR managers for testing.

Another related issue: Both demos should render identical but if you hit the Quest/Meta button while presenting, the menu is rendered differently. With WebXRManager/WebGLRenderer, the menu is properly embedded in the 3D scene whereas in XRManager/WebGPURenderer it looks broken. The 3D scene is rendered flat "behind" the menu. I suspect this has to do with missing depth information when using multiple render passes. When WebGPURenderer performs tone mapping and color space conversion (or potentially any kind of FX), it does not write the scene's depth into the XR render target. Just the final color values. Is this considered as a bug on our side (meaning do we provide an incomplete framebuffer)?

XRManager (WebGPURenderer): webgpu_xr_cubes
WebXRManager (WebGLRenderer): webxr_xr_cubes

@cabanier
Copy link
Contributor

cabanier commented Jan 23, 2025

Thanks for the heads up! Yes, this is likely caused by multipass which is slow on mobile GPUs and is likely exacerbated in VR because it renders the scene twice (once for each eye so 4 passes instead of just 1 previously).
I'll see if I can find some time this week to investigate.

@cabanier
Copy link
Contributor

Another related issue: Both demos should render identical but if you hit the Quest/Meta button, the menu is rendered differently. With WebXRManager/WebGLRenderer, the menu is properly embedded in the 3D scene whereas in XRManager/WebGPURenderer it looks broken. The 3D scene is rendered flat "behind" the menu. I suspect this has to do with missing depth information when using multiple render passes. When WebGPURenderer performs tone mapping and color space conversion (or potentially any kind of FX), it does not write the scene's depth into the XR render target. Just the final color values. Is this considered as a bug on our side (meaning do we provide an incomplete framebuffer)?

Quest does not use the depth buffer (except when using space warp which is uncommon). However, other headsets might need depth information to help in reprojection so we need a way to preserve it.

@cabanier
Copy link
Contributor

the GPU profiler shows this for the previous code path:
Surface 0 | 3360x1760 | color 32bit, depth 24bit, stencil 0 bit, MSAA 4, Mode: 1 (HwBinning) | 72 288x320 bins ( 44 rendered) | 2.61 ms | 91 stages : Binning : 0.275ms Render : 0.854ms StoreColor : 0.391ms Blit : 0.006ms Preempt : 0.716ms
and this for the new code path:
Surface 0 | 3360x1760 | color 64bit, depth 24bit, stencil 0 bit, MSAA 4, Mode: 1 (HwBinning) | 96 288x224 bins ( 96 rendered) | 9.41 ms | 294 stages : Binning : 0.193ms Render : 1.196ms StoreColor : 1.49ms Blit : 0.548ms Preempt : 2.322ms StoreDepthStencil : 0.251ms Surface 1 | 3360x1760 | color 32bit, depth 24bit, stencil 8 bit, MSAA 1, Mode: 1 (HwBinning) | 16 864x448 bins ( 16 rendered) | 1.86 ms | 50 stages : Binning : 0.07ms Render : 1.412ms StoreColor : 0.04ms Blit : 0.005ms LoadDepthStencil : 0.188ms

A lot of the slowdown is coming from the loading/storing of depth and color which comes from postprocessing. three is also rendering to a 64 bit color buffer which is a lot slower. Is that needed?

@cabanier
Copy link
Contributor

I looked out how the new rendering pipeline is constructed and unfortunately, this is not going to work for us. In order to get good performance, the scene needs to render with foveation.
Only the textures that are created by WebXR have this feature enabled so it won't apply with your current approach that uses renderbuffers :-\

@Mugen87
Copy link
Collaborator Author

Mugen87 commented Jan 24, 2025

three is also rendering to a 64 bit color buffer which is a lot slower. Is that needed?

That is good point. Right now, the intermedia framebuffer for the beauty pass is RGBAFormat + HalfFloatType which is essentially gl.RGBA16F. This has been configured to support HDR by default. Related code:

frameBufferTarget = new RenderTarget( width, height, {
depthBuffer: depth,
stencilBuffer: stencil,
type: HalfFloatType, // FloatType
format: RGBAFormat,
colorSpace: LinearSRGBColorSpace,
generateMipmaps: false,
minFilter: LinearFilter,
magFilter: LinearFilter,
samples: this.samples
} );

To improve performance we should try to use gl.RGBA8 whenever possible.

@Mugen87
Copy link
Collaborator Author

Mugen87 commented Jan 24, 2025

Only the textures that are created by WebXR have this feature enabled so it won't apply with your current approach that uses renderbuffers :-\

Does that mean any kind of FX won't work and we always have to render directly into the WebXR textures?

I guess we need to think about a separate solution for Quest (or XR in general) then 🤔 .

@cabanier
Copy link
Contributor

Does that mean any kind of FX won't work and we always have to render directly into the WebXR textures?
I guess we need to think about a separate solution for Quest then 🤔 .

I believe it's possible to read and write to the same texture with some limitations. So in the second step, we would read and write to the same rendertarget.
Another possibility is to add a new API to WebXR that marks any texture as foveated. That would allow fancy postprocessing effects but might be a bit slower since there will still be a copy during the second pass.

@Mugen87
Copy link
Collaborator Author

Mugen87 commented Jan 24, 2025

Another possibility is to add a new API to WebXR that marks any texture as foveated.

I wasn't aware this is possible. How do you do that?

@cabanier
Copy link
Contributor

Another possibility is to add a new API to WebXR that marks any texture as foveated.

I wasn't aware this is possible. How do you do that?

We use an OpenGL API [1] to apply foveation to our swapchain textures. However, the API applies to any texture so we could create a function to do this with a WebGL texture.

1: https://registry.khronos.org/OpenGL/extensions/QCOM/QCOM_texture_foveated.txt

@mrxz
Copy link
Contributor

mrxz commented Jan 27, 2025

I believe it's possible to read and write to the same texture with some limitations. So in the second step, we would read and write to the same rendertarget.

@cabanier Correct me if I'm wrong, and I'd like to be wrong on this one, but I don't think this is possible with WebGL2 in a meaningful way for post processing. EXT_shader_framebuffer_fetch has been rejected in favour of WEBGL_shader_pixel_local_storage, which is still a draft extension.

@cabanier
Copy link
Contributor

@cabanier Correct me if I'm wrong, and I'd like to be wrong on this one, but I don't think this is possible with WebGL2 in a meaningful way for post processing. EXT_shader_framebuffer_fetch has been rejected in favour of WEBGL_shader_pixel_local_storage, which is still a draft extension.

I haven't talked to our driver team yet but if this extension solves the problem of the extra renderpass, we could turn it on in Quest Browser.
@toji , do you know why this extension isn't enabled yet?

@toji
Copy link
Contributor

toji commented Jan 27, 2025

I don't, but I know it's been under development for a long time. I'll see if I can find out any additional details.

@cabanier
Copy link
Contributor

@Mugen87 I made a couple of changes that seem to make the processing a lot fast. See Mugen87@2fe59c2
The changes I made:

  • there was a bug were a stencil buffer was accidentally added
  • keep track if the compositor needs depth
  • don't attach the depth buffer to the non-msaa framebuffer if it's not needed by the system
  • switch to byte from half float

With these changes, I see the following in the trace:

Surface 0 | 3360x1760 | color 32bit, depth 0 bit, stencil 0 bit, MSAA 4, Mode: 1 (HwBinning) | 35 480x384 bins ( 35 rendered) | 1.25 ms | 72 stages : Binning : 0.115ms Render : 0.517ms StoreColor : 0.09ms Blit : 0.308ms
Surface 1 | 3360x1760 | color 32bit, depth 24bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 16 864x448 bins ( 17 rendered) | 1.97 ms | 35 stages : Render : 0.61ms StoreColor : 0.227ms Blit : 0.252ms Preempt : 0.78ms
Surface 2 | 3360x1760 | color 32bit, depth 0 bit, stencil 0 bit, MSAA 4, Mode: 1 (HwBinning) | 35 480x384 bins ( 35 rendered) | 3.09 ms | 74 stages : Binning : 0.115ms Render : 0.543ms StoreColor : 0.088ms Blit : 0.301ms Preempt : 1.685ms
Surface 3 | 3360x1760 | color 32bit, depth 24bit, stencil 0 bit, MSAA 1, Mode: 2 (SwBinning) | 16 864x448 bins ( 17 rendered) | 1.75 ms | 34 stages : Render : 0.6ms StoreColor : 0.229ms Blit : 0.006ms Preempt : 0.756ms

Which is a lot faster than before. The second pass adds around 1ms per frame of overhead.

If this looks reasonable and we can integrate it (or something similar), I can tackle the render_to_texture approach instead of using renderbuffers. I think the non-msaa path also needs work but it looks like that isn't implemented yet.

If we combine this with WEBGL_shader_pixel_local_storage, we can likely have very similar performance (with foveation) and the door would be open to decent post-processing. This should make @CodyJasonBennett happy!

@CodyJasonBennett
Copy link
Contributor

CodyJasonBennett commented Jan 29, 2025

You don't need pixel local storage for post-processing, nor do you even need storage memory. That's also not exclusively what #26160 is about; it fixes offscreen rendering in general, which is otherwise completely broken. I will not entertain that narrative any further and will reiterate that I have a $10,000 bounty #26160 (comment) for a strict improvement in 1 LoC Rik alone has been gatekeeping. I don't use three.js myself, but even Meta asks me about this on a weekly basis, and I would hate to see it regress any further #28968 (comment). On a related note, it is desireable to keep inline tonemapping if not working in HDR or simply upscaling #29429. This whole area has regressed since r155, but that's a way to walk back in a non-breaking way.

@Mugen87
Copy link
Collaborator Author

Mugen87 commented Jan 29, 2025

@cabanier That's great news! I think we need multiple PRs to integrate these changes since the switch from a 64 to 32 bit color buffer should be done with a new renderer property as mentioned in #30392 (comment).

@Mugen87
Copy link
Collaborator Author

Mugen87 commented Jan 29, 2025

Also, there were some recent PRs after I have filed this one, see #30418 and #30417. So we might need some additional changes.

@cabanier
Copy link
Contributor

Also, there were some recent PRs after I have filed this one, see #30418 and #30417. So we might need some additional changes.

No problem! I'll sync to the dev branch and apply my changes there. I will make sure that it also works correctly on devices that don't support render_to_texture or needs the depth buffer.

@Mugen87
Copy link
Collaborator Author

Mugen87 commented Jan 29, 2025

Awesome!

  • there was a bug were a stencil buffer was accidentally added
  • keep track if the compositor needs depth
  • don't attach the depth buffer to the non-msaa framebuffer if it's not needed by the system
  • switch to byte from half float

In the meanwhile, point 1 and 4 from your list are fixed on dev 🙌 .

@cabanier
Copy link
Contributor

Also, there were some recent PRs after I have filed this one, see #30418 and #30417. So we might need some additional changes.

@Mugen87 I see that the dev branch is no longer using an extra renderpass and is using the WEBGL_multisampled_render_to_texture. Do you still want to explore adding that extra pass?

@Mugen87
Copy link
Collaborator Author

Mugen87 commented Jan 30, 2025

Do you still want to explore adding that extra pass?

Yes, I think it would be good to have this implemented so we have a concrete use case for WEBGL_shader_pixel_local_storage. The extra pass would perform the tone mapping and color space conversion which is still missing.

It would be great if you we could make this path as fast as possible so we can easier evaluate the gap between this approach and the inline tone mapping/color space conversion.

@cabanier
Copy link
Contributor

Do you still want to explore adding that extra pass?

Yes, I think it would be good to have this implemented so we have a concrete use case for WEBGL_shader_pixel_local_storage. The extra pass would perform the tone mapping and color space conversion which is still missing.

ok, so I shouldn't investigate maintaining foveation support for the 2 pass solution?
I was thinking of adding a new API to WebXR to do so...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants