opengl performance improvements #1410

gizahNL · 2021-11-24T14:18:42Z

Low hanging fruit: change glWaitSync behavior
Optimize fragment shader to discard when invisible (alpha < 0.01)
Change shader (less branching due to non-uniform flow control)
Using a separate shader program as a "fast path" & using openGL blend func (executed on GPU ROP, less texture reads) shaves off another ~5%

Total improvement from ~80-85% to ~50-55% utilization when running 4 HD 50i channels each with 2 layers, close to 2.1 GPU utilization.

instead of repeatedly calling glClientWaitSync with a 1 nanosecond timeout, call it with a 20ms timeout w flush. Decreases average GPU utilisation on my testbench by about 10% (~85%->~75%, 4 * 1080i5000 on k620)

When running it do blending via OpenGL, this is a tad bit faster.

Since we keep filling the command buffer there is no need to flush and we can safely forego it. This marginally improves performance.

Julusian · 2023-04-12T16:07:50Z

Trying this with 4x 1080i50 channels (each playing 2 AMB) on ubuntu 22.04 with a GTX1060, I am seeing gpu usage go from 40-45% to 38-42%, which is not a significant improvement. What gpu and os are you using?

On windows it gets stuck in an error loop when playing any media with caspar::gl::ogl_invalid_framebuffer_operation_ext

Change shader (less branching due to non-uniform flow control)

It has been quite a while (~10 years) since I have had to think about optimising cuda code, but from what I remember branching is only an issue when threads in the same cluster make take different routes. So for us, different branches being used for each frame being composited should have no major impact?

What is the cost of frequently switching shaders? some layers on a channel could be on the fast and some on the slow shader

As it currently stands, I am not convinced that this will give a noticeable performance benefit to most users, so I am not convinced it is worth the extra complexity

Gijs Peskens added 4 commits November 24, 2021 15:01

opengl linux: minor performance improvement

13d97e1

instead of repeatedly calling glClientWaitSync with a 1 nanosecond timeout, call it with a 20ms timeout w flush. Decreases average GPU utilisation on my testbench by about 10% (~85%->~75%, 4 * 1080i5000 on k620)

NVIDIA busy waits on glWaytClientSync, sleep in between

4fcdb5a

This shaves another 5-10% off

3a6f820

Early discard invisible, shaves of another ~5%

46cfca7

gizahNL changed the title ~~[WIP] opengl linux performance~~ opengl performance improvements Nov 25, 2021

gizahNL force-pushed the opengl_performance branch from c4f3c9a to f1ae1dc Compare November 29, 2021 12:50

Introduce different shader program for common fast path

e710ba2

When running it do blending via OpenGL, this is a tad bit faster.

gizahNL force-pushed the opengl_performance branch from f1ae1dc to e710ba2 Compare November 29, 2021 12:51

Gijs Peskens added 4 commits November 29, 2021 15:41

Use DSA in texture & for vertex array

3e07587

Add frame terminator to aid in profiling when building debug build

5781e40

Replace deprecated texture2D with texture

2133e76

Return to yielding behavior & forego flushing

44b6fa5

Since we keep filling the command buffer there is no need to flush and we can safely forego it. This marginally improves performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opengl performance improvements #1410

opengl performance improvements #1410

gizahNL commented Nov 24, 2021 •

edited

Loading

Julusian commented Apr 12, 2023

opengl performance improvements #1410

Are you sure you want to change the base?

opengl performance improvements #1410

Conversation

gizahNL commented Nov 24, 2021 • edited Loading

Julusian commented Apr 12, 2023

gizahNL commented Nov 24, 2021 •

edited

Loading