better device memory allocation strategy #744

w23 · 2024-11-27T16:31:51Z

The problem: big allocations (say, > half of default devmem allocation size, currently 64M/2 = 32M) lead to too many device memory objects, hitting the hardcoded limit. Example: UHD window resolution w/ rt pipeline leads to allocating many large "g-buffer" textures and hitting the max devmem assert.

Better devmem allocation strategy could consist of the following properties:

Dynamic devmem array, not limited by hardcoded max count
Special handling for big allocations, e.g.:
- allocate a dedicated devmem object for each big alloc (e.g. > 32M)
- allocation classes: allocate 128M/256M devmem object for big allocs specifically
- lazy allocation: first collect how much memory would be needed, and only then allocate devmem when it is actually required.

Related: #502, it would be good to know devmem allocation stats: how many and how large object we have, etc.

w23 · 2024-12-12T03:37:39Z

Related issues:

What we could do is:

Split allocations into three classes

Static stuff allocated at render start time: various buffers, geometry, uniform, acceleration structures*, etc. This is allocated once and never changes after. (* -- AS could be also semi-dynamic, but that's out of scope now)
Long-life heap: e.g. textures. These are usually mass-allocated-deallocated at map change events, with only a few exceptions.
Dynamic heap: G-buffer stuff. This depends on RT pipeline configuration, max frame size, etc. Its contents might also be aliasable based on pipeline structure. It can also change between frames, e.g. when resizing window.

Implement different allocation strategies for them

Static stuff can be allocated lazily. I.e. in R_VkInit() we collect all requirements, build a list of things to allocate, collect all necessary GPU memory heaps, etc. At the end of init function we know the needed sizes exactly and can just allocate devmem objects with given sized.
Long heap: just implement current strategy of filling up and allocating new devmem objects when needed. Could even do compaction/defrag at map boundaries, if we see that it's a problem.
Dynamic heap is similar to static, in the sense that on RT pipeline reload or swapchain params change we can collect all the requirements (sizes, aliasing, etc) and just allocate devmems exactly. This also helps in the sense that it doesn't interfere with other allocations and doesn't introduce extra fragmentation.

Some maps fail to load. Possibly drivers have changed and require bigger buffer. Related: #744, #502, etc

w23 added enhancement New feature or request performance Performance improvement needed dev-tools Tools helpful for development potential bug workaround_known There is a known workaround for this issue refactoring Improvement of the code base skill issue labels Nov 27, 2024

w23 added a commit that referenced this issue Jan 30, 2025

vk: rt: increase AS scratch buffer size

d12f73a

Some maps fail to load. Possibly drivers have changed and require bigger buffer. Related: #744, #502, etc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

better device memory allocation strategy #744

better device memory allocation strategy #744

w23 commented Nov 27, 2024

w23 commented Dec 12, 2024

better device memory allocation strategy #744

better device memory allocation strategy #744

Comments

w23 commented Nov 27, 2024

w23 commented Dec 12, 2024

Split allocations into three classes

Implement different allocation strategies for them