Perf: Reduce StridedMemoryView
construction time
#449
Labels
cuda.core
Everything related to the cuda.core module
enhancement
Any code-related improvements
P0
High priority - Must do!
Currently it takes 3.4 - 3.45 us (depending on stream-ordering or not) to create a memory view object:
which could be a bit expensive in a tight loop. We should try to reduce it down to 1 us or O(100) ns if possible.
cc @shwina for vis
The text was updated successfully, but these errors were encountered: