You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This assumes the GPU has enough threads to process every element. Grid-strided for loops allow processing more data than GPU threads by making each thread do more work, and potentially allows maximum memory coalescing on larger inputs since we are continuously accessing consecutive memory. I think this should be one of the easier fixes to efficiently enable larger msm sizes.
Working on a pull request but getting comfortable with the msm Rust binding for testing first.
Motivation
From the icicle Discord I see "msm for large sizes" is a sprint priority.
The text was updated successfully, but these errors were encountered:
Looks cool! I've solved some bugs in the MSM and will merge them this week, this should enable experimenting with larger sizes and adding optimizations such as this one.
Description
The current msm.cu implementation launches a monolithic kernel in several places like this:
This assumes the GPU has enough threads to process every element. Grid-strided for loops allow processing more data than GPU threads by making each thread do more work, and potentially allows maximum memory coalescing on larger inputs since we are continuously accessing consecutive memory. I think this should be one of the easier fixes to efficiently enable larger msm sizes.
Working on a pull request but getting comfortable with the msm Rust binding for testing first.
Motivation
From the icicle Discord I see "msm for large sizes" is a sprint priority.
The text was updated successfully, but these errors were encountered: