-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix 0 byte sends/receives with user buffer registration #804
Fix 0 byte sends/receives with user buffer registration #804
Conversation
The rdma transport does not support FI_MR_ENDPOINT, so there is no reason to have every memory region pinned twice per domain (one on the control endpoint and one on the data endpoint). This also happily simplifies the code. Signed-off-by: Brian Barrett <[email protected]>
The fid_domain pointer in the endpoint is a remnant of the world before we had a domain pointer. We can easily get the domain object anywhere we need the fid_domain, so remove the endpoint fid_domain. Signed-off-by: Brian Barrett <[email protected]>
The rdma transport doesn't support FI_MR_ENDPOINT, so swap all the MR oriented interfaces to be domain-oriented instead of endpoint-oriented. This cleans up the code a little bit, but also makes it considerably easier to add future features, like reducing the number of flush buffers. Signed-off-by: Brian Barrett <[email protected]>
Rather than passing fields in the domain into various MR helper functions, just pass the domain itself. Simplifies the interface, especially as we clean up needed error paths.
Static analysis is really complaining about our error cleanup paths for MRs, which were pretty well spread out and convoluted anyway. Flatten the call paths for registration and deregistration, move all the cleanup to just calling dereg_mr(), and hopefully keep the static analysis tools from getting angry again. Signed-off-by: Brian Barrett <[email protected]>
Since MRs are entirely domain-oriented and we don't care about the data in the flush buffer, move the flush buffer to the domain. It reduces the number of pinned pages and simplifies some later work to fix 0-byte send and receives. Signed-off-by: Brian Barrett <[email protected]>
Static analysis found a path where we could try to unmap a buffer that didn't exist because map had failed. I think we'd have aborted anyway, but handle that case. Signed-off-by: Brian Barrett <[email protected]>
NCCL versions prior to 2.24 take a shortcut when using user buffer registration and pass the user's buffer but the mhandle for the channel buffer. This saves a lookup in the MR cache and is legal enough on IB because the SGE data isn't looked at for a 0 byte transfer in IB. But EFA does enforce all the SGE fields on a 0 byte transfer, so a recent change to not always use eager messages for 0 byte transfers broke 0 byte user registered buffer cases. This patch fixes that case by overwriting both the base buffer and the mhandle to use the domain's flush buffer. Since no data will actually be transfered, this won't cause data corruption issues, but does provide a buffer and mhandle that match. Signed-off-by: Brian Barrett <[email protected]>
54ef352
to
61b16e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this change, we can just drop the FI_MR_ENDPOINT piece of https://github.com/aws/aws-ofi-nccl/blob/26b3d5929f15d1f6b4c2a0b8feb98d47e21d4b5e/src/nccl_ofi_net.c#L467-#L482 then since we fail init for providers that require it anyway.
I only dropped FI_MR_ENDPOINT in the RDMA transport. The sendrecv transport still supports FI_MR_ENDPOINT. Given that CXI only supports FI_MR_ENDPOINT, we probably shouldn't drop that for send/recv. |
The primary goal of this patch series is to fix another bug introduced with #614 that #794 didn't fix, specifically an issue with how NCCL 2.24 and earlier behaved with 0 byte send/recv from user registered buffers. In those configurations, NCCL would pass an mhandle that was valid but not one that covered the base pointer of the buffer. This patch fixes that by using the flush buffer for the source / dest of 0 byte send / recv, so that we always have a valid MR for the buffer.
That was simplest by moving the flush buffer into the domain, so that we didn't have to add a ton of flush buffers, one for each new communicator. However, that simplified the code just enough that static analysis suddenly found lots of existing leaks, potential leaks, and use after frees, which required some significant code cleanup in the reg_mr and dereg_mr code paths. So this patch series grew significantly longer than originally anticipated.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.