-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.14.x: Cherry pick commits from master #806
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We don't think we need eager with the performance improvements on P5 and P5en, so set the default to -1 (disabled). Signed-off-by: Brian Barrett <[email protected]>
This patch eliminates PortaFiducia's setups that are unncessary when running tests on containers. Signed-off-by: Vishwas Dsouza <[email protected]>
The rdma transport does not support FI_MR_ENDPOINT, so there is no reason to have every memory region pinned twice per domain (one on the control endpoint and one on the data endpoint). This also happily simplifies the code. Signed-off-by: Brian Barrett <[email protected]>
The fid_domain pointer in the endpoint is a remnant of the world before we had a domain pointer. We can easily get the domain object anywhere we need the fid_domain, so remove the endpoint fid_domain. Signed-off-by: Brian Barrett <[email protected]>
The rdma transport doesn't support FI_MR_ENDPOINT, so swap all the MR oriented interfaces to be domain-oriented instead of endpoint-oriented. This cleans up the code a little bit, but also makes it considerably easier to add future features, like reducing the number of flush buffers. Signed-off-by: Brian Barrett <[email protected]>
Rather than passing fields in the domain into various MR helper functions, just pass the domain itself. Simplifies the interface, especially as we clean up needed error paths.
Static analysis is really complaining about our error cleanup paths for MRs, which were pretty well spread out and convoluted anyway. Flatten the call paths for registration and deregistration, move all the cleanup to just calling dereg_mr(), and hopefully keep the static analysis tools from getting angry again. Signed-off-by: Brian Barrett <[email protected]>
Since MRs are entirely domain-oriented and we don't care about the data in the flush buffer, move the flush buffer to the domain. It reduces the number of pinned pages and simplifies some later work to fix 0-byte send and receives. Signed-off-by: Brian Barrett <[email protected]>
Static analysis found a path where we could try to unmap a buffer that didn't exist because map had failed. I think we'd have aborted anyway, but handle that case. Signed-off-by: Brian Barrett <[email protected]>
NCCL versions prior to 2.24 take a shortcut when using user buffer registration and pass the user's buffer but the mhandle for the channel buffer. This saves a lookup in the MR cache and is legal enough on IB because the SGE data isn't looked at for a 0 byte transfer in IB. But EFA does enforce all the SGE fields on a 0 byte transfer, so a recent change to not always use eager messages for 0 byte transfers broke 0 byte user registered buffer cases. This patch fixes that case by overwriting both the base buffer and the mhandle to use the domain's flush buffer. Since no data will actually be transfered, this won't cause data corruption issues, but does provide a buffer and mhandle that match. Signed-off-by: Brian Barrett <[email protected]>
Commits 27548d4 and a1ac0f5 fixed providers that required memory registrations when FI_MR_LOCAL was set, but also broke providers that clear FI_MR_LOCAL (such as HPE's provider), because I did not account for the mr_local handling in the send/recv transport. We don't have a great way to test that case from AWS, the vast majority of transfers will either be HMEM (which creates an MR anyway) or control messages (which have a freelist which creates an MR), and the Libfabric specification allows passing an MR descriptor to a local operation even if the provider clears the FI_MR_LOCAL bit. Therefore, the best path forward seems to be removing the code to skip registration if FI_MR_LOCAL is cleared, and always creating an MR. Signed-off-by: Brian Barrett <[email protected]>
Plugin Optimization Details: This change set early completion by default to be enabled when data progress model is FI_PROGRESS_AUTO. Receiver Side: - Marks request completion immediately after CTRL message send completion - Does not wait for RDMA write operation completion Sender Side: - Uses fi_write instead of fi_writedata, to eliminate unnecessary CQ entries on RX side Requirements: - Eager msg mode is disabled: OFI_NCCL_EAGER_MAX_SIZE == -1. (With the plugin version at the time of this PR, by default, eager mode is disabled) - Provider must use FI_PROGRESS_AUTO data progress model
bwbarrett
approved these changes
Mar 12, 2025
rajachan
approved these changes
Mar 12, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of changes:
Cherry-picked below commits into 1.14.x branch
skipped following from master
only conflict that needed manual resolution was in
rdma: Support early completion of recv() requests
, when callingfi_writedata
, there was a context2 change in master that needed to remove.Testing:
nccl-test pass against both efa-rdm and efa-direct branch.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.