Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rdma: Support early completion of recv() requests #797

Merged
merged 2 commits into from
Mar 11, 2025

Conversation

yexiang-aws
Copy link
Contributor

@yexiang-aws yexiang-aws commented Feb 27, 2025

Description of changes:

Background:
When using LL (Low Latency) or LL128 protocols, NCCL sets the request pointer to NCCL_NET_OPTIONAL_RECV_COMPLETION in irecv() calls. This indicates that the plugin can complete a receiver request early without plugin explicitly polling the CQ to validate data arrival. This is achievable because NCCL itself following LL protocol semantics will validate data arrival by checking the flag bytes.

Plugin Optimization Details:
This change set early completion by default to be enabled when data progress model is FI_PROGRESS_AUTO.

  1. Receiver Side:
    1. Marks request completion immediately after CTRL message send completion
    2. Does not wait for RDMA write operation completion
  2. Sender Side:
    1. Uses fi_write instead of fi_writedata, to eliminate unnecessary CQ entries on RX side

Requirements:

  • Eager msg mode is disabled: eager_max_size == -1. (With the plugin version at the time of this PR, by default, eager mode is disabled)
  • Provider must use FI_PROGRESS_AUTO data progress model

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@yexiang-aws yexiang-aws requested review from bwbarrett and a team as code owners February 27, 2025 01:31
Copy link
Contributor

@bwbarrett bwbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's something missing in this patch. The receiver needs to specify whether to use early completion or not for each request. There's no change to request header data in this patch, so we're clearly not doing that.

@yexiang-aws yexiang-aws force-pushed the feature/LL128_early_completion branch 2 times, most recently from 46c478d to a635123 Compare February 28, 2025 02:09
@yexiang-aws yexiang-aws force-pushed the feature/LL128_early_completion branch from a635123 to dc27609 Compare February 28, 2025 16:59
bwbarrett
bwbarrett previously approved these changes Feb 28, 2025
@yexiang-aws yexiang-aws force-pushed the feature/LL128_early_completion branch 4 times, most recently from ec05e81 to 68d1981 Compare February 28, 2025 17:58
@yexiang-aws yexiang-aws force-pushed the feature/LL128_early_completion branch from 68d1981 to 7d8d380 Compare March 10, 2025 16:52
rauteric
rauteric previously approved these changes Mar 10, 2025
Copy link
Contributor

@rauteric rauteric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a rebase is needed.

rauteric
rauteric previously approved these changes Mar 10, 2025
@yexiang-aws yexiang-aws force-pushed the feature/LL128_early_completion branch from 31497a5 to 0d24e6e Compare March 11, 2025 18:05
@yexiang-aws yexiang-aws marked this pull request as draft March 11, 2025 18:06
@xwangxin xwangxin requested a review from rauteric March 11, 2025 18:41
@yexiang-aws yexiang-aws force-pushed the feature/LL128_early_completion branch 2 times, most recently from 7fe98f3 to c11cccf Compare March 11, 2025 19:02
@yexiang-aws yexiang-aws force-pushed the feature/LL128_early_completion branch 2 times, most recently from f13057f to ad38cfa Compare March 11, 2025 20:21
@yexiang-aws yexiang-aws force-pushed the feature/LL128_early_completion branch 2 times, most recently from 57a8239 to 7cecc37 Compare March 11, 2025 20:26
@yexiang-aws yexiang-aws marked this pull request as ready for review March 11, 2025 20:29
Plugin Optimization Details:
This change set early completion by default to be enabled when
data progress model is FI_PROGRESS_AUTO.

Receiver Side:
- Marks request completion immediately after CTRL message send completion
- Does not wait for RDMA write operation completion
Sender Side:
- Uses fi_write instead of fi_writedata, to eliminate unnecessary CQ
entries on RX side

Requirements:
- Eager msg mode is disabled: OFI_NCCL_EAGER_MAX_SIZE == -1. (With the
plugin version at the time of this PR, by default, eager mode is disabled)
- Provider must use FI_PROGRESS_AUTO data progress model
@yexiang-aws yexiang-aws force-pushed the feature/LL128_early_completion branch from 7cecc37 to d869312 Compare March 11, 2025 20:32
@yexiang-aws
Copy link
Contributor Author

bot:aws:retest

@yexiang-aws
Copy link
Contributor Author

AWS CI failed due to infra issue on 4_g4dn_ubuntu2204 Jenkins. Other platforms are passing

==================================== ERRORS ====================================
_______ ERROR at setup of test_run_nccl_tests[0x0-base-all_gather_perf] ________
/home/ubuntu/PortaFiducia/tests/conftest.py:526: in aws_ofi_nccl_install_path
    return install_aws_ofi_nccl_plugin("nvidia", mpi=mpi, checkout_target=aws_ofi_nccl_plugin_version, enable_debug=enable_debug, source_repo=testing_config["test_aws_ofi_nccl_repo"])
/home/ubuntu/PortaFiducia/libraries/aws_ofi_nccl/install.py:78: in install_aws_ofi_nccl_plugin
    raise RuntimeError("Source location {} already exists!".format(source_path))
E   RuntimeError: Source location /home/ubuntu/PortaFiducia/build/libraries/aws_ofi_nccl/797/openmpi/4.1.6/source already exists!```

@yexiang-aws yexiang-aws merged commit a97f82b into aws:master Mar 11, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants