Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvshmem 3.1.7 #28647

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

billysuh7
Copy link
Contributor

@billysuh7 billysuh7 commented Dec 19, 2024

Checklist

  • Title of this PR is meaningful: e.g. "Adding my_nifty_package", not "updated meta.yaml".
  • License file is packaged (see here for an example).
  • Source is from official source.
  • Package does not vendor other packages. (If a package uses the source of another package, they should be separate packages or the licenses of all packages need to be packaged).
  • If static libraries are linked in, the license of the static library is packaged.
  • Package does not ship static libraries. If static libraries are needed, follow CFEP-18.
  • Build number is 0.
  • A tarball (url) rather than a repo (e.g. git_url) is used in your recipe (see here for more details).
  • GitHub users listed in the maintainer section have posted a comment confirming they are willing to be listed there.
  • When in trouble, please check our knowledge base documentation before pinging a team.

Xref: #28111

Please also refer to https://docs.nvidia.com/nvshmem/release-notes-install-guide/install-guide/abstract.html for NVSHMEM requirements and how much this package were able/unable to meet.

Copy link
Contributor

Hi! This is the staged-recipes linter and your PR looks excellent! 🚀

@conda-forge-admin
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipes/nvshmem/meta.yaml, recipes/libnvshmem/meta.yaml) and found some lint.

Here's what I've got...

For recipes/libnvshmem/meta.yaml:

  • ❌ The license item is expected in the about section.

For recipes/libnvshmem/meta.yaml:

  • ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). This parser is not currently used by conda-forge, but may be in the future. We are collecting information to see which recipes are compatible with grayskull.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/12416875254. Examine the logs at this URL for more detail.

@billysuh7 billysuh7 force-pushed the topic/bsuh/libnvshmem branch from c71139b to 19e4343 Compare December 19, 2024 16:40
@conda-forge-admin
Copy link
Contributor

conda-forge-admin commented Dec 19, 2024

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipes/nvshmem/meta.yaml, recipes/libnvshmem/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipes/libnvshmem/meta.yaml:

  • ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). This parser is not currently used by conda-forge, but may be in the future. We are collecting information to see which recipes are compatible with grayskull.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/12835109483. Examine the logs at this URL for more detail.

@billysuh7 billysuh7 force-pushed the topic/bsuh/libnvshmem branch from 19e4343 to 40d4c1f Compare December 19, 2024 21:17
@billysuh7 billysuh7 force-pushed the topic/bsuh/libnvshmem branch from 40d4c1f to a82dd91 Compare December 19, 2024 21:54
rdma-core: MOFED provides it
openmpi: DOE/DOD supercomputing clusters have their own versions of MPI installed already
libfabric: Slingshot and EFA NIC have a custom version and plugin
test:
commands:
- test -f $PREFIX/lib/libnvshmem_device.a
- test -f $PREFIX/lib/libnvshmem.a

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are any size constraints on the package, we can leave libnvshmem.a out

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add the bitcode library now? That might be easier than doing another pass.The first 3.2 builds are in URM now:

https://urm.nvidia.com/ui/repos/tree/General/sw-nvshmem-generic-local/NVSHMEM/gpu_comms_cuda12.0_compatibility/3.2.2/libnvshmem_cuda12-linux-x86_64-3.2.2.tar.gz

Copy link
Contributor Author

@billysuh7 billysuh7 Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Size is about 130MB and I understand conda generally discourage static packages so I guess I'll take 'em out. About the bitcode - this PR is for nvshmem 3.1.7, so once this has gone through vetting and is released for conda, I can work on the next version and include bitcode at that time.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, just libnvshmem.a. libnvshmem_device.a likely has to stay for jit compiling. This is something that's been communicated to me from cuBLASMp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Anyway, I just realized that out of 130MB, the static conda package only takes up 12MB so it is inconsequential either way :) For now I'll take out libnvshmem.a

name: libnvshmem-split
version: {{ version }}

source:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked to make sure the files are in the required places.

Let me run an additional test to make sure the binaries can find the libraries correctly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have confirmed the following:

Various NVSHMEM Bootstraps can be used, as long as the LD_LIBRARY_PATH is properly specified to include the desired packages.
The Bootstraps can be found automatically by the library
The performance tests in src can be compiled against NVSHMEM. The kitmaker libraries work properly.

From my end, the configuration is good.

@billysuh7 billysuh7 marked this pull request as ready for review January 17, 2025 19:41
@billysuh7
Copy link
Contributor Author

@conda-forge/cuda please review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants