Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use 'sagemaker-python-sdk' instead of 'sagemaker' #504

Merged
merged 14 commits into from
Feb 13, 2025
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ jupyter_execute/

# files manually written by example code
source/examples/rapids-azureml-hpo/Dockerfile
source/examples/rapids-sagemaker-hpo/Dockerfile

# exclusions
!source/examples/rapids-1brc-single-node/lookup.csv
15 changes: 7 additions & 8 deletions source/cloud/aws/sagemaker.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,16 +10,15 @@ RAPIDS can be used in a few ways with [AWS SageMaker](https://aws.amazon.com/sag

To get started head to [the SageMaker console](https://console.aws.amazon.com/sagemaker/) and create a [new SageMaker Notebook Instance](https://console.aws.amazon.com/sagemaker/home#/notebook-instances/create).

Choose `Notebook > Notebook Instances > Create notebook instance`.
Choose `Applications and IDEs > Notebooks > Create notebook instance`.

### Select your instance

If a field is not mentioned below, leave the default values:

- **NOTEBOOK_INSTANCE_NAME** = Name of the notebook instance
- **NOTEBOOK_INSTANCE_TYPE** = Type of notebook instance. Select a RAPIDS-compatible GPU ([see the RAPIDS docs](https://docs.rapids.ai/install#system-req)) as the SageMaker Notebook instance type (e.g., `ml.p3.2xlarge`).
- **PLATFORM_IDENTIFIER** = 'Amazon Linux 2, Jupyter Lab 3'
- **IAM_ROLE** = Create a new role > Create role
- **Notebook instance name** = Name of the notebook instance
- **Notebook instance type** = Type of notebook instance. Select a RAPIDS-compatible GPU ([see the RAPIDS docs](https://docs.rapids.ai/install#system-req)) as the SageMaker Notebook instance type (e.g., `ml.p3.2xlarge`).
- **Platform identifier** = 'Amazon Linux 2, Jupyter Lab 4'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes here:

  • use the same case as the UI
  • remove "IAM Role" (these examples work with the default role added by SageMaker)
  • update to Amazon Linux 2, Jupyter Lab 4 (the latest platform SageMaker supports)


![Screenshot of the create new notebook screen with a ml.p3.2xlarge selected](../../images/sagemaker-create-notebook-instance.png)

Expand All @@ -29,7 +28,7 @@ If a field is not mentioned below, leave the default values:

We can add a RAPIDS conda environment to the set of Jupyter ipython kernels available in our SageMaker notebook instance by installing in a [lifecycle configuration script](https://docs.aws.amazon.com/sagemaker/latest/dg/notebook-lifecycle-config.html).

Create a new lifecycle configuration (via the 'Additional Options' dropdown).
Create a new lifecycle configuration (via the 'Additional Configuration' dropdown).

![Screenshot of the create lifecycle configuration screen](../../images/sagemaker-create-lifecycle-configuration.png)

Expand All @@ -42,10 +41,10 @@ set -e

sudo -u ec2-user -i <<'EOF'

mamba create -y -n rapids {{ rapids_conda_channels }} {{ rapids_conda_packages }} \
mamba create -y -n rapids {{ rapids_conda_channels }} {{ rapids_sagemaker_conda_packages }} \
boto3 \
ipykernel \
sagemaker
'sagemaker-python-sdk>=2.239.0'

conda activate rapids

Expand Down
8 changes: 7 additions & 1 deletion source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,16 +34,22 @@
"rapids_conda_packages": f"rapids={stable_version} python=3.12 cuda-version=12.5",
"rapids_pip_index": "https://pypi.nvidia.com",
"rapids_pip_version": stable_version,
# SageMaker Notebook Instance examples need to stay pinned to an older RAPIDS until this is resolved:
# https://github.com/rapidsai/deployment/issues/520
"rapids_sagemaker_conda_packages": "rapids=24.12 python=3.12 cuda-version=12.5",
},
"nightly": {
"rapids_version": f"{nightly_version}-nightly",
"rapids_version": f"{nightly_version}",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing in the repo relies on this -nightly being there, as far as I can tell.

git grep rapids_version

And in fact, this difference is causing some bugs. Look at https://docs.rapids.ai/deployment/nightly/platforms/databricks/#install-rapids-and-dask .... it's saying you should install dask-cuda==25.02-nightly, which does not exist.

Screenshot 2025-02-11 at 9 00 42 PM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think this might be a holdover from a previous naming scheme. Fine to remove it.

"rapids_api_docs_version": "nightly",
"rapids_container": f"rapidsai/base:{nightly_version + 'a'}-cuda12.5-py3.12",
"rapids_notebooks_container": f"rapidsai/notebooks:{nightly_version + 'a'}-cuda12.5-py3.12",
"rapids_conda_channels": "-c rapidsai-nightly -c conda-forge -c nvidia",
"rapids_conda_packages": f"rapids={nightly_version} python=3.12 cuda-version=12.5",
"rapids_pip_index": "https://pypi.anaconda.org/rapidsai-wheels-nightly/simple",
"rapids_pip_version": f"{nightly_version}.*,>=0.0.0a0",
# SageMaker Notebook Instance examples need to stay pinned to an older RAPIDS until this is resolved:
# https://github.com/rapidsai/deployment/issues/520
"rapids_sagemaker_conda_packages": "rapids=24.12 python=3.12 cuda-version=12.5",
},
}
rapids_version = (
Expand Down
2 changes: 1 addition & 1 deletion source/examples/rapids-sagemaker-higgs/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ RUN conda install --yes -n base \
cupy \
flask \
protobuf \
sagemaker
'sagemaker-python-sdk>=2.239.0'

# Copies the training code inside the container
COPY rapids-higgs.py /opt/ml/code/rapids-higgs.py
Expand Down
2 changes: 1 addition & 1 deletion source/examples/rapids-sagemaker-higgs/notebook.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,7 @@
" cupy \\\n",
" flask \\\n",
" protobuf \\\n",
" sagemaker\n",
" 'sagemaker-python-sdk>=2.239.0'\n",
"\n",
"# Copies the training code inside the container\n",
"COPY rapids-higgs.py /opt/ml/code/rapids-higgs.py\n",
Expand Down
27 changes: 0 additions & 27 deletions source/examples/rapids-sagemaker-hpo/Dockerfile

This file was deleted.

9 changes: 6 additions & 3 deletions source/examples/rapids-sagemaker-hpo/notebook.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -779,7 +779,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Next let's append write the remaining pieces of the Dockerfile, namely adding the sagemaker-training-toolkit, flask, dask-ml, and copying our python code."
"Next let's append the remaining pieces of the Dockerfile, namely adding dependencies and our Python code."
]
},
{
Expand All @@ -805,10 +805,13 @@
"\n",
"# install a few more dependencies\n",
"RUN conda install --yes -n base \\\n",
" {{ rapids_conda_channels }} \\\n",
" cupy \\\n",
" dask-ml \\\n",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was missing, and so the example code failed like this:

Traceback (most recent call last):
  File "/opt/ml/code/train.py", line 75, in <module>
    train()
  File "/opt/ml/code/train.py", line 27, in train
    ml_workflow = create_workflow(hpo_config)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ml/code/MLWorkflow.py", line 43, in create_workflow
    from workflows.MLWorkflowMultiGPU import MLWorkflowMultiGPU
  File "/opt/ml/code/workflows/MLWorkflowMultiGPU.py", line 34, in <module>
    from dask_ml.model_selection import train_test_split
ModuleNotFoundError: No module named 'dask_ml'

" flask \\\n",
" protobuf \\\n",
" sagemaker\n",
" rapids-dask-dependency=${{ rapids_version }} \\\n",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure dask-ml installation doesn't result in upgrading / downgrading dask and distributed and therefore changing the installed version of RAPIDS libraries.

" 'sagemaker-python-sdk>=2.239.0'\n",
"\n",
"# path where SageMaker looks for code when container runs in the cloud\n",
"ENV CLOUD_PATH=\"/opt/ml/code\"\n",
Expand Down Expand Up @@ -855,7 +858,7 @@
" cupy \\\n",
" flask \\\n",
" protobuf \\\n",
" sagemaker\n",
" 'sagemaker-python-sdk>=2.239.0'\n",
"\n",
"# path where SageMaker looks for code when container runs in the cloud\n",
"ENV CLOUD_PATH=\"/opt/ml/code\"\n",
Expand Down
Binary file modified source/images/sagemaker-create-lifecycle-configuration.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified source/images/sagemaker-create-notebook-instance.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading