Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Listing DaskGateway clusters created via Python code alongside those created via the dask labextension UI #204

Open
consideRatio opened this issue Jul 28, 2021 · 9 comments

Comments

@consideRatio
Copy link

What happened:

I can create a dask-gateway cluster via the dask-labextension view and I'll see it visible there then.

starting-new-dask-cluster

But, if I create a dask-gateway cluster from a notebook using code like below, then no dask cluster shows up in the list of clusters.

from dask_gateway import Gateway
gateway = Gateway()
cluster = gateway.new_cluster()

My wish

My wish is that the dask clusters I've created should be listed visually. I'm not sure if this is possible or not, but I'd like to describe this wish here to explore if we can make it happen one way or another.

Environment:

JupyterHub (1.1.1 Helm chart) + Dask-Gateway (0.9.0 Helm chart).

$ conda list | grep dask
dask                      2021.6.0           pyhd8ed1ab_0    conda-forge
dask-core                 2021.6.0           pyhd8ed1ab_0    conda-forge
dask-gateway              0.9.0            py38h578d9bd_0    conda-forge
dask-glm                  0.2.0                      py_1    conda-forge
dask-kubernetes           2021.3.1           pyhd8ed1ab_0    conda-forge
dask-labextension         5.0.2              pyhd8ed1ab_0    conda-forge
dask-ml                   1.9.0              pyhd8ed1ab_0    conda-forge
pangeo-dask               2021.06.05           hd8ed1ab_0    conda-forge
$ python --version
Python 3.8.10

Operating System: Ubuntu 20.04
Install method: conda-forge

# The current environment and dask configuration via environment
DASK_DISTRIBUTED__DASHBOARD_LINK=/user/{JUPYTERHUB_USER}/proxy/{port}/status
DASK_GATEWAY__ADDRESS=http://10.100.116.39:8000/services/dask-gateway/
DASK_GATEWAY__AUTH__TYPE=jupyterhub
DASK_GATEWAY__CLUSTER__OPTIONS__IMAGE={JUPYTER_IMAGE_SPEC}
DASK_GATEWAY__PROXY_ADDRESS=gateway://traefik-prod-dask-gateway.prod:80
DASK_GATEWAY__PUBLIC_ADDRESS=/services/dask-gateway/
DASK_LABEXTENSION__FACTORY__CLASS=GatewayCluster
DASK_LABEXTENSION__FACTORY__MODULE=dask_gateway
DASK_ROOT_CONFIG=/srv/conda/etc
@ian-r-rose
Copy link
Collaborator

Thanks for raising this @consideRatio . In general, this is a hard problem, as dask doesn't really have a built-in cluster discovery method. Short of port sniffing, I'm not sure I know of a good way to handle auto-detecting any cluster in a given notebook (or set of notebooks). Indeed, part of the reason for creating the cluster manager sidebar in the first place was to be able to build some user interfaces around starting, stopping, and scaling clusters that the extension can actually keep track of and reason about.

That being said, my goal for this extension is to get out of the game of managing clusters directly, and instead investigate a solution like dask-ctl. This could allow different cluster providers to set up their own discovery and control services, which the labextension could then consume. There is some detailed discussion of this in #189, I encourage you to weigh in!

@jacobtomlinson
Copy link
Member

I would really like dask-ctl to be the solution for this.

@dharhas
Copy link

dharhas commented Apr 13, 2023

Related question. How do you configure the lab extension to use dask-gateway for creating new clusters. I can't find that anywhere in the docs but clearly from the screenshot above it is possible.

@ian-r-rose
Copy link
Collaborator

@dharhas I haven't tried it recently myself, but the configuration that @consideRatio posted above looks like the correct approach to me (though it could also be configured using a yml file or what have you):

# The current environment and dask configuration via environment
DASK_DISTRIBUTED__DASHBOARD_LINK=/user/{JUPYTERHUB_USER}/proxy/{port}/status
DASK_GATEWAY__ADDRESS=http://10.100.116.39:8000/services/dask-gateway/
DASK_GATEWAY__AUTH__TYPE=jupyterhub
DASK_GATEWAY__CLUSTER__OPTIONS__IMAGE={JUPYTER_IMAGE_SPEC}
DASK_GATEWAY__PROXY_ADDRESS=gateway://traefik-prod-dask-gateway.prod:80
DASK_GATEWAY__PUBLIC_ADDRESS=/services/dask-gateway/
DASK_LABEXTENSION__FACTORY__CLASS=GatewayCluster
DASK_LABEXTENSION__FACTORY__MODULE=dask_gateway
DASK_ROOT_CONFIG=/srv/conda/etc

In particular, the factory class and factory module options tell the labextension what to use when starting a new cluster.

@minrk
Copy link
Contributor

minrk commented Feb 5, 2025

I have a reason and may have the time to pick this up at the moment.

One approach is for the Manager class itself to be configurable, just like the factory_class.

The Manager right now has in-memory-only state in a private _clusters variable. With Gateway, the gateway has this state, so Manager.list_clusters() could call out to Gateway.list_clusters() and lose the local state (or keep the local state as a cache that gets refreshed by calls to list_clusters()).

Since Gateway is likely the main alternative for this, rather than making manager fully configurable, implementing a single GatewayManager class here is a simpler, less configurable option.

The only tricky bit I've encountered is the scaling state. I don't quite understand how to get the manual scaling/adaptive state from the Gateway API, but surely there is a way?

@jacobtomlinson
Copy link
Member

A few years ago I started an effort to standardise the discovery of existing clusters via dask-ctl. I would love an excuse to dust that off and make good use of it here.

@minrk
Copy link
Contributor

minrk commented Feb 6, 2025

That sounds really cool! So gateway would need to implement discovery via https://dask-ctl.readthedocs.io/en/latest/integrating.html ? This seems like it would be pretty easy, since discovery is just Gateway().list_clusters().

I've actually got this working already with a pretty small DaskGatewayManager subclass that doesn't do much other that call Gateway().list_clusters(), but I cannot for the life of me figure out why the worker count doesn't update. When I instantiate the exact same class in a notebook, the worker count updates just fine.

@jacobtomlinson
Copy link
Member

Yeah I think wiring up dask-gateway and dask-ctl should be straightforward.

@minrk
Copy link
Contributor

minrk commented Feb 7, 2025

fwiw, here's my temporary package which adds gateway support: https://github.com/minrk/dask-labextension-gateway

I'll try to investigate a PR here once I think I have a handle on what the 'right' way to do it is (as opposed to my Works for Me solution that I have right now).

As it is, I went with something that's either/or - instead of adding Gateway clusters, it's fully Gateway-only, unlike using dask-ctl would be. That just happens to be what I want for what I'm working on right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants