-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to run the datachain query to Studio #579
Conversation
The options are: positional arguments: query_file The query file to run. options: --team TEAM The team to run a job for. By default, it will use team from config. --env-file ENV_FILE File containing environment variables to set for the job. --envs ENVS [ENVS ...] Environment variables to set for the job. --workers WORKERS Number of workers to use for the job. --files FILES [FILES ...] Files to include in the job. --python-version PYTHON_VERSION Python version to use for the job (e.g. '3.9', '3.10', '3.11'). --req-file REQ_FILE File containing Python package requirements. --reqs REQS [REQS ...] Python package requirements. Example run: ------------ Example script to run ```sh $ datachain studio run example_query.py --env-file=env_file.txt --envs="ENV_FROM_ARGS=1" --workers=2 --files file.txt --python-version=3.12 --req-file=reqs.txt --reqs="oneliners" ``` Files: ------ `run/env_file.txt`: ``` ENV_FROM_FILE = 'environments.txt' ``` `run/file.txt` ``` content from file ``` `run/reqs.txt` ``` pyjokes ``` `run/example_query.py` ```py from datachain import DataChain from os import environ from oneliners import get_random import pyjokes # Define the UDF: def path_len(path): if path.endswith(".json"): return (-1,) return (len(path),) if __name__ == "__main__": # Run in chain print("Environment set from file:", environ["ENV_FROM_FILE"]) print("Environment set from args:", environ["ENV_FROM_ARGS"]) print("Oneliners from reqs(args):", get_random()) print("Joke from pyjokes:(from reqs file)", pyjokes.get_joke()) print("Content from files(args):", open("file.txt").read()) DataChain.from_storage( uri="gs://datachain-demo/dogs-and-cats/", ).map( path_len, params=["file.path"], output={"path_len": int}, ).show() ``` TODO: - Rename the argument names to better names - Add tests
Deploying datachain-documentation with
|
Latest commit: |
1feefc7
|
Status: | ✅ Deploy successful! |
Preview URL: | https://16045bd7.datachain-documentation.pages.dev |
Branch Preview URL: | https://amrit-create-job.datachain-documentation.pages.dev |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #579 +/- ##
==========================================
- Coverage 87.71% 87.70% -0.01%
==========================================
Files 112 112
Lines 10694 10753 +59
Branches 1439 1448 +9
==========================================
+ Hits 9380 9431 +51
- Misses 954 956 +2
- Partials 360 366 +6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
In the follow-up issues/PRs we may want to stream state/logs/progress from Studio.
Also how can we check job execution status? May be first step might be to add job status
command?
And job logs
command before streaming may be?
src/datachain/cli.py
Outdated
studio_run_parser.add_argument( | ||
"--envs", | ||
nargs="+", | ||
help="Environment variables to set for the job.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to be more verbose on how to set env variables via --envs
flag? Also consider renaming this to --env
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, on the naming (and probably --req
)
src/datachain/cli.py
Outdated
studio_run_parser.add_argument( | ||
"--reqs", | ||
nargs="+", | ||
help="Python package requirements.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, more information on how to add requirements from command-line flag would be nice to have.
reqs: Optional[str] = None, | ||
req_file: Optional[str] = None, | ||
): | ||
query_type = "PYTHON" if query_file.endswith(".py") else "SHELL" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a problem in future. I think we should let user choose file type either by additional argument or even by separate command.
with open(query_file) as f: | ||
query = f.read() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we check if this is a valid Python file before sending it may be?
environment = "\n".join(envs) if envs else "" | ||
if env_file: | ||
with open(env_file) as f: | ||
environment = f.read() + "\n" + environment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty line in the beginning if no envs
defined, but that's may be ok.
requirements = "\n".join(reqs) if reqs else "" | ||
if req_file: | ||
with open(req_file) as f: | ||
requirements = f.read() + "\n" + requirements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, empty line in the beginning if no reqs
defined.
The options are:
positional arguments:
options:
Example run:
Example script to run
Files:
run/env_file.txt
:run/file.txt
run/reqs.txt
run/example_query.py
TODO:
Companion PR: https://github.com/iterative/studio/pull/10897