-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: import resources from git repo or archive file #628
base: dev
Are you sure you want to change the base?
Conversation
b933b2d
to
e39cb71
Compare
Basic import functionality is now working. To test it with the included hello world plugin, go to the for git, set for file upload, first create an archive containing the necessary files (from the root dioptra dir in this branch) tar -czf plugins.tar.gz -C . plugins examples/hello-world.yaml dioptra.toml then set |
f27ae03
to
b4372ce
Compare
b4372ce
to
634c16e
Compare
b1fcb42
to
5099e70
Compare
8447f70
to
695ff9b
Compare
converting back to Draft. Needs to update the client to support the new workflow. Tests should be updated to use client. Blocked by #694 |
31f1f66
to
fc97c56
Compare
c9e02a3
to
16d3401
Compare
16d3401
to
c5300d6
Compare
This commit adds a new workflow to the Dioptra REST API for importing resources into a deployment. Resources (Plugins, Entrypoints, and PluginParameterTypes) can be imported via a combination of a toml config file, yaml files, and python source files. They can be sourced from a multi-file upload, archive file upload, or a git repository. This commit defines a new toml config format and includes a JSON schema for validation. It updates the client to support the new import workflow. It adds a new test suite that covers the new functionality. This commit also includes a new "Hello World" example consisting of a Jupyter notebook which uses a provided plugin, and entrypoint. The notebook makes use of the new import functionality for registering example resources with a deployment. The `dioptra-resources.toml` and "Hello World" entrypoint and plugin have been organized under the `extra` directory. This is intended to be the new structure for any entrypoints and plugins that are distributed as part of the Dioptra codebase.
c244c2a
to
2cbbb5a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The core behavior and functionality worked for me when I tested things, so that's good I am able to reproduce things on my machine. Besides some refactoring suggestions, which should be straightforward, I noticed a few places that we weren't validating user-provided input, such as whether the user-provided filepaths for an import are legal. I left commentary to point out what we should do, and if there was any examples already in the code base that you can refer to.
extra/plugins/hello_world/tasks.py
Outdated
LOGGER = structlog.get_logger() | ||
|
||
|
||
@pyplugs.register() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it works if you include the parentheses, my IDE's static type checker does not like including them. This should also work:
@pyplugs.register() | |
@pyplugs.register |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed here: 3bd3e2a
extra/dioptra-resources.toml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wanted to verify if there was a preference to giving this the longer name dioptra-resources.toml
instead of the simpler dioptra.toml
that is implied by the default value in the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong preference either way for me, but I should keep it consistent.
Should I just stick with dioptra.toml
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just made it dioptra.toml
: 7f51ce1
git_env = {"GIT_TERMINAL_PROMPT": "0"} | ||
git_sparse_args = ["--filter=blob:none", "--no-checkout", "--depth=1"] | ||
git_branch_args = ["-b", git_branch] if git_branch else [] | ||
clone_cmd = ["git", "clone", *git_sparse_args, *git_branch_args, git_url, str(dir)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere we need to do a check that the git
command exists, and raise an informative error message if it doesn't. The shutil.which
command is one way to do this (it also tries to find the absolute path to the command, which I think is safer to pass to a subprocess call), here's an example from tests/unit/restapi/lib/server.py
:
dioptra/tests/unit/restapi/lib/server.py
Lines 112 to 118 in 9e56076
cmd = shutil.which("flask") | |
if cmd is None: | |
raise RuntimeError("Flask command not found.") | |
return cmd | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/dioptra/client/workflows.py
Outdated
@overload | ||
def import_resources( | ||
self, | ||
git_url: str, | ||
config_path: str | None = "dioptra.toml", | ||
resolve_name_conflicts_strategy: Literal["fail", "overwrite"] | None = "fail", | ||
) -> DioptraResponseProtocol: | ||
"""Signature for using import_resource from git repo""" | ||
... # pragma: nocover | ||
|
||
@overload | ||
def import_resources( | ||
self, | ||
archive_file: DioptraFile, | ||
config_path: str | None = "dioptra.toml", | ||
resolve_name_conflicts_strategy: Literal["fail", "overwrite"] | None = "fail", | ||
) -> DioptraResponseProtocol: | ||
"""Signature for using import_resource from archive file""" | ||
... # pragma: nocover | ||
|
||
@overload | ||
def import_resources( | ||
self, | ||
files: list[DioptraFile], | ||
config_path: str | None = "dioptra.toml", | ||
resolve_name_conflicts_strategy: Literal["fail", "overwrite"] | None = "fail", | ||
) -> DioptraResponseProtocol: | ||
"""Signature for using import_resource from archive file""" | ||
... # pragma: nocover | ||
|
||
def import_resources( | ||
self, | ||
group_id, | ||
git_url=None, | ||
archive_file=None, | ||
files=None, | ||
config_path="dioptra.toml", | ||
resolve_name_conflicts_strategy="fail", | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While giving feedback on #719, I revisited the use and purpose of the @overload
decorator, and my understanding is now that its primarily used to disambiguate function signatures like the following:
def myfunc(a: str | int) -> str | float:
...
when, for example, you know that the only outcomes are str
returns str
or int
returns float
.
The import_resources
method always returns DioptraResponseProtocol
, so there's nothing to disambiguate. My IDE's static type checker is yelling that it doesn't agree with the use of @overload
here either.
Basically, I don't think we can use @overload
to signify that the git_url
, archive_file
, and files
arguments are mutually exclusive. I think I was the one that suggested using @overload
like this in the first place, so that's on me.
Looking at this again, the type for git_url
, archive_file
, and files
is unique, indicating that the type itself can be used to determine what you're uploading. What if we just do something like this for the function signature:
def import_resources(
self,
group_id: int,
source: str | DioptraFile | list[DioptraFile],
config_path: str = "dioptra.toml",
resolve_name_conflicts_strategy: Literal["fail", "overwrite"] = "fail",
) -> DioptraResponseProtocol:
And then lines 189-199 could be updated as follows:
if not isinstance(source, (str, DioptraFile, list)):
raise SomeError
if isinstance(source, list):
if not all([isinstance(x, DioptraFile) for x in source]):
raise SomeError
data["sourceType"] = "upload_files"
files_["files"] = source
if isinstance(source, str):
data["sourceType"] = "git"
data["gitUrl"] = source
if isinstance(source, DioptraFile):
data["sourceType"] = "upload_archive"
files_["archiveFile"] = source
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
took your suggestion: 614f5db
if config_path is not None: | ||
data["configPath"] = config_path | ||
|
||
if resolve_name_conflicts_strategy is not None: | ||
data["resolveNameConflictsStrategy"] = resolve_name_conflicts_strategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these arguments optional in the API itself and fill in their own defaults? If so, then maybe the default values for config_path
and resolve_name_conflicts_strategy
should both be None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They do have defaults in the API, so they could default to None here and have the same behavior. I replicated the defaults in the client, so it is clear to the user what the default behavior is without needing to look up the API docs. However, this may be inconsistent with other parts of the client.
bytes = file.stream.read() | ||
with open(working_dir / file.filename, "wb") as f: | ||
f.write(bytes) | ||
hashes = hashes + sha256(bytes).digest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this hash independent of the order you pass the list of files? I don't think it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. Attempted to resolve this by sorting by file path: 844cb1c
I'm not sure if there is a better strategy.
jsonschema.validate(config, schema) | ||
|
||
# all resources are relative to the config file directory | ||
os.chdir((working_dir / config_path).parent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we must change the directory, we should control it via a context manager. I don't see this reversed anywhere, so its long-term effect on the REST API's processes/threads could be unpredictable.
The SDK has the following contextmanager, although if we don't want to import from that module within the restapi, we can duplicate it into the restapi utils.py:
dioptra/src/dioptra/sdk/utilities/paths/set_cwd.py
Lines 23 to 39 in 3b81da4
@contextlib.contextmanager | |
def set_cwd(temp_cwd: Union[str, Path]) -> Iterator[None]: | |
""" | |
A convenient way to temporarily set the current working directory to a | |
particular directory, while executing a particular block of code. Just | |
place the code into a with-statement which uses this function as a | |
contextmanager. | |
Args: | |
temp_cwd: The directory to temporarily set as current | |
""" | |
saved_cwd = Path.cwd() | |
os.chdir(temp_cwd) | |
try: | |
yield | |
finally: | |
os.chdir(saved_cwd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/unit/restapi/v1/conftest.py
Outdated
|
||
@pytest.fixture | ||
def resources_tar_file() -> DioptraFile: | ||
os.chdir(Path(__file__).absolute().parent / "workflows" / "resource_import_files") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like mentioned elsewhere, I'd use a contextmanager to control the lifetime of a changed directory:
dioptra/src/dioptra/sdk/utilities/paths/set_cwd.py
Lines 23 to 39 in 3b81da4
@contextlib.contextmanager | |
def set_cwd(temp_cwd: Union[str, Path]) -> Iterator[None]: | |
""" | |
A convenient way to temporarily set the current working directory to a | |
particular directory, while executing a particular block of code. Just | |
place the code into a with-statement which uses this function as a | |
contextmanager. | |
Args: | |
temp_cwd: The directory to temporarily set as current | |
""" | |
saved_cwd = Path.cwd() | |
os.chdir(temp_cwd) | |
try: | |
yield | |
finally: | |
os.chdir(saved_cwd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/unit/restapi/v1/conftest.py
Outdated
|
||
@pytest.fixture | ||
def resources_files() -> DioptraFile: | ||
os.chdir(Path(__file__).absolute().parent / "workflows" / "resource_import_files") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like mentioned elsewhere, I'd use a contextmanager to control the lifetime of a changed directory:
dioptra/src/dioptra/sdk/utilities/paths/set_cwd.py
Lines 23 to 39 in 3b81da4
@contextlib.contextmanager | |
def set_cwd(temp_cwd: Union[str, Path]) -> Iterator[None]: | |
""" | |
A convenient way to temporarily set the current working directory to a | |
particular directory, while executing a particular block of code. Just | |
place the code into a with-statement which uses this function as a | |
contextmanager. | |
Args: | |
temp_cwd: The directory to temporarily set as current | |
""" | |
saved_cwd = Path.cwd() | |
os.chdir(temp_cwd) | |
try: | |
yield | |
finally: | |
os.chdir(saved_cwd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
response_types = set(param["name"] for param in response.json()["data"]) | ||
expected_types = set(param["name"] for param in expected["plugin_param_types"]) | ||
assert ( | ||
response.status_code == 200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use something like this for status codes:
from http import HTTPStatus
assert response.status_code == HTTPStatus.OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I think this code may have actually been written before Harold refactored the errors module and took Andy's suggestion of using the HTTPStatus codes.
resolves: