Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: import resources from git repo or archive file #628

Open
wants to merge 19 commits into
base: dev
Choose a base branch
from

Conversation

@keithmanville keithmanville linked an issue Aug 23, 2024 that may be closed by this pull request
11 tasks
@keithmanville keithmanville force-pushed the workflow-import branch 2 times, most recently from b933b2d to e39cb71 Compare September 23, 2024 19:28
@keithmanville
Copy link
Collaborator Author

Basic import functionality is now working. To test it with the included hello world plugin, go to the /api/v1/workflows/resourceImport endpoint in swagger, then either import via git or upload:

for git, set sourceType to "git" and gitUrl to https://github.com/usnistgov/dioptra#workflow-import

for file upload, first create an archive containing the necessary files (from the root dioptra dir in this branch)

tar -czf plugins.tar.gz -C . plugins examples/hello-world.yaml dioptra.toml

then set sourceType to "upload" and select the archive you created.

@keithmanville keithmanville marked this pull request as ready for review October 30, 2024 19:50
@keithmanville keithmanville force-pushed the workflow-import branch 2 times, most recently from 8447f70 to 695ff9b Compare December 24, 2024 16:28
@keithmanville keithmanville marked this pull request as draft December 24, 2024 16:29
@keithmanville
Copy link
Collaborator Author

converting back to Draft.

Needs to update the client to support the new workflow. Tests should be updated to use client.

Blocked by #694

This commit adds a new workflow to the Dioptra REST API for importing resources into a deployment.
Resources (Plugins, Entrypoints, and PluginParameterTypes) can be imported via a combination of a
toml config file, yaml files, and python source files. They can be sourced from a multi-file
upload, archive file upload, or a git repository. This commit defines a new toml config format and
includes a JSON schema for validation. It updates the client to support the new import workflow. It
adds a new test suite that covers the new functionality.

This commit also includes a new "Hello World" example consisting of a Jupyter notebook which uses
a provided plugin, and entrypoint. The notebook makes use of the new import functionality for
registering example resources with a deployment.

The `dioptra-resources.toml` and "Hello World" entrypoint and plugin have been organized under the
`extra` directory. This is intended to be the new structure for any entrypoints and plugins that
are distributed as part of the Dioptra codebase.
Copy link
Collaborator

@jkglasbrenner jkglasbrenner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core behavior and functionality worked for me when I tested things, so that's good I am able to reproduce things on my machine. Besides some refactoring suggestions, which should be straightforward, I noticed a few places that we weren't validating user-provided input, such as whether the user-provided filepaths for an import are legal. I left commentary to point out what we should do, and if there was any examples already in the code base that you can refer to.

LOGGER = structlog.get_logger()


@pyplugs.register()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it works if you include the parentheses, my IDE's static type checker does not like including them. This should also work:

Suggested change
@pyplugs.register()
@pyplugs.register

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed here: 3bd3e2a

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to verify if there was a preference to giving this the longer name dioptra-resources.toml instead of the simpler dioptra.toml that is implied by the default value in the API.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong preference either way for me, but I should keep it consistent.

Should I just stick with dioptra.toml?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just made it dioptra.toml: 7f51ce1

git_env = {"GIT_TERMINAL_PROMPT": "0"}
git_sparse_args = ["--filter=blob:none", "--no-checkout", "--depth=1"]
git_branch_args = ["-b", git_branch] if git_branch else []
clone_cmd = ["git", "clone", *git_sparse_args, *git_branch_args, git_url, str(dir)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhere we need to do a check that the git command exists, and raise an informative error message if it doesn't. The shutil.which command is one way to do this (it also tries to find the absolute path to the command, which I think is safer to pass to a subprocess call), here's an example from tests/unit/restapi/lib/server.py:

cmd = shutil.which("flask")
if cmd is None:
raise RuntimeError("Flask command not found.")
return cmd

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added this here: f258bc8
also added a docstring for the clone_fit_repository function here: 90f4139

Comment on lines 116 to 154
@overload
def import_resources(
self,
git_url: str,
config_path: str | None = "dioptra.toml",
resolve_name_conflicts_strategy: Literal["fail", "overwrite"] | None = "fail",
) -> DioptraResponseProtocol:
"""Signature for using import_resource from git repo"""
... # pragma: nocover

@overload
def import_resources(
self,
archive_file: DioptraFile,
config_path: str | None = "dioptra.toml",
resolve_name_conflicts_strategy: Literal["fail", "overwrite"] | None = "fail",
) -> DioptraResponseProtocol:
"""Signature for using import_resource from archive file"""
... # pragma: nocover

@overload
def import_resources(
self,
files: list[DioptraFile],
config_path: str | None = "dioptra.toml",
resolve_name_conflicts_strategy: Literal["fail", "overwrite"] | None = "fail",
) -> DioptraResponseProtocol:
"""Signature for using import_resource from archive file"""
... # pragma: nocover

def import_resources(
self,
group_id,
git_url=None,
archive_file=None,
files=None,
config_path="dioptra.toml",
resolve_name_conflicts_strategy="fail",
):
Copy link
Collaborator

@jkglasbrenner jkglasbrenner Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While giving feedback on #719, I revisited the use and purpose of the @overload decorator, and my understanding is now that its primarily used to disambiguate function signatures like the following:

def myfunc(a: str | int) -> str | float:
   ...

when, for example, you know that the only outcomes are str returns str or int returns float.

The import_resources method always returns DioptraResponseProtocol, so there's nothing to disambiguate. My IDE's static type checker is yelling that it doesn't agree with the use of @overload here either.

Basically, I don't think we can use @overload to signify that the git_url, archive_file, and files arguments are mutually exclusive. I think I was the one that suggested using @overload like this in the first place, so that's on me.

Looking at this again, the type for git_url, archive_file, and files is unique, indicating that the type itself can be used to determine what you're uploading. What if we just do something like this for the function signature:

def import_resources(
    self,
    group_id: int,
    source: str | DioptraFile | list[DioptraFile],
    config_path: str = "dioptra.toml",
    resolve_name_conflicts_strategy: Literal["fail", "overwrite"] = "fail",
) -> DioptraResponseProtocol:

And then lines 189-199 could be updated as follows:

if not isinstance(source, (str, DioptraFile, list)):
    raise SomeError

if isinstance(source, list):
    if not all([isinstance(x, DioptraFile) for x in source]):
        raise SomeError

    data["sourceType"] = "upload_files"
    files_["files"] = source

if isinstance(source, str):
    data["sourceType"] = "git"
    data["gitUrl"] = source

if isinstance(source, DioptraFile):
    data["sourceType"] = "upload_archive"
    files_["archiveFile"] = source

What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

took your suggestion: 614f5db

Comment on lines +201 to +205
if config_path is not None:
data["configPath"] = config_path

if resolve_name_conflicts_strategy is not None:
data["resolveNameConflictsStrategy"] = resolve_name_conflicts_strategy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these arguments optional in the API itself and fill in their own defaults? If so, then maybe the default values for config_path and resolve_name_conflicts_strategy should both be None.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do have defaults in the API, so they could default to None here and have the same behavior. I replicated the defaults in the client, so it is clear to the user what the default behavior is without needing to look up the API docs. However, this may be inconsistent with other parts of the client.

bytes = file.stream.read()
with open(working_dir / file.filename, "wb") as f:
f.write(bytes)
hashes = hashes + sha256(bytes).digest()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this hash independent of the order you pass the list of files? I don't think it is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Attempted to resolve this by sorting by file path: 844cb1c

I'm not sure if there is a better strategy.

jsonschema.validate(config, schema)

# all resources are relative to the config file directory
os.chdir((working_dir / config_path).parent)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we must change the directory, we should control it via a context manager. I don't see this reversed anywhere, so its long-term effect on the REST API's processes/threads could be unpredictable.

The SDK has the following contextmanager, although if we don't want to import from that module within the restapi, we can duplicate it into the restapi utils.py:

@contextlib.contextmanager
def set_cwd(temp_cwd: Union[str, Path]) -> Iterator[None]:
"""
A convenient way to temporarily set the current working directory to a
particular directory, while executing a particular block of code. Just
place the code into a with-statement which uses this function as a
contextmanager.
Args:
temp_cwd: The directory to temporarily set as current
"""
saved_cwd = Path.cwd()
os.chdir(temp_cwd)
try:
yield
finally:
os.chdir(saved_cwd)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


@pytest.fixture
def resources_tar_file() -> DioptraFile:
os.chdir(Path(__file__).absolute().parent / "workflows" / "resource_import_files")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like mentioned elsewhere, I'd use a contextmanager to control the lifetime of a changed directory:

@contextlib.contextmanager
def set_cwd(temp_cwd: Union[str, Path]) -> Iterator[None]:
"""
A convenient way to temporarily set the current working directory to a
particular directory, while executing a particular block of code. Just
place the code into a with-statement which uses this function as a
contextmanager.
Args:
temp_cwd: The directory to temporarily set as current
"""
saved_cwd = Path.cwd()
os.chdir(temp_cwd)
try:
yield
finally:
os.chdir(saved_cwd)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


@pytest.fixture
def resources_files() -> DioptraFile:
os.chdir(Path(__file__).absolute().parent / "workflows" / "resource_import_files")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like mentioned elsewhere, I'd use a contextmanager to control the lifetime of a changed directory:

@contextlib.contextmanager
def set_cwd(temp_cwd: Union[str, Path]) -> Iterator[None]:
"""
A convenient way to temporarily set the current working directory to a
particular directory, while executing a particular block of code. Just
place the code into a with-statement which uses this function as a
contextmanager.
Args:
temp_cwd: The directory to temporarily set as current
"""
saved_cwd = Path.cwd()
os.chdir(temp_cwd)
try:
yield
finally:
os.chdir(saved_cwd)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

response_types = set(param["name"] for param in response.json()["data"])
expected_types = set(param["name"] for param in expected["plugin_param_types"])
assert (
response.status_code == 200
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use something like this for status codes:

from http import HTTPStatus

assert response.status_code == HTTPStatus.OK

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I think this code may have actually been written before Harold refactored the errors module and took Andy's suggestion of using the HTTPStatus codes.

9762690

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a workflow for importing a plugin from a git repository or file upload
2 participants