Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #99 proposal #101

Closed
wants to merge 2 commits into from
Closed

Conversation

thirtytwobits
Copy link
Member

@thirtytwobits thirtytwobits commented Mar 30, 2024

This is a draft PR to review a proposed change to the public APIs for pydsdl.

After playing with a Nunavut integration I realized the previous design was deficient.

This is non-breaking change that adds an optional named tuple to the read_namespace method. This named tuple allows specifying a set of filtering rules to apply when discovering DSDL types from files under a given namespace filetree. The intent is to connect the proposed TypeFilterEngine to the _construct_dsdl_definitions_from_namespace method after it creates a DSDLDefinition to apply the initial filtering of the target type. From there I need to figure out how to mark that selected type and its transitive dependencies as either weakly-excluded or strongly-included and detect any conflict where user provided rules conflict as processing proceeds. Any advice as to where I could best work with a tree-representation ahead of any expensive processing would be appreciated.

The new proposal uses a visitor pattern that is more powerful and will allow Nunavut to fully describe all file dependencies closing OpenCyphal/nunavut#58 and allowing the #99 filter to be implemented in nunavut.

@pavel-kirienko
Copy link
Member

pavel-kirienko commented Apr 2, 2024

I want to postpone reviewing this until this discussion is concluded: https://forum.opencyphal.org/t/proposal-for-a-type-manifest-specification/2107/6

One point that is probably invariant to the conclusion is that I would like to bring PyDSDL into the context of processing DSDL manifests, since manifests are closely related to the problem domain of PyDSDL itself. PyDSDL shouldn't be involved in parsing the manifests but it should probably accept some abstract representation of the manifest at the input (optionally, of course, as we don't want it to be a breaking change).

Also, I would welcome a more functional-style design, if there are such options on the table.

@thirtytwobits
Copy link
Member Author

This is separable from the manifest format. pydsdl needs to support the concepts of target types instead of target namespaces and transitive closures for those target types across namespaces. For example: if

# TARGET                       |   DEPENDENCY               
root_ns_b/d/FType.1.0 -> root_ns_a/b/GType.1.0
root_ns_b/d/HType.1.0 -> root_ns_b/b/FType.1.0
root_ns_a/b/CType.1.0 -> root_ns_b/d/EType.1.0

The current API requires two calls, one with a target NS of root_ns_b that produces one tree and then another root_ns_a target NS that produces a second tree one has to merge with the previous tree. I guess I could design a new API but the old one cannot be retrofitted to support this in a single call.

As for "functional" can you explain? The dependencies are found by parser events so it seems natural to support dependency discovery events.

@pavel-kirienko
Copy link
Member

This is separable from the manifest format. pydsdl needs to support the concepts of target types instead of target namespaces and transitive closures for those target types across namespaces.

I agree.

What I was trying to communicate is that exposing a generic visitor-based API works but the utility of this solution is not very high because it requires the user to implement much of the manifest processing manually. It will be implemented in Nunavut and then again in other software involved in the processing of DSDL definitions; I would like to avoid such duplication. Hence, I am trying to say that the concerns related to the selection of data types based on user-provided requirements should be addressed within PyDSDL instead of being pushed up the stack via the abstract visitors.

I guess I could design a new API but the old one cannot be retrofitted to support this in a single call.

Can we explore the idea of adding a new top-level entry point beside read_namespace that accepts an explicit list of types of interest -- closely following the manifest model (note that I am not talking about the manifest format here) -- and the set of root namespace directories to look for them in?

The existing read_namespace may or may not be marked as deprecated.

This is a draft PR to review a proposed change to the public APIs for pydsdl.

This is non-breaking change that adds a new public method and datatype

## New Method

read_files - a file-oriented entry point to the front end. This takes a
list of target DSDL files allowing the user to maintain an explicit list
instead of depending on globular filesystem discovery.

## New Datatype

DsdlFile – We define and publish a new abstract data type that encapsulates
all the information collected about and from a given dsdl file. This allows
a backend to associate datatypes with files for the purposes of managing
dependencies (e.g. this would allow the creation of .d files by a back end).
@thirtytwobits
Copy link
Member Author

I've updated the proposal. Python docs aren't fixed up nor is there testing. This is just the first thwack to see if we're aligned. Please review _namespace.py::read_files (again, I didn't fix the docs sorry). Example usage would be:

target_types = [
    Path("/path/to/root/taxonomy/animalia/chordata/mammalia/artiodactyla/suidae/sus/sus_domesticus.1.0.dsdl"),
    Path("/another_path/to/roots/taxonomy/animalia/chordata/mammalia/artiodactyla/suidae/sus/sus_scrofa.1.0.dsdl"),
    Path("/path/to/root/plantae/tracheophytes/gymnospermae/pinophyta/pinopsida/cupressales/cupressaceae/sequoia/sequoia_sempervirens.1.0.dsdl")
]
target_dsdl, dependency_dsdl = pydsdl.read_files(target_types, ["animalia", "plantae"], path_to_public_types)

@emrainey
Copy link

I'm a little confused by the example, am I getting the same set of files (or types or ??) in target_dsdl that I put into target_types (which are just file paths, not types)? It essentially discovers all the dependency_dsdl that I have amongst the provided namespaces and roots paths for that particular set of target_dsdl?

@thirtytwobits
Copy link
Member Author

I'm a little confused by the example, am I getting the same set of files (or types or ??) in target_dsdl that I put into target_types (which are just file paths, not types)? It essentially discovers all the dependency_dsdl that I have amongst the provided namespaces and roots paths for that particular set of target_dsdl?

Right. you get a tuple where the target_dsdl "files" are abstractions that provide the CompositeType as a getter associated with the metadata about the file used to create that type. This is 1:1 for the target types passed into the method. The second member of the tuple is the same but contains all dependent types discovered using the lookup paths and needed by the transitive closure of the target set.

Copy link
Member

@pavel-kirienko pavel-kirienko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

I would like to avoid exposing anything but CompositeType to the client; we can add new properties to that class for the missing context information. By adding an entity that claims to deduce information from the file path we would create an obstacle for the evolution of the DSDL language because eventually certain properties may be moved from the path into the file itself; in addition, it adds a pesky failure mode where the information may be deduced differently by the parser itself and the entity that processes the file path, which is actually recognized in the doc comments you added (where you talk about the possible divergence between the two).

All error types must inherit from either InternalError or InvalidDefinitionError; only the former is of relevance for this changeset.

Why do we need to return the dependent types as a separate entity?

Some of the visitors introduced with the previous iteration appear to be unnecessary and should be removed.

I would prefer to avoid the inclusion of the DSDL submodule in this changeset and focus on this later on. The CI already tests PyDSDL against the standard uavcan namespace here, but this approach admittedly has drawbacks:

python -c "import pydsdl; pydsdl.read_namespace('.dsdl-test/uavcan', [])"

Perhaps it would be wiser to move it into a separate nox session because adding a submodule just for the sake of sourcing test data is undesirable. Let us please look into this later.

@thirtytwobits
Copy link
Member Author

I would like to avoid exposing anything but CompositeType to the client; we can add new properties to that class for the missing context information.

Please look at the latest proposal. It only exposes CompositeType but adds properties to it.

Copy link
Member

@pavel-kirienko pavel-kirienko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the overall idea but the implementation appears to be too heavy and convoluted; I shared a few ideas how to simplify it (mostly by dialing down on the OOP a little)

raise NotImplementedError()


class DsdlFileBuildable(DsdlFile):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current name breaks the naming consistency. It should be BuildableDSDLFile or, in this specific case, I would name it ReadableDSDLFile, because it has read() not build()

"""
return self._source_file_path

@property
def source_file_path_to_root(self) -> Path:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def source_file_path_to_root(self) -> Path:
def root_namespace_path(self) -> Path:

Comment on lines +268 to +269
For synthesized types such as service request/response sections, this property is the path to the service type
since request and response types are defined within the service type's dsdl file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For synthesized types such as service request/response sections, this property is the path to the service type
since request and response types are defined within the service type's dsdl file.
For service request/response types this property is the path to the service type
since request and response types are defined within the service type's DSDL file.

@@ -198,6 +198,7 @@ def resolve_versioned_data_type(self, name: str, version: _serializable.Version)
del name
found = list(filter(lambda d: d.full_name == full_name and d.version == version, self._lookup_definitions))
if not found:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

"""Invoked when the frontend encounters a print directive or needs to output a generic diagnostic."""


class DsdlFile(ABC):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this particular project "DSDL" is spelled as DSDL in definitions, not Dsdl, so we should follow that for consistency.

def normalize_paths_argument(
namespaces_or_namespace: Union[None, Path, str, Iterable[Union[Path, str]]],
type_cast: Callable[[Iterable], PathListT],
) -> PathListT:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am looking at the usages and it seems like this can be simplified such that type_cast is removed (always perform something roughly similar to {Path(p) for p in namespaces_or_namespace}) and the return type can be a concrete list[Path] or set[Path]. Or am I missing something?


from abc import ABC, abstractmethod
from pathlib import Path
from typing import Callable, Iterable, List, Optional, Set, Tuple, TypeVar, Union
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from typing import Callable, Iterable, List, Optional, Set, Tuple, TypeVar, Union
from typing import Callable, Iterable, Optional, TypeVar

Deprecated items should not be used in new code; see PEP 585

Union is not needed anymore because it can be written in shorthand as A | B

Optional arguably might not be needed as well because it is representable as T | None

Comment on lines +111 to +114
class DependentFileError(RuntimeError):
"""
Raised by the DefinitionVisitor when it encounters a dependent dsdl file that is not allowed by the user.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used for control flow, not for error reporting, so it shouldn't be here. If it were to be kept here it would have to implement FrontendError.

Comment on lines +101 to +115
class DefinitionVisitor(ABC):
"""
An interface that allows visitors to index dependent types of a target DSDL file. This allows visitors to
build a closure of dependent types while parsing a set of target DSDL files.
"""

@abstractmethod
def on_discover_lookup_dependent_file(self, target_dsdl_file: DsdlFile, dependent_type: DsdlFile) -> None:
"""
Called by the parser after if finds a dependent type but before it parses a file in a lookup namespace.
:param DsdlFile target_dsdl_file: The target DSDL file that has dependencies the parser is searching for.
:param DsdlFile dependent_type: The dependency of target_dsdl_file file the parser is about to parse.
:raises DependentFileError: If the dependent file is not allowed by the visitor.
"""
raise NotImplementedError()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest replacing this with a callback function and avoid using exceptions for control flow, as we can communicate the fact that the dependent file is not allowed by returning a tag type:

@dataclasses.dataclass(frozen=True)  # ok this is optional
class DependentFileNotAllowed:
    pass  # some context may be added later

#...

def on_discover_lookup_dependent_file(self, target_dsdl_file: DsdlFile, dependent_type: DsdlFile) -> None | DependentFileNotAllowed:

return union_files


class Closure(DefinitionVisitor):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this approach might be too stateful and OOP-heavy to be considered Pythonic. Continuing the idea of converting the DefinitionVisitor into a callback, this would also become a function:

@dataclasses.dataclass(frozen=True)
class Result:
    direct: set[DSDLFile]
    transitive: set[DSDLFile]
    
    @property
    def all(self) -> set[DSDLFile]:
        return self.direct | self.transitive


def read_definitions(target_definitions: SortedFileList,
                     lookup_definitions: SortedFileList,
                     allow_unregulated_fixed_port_id: bool,
                     print_output_handler: PrintOutputHandler | None) -> Result:
    ...

The FilesContainer are an implementation detail of this module so they should not be visible nor returned

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oof. Okay. I think I need to start over then. Let me try again in a new pull-request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants