-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for graph operations #107
Comments
#74 (or #74 (comment)) can be added to your list, I think.
This is somewhat different: We want to save task-graphs, which we know to be a DAG. Much less complexity there than what
Visualization is kind of unrelated to graphs. The problems we face there will not be solved by a graph library or something along those lines, imho.
Note that this is essentially solved already in the current implementation.
Does it solve anything else apart from #43?
Why do you say it is expensive? Can you elaborate?
Shouldn't providers also be treated as nodes?
Is it wasteful to a relevant amount?
I don't feel this is going to work, given the support for param tables and generic providers. Furthermore, it would make such a graph library a mandatory dependency. Until we have clear evidence that the benefit is large (see my question a couple paragraphs above, does it solve anything but #43?) I do not see a justification for that. |
Can you show me an example of a pipeline that is not a DAG? Can this exist given that we can convert pipelines to task graphs?
Yes. But it would be helped by traversal functions. We can of course implement them ourselves.
It provides the primitive operations needed to compose everything else. Plus high level ones that can make our lives easier. But yes, if you are looking for a single function that fixes an open issue, it's only #43
Because of
Can be. The choice here matches how our and Dasks's task graphs are implemented. But if this doesn't work in general, we can make providers nodes.
Depends on how many checks you run and whether they can share the intermediate graph. And, of course, what latency you are willing to allow for |
I don't think thinking of a pipeline as a specific graph is possible in general. Due to (in particular) generic providers it is not possible to draw a graph without knowing which keys the users wants to compute. Furthermore it is possible to create a pipeline with cycles, etc., without errors.
Not sure what you have in mind. Are you talking about visualizing the pipeline as a graph (as opposed to the resulting task-graphs)?
Compose what else, can you elaborate? And what "high level ones"?
I do not understand what you are computing there, and why it is needed. Why do we need to rebuild the graph for every provided type in the pipeline?
I think we need to differentiate here between
I know it depends, that is why I am asking. ;) |
Yes, the inspection algorithms (finding unused params, missing params) need to inspect the entire pipeline, no? So we need some way of operating on it. Currently, that is not really possible. We can only get a concrete task graph. The purpose of
I don't think this is correct. Generic providers are not truly generic but always restricted to a concrete (open) set of types. This means that given def a() -> X[int]: ...
def b() -> X[float]: ...
def c(X[T]): ... we can have the combined
Is it? How can such a pipeline ever produce a DAG?
The primitive operations are traversal, adding and removing nodes and edges. We have the latter but not the former. By "high level", I mean pretty much everything else. We can, at least in principle, express those in terms of the primitives. E.g., |
I am confused now, you appear to be mixing two of my comments (visualization of pipeline vs. cost of "graph" setup).
Are they? How?
What about
The pipeline does not build a graph upon creation (as you have experienced this is non trivial, without knowing, e.g., the output keys). Even if there are cycles, the pipeline may have a-cyclic parts that can produce a DAG.
You mean we do not have traversal? Is this what we do when a pipeline builds a task graph, or do you have something else in mind?
|
For generic providers, the output type is still restricted by the
Yes, but is the cyclic part valid? Can you give an example where a cycle in a pipeline is both valid and useful? (I.e. not in a part of the pipeline that is never used for a task graph.)
In this case, the traversal is entangled with the
Yes, if by 'largest possible' you mean the entire pipeline. We don't need to instantiate generic parameters for this but connect the generics directly with in the graph. |
How about this? graph TD;
compute([compute])
to_list([to_list])
list["list[T]"]
int-->compute-->float;
int-->to_list;
float-->to_list;
list-->to_list;
to_list-->list;
|
If you draw the loop from |
Yes, my bad, I fixed it.
Not necessarily useful for people to look at (except experts). But useful for inspecting the pipeline. |
Let us assume for now one can internally represent a Considering data nodes,
|
After staring some more at #111 I think that our problems may be solved under the assumption that all generics use typevars that are constrained. We can then, e.g., represent a provider for This should then allow for running analysis on the graph, e.g., to tell which combinations of There may also be some solution for bound (not constrained) typevars, but as we use those little I am not going to worry about this until I am confident that may thoughts for the constrained case work out. |
Kind of done (or outdated) by rewrite. |
At the moment,
Pipeline
is largely a black box in terms of the graph it represents. It can buildTaskGraph
objects but only when the user knows suitable nodes to pass tobuild
orget
(as intg = p.get(p._providers.keys())
below). We have several issues (open and closed) now that relate to graph operations and could be solved easily if we had such operations:NetworkX seems like a good choice for a graph implementation because it provides all basic algorithms out of the box. It becomes trivial to solve #43 as suggested by @YooSunYoung, #92 for several formats, and maybe visualisation (might get removed from NetworkX).
Current graph traversal
Unfortunately, it is currently cumbersome, expensive, and not technically allowed to traverse graphs. E.g., to build a NetworkX graph, we need something like (note the protected attributes and functions)
Potential solutions
I talk about NetworkX here, but we can also use a different library or custom graph implementation. The general ideas remain the same.
1. Implement
to_networkx
inPipeline
This is the least invasive and easiest to implement. But it would mean that operations that need to inspect the graph need to make a temporary
nx.DiGraph
which is wasteful.2. Represent the internal state of
Pipeline
as aDiGraph
NetworkX allows storing arbitrary data in nodes and edges. So we can stuff everything we need into the graph. This would make graph operations first class citizens and would open up more possibilities. (E.g., we can use node metadata to indicate whether a provider is a param or not. Something that is currently difficult (impossible?) to determine.)
On the other hand, this would require a complete refactoring of
Pipeline
. And I don't even know if there is a straightforward way of representing everything as a graph. In particular when it comes to subproviders.The text was updated successfully, but these errors were encountered: