Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursive priorities #55

Open
erikrose opened this issue Feb 27, 2015 · 10 comments
Open

Recursive priorities #55

erikrose opened this issue Feb 27, 2015 · 10 comments

Comments

@erikrose
Copy link
Contributor

If we wanted to quash all priority issues, once and for all, we'd have to go one final, additional step from #54: rather than constructing Schema objects lazily, in validate(), we should traverse all the way down the hierarchy, turning things into Schema subclasses, in Schema.__init__ (or, more likely, in Schema.__new__ or some other factory function, so we can spit out different classes). Then we would have the complete compound priority, like (VALIDATOR, VALIDATOR, COMPARABLE) for Optional(And(int, is_even)), and any remaining problems with nondeterminism would be scared into a little corner where we can't decide which of, say, 2 callables has higher priority. If someone wanted to explicitly set priorities, they could write their own class with a priority() method, or we could even introduce a manual priority= kwarg on Schema for those special, hard-to-reach cases—or for cases where people don't want to go by our concept of specificity.

Creating this issue as a place for discussion about this.

@keleshev
Copy link
Owner

keleshev commented Mar 4, 2015

Yes 😄

@erikrose
Copy link
Contributor Author

erikrose commented Mar 4, 2015

Yay! It'll probably even increase performance, since we'll construct all those Schema instances once and then can validate against them 10,000 times without doing it again.

@erikrose
Copy link
Contributor Author

erikrose commented Mar 5, 2015

Another factor that could be considered in priorities is the order of keys in the dict, if it were an OrderedDict. I wonder how much of the rest of this that makes unnecessary.

@erikrose
Copy link
Contributor Author

erikrose commented Mar 9, 2015

I haven't thought it through exhaustively, but I want to elevate this to the level of "proposal" and see what you think, @keleshev. What if we removed literal dicts and sets and replaced them with Dicts and Sets, similar schema-provided constructs that are ordered? Then priority could go away as a concept. This would mean…

  • Less to teach users
  • Easier-to-reason-about schemas—it would just try things in order until one matched
  • No need for any priority args or method overrides to vary from our default idea of priority
  • A lot less code. We could even rip out a fair amount of what's currently in there.

Of course, it wouldn't be backward-compatible; you probably care more about that than I do. And I'd miss the dict-literal syntax. But for so much simplification, I find it compelling.

@keleshev
Copy link
Owner

Yeah, compelling. Let me see how it looks

# now
Schema({
    'shape': str,
    Optional('color', default='blue'): str,
    str: str,
})

# proposed
Dict(
    ('shape', str),
    (Optional('color', default='blue'), str),
    (str, str),
)

@sjakobi
Copy link
Contributor

sjakobi commented Oct 6, 2015

So I had shot at this issue. You can find the preliminary results in my recursive-priorities branch.

I decided to go with a factory function schema that more or less replaces the old Schema class.

Example:

In [2]: s = schema({Optional("a"): int, Optional(str): object})

In [3]: s.validate({"a": 3, "b": 1})
Out[3]: {'a': 3, 'b': 1}

In [4]: s
Out[4]: Dict([(Optional(Comparable('a')), Type(int)), (Optional(Type(str)), Type(object))])

The design isn't finished IMO and the test suite still contains a lot of failing tests, mostly due to reprs turning out differently now.

I just wanted to hear what you think about it before I adjust all the error messages in the test suite. ;)

@skorokithakis
Copy link
Collaborator

Can someone fill me in? What are the priority problems? An example would be especially helpful, as I've never run into these problems.

@sjakobi
Copy link
Contributor

sjakobi commented Oct 6, 2015

I don't have time for a full response right now, but take a look at #23, especially this comment.

@skorokithakis
Copy link
Collaborator

I see, thank you. Your example doesn't really make it clear, as an int is also an object, so we don't know which of the two matched. I'll have a look at your code to see how you do this, but it seems like a welcome change. It does throw a spanner in the works for the code I've been working on, which is self-generating documentation from a schema, but that's life.

@Suor
Copy link

Suor commented Nov 3, 2019

Any performance gains would be highly beneficial to dvc. Now validating 20k yaml files like:

md5: 09f910d6eb6b6013d6fd292e9b7b2884
outs:
- md5: 038a480057ac11cb3ba68e35d29e044f
  path: 1.txt
  cache: true
  metric: false
  persist: false

takes ~10s for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants