Recursive priorities #55

erikrose · 2015-02-27T17:21:11Z

If we wanted to quash all priority issues, once and for all, we'd have to go one final, additional step from #54: rather than constructing Schema objects lazily, in validate(), we should traverse all the way down the hierarchy, turning things into Schema subclasses, in Schema.__init__ (or, more likely, in Schema.__new__ or some other factory function, so we can spit out different classes). Then we would have the complete compound priority, like (VALIDATOR, VALIDATOR, COMPARABLE) for Optional(And(int, is_even)), and any remaining problems with nondeterminism would be scared into a little corner where we can't decide which of, say, 2 callables has higher priority. If someone wanted to explicitly set priorities, they could write their own class with a priority() method, or we could even introduce a manual priority= kwarg on Schema for those special, hard-to-reach cases—or for cases where people don't want to go by our concept of specificity.

Creating this issue as a place for discussion about this.

The text was updated successfully, but these errors were encountered:

keleshev · 2015-03-04T13:20:45Z

Yes 😄

erikrose · 2015-03-04T15:53:15Z

Yay! It'll probably even increase performance, since we'll construct all those Schema instances once and then can validate against them 10,000 times without doing it again.

erikrose · 2015-03-05T17:48:52Z

Another factor that could be considered in priorities is the order of keys in the dict, if it were an OrderedDict. I wonder how much of the rest of this that makes unnecessary.

erikrose · 2015-03-09T23:26:47Z

I haven't thought it through exhaustively, but I want to elevate this to the level of "proposal" and see what you think, @keleshev. What if we removed literal dicts and sets and replaced them with Dicts and Sets, similar schema-provided constructs that are ordered? Then priority could go away as a concept. This would mean…

Less to teach users
Easier-to-reason-about schemas—it would just try things in order until one matched
No need for any priority args or method overrides to vary from our default idea of priority
A lot less code. We could even rip out a fair amount of what's currently in there.

Of course, it wouldn't be backward-compatible; you probably care more about that than I do. And I'd miss the dict-literal syntax. But for so much simplification, I find it compelling.

keleshev · 2015-03-10T11:32:47Z

Yeah, compelling. Let me see how it looks

# now
Schema({
    'shape': str,
    Optional('color', default='blue'): str,
    str: str,
})

# proposed
Dict(
    ('shape', str),
    (Optional('color', default='blue'), str),
    (str, str),
)

sjakobi · 2015-10-06T06:57:44Z

So I had shot at this issue. You can find the preliminary results in my recursive-priorities branch.

I decided to go with a factory function schema that more or less replaces the old Schema class.

Example:

In [2]: s = schema({Optional("a"): int, Optional(str): object})

In [3]: s.validate({"a": 3, "b": 1})
Out[3]: {'a': 3, 'b': 1}

In [4]: s
Out[4]: Dict([(Optional(Comparable('a')), Type(int)), (Optional(Type(str)), Type(object))])

The design isn't finished IMO and the test suite still contains a lot of failing tests, mostly due to reprs turning out differently now.

I just wanted to hear what you think about it before I adjust all the error messages in the test suite. ;)

skorokithakis · 2015-10-06T11:46:03Z

Can someone fill me in? What are the priority problems? An example would be especially helpful, as I've never run into these problems.

sjakobi · 2015-10-06T12:07:47Z

I don't have time for a full response right now, but take a look at #23, especially this comment.

skorokithakis · 2015-10-06T13:03:11Z

I see, thank you. Your example doesn't really make it clear, as an int is also an object, so we don't know which of the two matched. I'll have a look at your code to see how you do this, but it seems like a welcome change. It does throw a spanner in the works for the code I've been working on, which is self-generating documentation from a schema, but that's life.

Suor · 2019-11-03T16:55:28Z

Any performance gains would be highly beneficial to dvc. Now validating 20k yaml files like:

md5: 09f910d6eb6b6013d6fd292e9b7b2884
outs:
- md5: 038a480057ac11cb3ba68e35d29e044f
  path: 1.txt
  cache: true
  metric: false
  persist: false

takes ~10s for me.

keleshev mentioned this issue Mar 4, 2015

Prioritize Optionals based on one level of depth traversal. Close #44, #45. #54

Open

erikrose mentioned this issue Aug 11, 2017

Throw some documentation down for design-by-docs willkg/treevert#1

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recursive priorities #55

Recursive priorities #55

erikrose commented Feb 27, 2015

keleshev commented Mar 4, 2015

erikrose commented Mar 4, 2015

erikrose commented Mar 5, 2015

erikrose commented Mar 9, 2015

keleshev commented Mar 10, 2015

sjakobi commented Oct 6, 2015

skorokithakis commented Oct 6, 2015

sjakobi commented Oct 6, 2015

skorokithakis commented Oct 6, 2015

Suor commented Nov 3, 2019

Recursive priorities #55

Recursive priorities #55

Comments

erikrose commented Feb 27, 2015

keleshev commented Mar 4, 2015

erikrose commented Mar 4, 2015

erikrose commented Mar 5, 2015

erikrose commented Mar 9, 2015

keleshev commented Mar 10, 2015

sjakobi commented Oct 6, 2015

skorokithakis commented Oct 6, 2015

sjakobi commented Oct 6, 2015

skorokithakis commented Oct 6, 2015

Suor commented Nov 3, 2019