Skip to content

Commit c48cc79

Browse files
fix: parallel file validation for a datapackage (#1722)
- fixes #1644 ### Context - In validator.py, `Resource` was imported as a type, but then used as a class to retrieve a class method (`Resource.from_descriptor`). - The tests would not intercept this error, as they were performed on a schema with a foreign key, and the validation would fall back on synchronous validation in this case. - Importing `Resource` (not as a type) into `validator.py` leads to a circular import. A small workaround would have been to import Resource locally, but this was not very satisfactory. ### Suggested changes and fixes - I repaired the tests, using datapackages with no foreign keys to test parallel validation. I checked that these tests would have failed because with the bug. - I moved the implementation of `Validator.validate_resource` and `Validator.validate_package` methods to the `Resource.validate` and `Package.validate` methods. This solves the circular import problem, and removes duplication (method signatures). - The tests have been dispatched as well. - I soft deprecated the `Validator` class, keeping the methods (forwarding input parameters) as they are part of the public API. No warning, only a deprecation message in the docstring, and a mention that there is no plan to remove the class in the future. ### Note I could not produce any significant performance gain in my tests, I wonder if there is some shared resources' lock that prevents the validation from effectively running the validation in parallel (but I have not spent much time on this yet). <details> <summary>Archive of explorations</summary> - [X] Test that it works correctly - Works without error, however, I could not produce any significant performance gain in my tests. - [X] Check why the tests did not intercept this error - The tests for parallel processing are run for a schema with a foreign key - which fall backs to single-core processing. So the `validate_parallel` function was actually never executed in the tests. - [X] See if I can find a more elegant way to deal with the circular import. I actually don't see the benefit of having this validator.py class instead of dispatching the methods to `Resource` and `DataPackage` directly, but I may have missed something. </details>
1 parent f5a4573 commit c48cc79

27 files changed

+875
-853
lines changed

CHANGELOG.md

+8
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,14 @@
22

33
Here described only the breaking and most significant changes. The full changelog and documentation for all released versions could be found in nicely formatted [commit history](https://github.com/frictionlessdata/frictionless-py/commits/main).
44

5+
## v5.18.1
6+
7+
- feat: add pass_row as configurable parameter for field_update
8+
([#1729](https://github.com/frictionlessdata/frictionless-py/pull/1729))
9+
- fix: vendor unmaintained stringcase library
10+
([#1727](https://github.com/frictionlessdata/frictionless-py/pull/1727))
11+
- Various bug fixes
12+
513
## v5.18
614

715
- Support `ignore_constraints` option for the `Indexer` (#1691)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
{
2+
"name": "testing",
3+
"resources": [
4+
{
5+
"name": "data",
6+
"path": "data.csv",
7+
"schema": {
8+
"fields": [
9+
{
10+
"name": "id",
11+
"type": "string",
12+
"constraints": {
13+
"required": true
14+
}
15+
},
16+
{
17+
"name": "name",
18+
"type": "string"
19+
},
20+
{
21+
"name": "description",
22+
"type": "string"
23+
},
24+
{
25+
"name": "amount",
26+
"type": "number"
27+
}
28+
],
29+
"primaryKey": "id"
30+
}
31+
},
32+
{
33+
"name": "data2",
34+
"path": "data2.csv",
35+
"schema": {
36+
"fields": [
37+
{
38+
"type": "string",
39+
"name": "parent"
40+
},
41+
{
42+
"type": "string",
43+
"name": "comment"
44+
}
45+
]
46+
}
47+
}
48+
]
49+
}

frictionless/__init__.py

+2
Original file line numberDiff line numberDiff line change
@@ -40,4 +40,6 @@
4040
from .table import Lookup as Lookup
4141
from .table import Row as Row
4242
from .transformer import Transformer as Transformer
43+
44+
# Deprecated
4345
from .validator import Validator as Validator

0 commit comments

Comments
 (0)