Skip to content

Commit f13dd7a

Browse files
harm-matthias-harmsElenaKhaustovalrcoutoankatiyar
authoredOct 10, 2024··
feat(datasets): Replace geopandas.GeoJSONDataset with geopandas.GenericDataset (#812)
* feat(datasets): Add geopandas ParquetDataset Signed-off-by: Harm Matthias Harms <[email protected]> * Add release notes Signed-off-by: Harm Matthias Harms <[email protected]> * Add parquet dataset to docs Signed-off-by: Harm Matthias Harms <[email protected]> * Fix typo in tests Signed-off-by: Harm Matthias Harms <[email protected]> * Fix pylint type Signed-off-by: Harm Matthias Harms <[email protected]> * Discard changes to kedro-datasets/docs/source/api/kedro_datasets.rst Signed-off-by: Harm Matthias Harms <[email protected]> * Discard changes to kedro-datasets/kedro_datasets/geopandas/__init__.py Signed-off-by: Harm Matthias Harms <[email protected]> * Extend geojson dataset to support more file types Signed-off-by: Harm Matthias Harms <[email protected]> * Update RELEASE.md Signed-off-by: Harm Matthias Harms <[email protected]> * Add test for unsupported file format Signed-off-by: Harm Matthias Harms <[email protected]> * Cleanup GeoJSONDataset Signed-off-by: Harm Matthias Harms <[email protected]> * Fix lint Signed-off-by: Harm Matthias Harms <[email protected]> * Replace GeoJSONDataset by GenericDataset Signed-off-by: Harm Matthias Harms <[email protected]> * Update pyproject.toml Signed-off-by: Harm Matthias Harms <[email protected]> * Update RELEASE.md Signed-off-by: Harm Matthias Harms <[email protected]> * Use new default fs args Signed-off-by: Harm Matthias Harms <[email protected]> * Fix pattern in test Signed-off-by: Harm Matthias Harms <[email protected]> * Use fiona for python < 3.11 Signed-off-by: Harm Matthias Harms <[email protected]> * Install fiona dependency for python < 3.11 Signed-off-by: Harm Matthias Harms <[email protected]> * Revert fiona test Signed-off-by: Harm Matthias Harms <[email protected]> * Use fiona because pyogrio doesnt support fsspec Signed-off-by: Harm Matthias Harms <[email protected]> * Format file Signed-off-by: Harm Matthias Harms <[email protected]> * Update kedro-datasets/kedro_datasets/geopandas/__init__.py Co-authored-by: ElenaKhaustova <[email protected]> Signed-off-by: Harm Matthias Harms <[email protected]> Signed-off-by: Harm Matthias Harms <[email protected]> * Improve none file system target error message Signed-off-by: Harm Matthias Harms <[email protected]> * Update RELEASE.md Signed-off-by: Harm Matthias Harms <[email protected]> --------- Signed-off-by: Harm Matthias Harms <[email protected]> Signed-off-by: Harm Matthias Harms <[email protected]> Signed-off-by: Ankita Katiyar <[email protected]> Co-authored-by: ElenaKhaustova <[email protected]> Co-authored-by: L. R. Couto <[email protected]> Co-authored-by: Ankita Katiyar <[email protected]> Co-authored-by: Ankita Katiyar <[email protected]>
1 parent 2b1228e commit f13dd7a

File tree

7 files changed

+214
-88
lines changed

7 files changed

+214
-88
lines changed
 

‎kedro-datasets/RELEASE.md

+2
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323

2424
## Breaking Changes
2525
* Exposed `load` and `save` publicly for each dataset. This requires Kedro version 0.19.7 or higher.
26+
* Replaced the `geopandas.GeoJSONDataset` with `geopandas.GenericDataset` to support parquet and feather file formats.
2627

2728
## Community contributions
2829
Many thanks to the following Kedroids for contributing PRs to this release:
@@ -32,6 +33,7 @@ Many thanks to the following Kedroids for contributing PRs to this release:
3233
* [janickspirig](https://github.com/janickspirig)
3334
* [Galen Seilis](https://github.com/galenseilis)
3435
* [Mariusz Wojakowski](https://github.com/mariusz89016)
36+
* [harm-matthias-harms](https://github.com/harm-matthias-harms)
3537
* [Felix Scherz](https://github.com/felixscherz)
3638

3739

‎kedro-datasets/docs/source/api/kedro_datasets.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ kedro_datasets
1717
dask.ParquetDataset
1818
databricks.ManagedTableDataset
1919
email.EmailMessageDataset
20-
geopandas.GeoJSONDataset
20+
geopandas.GenericDataset
2121
holoviews.HoloviewsWriter
2222
huggingface.HFDataset
2323
huggingface.HFTransformerPipelineDataset

‎kedro-datasets/kedro_datasets/geopandas/README.md

-31
This file was deleted.
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
1-
"""``GeoJSONDataset`` is an ``AbstractVersionedDataset`` to save and load GeoJSON files."""
1+
"""``GenericDataset`` is an ``AbstractVersionedDataset`` to save and load GeoDataFrames."""
22

33
from typing import Any
44

55
import lazy_loader as lazy
66

77
# https://github.com/pylint-dev/pylint/issues/4300#issuecomment-1043601901
8-
GeoJSONDataset: Any
8+
GenericDataset: Any
99

1010
__getattr__, __dir__, __all__ = lazy.attach(
11-
__name__, submod_attrs={"geojson_dataset": ["GeoJSONDataset"]}
11+
__name__, submod_attrs={"generic_dataset": ["GenericDataset"]}
1212
)

‎kedro-datasets/kedro_datasets/geopandas/geojson_dataset.py ‎kedro-datasets/kedro_datasets/geopandas/generic_dataset.py

+72-27
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
1-
"""GeoJSONDataset loads and saves data to a local geojson file. The
1+
"""GenericDataset loads and saves data to a local file. The
22
underlying functionality is supported by geopandas, so it supports all
33
allowed geopandas (pandas) options for loading and saving geosjon files.
44
"""
5+
56
from __future__ import annotations
67

78
import copy
@@ -18,30 +19,35 @@
1819
get_protocol_and_path,
1920
)
2021

22+
# pyogrio currently supports no alternate file handlers https://github.com/geopandas/pyogrio/issues/430
23+
gpd.options.io_engine = "fiona"
24+
25+
NON_FILE_SYSTEM_TARGETS = ["postgis"]
26+
2127

22-
class GeoJSONDataset(
28+
class GenericDataset(
2329
AbstractVersionedDataset[
2430
gpd.GeoDataFrame, gpd.GeoDataFrame | dict[str, gpd.GeoDataFrame]
2531
]
2632
):
27-
"""``GeoJSONDataset`` loads/saves data to a GeoJSON file using an underlying filesystem
33+
"""``GenericDataset`` loads/saves data to a file using an underlying filesystem
2834
(eg: local, S3, GCS).
2935
The underlying functionality is supported by geopandas, so it supports all
30-
allowed geopandas (pandas) options for loading and saving GeoJSON files.
36+
allowed geopandas (pandas) options for loading and saving files.
3137
3238
Example:
3339
3440
.. code-block:: pycon
3541
3642
>>> import geopandas as gpd
37-
>>> from kedro_datasets.geopandas import GeoJSONDataset
43+
>>> from kedro_datasets.geopandas import GenericDataset
3844
>>> from shapely.geometry import Point
3945
>>>
4046
>>> data = gpd.GeoDataFrame(
4147
... {"col1": [1, 2], "col2": [4, 5], "col3": [5, 6]},
4248
... geometry=[Point(1, 1), Point(2, 4)],
4349
... )
44-
>>> dataset = GeoJSONDataset(filepath=tmp_path / "test.geojson", save_args=None)
50+
>>> dataset = GenericDataset(filepath=tmp_path / "test.geojson")
4551
>>> dataset.save(data)
4652
>>> reloaded = dataset.load()
4753
>>>
@@ -50,35 +56,41 @@ class GeoJSONDataset(
5056
"""
5157

5258
DEFAULT_LOAD_ARGS: dict[str, Any] = {}
53-
DEFAULT_SAVE_ARGS = {"driver": "GeoJSON"}
59+
DEFAULT_SAVE_ARGS: dict[str, Any] = {}
60+
DEFAULT_FS_ARGS: dict[str, Any] = {"open_args_save": {"mode": "wb"}}
5461

5562
def __init__( # noqa: PLR0913
5663
self,
5764
*,
5865
filepath: str,
66+
file_format: str = "file",
5967
load_args: dict[str, Any] | None = None,
6068
save_args: dict[str, Any] | None = None,
6169
version: Version | None = None,
6270
credentials: dict[str, Any] | None = None,
6371
fs_args: dict[str, Any] | None = None,
6472
metadata: dict[str, Any] | None = None,
6573
) -> None:
66-
"""Creates a new instance of ``GeoJSONDataset`` pointing to a concrete GeoJSON file
74+
"""Creates a new instance of ``GenericDataset`` pointing to a concrete file
6775
on a specific filesystem fsspec.
6876
6977
Args:
7078
71-
filepath: Filepath in POSIX format to a GeoJSON file prefixed with a protocol like
79+
filepath: Filepath in POSIX format to a file prefixed with a protocol like
7280
`s3://`. If prefix is not provided `file` protocol (local filesystem) will be used.
7381
The prefix should be any protocol supported by ``fsspec``.
7482
Note: `http(s)` doesn't support versioning.
75-
load_args: GeoPandas options for loading GeoJSON files.
83+
file_format: String which is used to match the appropriate load/save method on a best
84+
effort basis. For example if 'parquet' is passed in the `geopandas.read_parquet` and
85+
`geopandas.DataFrame.to_parquet` will be identified. An error will be raised unless
86+
at least one matching `read_{file_format}` or `to_{file_format}` method is
87+
identified. Defaults to 'file'.
88+
load_args: GeoPandas options for loading files.
7689
Here you can find all available arguments:
7790
https://geopandas.org/en/stable/docs/reference/api/geopandas.read_file.html
78-
save_args: GeoPandas options for saving geojson files.
91+
save_args: GeoPandas options for saving files.
7992
Here you can find all available arguments:
8093
https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.to_file.html
81-
The default_save_arg driver is 'GeoJSON', all others preserved.
8294
version: If specified, should be an instance of
8395
``kedro.io.core.Version``. If its ``load`` attribute is
8496
None, the latest version will be loaded. If its ``save``
@@ -94,6 +106,9 @@ def __init__( # noqa: PLR0913
94106
metadata: Any arbitrary metadata.
95107
This is ignored by Kedro, but may be consumed by users or external plugins.
96108
"""
109+
110+
self._file_format = file_format.lower()
111+
97112
_fs_args = copy.deepcopy(fs_args) or {}
98113
_fs_open_args_load = _fs_args.pop("open_args_load", {})
99114
_fs_open_args_save = _fs_args.pop("open_args_save", {})
@@ -114,28 +129,57 @@ def __init__( # noqa: PLR0913
114129
glob_function=self._fs.glob,
115130
)
116131

117-
self._load_args = copy.deepcopy(self.DEFAULT_LOAD_ARGS)
118-
if load_args is not None:
119-
self._load_args.update(load_args)
120-
121-
self._save_args = copy.deepcopy(self.DEFAULT_SAVE_ARGS)
122-
if save_args is not None:
123-
self._save_args.update(save_args)
132+
# Handle default load and save and fs arguments
133+
self._load_args = {**self.DEFAULT_LOAD_ARGS, **(load_args or {})}
134+
self._save_args = {**self.DEFAULT_SAVE_ARGS, **(save_args or {})}
135+
self._fs_open_args_load = {
136+
**self.DEFAULT_FS_ARGS.get("open_args_load", {}),
137+
**(_fs_open_args_load or {}),
138+
}
139+
self._fs_open_args_save = {
140+
**self.DEFAULT_FS_ARGS.get("open_args_save", {}),
141+
**(_fs_open_args_save or {}),
142+
}
124143

125-
_fs_open_args_save.setdefault("mode", "wb")
126-
self._fs_open_args_load = _fs_open_args_load
127-
self._fs_open_args_save = _fs_open_args_save
144+
def _ensure_file_system_target(self) -> None:
145+
# Fail fast if provided a known non-filesystem target
146+
if self._file_format in NON_FILE_SYSTEM_TARGETS:
147+
raise DatasetError(
148+
f"Cannot load or save a dataset of file_format '{self._file_format}' as it "
149+
f"does not support a filepath target/source."
150+
)
128151

129152
def load(self) -> gpd.GeoDataFrame | dict[str, gpd.GeoDataFrame]:
153+
self._ensure_file_system_target()
154+
130155
load_path = get_filepath_str(self._get_load_path(), self._protocol)
131-
with self._fs.open(load_path, **self._fs_open_args_load) as fs_file:
132-
return gpd.read_file(fs_file, **self._load_args)
156+
load_method = getattr(gpd, f"read_{self._file_format}", None)
157+
if load_method:
158+
with self._fs.open(load_path, **self._fs_open_args_load) as fs_file:
159+
return load_method(fs_file, **self._load_args)
160+
raise DatasetError(
161+
f"Unable to retrieve 'geopandas.read_{self._file_format}' method, please ensure that your "
162+
"'file_format' parameter has been defined correctly as per the GeoPandas API "
163+
"https://geopandas.org/en/stable/docs/reference/io.html"
164+
)
133165

134166
def save(self, data: gpd.GeoDataFrame) -> None:
167+
self._ensure_file_system_target()
168+
135169
save_path = get_filepath_str(self._get_save_path(), self._protocol)
136-
with self._fs.open(save_path, **self._fs_open_args_save) as fs_file:
137-
data.to_file(fs_file, **self._save_args)
138-
self.invalidate_cache()
170+
save_method = getattr(data, f"to_{self._file_format}", None)
171+
if save_method:
172+
with self._fs.open(save_path, **self._fs_open_args_save) as fs_file:
173+
# KEY ASSUMPTION - first argument is path/buffer/io
174+
save_method(fs_file, **self._save_args)
175+
self.invalidate_cache()
176+
else:
177+
raise DatasetError(
178+
f"Unable to retrieve 'geopandas.DataFrame.to_{self._file_format}' method, please "
179+
"ensure that your 'file_format' parameter has been defined correctly as "
180+
"per the GeoPandas API "
181+
"https://geopandas.org/en/stable/docs/reference/io.html"
182+
)
139183

140184
def _exists(self) -> bool:
141185
try:
@@ -147,6 +191,7 @@ def _exists(self) -> bool:
147191
def _describe(self) -> dict[str, Any]:
148192
return {
149193
"filepath": self._filepath,
194+
"file_format": self._file_format,
150195
"protocol": self._protocol,
151196
"load_args": self._load_args,
152197
"save_args": self._save_args,

‎kedro-datasets/pyproject.toml

+4-4
Original file line numberDiff line numberDiff line change
@@ -40,8 +40,8 @@ dask = ["kedro-datasets[dask-parquetdataset, dask-csvdataset]"]
4040
databricks-managedtabledataset = ["kedro-datasets[spark-base,pandas-base,delta-base,hdfs-base,s3fs-base]"]
4141
databricks = ["kedro-datasets[databricks-managedtabledataset]"]
4242

43-
geopandas-geojsondataset = ["geopandas>=0.6.0, <1.0", "pyproj~=3.0"]
44-
geopandas = ["kedro-datasets[geopandas-geojsondataset]"]
43+
geopandas-genericdataset = ["geopandas>=0.8.0, <2.0", "fiona >=1.8, <2.0"]
44+
geopandas = ["kedro-datasets[geopandas-genericdataset]"]
4545

4646
holoviews-holoviewswriter = ["holoviews>=1.13.0"]
4747
holoviews = ["kedro-datasets[holoviews-holoviewswriter]"]
@@ -215,8 +215,9 @@ test = [
215215
"deltalake>=0.10.0",
216216
"dill~=0.3.1",
217217
"filelock>=3.4.0, <4.0",
218+
"fiona >=1.8, <2.0",
218219
"gcsfs>=2023.1, <2023.3",
219-
"geopandas>=0.6.0, <1.0",
220+
"geopandas>=0.8.0, <2.0",
220221
"hdfs>=2.5.8, <3.0",
221222
"holoviews>=1.13.0",
222223
"ibis-framework[duckdb,examples]",
@@ -243,7 +244,6 @@ test = [
243244
"pyarrow>=1.0; python_version < '3.11'",
244245
"pyarrow>=7.0; python_version >= '3.11'", # Adding to avoid numpy build errors
245246
"pyodbc~=5.0",
246-
"pyproj~=3.0",
247247
"pyspark>=3.0; python_version < '3.11'",
248248
"pyspark>=3.4; python_version >= '3.11'",
249249
"pytest-cov~=3.0",

‎kedro-datasets/tests/geopandas/test_geojson_dataset.py ‎kedro-datasets/tests/geopandas/test_generic_dataset.py

+132-22
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
from s3fs import S3FileSystem
1111
from shapely.geometry import Point
1212

13-
from kedro_datasets.geopandas import GeoJSONDataset
13+
from kedro_datasets.geopandas import GenericDataset
1414

1515

1616
@pytest.fixture(params=[None])
@@ -24,16 +24,36 @@ def save_version(request):
2424

2525

2626
@pytest.fixture
27-
def filepath(tmp_path):
27+
def filepath_geojson(tmp_path):
2828
return (tmp_path / "test.geojson").as_posix()
2929

3030

31+
@pytest.fixture
32+
def filepath_parquet(tmp_path):
33+
return (tmp_path / "test.parquet").as_posix()
34+
35+
36+
@pytest.fixture
37+
def filepath_feather(tmp_path):
38+
return (tmp_path / "test.feather").as_posix()
39+
40+
41+
@pytest.fixture
42+
def filepath_postgis(tmp_path):
43+
return (tmp_path / "test.sql").as_posix()
44+
45+
46+
@pytest.fixture
47+
def filepath_abc(tmp_path):
48+
return tmp_path / "test.abc"
49+
50+
3151
@pytest.fixture(params=[None])
3252
def load_args(request):
3353
return request.param
3454

3555

36-
@pytest.fixture(params=[{"driver": "GeoJSON"}])
56+
@pytest.fixture(params=[None])
3757
def save_args(request):
3858
return request.param
3959

@@ -47,20 +67,77 @@ def dummy_dataframe():
4767

4868

4969
@pytest.fixture
50-
def geojson_dataset(filepath, load_args, save_args, fs_args):
51-
return GeoJSONDataset(
52-
filepath=filepath, load_args=load_args, save_args=save_args, fs_args=fs_args
70+
def geojson_dataset(filepath_geojson, load_args, save_args, fs_args):
71+
return GenericDataset(
72+
filepath=filepath_geojson,
73+
load_args=load_args,
74+
save_args=save_args,
75+
fs_args=fs_args,
76+
)
77+
78+
79+
@pytest.fixture
80+
def parquet_dataset(filepath_parquet, load_args, save_args, fs_args):
81+
return GenericDataset(
82+
filepath=filepath_parquet,
83+
file_format="parquet",
84+
load_args=load_args,
85+
save_args=save_args,
86+
fs_args=fs_args,
87+
)
88+
89+
90+
@pytest.fixture
91+
def parquet_dataset_bad_config(filepath_parquet, load_args, save_args, fs_args):
92+
return GenericDataset(
93+
filepath=filepath_parquet,
94+
load_args=load_args,
95+
save_args=save_args,
96+
fs_args=fs_args,
97+
)
98+
99+
100+
@pytest.fixture
101+
def feather_dataset(filepath_feather, load_args, save_args, fs_args):
102+
return GenericDataset(
103+
filepath=filepath_feather,
104+
file_format="feather",
105+
load_args=load_args,
106+
save_args=save_args,
107+
fs_args=fs_args,
108+
)
109+
110+
111+
@pytest.fixture
112+
def postgis_dataset(filepath_postgis, load_args, save_args, fs_args):
113+
return GenericDataset(
114+
filepath=filepath_postgis,
115+
file_format="postgis",
116+
load_args=load_args,
117+
save_args=save_args,
118+
fs_args=fs_args,
53119
)
54120

55121

56122
@pytest.fixture
57-
def versioned_geojson_dataset(filepath, load_version, save_version):
58-
return GeoJSONDataset(
59-
filepath=filepath, version=Version(load_version, save_version)
123+
def abc_dataset(filepath_abc, load_args, save_args, fs_args):
124+
return GenericDataset(
125+
filepath=filepath_abc,
126+
file_format="abc",
127+
load_args=load_args,
128+
save_args=save_args,
129+
fs_args=fs_args,
60130
)
61131

62132

63-
class TestGeoJSONDataset:
133+
@pytest.fixture
134+
def versioned_geojson_dataset(filepath_geojson, load_version, save_version):
135+
return GenericDataset(
136+
filepath=filepath_geojson, version=Version(load_version, save_version)
137+
)
138+
139+
140+
class TestGenericDataset:
64141
def test_save_and_load(self, geojson_dataset, dummy_dataframe):
65142
"""Test that saved and reloaded data matches the original one."""
66143
geojson_dataset.save(dummy_dataframe)
@@ -72,7 +149,7 @@ def test_save_and_load(self, geojson_dataset, dummy_dataframe):
72149
@pytest.mark.parametrize("geojson_dataset", [{"index": False}], indirect=True)
73150
def test_load_missing_file(self, geojson_dataset):
74151
"""Check the error while trying to load from missing source."""
75-
pattern = r"Failed while loading data from dataset GeoJSONDataset"
152+
pattern = r"Failed while loading data from dataset GenericDataset"
76153
with pytest.raises(DatasetError, match=pattern):
77154
geojson_dataset.load()
78155

@@ -82,6 +159,39 @@ def test_exists(self, geojson_dataset, dummy_dataframe):
82159
geojson_dataset.save(dummy_dataframe)
83160
assert geojson_dataset.exists()
84161

162+
def test_load_parquet_dataset(self, parquet_dataset, dummy_dataframe):
163+
parquet_dataset.save(dummy_dataframe)
164+
reloaded_df = parquet_dataset.load()
165+
assert_frame_equal(reloaded_df, dummy_dataframe)
166+
167+
def test_load_feather_dataset(self, feather_dataset, dummy_dataframe):
168+
feather_dataset.save(dummy_dataframe)
169+
reloaded_df = feather_dataset.load()
170+
assert_frame_equal(reloaded_df, dummy_dataframe)
171+
172+
def test_bad_load(
173+
self, parquet_dataset_bad_config, dummy_dataframe, filepath_parquet
174+
):
175+
dummy_dataframe.to_parquet(filepath_parquet)
176+
pattern = r"Failed while loading data from dataset GenericDataset(.*)"
177+
with pytest.raises(DatasetError, match=pattern):
178+
parquet_dataset_bad_config.load()
179+
180+
def test_none_file_system_target(self, postgis_dataset, dummy_dataframe):
181+
pattern = "Cannot load or save a dataset of file_format 'postgis' as it does not support a filepath target/source."
182+
with pytest.raises(DatasetError, match=pattern):
183+
postgis_dataset.save(dummy_dataframe)
184+
185+
def test_unknown_file_format(self, abc_dataset, dummy_dataframe, filepath_abc):
186+
pattern = "Unable to retrieve 'geopandas.DataFrame.to_abc' method"
187+
with pytest.raises(DatasetError, match=pattern):
188+
abc_dataset.save(dummy_dataframe)
189+
190+
filepath_abc.write_bytes(b"")
191+
pattern = "Unable to retrieve 'geopandas.read_abc' method"
192+
with pytest.raises(DatasetError, match=pattern):
193+
abc_dataset.load()
194+
85195
@pytest.mark.parametrize(
86196
"load_args", [{"crs": "init:4326"}, {"crs": "init:2154", "driver": "GeoJSON"}]
87197
)
@@ -118,7 +228,7 @@ def test_open_extra_args(self, geojson_dataset, fs_args):
118228
],
119229
)
120230
def test_protocol_usage(self, path, instance_type):
121-
geojson_dataset = GeoJSONDataset(filepath=path)
231+
geojson_dataset = GenericDataset(filepath=path)
122232
assert isinstance(geojson_dataset._fs, instance_type)
123233

124234
path = path.split(PROTOCOL_DELIMITER, 1)[-1]
@@ -129,18 +239,18 @@ def test_protocol_usage(self, path, instance_type):
129239
def test_catalog_release(self, mocker):
130240
fs_mock = mocker.patch("fsspec.filesystem").return_value
131241
filepath = "test.geojson"
132-
geojson_dataset = GeoJSONDataset(filepath=filepath)
242+
geojson_dataset = GenericDataset(filepath=filepath)
133243
geojson_dataset.release()
134244
fs_mock.invalidate_cache.assert_called_once_with(filepath)
135245

136246

137-
class TestGeoJSONDatasetVersioned:
247+
class TestGenericDatasetVersioned:
138248
def test_version_str_repr(self, load_version, save_version):
139249
"""Test that version is in string representation of the class instance
140250
when applicable."""
141251
filepath = "test.geojson"
142-
ds = GeoJSONDataset(filepath=filepath)
143-
ds_versioned = GeoJSONDataset(
252+
ds = GenericDataset(filepath=filepath)
253+
ds_versioned = GenericDataset(
144254
filepath=filepath, version=Version(load_version, save_version)
145255
)
146256
assert filepath in str(ds)
@@ -149,8 +259,8 @@ def test_version_str_repr(self, load_version, save_version):
149259
assert filepath in str(ds_versioned)
150260
ver_str = f"version=Version(load={load_version}, save='{save_version}')"
151261
assert ver_str in str(ds_versioned)
152-
assert "GeoJSONDataset" in str(ds_versioned)
153-
assert "GeoJSONDataset" in str(ds)
262+
assert "GenericDataset" in str(ds_versioned)
263+
assert "GenericDataset" in str(ds)
154264
assert "protocol" in str(ds_versioned)
155265
assert "protocol" in str(ds)
156266

@@ -163,7 +273,7 @@ def test_save_and_load(self, versioned_geojson_dataset, dummy_dataframe):
163273

164274
def test_no_versions(self, versioned_geojson_dataset):
165275
"""Check the error if no versions are available for load."""
166-
pattern = r"Did not find any versions for GeoJSONDataset\(.+\)"
276+
pattern = r"Did not find any versions for GenericDataset\(.+\)"
167277
with pytest.raises(DatasetError, match=pattern):
168278
versioned_geojson_dataset.load()
169279

@@ -178,7 +288,7 @@ def test_prevent_override(self, versioned_geojson_dataset, dummy_dataframe):
178288
version."""
179289
versioned_geojson_dataset.save(dummy_dataframe)
180290
pattern = (
181-
r"Save path \'.+\' for GeoJSONDataset\(.+\) must not "
291+
r"Save path \'.+\' for GenericDataset\(.+\) must not "
182292
r"exist if versioning is enabled"
183293
)
184294
with pytest.raises(DatasetError, match=pattern):
@@ -197,7 +307,7 @@ def test_save_version_warning(
197307
the subsequent load path."""
198308
pattern = (
199309
rf"Save version '{save_version}' did not match load version "
200-
rf"'{load_version}' for GeoJSONDataset\(.+\)"
310+
rf"'{load_version}' for GenericDataset\(.+\)"
201311
)
202312
with pytest.warns(UserWarning, match=pattern):
203313
versioned_geojson_dataset.save(dummy_dataframe)
@@ -206,7 +316,7 @@ def test_http_filesystem_no_versioning(self):
206316
pattern = "Versioning is not supported for HTTP protocols."
207317

208318
with pytest.raises(DatasetError, match=pattern):
209-
GeoJSONDataset(
319+
GenericDataset(
210320
filepath="https://example/file.geojson", version=Version(None, None)
211321
)
212322

0 commit comments

Comments
 (0)
Please sign in to comment.