Skip to content

Commit 70d2e51

Browse files
authored
[SEDONA-704] Add Stac Python Wrapper for STAC Reader (#1793)
* [SEDONA-704] Add Stac Python Wrapper for STAC Reader * update schema * downgrade pystac version so python 3.7 runtime is supported
1 parent 1260245 commit 70d2e51

File tree

8 files changed

+1028
-0
lines changed

8 files changed

+1028
-0
lines changed

docs/api/sql/Stac.md

+150
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,156 @@ In this example, the data source will push down the temporal filter to the under
146146

147147
In this example, the data source will push down the spatial filter to the underlying data source.
148148

149+
# Python API
150+
151+
The Python API allows you to interact with a SpatioTemporal Asset Catalog (STAC) API using the Client class. This class provides methods to open a connection to a STAC API, retrieve collections, and search for items with various filters.
152+
153+
## Client Class
154+
155+
## Methods
156+
157+
### `open(url: str) -> Client`
158+
159+
Opens a connection to the specified STAC API URL.
160+
161+
**Parameters:**
162+
163+
- `url` (*str*): The URL of the STAC API to connect to.
164+
**Example:** `"https://planetarycomputer.microsoft.com/api/stac/v1"`
165+
166+
**Returns:**
167+
168+
- `Client`: An instance of the `Client` class connected to the specified URL.
169+
170+
---
171+
172+
### `get_collection(collection_id: str) -> CollectionClient`
173+
174+
Retrieves a collection client for the specified collection ID.
175+
176+
**Parameters:**
177+
178+
- `collection_id` (*str*): The ID of the collection to retrieve.
179+
**Example:** `"aster-l1t"`
180+
181+
**Returns:**
182+
183+
- `CollectionClient`: An instance of the `CollectionClient` class for the specified collection.
184+
185+
---
186+
187+
### `search(*ids: Union[str, list], collection_id: str, bbox: Optional[list] = None, datetime: Optional[Union[str, datetime.datetime, list]] = None, max_items: Optional[int] = None, return_dataframe: bool = True) -> Union[Iterator[PyStacItem], DataFrame]`
188+
189+
Searches for items in the specified collection with optional filters.
190+
191+
**Parameters:**
192+
193+
- `ids` (*Union[str, list]*): A variable number of item IDs to filter the items.
194+
**Example:** `"item_id1"` or `["item_id1", "item_id2"]`
195+
- `collection_id` (*str*): The ID of the collection to search in.
196+
**Example:** `"aster-l1t"`
197+
- `bbox` (*Optional[list]*): A list of bounding boxes for filtering the items. Each bounding box is represented as a list of four float values: `[min_lon, min_lat, max_lon, max_lat]`.
198+
**Example:** `[[ -180.0, -90.0, 180.0, 90.0 ]]`
199+
- `datetime` (*Optional[Union[str, datetime.datetime, list]]*): A single datetime, RFC 3339-compliant timestamp, or a list of date-time ranges for filtering the items.
200+
**Example:**
201+
- `"2020-01-01T00:00:00Z"`
202+
- `datetime.datetime(2020, 1, 1)`
203+
- `[["2020-01-01T00:00:00Z", "2021-01-01T00:00:00Z"]]`
204+
- `max_items` (*Optional[int]*): The maximum number of items to return from the search, even if there are more matching results.
205+
**Example:** `100`
206+
- `return_dataframe` (*bool*): If `True` (default), return the result as a Spark DataFrame instead of an iterator of `PyStacItem` objects.
207+
**Example:** `True`
208+
209+
**Returns:**
210+
211+
- *Union[Iterator[PyStacItem], DataFrame]*: An iterator of `PyStacItem` objects or a Spark DataFrame that matches the specified filters.
212+
213+
## Sample Code
214+
215+
### Initialize the Client
216+
217+
```python
218+
from sedona.stac.client import Client
219+
220+
# Initialize the client
221+
client = Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")
222+
```
223+
224+
### Search Items on a Collection Within a Year
225+
226+
```python
227+
items = client.search(
228+
collection_id="aster-l1t",
229+
datetime="2020",
230+
return_dataframe=False
231+
)
232+
```
233+
234+
### Search Items on a Collection Within a Month and Max Items
235+
236+
```python
237+
items = client.search(
238+
collection_id="aster-l1t",
239+
datetime="2020-05",
240+
return_dataframe=False,
241+
max_items=5
242+
)
243+
```
244+
245+
### Search Items with Bounding Box and Interval
246+
247+
```python
248+
items = client.search(
249+
collection_id="aster-l1t",
250+
ids=["AST_L1T_00312272006020322_20150518201805"],
251+
bbox=[-180.0, -90.0, 180.0, 90.0],
252+
datetime=["2006-01-01T00:00:00Z", "2007-01-01T00:00:00Z"],
253+
return_dataframe=False
254+
)
255+
```
256+
257+
### Search Multiple Items with Multiple Bounding Boxes
258+
259+
```python
260+
bbox_list = [
261+
[-180.0, -90.0, 180.0, 90.0],
262+
[-100.0, -50.0, 100.0, 50.0]
263+
]
264+
items = client.search(
265+
collection_id="aster-l1t",
266+
bbox=bbox_list,
267+
return_dataframe=False
268+
)
269+
```
270+
271+
### Search Items and Get DataFrame as Return with Multiple Intervals
272+
273+
```python
274+
interval_list = [
275+
["2020-01-01T00:00:00Z", "2020-06-01T00:00:00Z"],
276+
["2020-07-01T00:00:00Z", "2021-01-01T00:00:00Z"]
277+
]
278+
df = client.search(
279+
collection_id="aster-l1t",
280+
datetime=interval_list,
281+
return_dataframe=True
282+
)
283+
df.show()
284+
```
285+
286+
### Save Items in DataFrame to GeoParquet with Both Bounding Boxes and Intervals
287+
288+
```python
289+
# Save items in DataFrame to GeoParquet with both bounding boxes and intervals
290+
client.get_collection("aster-l1t").save_to_geoparquet(
291+
output_path="/path/to/output",
292+
bbox=bbox_list,
293+
datetime="2020-05"
294+
)
295+
```
296+
297+
These examples demonstrate how to use the Client class to search for items in a STAC collection with various filters and return the results as either an iterator of PyStacItem objects or a Spark DataFrame.
298+
149299
# References
150300

151301
- STAC Specification: https://stacspec.org/

python/Pipfile

+1
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ attrs="*"
2929
pyarrow="*"
3030
keplergl = "==0.3.2"
3131
pydeck = "===0.8.0"
32+
pystac = "===1.5.0"
3233
rasterio = ">=1.2.10"
3334

3435
[requires]

python/sedona/stac/__init__.py

+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.

python/sedona/stac/client.py

+112
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
from typing import Union, Optional, Iterator
18+
19+
from sedona.stac.collection_client import CollectionClient
20+
21+
import datetime as python_datetime
22+
from pystac import Item as PyStacItem
23+
24+
from pyspark.sql import DataFrame
25+
26+
27+
class Client:
28+
def __init__(self, url: str):
29+
self.url = url
30+
31+
@classmethod
32+
def open(cls, url: str):
33+
"""
34+
Opens a connection to the specified STAC API URL.
35+
36+
This class method creates an instance of the Client class with the given URL.
37+
38+
Parameters:
39+
- url (str): The URL of the STAC API to connect to.
40+
Example: "https://planetarycomputer.microsoft.com/api/stac/v1"
41+
42+
Returns:
43+
- Client: An instance of the Client class connected to the specified URL.
44+
"""
45+
return cls(url)
46+
47+
def get_collection(self, collection_id: str):
48+
"""
49+
Retrieves a collection client for the specified collection ID.
50+
51+
This method creates an instance of the CollectionClient class for the given collection ID,
52+
allowing interaction with the specified collection in the STAC API.
53+
54+
Parameters:
55+
- collection_id (str): The ID of the collection to retrieve.
56+
Example: "aster-l1t"
57+
58+
Returns:
59+
- CollectionClient: An instance of the CollectionClient class for the specified collection.
60+
"""
61+
return CollectionClient(self.url, collection_id)
62+
63+
def search(
64+
self,
65+
*ids: Union[str, list],
66+
collection_id: str,
67+
bbox: Optional[list] = None,
68+
datetime: Optional[Union[str, python_datetime.datetime, list]] = None,
69+
max_items: Optional[int] = None,
70+
return_dataframe: bool = True,
71+
) -> Union[Iterator[PyStacItem], DataFrame]:
72+
"""
73+
Searches for items in the specified collection with optional filters.
74+
75+
Parameters:
76+
- ids (Union[str, list]): A variable number of item IDs to filter the items.
77+
Example: "item_id1" or ["item_id1", "item_id2"]
78+
79+
- collection_id (str): The ID of the collection to search in.
80+
Example: "aster-l1t"
81+
82+
- bbox (Optional[list]): A list of bounding boxes for filtering the items.
83+
Each bounding box is represented as a list of four float values: [min_lon, min_lat, max_lon, max_lat].
84+
Example: [[-180.0, -90.0, 180.0, 90.0]] # This bounding box covers the entire world.
85+
86+
- datetime (Optional[Union[str, python_datetime.datetime, list]]): A single datetime, RFC 3339-compliant timestamp,
87+
or a list of date-time ranges for filtering the items. The datetime can be specified in various formats:
88+
- "YYYY" expands to ["YYYY-01-01T00:00:00Z", "YYYY-12-31T23:59:59Z"]
89+
- "YYYY-mm" expands to ["YYYY-mm-01T00:00:00Z", "YYYY-mm-<last_day>T23:59:59Z"]
90+
- "YYYY-mm-dd" expands to ["YYYY-mm-ddT00:00:00Z", "YYYY-mm-ddT23:59:59Z"]
91+
- "YYYY-mm-ddTHH:MM:SSZ" remains as ["YYYY-mm-ddTHH:MM:SSZ", "YYYY-mm-ddTHH:MM:SSZ"]
92+
- A list of date-time ranges can be provided for multiple intervals.
93+
Example: "2020-01-01T00:00:00Z" or python_datetime.datetime(2020, 1, 1) or [["2020-01-01T00:00:00Z", "2021-01-01T00:00:00Z"]]
94+
95+
- max_items (Optional[int]): The maximum number of items to return from the search, even if there are more matching results.
96+
Example: 100
97+
98+
- return_dataframe (bool): If True, return the result as a Spark DataFrame instead of an iterator of PyStacItem objects.
99+
Example: True
100+
101+
Returns:
102+
- Union[Iterator[PyStacItem], DataFrame]: An iterator of PyStacItem objects or a Spark DataFrame that match the specified filters.
103+
"""
104+
client = self.get_collection(collection_id)
105+
if return_dataframe:
106+
return client.get_dataframe(
107+
*ids, bbox=bbox, datetime=datetime, max_items=max_items
108+
)
109+
else:
110+
return client.get_items(
111+
*ids, bbox=bbox, datetime=datetime, max_items=max_items
112+
)

0 commit comments

Comments
 (0)