Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Run Python black on Markdown code blocks #1797

Merged
merged 1 commit into from
Feb 11, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 28 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,25 +93,43 @@ This example loads NYC taxi trip records and taxi zone information stored as .CS
#### Load NYC taxi trips and taxi zones data from CSV Files Stored on AWS S3

```python
taxidf = sedona.read.format('csv').option("header","true").option("delimiter", ",").load("s3a://your-directory/data/nyc-taxi-data.csv")
taxidf = taxidf.selectExpr('ST_Point(CAST(Start_Lon AS Decimal(24,20)), CAST(Start_Lat AS Decimal(24,20))) AS pickup', 'Trip_Pickup_DateTime', 'Payment_Type', 'Fare_Amt')
taxidf = (
sedona.read.format("csv")
.option("header", "true")
.option("delimiter", ",")
.load("s3a://your-directory/data/nyc-taxi-data.csv")
)
taxidf = taxidf.selectExpr(
"ST_Point(CAST(Start_Lon AS Decimal(24,20)), CAST(Start_Lat AS Decimal(24,20))) AS pickup",
"Trip_Pickup_DateTime",
"Payment_Type",
"Fare_Amt",
)
```

```python
zoneDf = sedona.read.format('csv').option("delimiter", ",").load("s3a://your-directory/data/TIGER2018_ZCTA5.csv")
zoneDf = zoneDf.selectExpr('ST_GeomFromWKT(_c0) as zone', '_c1 as zipcode')
zoneDf = (
sedona.read.format("csv")
.option("delimiter", ",")
.load("s3a://your-directory/data/TIGER2018_ZCTA5.csv")
)
zoneDf = zoneDf.selectExpr("ST_GeomFromWKT(_c0) as zone", "_c1 as zipcode")
```

#### Spatial SQL query to only return Taxi trips in Manhattan

```python
taxidf_mhtn = taxidf.where('ST_Contains(ST_PolygonFromEnvelope(-74.01,40.73,-73.93,40.79), pickup)')
taxidf_mhtn = taxidf.where(
"ST_Contains(ST_PolygonFromEnvelope(-74.01,40.73,-73.93,40.79), pickup)"
)
```

#### Spatial Join between Taxi Dataframe and Zone Dataframe to Find taxis in each zone

```python
taxiVsZone = sedona.sql('SELECT zone, zipcode, pickup, Fare_Amt FROM zoneDf, taxiDf WHERE ST_Contains(zone, pickup)')
taxiVsZone = sedona.sql(
"SELECT zone, zipcode, pickup, Fare_Amt FROM zoneDf, taxiDf WHERE ST_Contains(zone, pickup)"
)
```

#### Show a map of the loaded Spatial Dataframes using GeoPandas
Expand All @@ -120,14 +138,14 @@ taxiVsZone = sedona.sql('SELECT zone, zipcode, pickup, Fare_Amt FROM zoneDf, tax
zoneGpd = gpd.GeoDataFrame(zoneDf.toPandas(), geometry="zone")
taxiGpd = gpd.GeoDataFrame(taxidf.toPandas(), geometry="pickup")

zone = zoneGpd.plot(color='yellow', edgecolor='black', zorder=1)
zone.set_xlabel('Longitude (degrees)')
zone.set_ylabel('Latitude (degrees)')
zone = zoneGpd.plot(color="yellow", edgecolor="black", zorder=1)
zone.set_xlabel("Longitude (degrees)")
zone.set_ylabel("Latitude (degrees)")

zone.set_xlim(-74.1, -73.8)
zone.set_ylim(40.65, 40.9)

taxi = taxiGpd.plot(ax=zone, alpha=0.01, color='red', zorder=3)
taxi = taxiGpd.plot(ax=zone, alpha=0.01, color="red", zorder=3)
```

## Docker image
Expand Down
7 changes: 6 additions & 1 deletion docs/api/sql/Raster-visualizer.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,9 +78,14 @@ Example:

```python
from sedona.raster_utils.SedonaUtils import SedonaUtils

# Or from sedona.spark import *

df = sedona.read.format('binaryFile').load(DATA_DIR + 'raster.tiff').selectExpr("RS_FromGeoTiff(content) as raster")
df = (
sedona.read.format("binaryFile")
.load(DATA_DIR + "raster.tiff")
.selectExpr("RS_FromGeoTiff(content) as raster")
)
htmlDF = df.selectExpr("RS_AsImage(raster, 500) as raster_image")
SedonaUtils.display_image(htmlDF)
```
Expand Down
84 changes: 58 additions & 26 deletions docs/api/sql/Spider.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,18 @@ Sedona offers a spatial data generator called Spider. It is a data source that g
Once you have your [`SedonaContext` object created](../Overview#quick-start), you can create a DataFrame with the `spider` data source.

```python
df_random_points = sedona.read.format("spider").load(n=1000, distribution='uniform')
df_random_boxes = sedona.read.format("spider").load(n=1000, distribution='gaussian', geometryType='box', maxWidth=0.05, maxHeight=0.05)
df_random_polygons = sedona.read.format("spider").load(n=1000, distribution='bit', geometryType='polygon', minSegment=3, maxSegment=5, maxSize=0.1)
df_random_points = sedona.read.format("spider").load(n=1000, distribution="uniform")
df_random_boxes = sedona.read.format("spider").load(
n=1000, distribution="gaussian", geometryType="box", maxWidth=0.05, maxHeight=0.05
)
df_random_polygons = sedona.read.format("spider").load(
n=1000,
distribution="bit",
geometryType="polygon",
minSegment=3,
maxSegment=5,
maxSize=0.1,
)
```

Now we have three DataFrames with random spatial data. We can show the first three rows of the `df_random_points` DataFrame to verify the data is generated correctly.
Expand Down Expand Up @@ -57,22 +66,24 @@ import matplotlib.pyplot as plt
import geopandas as gpd

# Convert DataFrames to GeoDataFrames
gdf_random_points = gpd.GeoDataFrame(df_random_points.toPandas(), geometry='geometry')
gdf_random_boxes = gpd.GeoDataFrame(df_random_boxes.toPandas(), geometry='geometry')
gdf_random_polygons = gpd.GeoDataFrame(df_random_polygons.toPandas(), geometry='geometry')
gdf_random_points = gpd.GeoDataFrame(df_random_points.toPandas(), geometry="geometry")
gdf_random_boxes = gpd.GeoDataFrame(df_random_boxes.toPandas(), geometry="geometry")
gdf_random_polygons = gpd.GeoDataFrame(
df_random_polygons.toPandas(), geometry="geometry"
)

# Create a figure and a set of subplots
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Plot each GeoDataFrame on a different subplot
gdf_random_points.plot(ax=axes[0], color='blue', markersize=5)
axes[0].set_title('Random Points')
gdf_random_points.plot(ax=axes[0], color="blue", markersize=5)
axes[0].set_title("Random Points")

gdf_random_boxes.boundary.plot(ax=axes[1], color='red')
axes[1].set_title('Random Boxes')
gdf_random_boxes.boundary.plot(ax=axes[1], color="red")
axes[1].set_title("Random Boxes")

gdf_random_polygons.boundary.plot(ax=axes[2], color='green')
axes[2].set_title('Random Polygons')
gdf_random_polygons.boundary.plot(ax=axes[2], color="green")
axes[2].set_title("Random Polygons")

# Adjust the layout
plt.tight_layout()
Expand Down Expand Up @@ -122,8 +133,11 @@ Example:

```python
import geopandas as gpd
df = sedona.read.format("spider").load(n=300, distribution='uniform', geometryType='box', maxWidth=0.05, maxHeight=0.05)
gpd.GeoDataFrame(df.toPandas(), geometry='geometry').boundary.plot()

df = sedona.read.format("spider").load(
n=300, distribution="uniform", geometryType="box", maxWidth=0.05, maxHeight=0.05
)
gpd.GeoDataFrame(df.toPandas(), geometry="geometry").boundary.plot()
```

![Uniform Distribution](../../image/spider/spider-uniform.png)
Expand All @@ -145,8 +159,11 @@ Example:

```python
import geopandas as gpd
df = sedona.read.format("spider").load(n=300, distribution='gaussian', geometryType='polygon', maxSize=0.05)
gpd.GeoDataFrame(df.toPandas(), geometry='geometry').boundary.plot()

df = sedona.read.format("spider").load(
n=300, distribution="gaussian", geometryType="polygon", maxSize=0.05
)
gpd.GeoDataFrame(df.toPandas(), geometry="geometry").boundary.plot()
```

![Gaussian Distribution](../../image/spider/spider-gaussian.png)
Expand All @@ -170,8 +187,11 @@ Example:

```python
import geopandas as gpd
df = sedona.read.format("spider").load(n=300, distribution='bit', geometryType='point', probability=0.2, digits=10)
gpd.GeoDataFrame(df.toPandas(), geometry='geometry').plot(markersize=1)

df = sedona.read.format("spider").load(
n=300, distribution="bit", geometryType="point", probability=0.2, digits=10
)
gpd.GeoDataFrame(df.toPandas(), geometry="geometry").plot(markersize=1)
```

![Bit Distribution](../../image/spider/spider-bit.png)
Expand All @@ -195,8 +215,11 @@ Example:

```python
import geopandas as gpd
df = sedona.read.format("spider").load(n=300, distribution='diagonal', geometryType='point', percentage=0.5, buffer=0.5)
gpd.GeoDataFrame(df.toPandas(), geometry='geometry').plot(markersize=1)

df = sedona.read.format("spider").load(
n=300, distribution="diagonal", geometryType="point", percentage=0.5, buffer=0.5
)
gpd.GeoDataFrame(df.toPandas(), geometry="geometry").plot(markersize=1)
```

![Diagonal Distribution](../../image/spider/spider-diagonal.png)
Expand All @@ -218,8 +241,11 @@ Example:

```python
import geopandas as gpd
df = sedona.read.format("spider").load(n=2000, distribution='sierpinski', geometryType='point')
gpd.GeoDataFrame(df.toPandas(), geometry='geometry').plot(markersize=1)

df = sedona.read.format("spider").load(
n=2000, distribution="sierpinski", geometryType="point"
)
gpd.GeoDataFrame(df.toPandas(), geometry="geometry").plot(markersize=1)
```

![Sierpinski Distribution](../../image/spider/spider-sierpinski.png)
Expand All @@ -237,8 +263,11 @@ Example:

```python
import geopandas as gpd
df = sedona.read.format("spider").load(n=300, distribution='parcel', dither=0.5, splitRange=0.5)
gpd.GeoDataFrame(df.toPandas(), geometry='geometry').boundary.plot()

df = sedona.read.format("spider").load(
n=300, distribution="parcel", dither=0.5, splitRange=0.5
)
gpd.GeoDataFrame(df.toPandas(), geometry="geometry").boundary.plot()
```

![Parcel Distribution](../../image/spider/spider-parcel.png)
Expand Down Expand Up @@ -274,8 +303,11 @@ Example:

```python
import geopandas as gpd
df_random_points = sedona.read.format("spider").load(n=1000, distribution='uniform', translateX=0.5, translateY=0.5, scaleX=2, scaleY=2)
gpd.GeoDataFrame(df_random_points.toPandas(), geometry='geometry').plot(markersize=1)

df_random_points = sedona.read.format("spider").load(
n=1000, distribution="uniform", translateX=0.5, translateY=0.5, scaleX=2, scaleY=2
)
gpd.GeoDataFrame(df_random_points.toPandas(), geometry="geometry").plot(markersize=1)
```

The data is now in the region `[0.5, 2.5] x [0.5, 2.5]`.
Expand Down
40 changes: 14 additions & 26 deletions docs/api/sql/Stac.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,15 +34,19 @@ df.show()
You can load a STAC collection from a s3 collection file object:

```python
df = sedona.read.format("stac").load("s3a://example.com/stac_bucket/stac_collection.json")
df = sedona.read.format("stac").load(
"s3a://example.com/stac_bucket/stac_collection.json"
)
df.printSchema()
df.show()
```

You can also load a STAC collection from an HTTP/HTTPS endpoint:

```python
df = sedona.read.format("stac").load("https://earth-search.aws.element84.com/v1/collections/sentinel-2-pre-c1-l2a")
df = sedona.read.format("stac").load(
"https://earth-search.aws.element84.com/v1/collections/sentinel-2-pre-c1-l2a"
)
df.printSchema()
df.show()
```
Expand Down Expand Up @@ -225,20 +229,15 @@ client = Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")

```python
items = client.search(
collection_id="aster-l1t",
datetime="2020",
return_dataframe=False
collection_id="aster-l1t", datetime="2020", return_dataframe=False
)
```

### Search Items on a Collection Within a Month and Max Items

```python
items = client.search(
collection_id="aster-l1t",
datetime="2020-05",
return_dataframe=False,
max_items=5
collection_id="aster-l1t", datetime="2020-05", return_dataframe=False, max_items=5
)
```

Expand All @@ -250,35 +249,26 @@ items = client.search(
ids=["AST_L1T_00312272006020322_20150518201805"],
bbox=[-180.0, -90.0, 180.0, 90.0],
datetime=["2006-01-01T00:00:00Z", "2007-01-01T00:00:00Z"],
return_dataframe=False
return_dataframe=False,
)
```

### Search Multiple Items with Multiple Bounding Boxes

```python
bbox_list = [
[-180.0, -90.0, 180.0, 90.0],
[-100.0, -50.0, 100.0, 50.0]
]
items = client.search(
collection_id="aster-l1t",
bbox=bbox_list,
return_dataframe=False
)
bbox_list = [[-180.0, -90.0, 180.0, 90.0], [-100.0, -50.0, 100.0, 50.0]]
items = client.search(collection_id="aster-l1t", bbox=bbox_list, return_dataframe=False)
```

### Search Items and Get DataFrame as Return with Multiple Intervals

```python
interval_list = [
["2020-01-01T00:00:00Z", "2020-06-01T00:00:00Z"],
["2020-07-01T00:00:00Z", "2021-01-01T00:00:00Z"]
["2020-07-01T00:00:00Z", "2021-01-01T00:00:00Z"],
]
df = client.search(
collection_id="aster-l1t",
datetime=interval_list,
return_dataframe=True
collection_id="aster-l1t", datetime=interval_list, return_dataframe=True
)
df.show()
```
Expand All @@ -288,9 +278,7 @@ df.show()
```python
# Save items in DataFrame to GeoParquet with both bounding boxes and intervals
client.get_collection("aster-l1t").save_to_geoparquet(
output_path="/path/to/output",
bbox=bbox_list,
datetime="2020-05"
output_path="/path/to/output", bbox=bbox_list, datetime="2020-05"
)
```

Expand Down
23 changes: 16 additions & 7 deletions docs/setup/azure-synapse-analytics.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,14 +75,23 @@ Start your notebook with:
```python
from sedona.spark import SedonaContext

config = SedonaContext.builder() \
.config('spark.jars.packages',
'org.apache.sedona:sedona-spark-shaded-3.4_2.12-1.6.1,'
'org.datasyslab:geotools-wrapper-1.6.1-28.2') \
.config("spark.serializer","org.apache.spark.serializer.KryoSerializer") \
.config("spark.kryo.registrator", "org.apache.sedona.core.serde.SedonaKryoRegistrator") \
.config("spark.sql.extensions", "org.apache.sedona.viz.sql.SedonaVizExtensions,org.apache.sedona.sql.SedonaSqlExtensions") \
config = (
SedonaContext.builder()
.config(
"spark.jars.packages",
"org.apache.sedona:sedona-spark-shaded-3.4_2.12-1.6.1,"
"org.datasyslab:geotools-wrapper-1.6.1-28.2",
)
.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.config(
"spark.kryo.registrator", "org.apache.sedona.core.serde.SedonaKryoRegistrator"
)
.config(
"spark.sql.extensions",
"org.apache.sedona.viz.sql.SedonaVizExtensions,org.apache.sedona.sql.SedonaSqlExtensions",
)
.getOrCreate()
)

sedona = SedonaContext.create(config)
```
Expand Down
1 change: 1 addition & 0 deletions docs/setup/databricks.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ SedonaSQLRegistrator.registerAll(spark)

```python
from sedona.register.geo_registrator import SedonaRegistrator

SedonaRegistrator.registerAll(spark)
```

Expand Down
Loading
Loading