Update for new Demand Model #45

Margherita-Capitani · 2024-06-28T15:24:49Z

This Pull Request aims to update the methodology for estimating demand in the microgrid.

The main changes concern the following rules:

ramp build demand profile
build demand
cluster building

And the files related to them.
The main change is realtive to the introduction of a bottom-up model for demand estimation, based on the use of building data downloaded from OSM and the "RAMP demand" tool.
The proposed update still needs to be improved in particular to allow selection of the demand modeling method from the config file, and to refine the techniques proposed in build_demand for demand estimation.

The work is currently running locally but some output data still needs to be analyzed.

davide-f · 2024-06-29T13:46:32Z

The CI is not triggered for some weird reason.
With the next commit it should be triggered again. I'll add few comments in the between

davide-f

Great work @Margherita-Capitani! :DDD
Added some comments; happy to discuss if need support

davide-f · 2024-06-29T13:47:14Z

.gitignore

@@ -20,7 +20,7 @@ img/
 .snakemake/
 benchmarks/
 cutouts/
-data/
+# data/


Shall this be restored by having the exeptions "!data..."?

davide-f · 2024-06-29T13:47:31Z

Snakefile

@@ -47,7 +47,7 @@ run = config.get("run", {})
 RDIR = run["name"] + "/" if run.get("name") else ""
 countries = config["countries"]

-ATLITE_NPROCESSES = config["atlite"].get("nprocesses", 5)
+ATLITE_NPROCESSES = config["atlite"].get("nprocesses", 1)


Please restore

davide-f · 2024-06-29T13:47:57Z

Snakefile

+        electric_load_1="resources/demand/microgrid_load_1.csv",
+        electric_load_2="resources/demand/microgrid_load_2.csv",


What are these?

These are the three calculated load outputs:

With Denise's rule readjust

With the rule with Ramp without considering the standard deviation

With the rule with Ramp considering the stndard deviation as sqrt(number_person)*random(std)

I would like to leave one but introduce the possibility to select the type of question modeling from a config parameter.
the proposal could be to make this change in the rule:

if build_demand_model == 0:
calculate_load(
n,
snakemake.config["load"]["scaling_factor"],
worldpop_path,
snakemake.input["microgrid_shapes"],
sample_profile,
snakemake.input["clusters_with_buildings"],
snakemake.output["electric_load"],
snakemake.input["building_csv"],
)

elif build_demand_model == 1: calculate_load_ramp( snakemake.input["clusters_with_buildings"], n, snakemake.config["load"]["scaling_factor"], worldpop_path, snakemake.input["microgrid_shapes"], sample_profile, snakemake.output["electric_load"], snakemake.input["profile_Tier1"], snakemake.input["profile_Tier2"], snakemake.input["profile_Tier3"], snakemake.input["profile_Tier4"], snakemake.input["profile_Tier5"], snakemake.output["electric_load"], tier_percent, ) elif build_demand_model == 2: calculate_load_ramp_std( snakemake.input["clusters_with_buildings"], n, snakemake.config["load"]["scaling_factor"], worldpop_path, snakemake.input["microgrid_shapes"], sample_profile, snakemake.output["electric_load"], snakemake.input["profile_Tier1"], snakemake.input["profile_Tier2"], snakemake.input["profile_Tier3"], snakemake.input["profile_Tier4"], snakemake.input["profile_Tier5"], snakemake.output["electric_load"], tier_percent, )

Where build_demand_model is a config. parameter to select the modeling type for demand

davide-f · 2024-06-29T13:49:18Z

Snakefile

+        profile_tier1="resources/ramp/daily_type_demand_Tier1.xlsx",
+        profile_tier2="resources/ramp/daily_type_demand_Tier2.xlsx",
+        profile_tier3="resources/ramp/daily_type_demand_Tier3.xlsx",
+        profile_tier4="resources/ramp/daily_type_demand_Tier4.xlsx",
+        profile_tier5="resources/ramp/daily_type_demand_Tier5.xlsx",


These may not be needed as they are loaded using the first **{...} block. Is it possible to revise them accordingly?

Sure, you're right!

davide-f · 2024-06-29T13:49:44Z

config.distribution.yaml

+ramp:
+  days: 365
+
+tier:
+  tier_percent: [0.3, 0.2, 0.2, 0.1, 0.15, 0.05]
+
+house_area_limit:
+  area_limit: 255


It would be good to add few comments here to explain the reasoning of these parameters

davide-f · 2024-06-29T14:04:23Z

scripts/cluster_buildings.py

+                "cluster": i,
+            }
+        )
+    central_features = gpd.GeoDataFrame(central_features)


specify the crs of the other dataframe to avoid crs issues; you can do:
gpd.GeoDataFrame(central_features, crs=microgrid_buildings.crs)

davide-f · 2024-06-29T14:05:34Z

scripts/cluster_buildings.py

+    centroids_building = [
+        (row.geometry.centroid.x, row.geometry.centroid.y)
+        for row in cleaned_buildings.itertuples()
+    ]


It seems this is repeated. Could you explain why?
What is the difference between the previous function and this one?
Same comment on cleaned_buildings.geometry.centroid applies

I had kept the previous structure with two separate functions, and in order to split the buildings into the various clusters it was necessary to recalculate the position of the centroids, I was thinking, what if I compact these two functions into one function with the two separate outputs?

See comment below for proposed function

davide-f · 2024-06-29T14:06:47Z

scripts/cluster_buildings.py

+    for i, row in enumerate(cleaned_buildings.itertuples()):
+        if row.geometry.type == "Polygon":
+            cluster_id = kmeans.labels_[i]
+            features.append(
+                {
+                    "properties": {
+                        "id": row.Index,
+                        "cluster_id": cluster_id,
+                        "tags_building": row.tags_building,
+                        "area_m2": row.area_m2,
+                    },
+                    "geometry": row.geometry,
+                }
            )


As above this can be revised with something similar to below but adapted:
microgrid_buildings = microgrid_buildings.loc[microgrid_buildings.geometry.type != "Point"]
However, I'm unsure we need this part

This part was to create a new geodataframe where the buildings would be associated with their own cluster but certainly it can be written better and compacted. What do you think if I made a single function with the two ouputs compacted in this way?

Suggested change

for i, row in enumerate(cleaned_buildings.itertuples()):

if row.geometry.type == "Polygon":

cluster_id = kmeans.labels_[i]

features.append(

{

"properties": {

"id": row.Index,

"cluster_id": cluster_id,

"tags_building": row.tags_building,

"area_m2": row.area_m2,

},

"geometry": row.geometry,

}

)

def get_central_points_geojson_with_buildings(

input_filepath, output_filepath_centroids, n_clusters, crs, house_area_limit,output_filepath_buildings

):

microgrid_buildings = buildings_classification(input_filepath, crs, house_area_limit)

centroids_building = [

(row.geometry.centroid.x, row.geometry.centroid.y)

for row in microgrid_buildings.itertuples()

]

centroids_building = np.array(centroids_building)

kmeans = KMeans(n_clusters=n_clusters, random_state=0).fit(centroids_building)

centroids = kmeans.cluster_centers_

central_points = []

for i in range(kmeans.n_clusters):

cluster_points = centroids_building[kmeans.labels_ == i]

distances = np.linalg.norm(cluster_points - centroids[i], axis=1)

central_point_idx = np.argmin(distances)

central_points.append(cluster_points[central_point_idx])

central_features = []

for i, central_point in enumerate(central_points):

central_features.append(

{

"geometry": Point(central_point),

"cluster": i,

}

)

central_features = gpd.GeoDataFrame(central_features,crs=microgrid_buildings.crs)

central_features.to_file(output_filepath_centroids)

clusters = []

for i, row in enumerate(microgrid_buildings.itertuples()):

cluster_id = kmeans.labels_[i]

clusters.append(cluster_id)

microgrid_buildings_gdf = gpd.GeoDataFrame(microgrid_buildings, crs=microgrid_buildings.crs)

microgrid_buildings_gdf.to_file(output_filepath_buildings)

davide-f · 2024-06-29T14:10:09Z

scripts/cluster_buildings.py

+    buildings_geodataframe = gpd.read_file(input_filepath)
+
+    grouped_buildings = buildings_geodataframe.groupby("cluster_id")
+    clusters = np.sort(buildings_geodataframe["cluster_id"].unique())
+    counts = []
+    for cluster in clusters:
+        cluster_buildings = pd.DataFrame(grouped_buildings.get_group(cluster))
+        building_tag = cluster_buildings["tags_building"]
+        building_tag = pd.Series(building_tag)
+        count = building_tag.value_counts()
+        counts.append(count)
+    counts = pd.DataFrame(counts).fillna(0).astype(int)
+    counts["cluster"] = clusters
+    counts.set_index("cluster", inplace=True)
+    counts.to_csv(output_filepath)


Sorry, I don't get exactly the goal of this.
What about
buildings_geodataframe.cluster_id.value_counts() ?
if you wish to also have the difference by tags_buildings, maybe:
buildings_geodataframe.groupby("cluster_id").tags_building.value_counts()

Absolutely better, I had done it this way because I was not very familiar with values_count() and wanted to try to create a table to use in build_demand.
I made a few changes to the build_demand rule to make it work with this proposal of yours, which is definitely better and lighter, thanks!

I would say I can compact everything in the function above , adding under the proposal you made above also the row:

building_class = buildings_geodataframe.groupby("cluster_id").tags_building.value_counts()

and also adding this last varible to the outputs

davide-f · 2024-06-29T14:11:16Z

scripts/cluster_buildings.py

    get_central_points_geojson(
-        snakemake.output["cleaned_buildings_geojson"],
+        snakemake.input["buildings_geojson"],
        snakemake.output["clusters"],
        snakemake.config["buildings"]["n_clusters"],
+        crs,
+        house_area_limit,
    )

    get_central_points_geojson_with_buildings(
-        snakemake.output["cleaned_buildings_geojson"],
+        snakemake.input["buildings_geojson"],
        snakemake.output["clusters_with_buildings"],
        snakemake.config["buildings"]["n_clusters"],
+        crs,
+        house_area_limit,
+    )
+
+    get_number_type_buildings(
+        snakemake.output["clusters_with_buildings"],
+        snakemake.output["number_buildings_type"],


In the functions I see a lot of input/output. Ideally snakemake.input["buildings_geojson"] can be loaded outside and passed to the function so avoiding reading it multiple times.
It saves time.
Same applies for the others

In case it is okay to compact the functions and keep the three outputs, the buildings_geojson would only be called once, can I leave it in instead of putting it out?

davide-f

Great @Margherita-Capitani :D

This PR WORKS that is amazing! and contains quite a large number of well needed features.
Improvements are indeed possible, but I'd be prone to merge as is and improve later.

Fantastic job!

davide-f · 2024-07-04T10:54:40Z

scripts/cluster_buildings.py

+    centroids_building = [
+        (row.geometry.centroid.x, row.geometry.centroid.y)
+        for row in microgrid_buildings.itertuples()
+    ]


davide-f · 2024-07-04T10:55:56Z

scripts/cluster_buildings.py

-
-    # Create GeoJSON feature for each central point
-    features = []
+    central_features = []
    for i, central_point in enumerate(central_points):


Also this one can be improved, but I'd be prone to keep like this and improve later.
This PR contains alread a lot of well needed features and it works :)

Margherita-Capitani added 4 commits June 28, 2024 17:12

First Working Update for new Demand Model

73f5ec2

Update config test

076efbf

Fix ci-linux

c7af5eb

Revise CI

40cde15

davide-f marked this pull request as ready for review June 28, 2024 18:09

davide-f reviewed Jun 29, 2024

View reviewed changes

Margherita-Capitani added 3 commits July 3, 2024 16:07

Fix after first review

7d53942

Config.distribution test fix

b3d3118

fix

6f2d973

davide-f approved these changes Jul 4, 2024

View reviewed changes

davide-f merged commit 64f873c into pypsa-meets-earth:main Jul 4, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update for new Demand Model #45

Update for new Demand Model #45

Margherita-Capitani commented Jun 28, 2024

davide-f commented Jun 29, 2024

davide-f left a comment

davide-f Jun 29, 2024

davide-f Jun 29, 2024

davide-f Jun 29, 2024

Margherita-Capitani Jul 3, 2024

davide-f Jun 29, 2024

Margherita-Capitani Jul 3, 2024

davide-f Jun 29, 2024

davide-f Jun 29, 2024

Margherita-Capitani Jun 30, 2024

davide-f Jun 29, 2024

Margherita-Capitani Jun 30, 2024

Margherita-Capitani Jul 2, 2024

davide-f Jun 29, 2024

Margherita-Capitani Jul 2, 2024

davide-f Jun 29, 2024

Margherita-Capitani Jul 3, 2024

davide-f Jun 29, 2024

Margherita-Capitani Jul 3, 2024

Margherita-Capitani Jul 3, 2024

davide-f left a comment

davide-f Jul 4, 2024

davide-f Jul 4, 2024

		electric_load_1="resources/demand/microgrid_load_1.csv",
		electric_load_2="resources/demand/microgrid_load_2.csv",

-    for i, row in enumerate(cleaned_buildings.itertuples()):
-        if row.geometry.type == "Polygon":
-            cluster_id = kmeans.labels_[i]
-            features.append(
-                {
-                    "properties": {
-                        "id": row.Index,
-                        "cluster_id": cluster_id,
-                        "tags_building": row.tags_building,
-                        "area_m2": row.area_m2,
-                    },
-                    "geometry": row.geometry,
-                }
-            )
+def get_central_points_geojson_with_buildings(
+    input_filepath, output_filepath_centroids, n_clusters, crs, house_area_limit,output_filepath_buildings
+):
+    microgrid_buildings = buildings_classification(input_filepath, crs, house_area_limit)
+    centroids_building = [
+        (row.geometry.centroid.x, row.geometry.centroid.y)
+        for row in microgrid_buildings.itertuples()
+    ]
+    centroids_building = np.array(centroids_building)
+    kmeans = KMeans(n_clusters=n_clusters, random_state=0).fit(centroids_building)
+    centroids = kmeans.cluster_centers_
+    central_points = []
+    for i in range(kmeans.n_clusters):
+        cluster_points = centroids_building[kmeans.labels_ == i]
+        distances = np.linalg.norm(cluster_points - centroids[i], axis=1)
+        central_point_idx = np.argmin(distances)
+        central_points.append(cluster_points[central_point_idx])
+    central_features = []
+    for i, central_point in enumerate(central_points):
+        central_features.append(
+            {
+                "geometry": Point(central_point),
+                "cluster": i,
+            }
+        )
+    central_features = gpd.GeoDataFrame(central_features,crs=microgrid_buildings.crs)
+    central_features.to_file(output_filepath_centroids)
+    clusters = []
+    for i, row in enumerate(microgrid_buildings.itertuples()):
+        cluster_id = kmeans.labels_[i]
+        clusters.append(cluster_id)
+    microgrid_buildings_gdf = gpd.GeoDataFrame(microgrid_buildings, crs=microgrid_buildings.crs)
+    microgrid_buildings_gdf.to_file(output_filepath_buildings)

Update for new Demand Model #45

Update for new Demand Model #45

Conversation

Margherita-Capitani commented Jun 28, 2024

davide-f commented Jun 29, 2024

davide-f left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davide-f left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment