Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update for new Demand Model #45

Merged

Conversation

Margherita-Capitani
Copy link
Contributor

This Pull Request aims to update the methodology for estimating demand in the microgrid.

The main changes concern the following rules:

  • ramp build demand profile
  • build demand
  • cluster building

And the files related to them.
The main change is realtive to the introduction of a bottom-up model for demand estimation, based on the use of building data downloaded from OSM and the "RAMP demand" tool.
The proposed update still needs to be improved in particular to allow selection of the demand modeling method from the config file, and to refine the techniques proposed in build_demand for demand estimation.

The work is currently running locally but some output data still needs to be analyzed.

@davide-f davide-f marked this pull request as ready for review June 28, 2024 18:09
@davide-f
Copy link
Member

The CI is not triggered for some weird reason.
With the next commit it should be triggered again. I'll add few comments in the between

Copy link
Member

@davide-f davide-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @Margherita-Capitani! :DDD
Added some comments; happy to discuss if need support

.gitignore Outdated
@@ -20,7 +20,7 @@ img/
.snakemake/
benchmarks/
cutouts/
data/
# data/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall this be restored by having the exeptions "!data..."?

Snakefile Outdated
@@ -47,7 +47,7 @@ run = config.get("run", {})
RDIR = run["name"] + "/" if run.get("name") else ""
countries = config["countries"]

ATLITE_NPROCESSES = config["atlite"].get("nprocesses", 5)
ATLITE_NPROCESSES = config["atlite"].get("nprocesses", 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please restore

Snakefile Outdated
Comment on lines 124 to 125
electric_load_1="resources/demand/microgrid_load_1.csv",
electric_load_2="resources/demand/microgrid_load_2.csv",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the three calculated load outputs:

  • With Denise's rule readjust
  • With the rule with Ramp without considering the standard deviation
  • With the rule with Ramp considering the stndard deviation as sqrt(number_person)*random(std)

I would like to leave one but introduce the possibility to select the type of question modeling from a config parameter.
the proposal could be to make this change in the rule:

if build_demand_model == 0:
calculate_load(
n,
snakemake.config["load"]["scaling_factor"],
worldpop_path,
snakemake.input["microgrid_shapes"],
sample_profile,
snakemake.input["clusters_with_buildings"],
snakemake.output["electric_load"],
snakemake.input["building_csv"],
)

elif build_demand_model == 1:
    calculate_load_ramp(
        snakemake.input["clusters_with_buildings"],
        n,
        snakemake.config["load"]["scaling_factor"],
        worldpop_path,
        snakemake.input["microgrid_shapes"],
        sample_profile,
        snakemake.output["electric_load"],
        snakemake.input["profile_Tier1"],
        snakemake.input["profile_Tier2"],
        snakemake.input["profile_Tier3"],
        snakemake.input["profile_Tier4"],
        snakemake.input["profile_Tier5"],
        snakemake.output["electric_load"],
        tier_percent,
    )
elif build_demand_model == 2:

    calculate_load_ramp_std(
        snakemake.input["clusters_with_buildings"],
        n,
        snakemake.config["load"]["scaling_factor"],
        worldpop_path,
        snakemake.input["microgrid_shapes"],
        sample_profile,
        snakemake.output["electric_load"],
        snakemake.input["profile_Tier1"],
        snakemake.input["profile_Tier2"],
        snakemake.input["profile_Tier3"],
        snakemake.input["profile_Tier4"],
        snakemake.input["profile_Tier5"],
        snakemake.output["electric_load"],
        tier_percent,
    )

Where build_demand_model is a config. parameter to select the modeling type for demand

Snakefile Outdated
Comment on lines 117 to 121
profile_tier1="resources/ramp/daily_type_demand_Tier1.xlsx",
profile_tier2="resources/ramp/daily_type_demand_Tier2.xlsx",
profile_tier3="resources/ramp/daily_type_demand_Tier3.xlsx",
profile_tier4="resources/ramp/daily_type_demand_Tier4.xlsx",
profile_tier5="resources/ramp/daily_type_demand_Tier5.xlsx",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These may not be needed as they are loaded using the first **{...} block. Is it possible to revise them accordingly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, you're right!

Comment on lines 44 to 51
ramp:
days: 365

tier:
tier_percent: [0.3, 0.2, 0.2, 0.1, 0.15, 0.05]

house_area_limit:
area_limit: 255
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add few comments here to explain the reasoning of these parameters

"cluster": i,
}
)
central_features = gpd.GeoDataFrame(central_features)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specify the crs of the other dataframe to avoid crs issues; you can do:
gpd.GeoDataFrame(central_features, crs=microgrid_buildings.crs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect!

Comment on lines 93 to 96
centroids_building = [
(row.geometry.centroid.x, row.geometry.centroid.y)
for row in cleaned_buildings.itertuples()
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this is repeated. Could you explain why?
What is the difference between the previous function and this one?
Same comment on cleaned_buildings.geometry.centroid applies

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had kept the previous structure with two separate functions, and in order to split the buildings into the various clusters it was necessary to recalculate the position of the centroids, I was thinking, what if I compact these two functions into one function with the two separate outputs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment below for proposed function

Comment on lines 109 to 122
for i, row in enumerate(cleaned_buildings.itertuples()):
if row.geometry.type == "Polygon":
cluster_id = kmeans.labels_[i]
features.append(
{
"properties": {
"id": row.Index,
"cluster_id": cluster_id,
"tags_building": row.tags_building,
"area_m2": row.area_m2,
},
"geometry": row.geometry,
}
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above this can be revised with something similar to below but adapted:
microgrid_buildings = microgrid_buildings.loc[microgrid_buildings.geometry.type != "Point"]
However, I'm unsure we need this part

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part was to create a new geodataframe where the buildings would be associated with their own cluster but certainly it can be written better and compacted. What do you think if I made a single function with the two ouputs compacted in this way?

Suggested change
for i, row in enumerate(cleaned_buildings.itertuples()):
if row.geometry.type == "Polygon":
cluster_id = kmeans.labels_[i]
features.append(
{
"properties": {
"id": row.Index,
"cluster_id": cluster_id,
"tags_building": row.tags_building,
"area_m2": row.area_m2,
},
"geometry": row.geometry,
}
)
def get_central_points_geojson_with_buildings(
input_filepath, output_filepath_centroids, n_clusters, crs, house_area_limit,output_filepath_buildings
):
microgrid_buildings = buildings_classification(input_filepath, crs, house_area_limit)
centroids_building = [
(row.geometry.centroid.x, row.geometry.centroid.y)
for row in microgrid_buildings.itertuples()
]
centroids_building = np.array(centroids_building)
kmeans = KMeans(n_clusters=n_clusters, random_state=0).fit(centroids_building)
centroids = kmeans.cluster_centers_
central_points = []
for i in range(kmeans.n_clusters):
cluster_points = centroids_building[kmeans.labels_ == i]
distances = np.linalg.norm(cluster_points - centroids[i], axis=1)
central_point_idx = np.argmin(distances)
central_points.append(cluster_points[central_point_idx])
central_features = []
for i, central_point in enumerate(central_points):
central_features.append(
{
"geometry": Point(central_point),
"cluster": i,
}
)
central_features = gpd.GeoDataFrame(central_features,crs=microgrid_buildings.crs)
central_features.to_file(output_filepath_centroids)
clusters = []
for i, row in enumerate(microgrid_buildings.itertuples()):
cluster_id = kmeans.labels_[i]
clusters.append(cluster_id)
microgrid_buildings_gdf = gpd.GeoDataFrame(microgrid_buildings, crs=microgrid_buildings.crs)
microgrid_buildings_gdf.to_file(output_filepath_buildings)

Comment on lines 128 to 142
buildings_geodataframe = gpd.read_file(input_filepath)

grouped_buildings = buildings_geodataframe.groupby("cluster_id")
clusters = np.sort(buildings_geodataframe["cluster_id"].unique())
counts = []
for cluster in clusters:
cluster_buildings = pd.DataFrame(grouped_buildings.get_group(cluster))
building_tag = cluster_buildings["tags_building"]
building_tag = pd.Series(building_tag)
count = building_tag.value_counts()
counts.append(count)
counts = pd.DataFrame(counts).fillna(0).astype(int)
counts["cluster"] = clusters
counts.set_index("cluster", inplace=True)
counts.to_csv(output_filepath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't get exactly the goal of this.
What about
buildings_geodataframe.cluster_id.value_counts() ?
if you wish to also have the difference by tags_buildings, maybe:
buildings_geodataframe.groupby("cluster_id").tags_building.value_counts()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely better, I had done it this way because I was not very familiar with values_count() and wanted to try to create a table to use in build_demand.
I made a few changes to the build_demand rule to make it work with this proposal of yours, which is definitely better and lighter, thanks!

I would say I can compact everything in the function above , adding under the proposal you made above also the row:

building_class = buildings_geodataframe.groupby("cluster_id").tags_building.value_counts()

and also adding this last varible to the outputs

Comment on lines 158 to 176
get_central_points_geojson(
snakemake.output["cleaned_buildings_geojson"],
snakemake.input["buildings_geojson"],
snakemake.output["clusters"],
snakemake.config["buildings"]["n_clusters"],
crs,
house_area_limit,
)

get_central_points_geojson_with_buildings(
snakemake.output["cleaned_buildings_geojson"],
snakemake.input["buildings_geojson"],
snakemake.output["clusters_with_buildings"],
snakemake.config["buildings"]["n_clusters"],
crs,
house_area_limit,
)

get_number_type_buildings(
snakemake.output["clusters_with_buildings"],
snakemake.output["number_buildings_type"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the functions I see a lot of input/output. Ideally snakemake.input["buildings_geojson"] can be loaded outside and passed to the function so avoiding reading it multiple times.
It saves time.
Same applies for the others

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case it is okay to compact the functions and keep the three outputs, the buildings_geojson would only be called once, can I leave it in instead of putting it out?

Copy link
Member

@davide-f davide-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great @Margherita-Capitani :D

This PR WORKS that is amazing! and contains quite a large number of well needed features.
Improvements are indeed possible, but I'd be prone to merge as is and improve later.

Fantastic job!

Comment on lines +59 to +62
centroids_building = [
(row.geometry.centroid.x, row.geometry.centroid.y)
for row in microgrid_buildings.itertuples()
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok! :)


# Create GeoJSON feature for each central point
features = []
central_features = []
for i, central_point in enumerate(central_points):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this one can be improved, but I'd be prone to keep like this and improve later.
This PR contains alread a lot of well needed features and it works :)

@davide-f davide-f merged commit 64f873c into pypsa-meets-earth:main Jul 4, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants