-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update for new Demand Model #45
Update for new Demand Model #45
Conversation
The CI is not triggered for some weird reason. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @Margherita-Capitani! :DDD
Added some comments; happy to discuss if need support
.gitignore
Outdated
@@ -20,7 +20,7 @@ img/ | |||
.snakemake/ | |||
benchmarks/ | |||
cutouts/ | |||
data/ | |||
# data/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall this be restored by having the exeptions "!data..."?
Snakefile
Outdated
@@ -47,7 +47,7 @@ run = config.get("run", {}) | |||
RDIR = run["name"] + "/" if run.get("name") else "" | |||
countries = config["countries"] | |||
|
|||
ATLITE_NPROCESSES = config["atlite"].get("nprocesses", 5) | |||
ATLITE_NPROCESSES = config["atlite"].get("nprocesses", 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please restore
Snakefile
Outdated
electric_load_1="resources/demand/microgrid_load_1.csv", | ||
electric_load_2="resources/demand/microgrid_load_2.csv", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the three calculated load outputs:
- With Denise's rule readjust
- With the rule with Ramp without considering the standard deviation
- With the rule with Ramp considering the stndard deviation as sqrt(number_person)*random(std)
I would like to leave one but introduce the possibility to select the type of question modeling from a config parameter.
the proposal could be to make this change in the rule:
if build_demand_model == 0:
calculate_load(
n,
snakemake.config["load"]["scaling_factor"],
worldpop_path,
snakemake.input["microgrid_shapes"],
sample_profile,
snakemake.input["clusters_with_buildings"],
snakemake.output["electric_load"],
snakemake.input["building_csv"],
)
elif build_demand_model == 1:
calculate_load_ramp(
snakemake.input["clusters_with_buildings"],
n,
snakemake.config["load"]["scaling_factor"],
worldpop_path,
snakemake.input["microgrid_shapes"],
sample_profile,
snakemake.output["electric_load"],
snakemake.input["profile_Tier1"],
snakemake.input["profile_Tier2"],
snakemake.input["profile_Tier3"],
snakemake.input["profile_Tier4"],
snakemake.input["profile_Tier5"],
snakemake.output["electric_load"],
tier_percent,
)
elif build_demand_model == 2:
calculate_load_ramp_std(
snakemake.input["clusters_with_buildings"],
n,
snakemake.config["load"]["scaling_factor"],
worldpop_path,
snakemake.input["microgrid_shapes"],
sample_profile,
snakemake.output["electric_load"],
snakemake.input["profile_Tier1"],
snakemake.input["profile_Tier2"],
snakemake.input["profile_Tier3"],
snakemake.input["profile_Tier4"],
snakemake.input["profile_Tier5"],
snakemake.output["electric_load"],
tier_percent,
)
Where build_demand_model is a config. parameter to select the modeling type for demand
Snakefile
Outdated
profile_tier1="resources/ramp/daily_type_demand_Tier1.xlsx", | ||
profile_tier2="resources/ramp/daily_type_demand_Tier2.xlsx", | ||
profile_tier3="resources/ramp/daily_type_demand_Tier3.xlsx", | ||
profile_tier4="resources/ramp/daily_type_demand_Tier4.xlsx", | ||
profile_tier5="resources/ramp/daily_type_demand_Tier5.xlsx", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These may not be needed as they are loaded using the first **{...} block. Is it possible to revise them accordingly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, you're right!
config.distribution.yaml
Outdated
ramp: | ||
days: 365 | ||
|
||
tier: | ||
tier_percent: [0.3, 0.2, 0.2, 0.1, 0.15, 0.05] | ||
|
||
house_area_limit: | ||
area_limit: 255 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add few comments here to explain the reasoning of these parameters
scripts/cluster_buildings.py
Outdated
"cluster": i, | ||
} | ||
) | ||
central_features = gpd.GeoDataFrame(central_features) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specify the crs of the other dataframe to avoid crs issues; you can do:
gpd.GeoDataFrame(central_features, crs=microgrid_buildings.crs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect!
scripts/cluster_buildings.py
Outdated
centroids_building = [ | ||
(row.geometry.centroid.x, row.geometry.centroid.y) | ||
for row in cleaned_buildings.itertuples() | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this is repeated. Could you explain why?
What is the difference between the previous function and this one?
Same comment on cleaned_buildings.geometry.centroid applies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had kept the previous structure with two separate functions, and in order to split the buildings into the various clusters it was necessary to recalculate the position of the centroids, I was thinking, what if I compact these two functions into one function with the two separate outputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment below for proposed function
scripts/cluster_buildings.py
Outdated
for i, row in enumerate(cleaned_buildings.itertuples()): | ||
if row.geometry.type == "Polygon": | ||
cluster_id = kmeans.labels_[i] | ||
features.append( | ||
{ | ||
"properties": { | ||
"id": row.Index, | ||
"cluster_id": cluster_id, | ||
"tags_building": row.tags_building, | ||
"area_m2": row.area_m2, | ||
}, | ||
"geometry": row.geometry, | ||
} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above this can be revised with something similar to below but adapted:
microgrid_buildings = microgrid_buildings.loc[microgrid_buildings.geometry.type != "Point"]
However, I'm unsure we need this part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part was to create a new geodataframe where the buildings would be associated with their own cluster but certainly it can be written better and compacted. What do you think if I made a single function with the two ouputs compacted in this way?
for i, row in enumerate(cleaned_buildings.itertuples()): | |
if row.geometry.type == "Polygon": | |
cluster_id = kmeans.labels_[i] | |
features.append( | |
{ | |
"properties": { | |
"id": row.Index, | |
"cluster_id": cluster_id, | |
"tags_building": row.tags_building, | |
"area_m2": row.area_m2, | |
}, | |
"geometry": row.geometry, | |
} | |
) | |
def get_central_points_geojson_with_buildings( | |
input_filepath, output_filepath_centroids, n_clusters, crs, house_area_limit,output_filepath_buildings | |
): | |
microgrid_buildings = buildings_classification(input_filepath, crs, house_area_limit) | |
centroids_building = [ | |
(row.geometry.centroid.x, row.geometry.centroid.y) | |
for row in microgrid_buildings.itertuples() | |
] | |
centroids_building = np.array(centroids_building) | |
kmeans = KMeans(n_clusters=n_clusters, random_state=0).fit(centroids_building) | |
centroids = kmeans.cluster_centers_ | |
central_points = [] | |
for i in range(kmeans.n_clusters): | |
cluster_points = centroids_building[kmeans.labels_ == i] | |
distances = np.linalg.norm(cluster_points - centroids[i], axis=1) | |
central_point_idx = np.argmin(distances) | |
central_points.append(cluster_points[central_point_idx]) | |
central_features = [] | |
for i, central_point in enumerate(central_points): | |
central_features.append( | |
{ | |
"geometry": Point(central_point), | |
"cluster": i, | |
} | |
) | |
central_features = gpd.GeoDataFrame(central_features,crs=microgrid_buildings.crs) | |
central_features.to_file(output_filepath_centroids) | |
clusters = [] | |
for i, row in enumerate(microgrid_buildings.itertuples()): | |
cluster_id = kmeans.labels_[i] | |
clusters.append(cluster_id) | |
microgrid_buildings_gdf = gpd.GeoDataFrame(microgrid_buildings, crs=microgrid_buildings.crs) | |
microgrid_buildings_gdf.to_file(output_filepath_buildings) |
scripts/cluster_buildings.py
Outdated
buildings_geodataframe = gpd.read_file(input_filepath) | ||
|
||
grouped_buildings = buildings_geodataframe.groupby("cluster_id") | ||
clusters = np.sort(buildings_geodataframe["cluster_id"].unique()) | ||
counts = [] | ||
for cluster in clusters: | ||
cluster_buildings = pd.DataFrame(grouped_buildings.get_group(cluster)) | ||
building_tag = cluster_buildings["tags_building"] | ||
building_tag = pd.Series(building_tag) | ||
count = building_tag.value_counts() | ||
counts.append(count) | ||
counts = pd.DataFrame(counts).fillna(0).astype(int) | ||
counts["cluster"] = clusters | ||
counts.set_index("cluster", inplace=True) | ||
counts.to_csv(output_filepath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't get exactly the goal of this.
What about
buildings_geodataframe.cluster_id.value_counts()
?
if you wish to also have the difference by tags_buildings, maybe:
buildings_geodataframe.groupby("cluster_id").tags_building.value_counts()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely better, I had done it this way because I was not very familiar with values_count() and wanted to try to create a table to use in build_demand.
I made a few changes to the build_demand rule to make it work with this proposal of yours, which is definitely better and lighter, thanks!
I would say I can compact everything in the function above , adding under the proposal you made above also the row:
building_class = buildings_geodataframe.groupby("cluster_id").tags_building.value_counts()
and also adding this last varible to the outputs
scripts/cluster_buildings.py
Outdated
get_central_points_geojson( | ||
snakemake.output["cleaned_buildings_geojson"], | ||
snakemake.input["buildings_geojson"], | ||
snakemake.output["clusters"], | ||
snakemake.config["buildings"]["n_clusters"], | ||
crs, | ||
house_area_limit, | ||
) | ||
|
||
get_central_points_geojson_with_buildings( | ||
snakemake.output["cleaned_buildings_geojson"], | ||
snakemake.input["buildings_geojson"], | ||
snakemake.output["clusters_with_buildings"], | ||
snakemake.config["buildings"]["n_clusters"], | ||
crs, | ||
house_area_limit, | ||
) | ||
|
||
get_number_type_buildings( | ||
snakemake.output["clusters_with_buildings"], | ||
snakemake.output["number_buildings_type"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the functions I see a lot of input/output. Ideally snakemake.input["buildings_geojson"]
can be loaded outside and passed to the function so avoiding reading it multiple times.
It saves time.
Same applies for the others
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case it is okay to compact the functions and keep the three outputs, the buildings_geojson would only be called once, can I leave it in instead of putting it out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great @Margherita-Capitani :D
This PR WORKS that is amazing! and contains quite a large number of well needed features.
Improvements are indeed possible, but I'd be prone to merge as is and improve later.
Fantastic job!
centroids_building = [ | ||
(row.geometry.centroid.x, row.geometry.centroid.y) | ||
for row in microgrid_buildings.itertuples() | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok! :)
|
||
# Create GeoJSON feature for each central point | ||
features = [] | ||
central_features = [] | ||
for i, central_point in enumerate(central_points): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this one can be improved, but I'd be prone to keep like this and improve later.
This PR contains alread a lot of well needed features and it works :)
This Pull Request aims to update the methodology for estimating demand in the microgrid.
The main changes concern the following rules:
And the files related to them.
The main change is realtive to the introduction of a bottom-up model for demand estimation, based on the use of building data downloaded from OSM and the "RAMP demand" tool.
The proposed update still needs to be improved in particular to allow selection of the demand modeling method from the config file, and to refine the techniques proposed in build_demand for demand estimation.
The work is currently running locally but some output data still needs to be analyzed.