Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent data repositories across HPCs #36

Open
kasra-keshavarz opened this issue Feb 8, 2024 · 1 comment
Open

Consistent data repositories across HPCs #36

kasra-keshavarz opened this issue Feb 8, 2024 · 1 comment
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@kasra-keshavarz
Copy link
Collaborator

The data repository need to become consistent in order to be sharable across various HPCs. Data repository is not directly included in here, however, path names need to be updated soon.

Some recommendations:

  1. lower case directory names,
  2. download workflows,
  3. TBD
@kasra-keshavarz kasra-keshavarz added documentation Improvements or additions to documentation enhancement New feature or request labels Feb 8, 2024
@kasra-keshavarz kasra-keshavarz self-assigned this Feb 8, 2024
@kasra-keshavarz kasra-keshavarz changed the title Consistent data repository Consistent data repositories across HPCs Feb 23, 2024
kasra-keshavarz added a commit that referenced this issue Mar 5, 2024
In this commit, the following are addressed:
 * Correcting paths for the local scripts,
 * Renaming scripts to reflect the owner of the script for further
   clarification,
 * Adding parallelization schemes based on model, ensemble, and scenario,
 * Adding gcc/9.3.0 as the reference clib for the modules loaded to
   prevent mismatch between various environments defined on the HPCs,
 * Assuring ESPG:4326 is considered for the input shape file if there is
   no CRS defined,
 * Getting rid of \t characters in the help messages,
 * Correcting short help message to be more informative,
 * Adding function declarations to follow Google’s shell scripting
   guidelines,
 * Assuring --account=STR is described in the help message.

Signed-off-by: Kasra Keshavarz <[email protected]>
@kasra-keshavarz kasra-keshavarz mentioned this issue Mar 5, 2024
kasra-keshavarz added a commit that referenced this issue Mar 6, 2024
* Ouranos ESPO-G6-R2 script + new capablities

This script introduces new features to the tool, including the
capability to process the climate datasets, including those consisting
of multiple models, submodels (those with specific configuration sets),
ensemble members, and multiple scenarios (SSPs). The parent calling
script is in charge of parallelization scheme, if needed.

With this script, a few issues related to the current deficiencies of
datatool could be resolved simultaneously.

Signed-off-by: Kasra Keshavarz <[email protected]>

* Fixing short usage and comments

* Adding new parallelization schemes

Multi parallelization schemes are added, so the package not only submit
array jobs based on the given date range and the chunk schemes, but also
considers submitting jobs based on various models, ensemble members, and
scenarios. These new parallelization schemes mostly applies to climate
datasets, but not necessarily.

This commit aims to save time for the user and fasten the processing
time for datasets.

This commit resolves issue #25 on remote GitHub hosting repository.
Furthermore, it adds the ESPO dataset to the list of datasets as well.

Moreover, a new option is implement to show the list of currently
available datasets to the users.

Signed-off-by: Kasra Keshavarz <[email protected]>

* Separating dataset information from the main extract-dataset script

This is meant to clearly organize the information provided inside the
package. The new file lists all the available datasets and the keyword
that users can provide the `--dataset` option. Previously, this
information was part of the main Usage message `--help` of the main
script.

Signed-off-by: Kasra Keshavarz <[email protected]>

* Adding GDDP-NEX-CMIP6 info

* Fixing DOI value for ab-gov dataset

* Adding NASA GDDP-NEX-CMIP6 script address

* ESPO-G6-R2 data processing example

* Multiple minor modifications

1. the "function" keywords added to make the style compatible with that
   of Google's recommendations,
2. required arguments and options are revised alongside the relevant
   comments,
3. typos are fixed

Signed-off-by: Kasra Keshavarz <[email protected]>

* AB Government Climate Dataset Script

The script deals with the Climate Dataset produced by the Alberta
Government. The dataset is not public yet, and is planned to be
available soon.

Signed-off-by: Kasra Keshavarz <[email protected]>

* Adding variable list for various elevation levels

Since some hydrological models can use near-surface level or 40m level
data, the necessary list of variables for both levels are added.

Furthermore, a link to the official website for the dataset is added for
further clarity.

Signed-off-by: Kasra Keshavarz <[email protected]>

* Path to the dataset for rpp-kshook allocation is updated

Since multiple HPCs are now used for the workflows, it is important to
have consistent datasets synchronized regularly. Therefore, this commit
attempts to reflect these efforts by creating consistent paths for
various HPCs/allocations.

Signed-off-by: Kasra Keshavarz <[email protected]>

* Bumping version to v0.5.0

* Addressing issues #39, #37, #36, #35, #34, and #25

In this commit, the following are addressed:
 * Correcting paths for the local scripts,
 * Renaming scripts to reflect the owner of the script for further
   clarification,
 * Adding parallelization schemes based on model, ensemble, and scenario,
 * Adding gcc/9.3.0 as the reference clib for the modules loaded to
   prevent mismatch between various environments defined on the HPCs,
 * Assuring ESPG:4326 is considered for the input shape file if there is
   no CRS defined,
 * Getting rid of \t characters in the help messages,
 * Correcting short help message to be more informative,
 * Adding function declarations to follow Google’s shell scripting
   guidelines,
 * Assuring --account=STR is described in the help message.

Signed-off-by: Kasra Keshavarz <[email protected]>

* Assuring compatibility of the style with Google's shell scripting guidelines

* Organizing the assets directory

Various files within this directory is categorized to be more
informative for the users/devs.

Signed-off-by: Kasra Keshavarz <[email protected]>

* README file for ab-gov dataset

The README file for this dataset is added, offering necessary
information for the users.

Signed-off-by: Kasra Keshavarz <[email protected]>

* Minor structural changes

This commit assures all dataset scripts follows the convention of
<institute>-<dataset-name> under the `scripts` path.

Furthermore, necessary adjusments on the styles of the scripts has been
implemented, including:
  * adding `--model`, `--scenario`, and `--ensemble` options, if missing,
    for compatibility with the main caller script, as these options are
    given to the script by `extract-dataset.sh` script,
  * assuring scripting style follows that of Google's shell scripting
    guidelines,
  * the paths to the externally called scripts are properlly adjusted,
    after modifications to the structure of datatool's `assets`
    directory, and
  * minor changes to the source code to assure compatibility with the
    v0.5.0 of datatool.

Signed-off-by: Kasra Keshavarz <[email protected]>

* Tracking LICENSE of eccc-rdrs

* Tracking eccc-rdrs script

* Tracking GWF-NCAR CONUS-I script

* Documentation for NASA's NEX-GDDP-CMIP6 dataset

This commit addresses issue #27 by describing the NASA's NEX-GDDP-CMIP^
dataset and relevant scripts for it. Furthermore, it provides necessary
information for users to enable them use `datatool` for extracting
subsets of the dataset for any temporal and spatial extents.

Signed-off-by: Kasra Keshavarz <[email protected]>

* Script for NASA's NEX-GDDP-CMIP6 dataset

This commit addresses issue #27 and provides scripts to extract subset
from NASA's NEX-GDDP-CMIP6 dataset. This script is capable to work with
various models, scenarios, ensemble members, and variables offered by
this dataset.

Signed-off-by: Kasra Keshavarz <[email protected]>

* Adding Ouranos ESPO-G6-R2 Dataset Script

This commit addresses issue #34 and processes this dataset that contains
multiple GCM model outputs, including various sub-models, scenarios,
ensemble members, and variables.

Signed-off-by: Kasra Keshavarz <[email protected]>

* Documenting Ouranos ESPO-G6-R2 Dataset script

Necessary information to use `datatool` for this script is provided to
the user via the README.md file.

Signed-off-by: Kasra Keshavarz <[email protected]>

* Updating changelog for v0.5.0

* Adding a section for WIP directories

* Restructuring script directory

With the growing number of scripts, this commit tries to restructure
this directory to provide more clarity and organization for the users.

Signed-off-by: Kasra Keshavarz <[email protected]>

* Updates to the documentations

The help message has been trimmed to provide more information to the
users. This include values provided to the `--lon-lims` that must be
within the [-180, +180] limits. This has not been mentioned before to
the users and could have provided confusion, as there are multiple
methods to describe longitudes.

Furthermore, the list of datasets on the main page of the repository has
been updated to reflect the most up-to-date list.

Signed-off-by: Kasra Keshavarz <[email protected]>

* Upgrading style of warning message

* Upgrading style of warning message

* Updating link addresses for CONUS I & II

* Updating link address to ERA5 dataset

* Removing dead link for the Ouranos MRCC5 dataset for now

---------

Signed-off-by: Kasra Keshavarz <[email protected]>
@kasra-keshavarz
Copy link
Collaborator Author

Partially resolved with #43

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant