Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add more documentation on met office dataset on hf #49

Merged
merged 1 commit into from
Jan 26, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,15 @@ For contributors unfamiliar with these concepts, the [Machine Learning Terms](#m

Datasets form the backbone of solar forecasting by providing the historical and real-time data required for model training and evaluation. This project leverages a variety of datasets, including weather, solar generation, and climate data.

For a detailed list of datasets and their descriptions, please refer to the [Datasets Guide](datasets.md).
### Met Office UK Deterministic (UKV)
A NWP dataset used for UK solar forecasting. See [Met Office Dataset Documentation](met_office_dataset.md) for detailed information about:
- Variables and their impact on solar forecasting
- Dataset structure and format
- Data quality considerations
- Access instructions via Hugging Face

### Other Weather Datasets
For a complete list of available weather datasets and their descriptions, see the [Datasets Guide](datasets.md).

---

Expand Down
155 changes: 155 additions & 0 deletions met_office_dataset.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
# Met Office UK Deterministic (UKV) Dataset

## Overview
The Met Office UK Deterministic (UKV) dataset provides high-resolution weather forecasts for the UK region. This document details the dataset structure, variables, and their relevance to solar forecasting.

## Dataset Structure
The dataset uses a Lambert Azimuthal Equal Area projection centered on the UK with:
- Height: 970 pixels
- Width: 1042 pixels
- Grid Resolution: 2km
- Temporal Resolution: 60 minutes
- Forecast Range: 54 hours

## Variables

### Cloud Coverage Variables
These variables are critical for predicting solar irradiance attenuation:

- **high_type_cloud_area_fraction**
- Description: Fraction of high-altitude clouds
- Units: 1 (fraction)
- Impact: High clouds typically have less impact on solar radiation than lower clouds
- Typical Range: 0-1

- **medium_type_cloud_area_fraction**
- Description: Fraction of medium-altitude clouds
- Units: 1 (fraction)
- Impact: Moderate impact on solar radiation
- Typical Range: 0-1

- **low_type_cloud_area_fraction**
- Description: Fraction of low-altitude clouds
- Units: 1 (fraction)
- Impact: Most significant impact on solar radiation
- Typical Range: 0-1

- **cloud_area_fraction**
- Description: Total cloud coverage
- Units: 1 (fraction)
- Impact: Overall indicator of solar radiation reduction
- Typical Range: 0-1

### Radiation Flux Variables
Direct measurements of solar radiation components:

- **surface_downwelling_shortwave_flux_in_air**
- Description: Total downward solar radiation at surface
- Units: W m⁻²
- Impact: Primary predictor for solar PV generation
- Typical Range: 0-1000+ W/m²
- Notes: Includes both direct and diffuse radiation

- **surface_downwelling_longwave_flux_in_air**
- Description: Thermal radiation from atmosphere
- Units: W m⁻²
- Impact: Affects panel temperature and efficiency
- Typical Range: 200-500 W/m²

- **surface_downwelling_ultraviolet_flux_in_air**
- Description: UV component of solar radiation
- Units: W m⁻²
- Impact: Can affect panel degradation and specific PV technologies
- Typical Range: 0-100 W/m²

### Meteorological Variables
Environmental conditions affecting solar panel efficiency:

- **air_temperature**
- Description: Air temperature at 2m height
- Units: K (Kelvin)
- Impact: Panel efficiency decreases with temperature
- Typical Range: 250-320K
- Note: Convert to Celsius by subtracting 273.15

- **wind_speed**
- Description: Wind speed at surface level
- Units: m s⁻¹
- Impact: Affects panel cooling and efficiency
- Typical Range: 0-30 m/s

- **wind_from_direction**
- Description: Wind direction at surface level
- Units: degrees
- Impact: Can influence panel temperature and local weather patterns
- Range: 0-360°
- Note: 0° is North, 90° is East

- **lwe_thickness_of_surface_snow_amount**
- Description: Snow depth in water equivalent
- Units: m
- Impact: Affects ground albedo and potential panel coverage
- Typical Range: 0-1m

### Coordinate System
The dataset uses Lambert Azimuthal Equal Area projection:

- **projection_x_coordinate**
- Description: X-axis grid coordinates
- Units: m
- Range: Covers UK extent

- **projection_y_coordinate**
- Description: Y-axis grid coordinates
- Units: m
- Range: Covers UK extent

### Time Variables
Temporal information for forecasts:

- **forecast_period**
- Description: Time offset from reference
- Type: timedelta64[ns]

- **forecast_reference_time**
- Description: Start time of forecast
- Type: datetime64[ns]

- **time**
- Description: Valid time for forecast step
- Type: datetime64[ns]

## Usage in Solar Forecasting

### Primary Predictors
1. **surface_downwelling_shortwave_flux_in_air**: Direct indicator of solar energy availability
2. **cloud_area_fraction** variables: Key for radiation attenuation
3. **air_temperature**: Critical for panel efficiency calculations

### Secondary Factors
1. **wind_speed**: Panel cooling effects
2. **snow_amount**: Ground reflectance and coverage
3. **UV flux**: Specific panel technology considerations

## Data Quality Considerations
- Least significant digit information provided for each variable
- Grid mapping information available in lambert_azimuthal_equal_area variable
- All variables follow CF-1.7 conventions

## Data Availability and Format
This dataset is hosted on Hugging Face at [openclimatefix/met-office-uk-deterministic-solar](https://huggingface.co/datasets/openclimatefix/met-office-uk-deterministic-solar).

### File Format
- Files are stored in `.zarr.zip` format
- Each file represents a specific timestamp (e.g., `2023-01-16-00.zarr.zip`)
- Zarr format is optimized for:
- Cloud storage access
- Parallel I/O operations
- Efficient chunked access to large arrays
- Integration with data science tools (xarray, pandas, dask)

## License
British Crown copyright 2022-2024, the Met Office, licensed under [CC BY-SA](https://creativecommons.org/licenses/by-sa/4.0/).

## Citation
Met Office UK Deterministic (UKV)2km on a 2-year rolling archive accessed from [AWS Registry](https://registry.opendata.aws/met-office-uk-deterministic).
4 changes: 2 additions & 2 deletions src/open_data_pvnet/configs/met_office_uk_data_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ input_data:
- radiation_flux_in_uv_downward_at_surface # Downward UV radiation flux at the surface (W/m²)
- wind_speed_at_10m # 10-meter wind speed (m/s)
- wind_direction_at_10m # 10-meter wind direction (degrees from north)
nwp_image_size_pixels_height: 12
nwp_image_size_pixels_width: 12
nwp_image_size_pixels_height: 970
nwp_image_size_pixels_width: 1042
nwp_provider: met_office
nwp_zarr_path: PLACEHOLDER.zarr
time_resolution_minutes: 60