Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch / Sample class - draft #125

Draft
wants to merge 24 commits into
base: main
Choose a base branch
from
Draft

Batch / Sample class - draft #125

wants to merge 24 commits into from

Conversation

felix-e-h-p
Copy link
Contributor

@felix-e-h-p felix-e-h-p commented Jan 15, 2025

Overview

First stages of a standardised sample handling system, with unified interface across different dataset types - refer to issue #71. Initial focus regarding Dataset type is PVNet UK Regional. As of the moment, base.py and uk_regional.py function as effective parent and child classes, respectively.

Main Implementation

base.py

•	Implements abstract SampleBase class
•	Handles both flat and nested data structures

uk_regional.py

•	Implements PVNetSample
•	Defines PV-specific feature requirements
•	Integrates with existing dataset structure
•	Plotting,  save and load functionality

@felix-e-h-p
Copy link
Contributor Author

To mention, any point in updating type hints to be effectively more robust?

I.e. using TypeVar:

T = TypeVar('T', bound='SampleBase')
ArrayType = Union[np.ndarray, torch.Tensor, xr.DataArray]

class SampleBase(ABC, Generic[T]):
    def to_torch(self) -> T:

Suppose important for early and type related errors, particularly with multiple refactors.

@felix-e-h-p
Copy link
Contributor Author

Updates implemented @Sukh-P

@felix-e-h-p
Copy link
Contributor Author

Further update @Sukh-P - purely updating uk_regional.py - numbered file saving and loading similar to the reference logic provided:

•	__init__ updated to manage file paths
•	__call__ implemented to handle numbered sample saving with format i.e. {sample_num:08}.pt
•	__getitem__ implemented to support both dict style access and indexed file loading 
•	Save function updated accordingly to previous additions

Previous function before this specific update (also pre-change init) left in uk_regional.py - just to check this is all in line with overall requirements here.

REQUIRED_KEYS = {
'nwp',
GSPSampleKey.gsp,
SatelliteSampleKey.satellite_actual,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be optional


logger = logging.getLogger(__name__)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be duplicate, to check


# Fixture define
@pytest.fixture
def pvnet_config_filename(tmp_path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to json, check not in conftest already


def plot(self, **kwargs) -> None:
""" Sample visualisation definition """
logger.debug("Creating PVNetSample visualisation")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could update later using ocf-datapipes visulation.

logger.error(f"Invalid key type: {type(key)}")
raise TypeError(f"Key must be str or int, got {type(key)}")

# REFERENCE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete

GSPSampleKey.solar_elevation
}

# REFERENCE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants