Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logic to determine frequency of verification dataset fails when monthly data times are on different days of the month #858

Open
gmacgilchrist opened this issue Jun 28, 2024 · 1 comment
Labels

Comments

@gmacgilchrist
Copy link

Description of bug
For monthly average data, it is not uncommon for time indices to be on the middle day of the month, which varies from month the month. This breaks the logic in return_time_series_freq, which only picks out a monthly frequency if the time index day is the same for each month.

I encountered the issue while attempting to generate an uninitialized forecast. I think it was likewise causing silent issues in generating a persistence forecast, which was previously producing NaNs but works fine after implementing a hack (changing the time index of the verification dataset to match what's expected).

Code sample (reproducing the core logic of return_time_series_freq)

import cftime

# monthly separated time array
times = [cftime.DatetimeNoLeap(1,1,15),cftime.DatetimeNoLeap(1,2,14),cftime.DatetimeNoLeap(1,3,15)]
ds = xr.Dataset(coords={'time':times})

for freq in ['day','month','year']:
        # first dim values not equal all others
        if not (
            getattr(ds.isel({'time': 0})['time'].dt, freq) == getattr(ds['time'].dt, freq)
        ).all():
            print(freq)
            break

This returns a frequency of "day", which results in subsequent errors. To work around this, a user has to manipulate at least the verification dataset to have the same "day" for each month in the time index.

Would it be undesirable for the frequency of the verification dataset to be user specified in the same way as the units of the initialized dataset lead time need to be specified?

Output of climpred.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 4.18.0-553.5.1.el8_10.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

climpred: 2.4.0
xarray: 2023.12.0
pandas: 2.2.2
numpy: 1.26.4
scipy: 1.13.1
cftime: 1.6.3
netcdf4: None
nc_time_axis: 1.4.1
matplotlib: 3.8.2
cf_xarray: 0.9.2
xclim: 0.50.0
dask: 2024.5.0
distributed: 2024.5.0
setuptools: 69.5.1
pip: 24.0
conda: None
IPython: 8.25.0
sphinx: None
@aaronspring
Copy link
Collaborator

Usually I went for "changing the time index of the verification dataset to match what's expected" ie fixing before using climpred. Mostly going for beginning of the month to just have 1s.

Not sure how difficult a change would be to implement but feel free.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants