Logic to determine frequency of verification dataset fails when monthly data times are on different days of the month #858

gmacgilchrist · 2024-06-28T14:30:08Z

Description of bug
For monthly average data, it is not uncommon for time indices to be on the middle day of the month, which varies from month the month. This breaks the logic in return_time_series_freq, which only picks out a monthly frequency if the time index day is the same for each month.

I encountered the issue while attempting to generate an uninitialized forecast. I think it was likewise causing silent issues in generating a persistence forecast, which was previously producing NaNs but works fine after implementing a hack (changing the time index of the verification dataset to match what's expected).

Code sample (reproducing the core logic of return_time_series_freq)

import cftime

# monthly separated time array
times = [cftime.DatetimeNoLeap(1,1,15),cftime.DatetimeNoLeap(1,2,14),cftime.DatetimeNoLeap(1,3,15)]
ds = xr.Dataset(coords={'time':times})

for freq in ['day','month','year']:
        # first dim values not equal all others
        if not (
            getattr(ds.isel({'time': 0})['time'].dt, freq) == getattr(ds['time'].dt, freq)
        ).all():
            print(freq)
            break

This returns a frequency of "day", which results in subsequent errors. To work around this, a user has to manipulate at least the verification dataset to have the same "day" for each month in the time index.

Would it be undesirable for the frequency of the verification dataset to be user specified in the same way as the units of the initialized dataset lead time need to be specified?

Output of climpred.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 4.18.0-553.5.1.el8_10.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

climpred: 2.4.0
xarray: 2023.12.0
pandas: 2.2.2
numpy: 1.26.4
scipy: 1.13.1
cftime: 1.6.3
netcdf4: None
nc_time_axis: 1.4.1
matplotlib: 3.8.2
cf_xarray: 0.9.2
xclim: 0.50.0
dask: 2024.5.0
distributed: 2024.5.0
setuptools: 69.5.1
pip: 24.0
conda: None
IPython: 8.25.0
sphinx: None

The text was updated successfully, but these errors were encountered:

aaronspring · 2024-06-29T08:51:07Z

Usually I went for "changing the time index of the verification dataset to match what's expected" ie fixing before using climpred. Mostly going for beginning of the month to just have 1s.

Not sure how difficult a change would be to implement but feel free.

gmacgilchrist added the bug label Jun 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logic to determine frequency of verification dataset fails when monthly data times are on different days of the month #858

Logic to determine frequency of verification dataset fails when monthly data times are on different days of the month #858

gmacgilchrist commented Jun 28, 2024

aaronspring commented Jun 29, 2024

Logic to determine frequency of verification dataset fails when monthly data times are on different days of the month #858

Logic to determine frequency of verification dataset fails when monthly data times are on different days of the month #858

Comments

gmacgilchrist commented Jun 28, 2024

aaronspring commented Jun 29, 2024