-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Answer changes on Izumi after the hardware rebuild #2862
Comments
I redid the ctsm5.3.009 baselines and called them ctsm5.3.009.redo |
email on the rebuild sent out on Oct/23/2024 was:
|
In looking at the coupler fields for the test: SMS.f10_f10_mg37.I2000Clm50BgcCrop.izumi_gnu.clm-crop it appears that many fields are roundoff different, but larger differences have propagated to some fields even at this point after only a few days. So it's likely roundoff level, but propagating to larger than roundoff. It also only appears for the Nag and gnu compilers. The differences for the Nag compiler seem to be roundoff level for example for the nag equivalent of above... SMS_D.f10_f10_mg37.I2000Clm60BgcCrop.izumi_nag.clm-crop: grep RMS /scratch/cluster/erik/tests_ctsm539redoacl/SMS_D.f10_f10_mg37.I2000Clm60BgcCrop.izumi_nag.clm-crop.GC.ctsm539redoacl_nag/run/SMS_D.f10_f10_mg37.I2000Clm60BgcCrop.izumi_nag.clm-crop.GC.ctsm539redoacl_nag.clm2.h0.2000-01-06-00000.nc.cprnc.out The longer nag mpi-serial single point case ERS_Ly5_Mmpi-serial.1x1_smallvilleIA.I1850Clm50BgcCrop.izumi_gnu.clm-ciso_monthly, shows that the difference has propegated and it no longer appears as roundoff. |
And here's the email from the day of the rebuild on Oct/30th...
|
I meet with Joseph on this and we talked it over. He potentially see's ways to fix this if I worked with him on going through libraries. But, potentially this would be something difficult and impossible to predict a timeline for. From looking at our CTSM tests it appears to be roundoff level since most field differences are small (appearing roundoff level) for short tests, but grow for longer ones which is normal for roundoff level changes that affect answers. To be certain of this we'd need to do more extensive testing, which is unclear if it's worth doing. We'll talk more at the CSEG meeting to evaluate. I wondered about having a warning this might change answers. However, since it was just an OS change and NOT a hardware change -- he didn't anticipate changes to answers -- which is reasonable. His process included a stepwise iteration allowing for people to test beforehand, so his process needn't change. We should maybe anticipate that more things can change answers though and elect to do more testing in cases like this. |
Working with Joseph we figured I should try building and running on a queue with the old OS and verify that answers are identical for our old baselines. This is worth doing. So I will do that. The old OS is in the "upgrade" queue, and on the login node. Other queues have the new OS. |
Sending a baseline comparison on the "upgrade" queue which has the old OS, I get identical answers for the tests that ran. The nag mpi-serial tests still mostly didn't run. @fischer-ncar is going to run the ECT test on Izumi with CESM2.1 to assess the likelihood if these changes truly are roundoff level. |
45 of the aux_clm tests show changes to answers after the hardware rebuild. Which means most of our baselines will need to be regenerated if we want to use them going forward. There were 25 tests in aux_clm that did NOT change answers, but most did. I would expect the changes to be roundoff level.
Here's the list of answer changes from aux_clm:
ERI_D_Ld9_P48x1.f10_f10_mg37.I2000Clm50BgcCru.izumi_nag.clm-reduceOutput
ERI_D_Ld9_P48x1.f10_f10_mg37.I2000Clm50Sp.izumi_nag.clm-SNICARFRC
ERI_D_Ld9_P48x1.f10_f10_mg37.I2000Clm50Sp.izumi_nag.clm-reduceOutput
ERP_D_Ld5_P48x1.f10_f10_mg37.I1850Clm50Bgc.izumi_nag.clm-ciso
ERP_D_Ld5_P48x1.f10_f10_mg37.I1850Clm60Bgc.izumi_nag.clm-ciso
ERP_D_Ld5_P48x1.f10_f10_mg37.I1850Clm60Bgc.izumi_nag.clm-ciso--clm-matrixcnOn
ERP_D_Ld5_P48x1.f10_f10_mg37.I2000Clm50BgcCru.izumi_nag.clm-flexCN_FUN
ERP_D_Ld5_P48x1.f10_f10_mg37.I2000Clm50BgcCru.izumi_nag.clm-flexCN_FUN--clm-matrixcnOn
ERP_D_Ld5_P48x1.f10_f10_mg37.I2000Clm50BgcCru.izumi_nag.clm-luna
ERP_D_Ld5_P48x1.f10_f10_mg37.I2000Clm50BgcCru.izumi_nag.clm-noFUN_flexCN
ERP_D_Ld5_P48x1.f10_f10_mg37.I2000Clm50BgcCru.izumi_nag.clm-noFUN_flexCN--clm-matrixcnOn
ERP_D_Ld5_P48x1.f10_f10_mg37.I2000Clm50BgcCru.izumi_nag.clm-reduceOutput
ERP_D_Ld5_P48x1.f10_f10_mg37.I2000Clm50Sp.izumi_nag.clm-o3lombardozzi2015
ERP_D_Ld9.f10_f10_mg37.I1850Clm60BgcCrop.izumi_nag.clm-clm60cam7LndTuningModeLDust
ERP_D_P48x1.f10_f10_mg37.IHistClm60Bgc.izumi_nag.clm-decStart
ERP_D_P48x1.f10_f10_mg37.IHistClm60Bgc.izumi_nag.clm-decStart--clm-matrixcnOn_ignore_warnings
ERS_D.f10_f10_mg37.I1850Clm50BgcCrop.izumi_nag.clm-ciso_monthly_matrixcn_spinup
ERS_D.f10_f10_mg37.I1850Clm60Sp.izumi_nag.clm-ExcessIceStreams
ERS_D_Ld5.f10_f10_mg37.I2000Clm50Fates.izumi_nag.clm-FatesCold
ERS_Lm20_Mmpi-serial.1x1_smallvilleIA.I1850Clm50BgcCrop.izumi_gnu.clm-cropMonthlyNoinitial
ERS_Lm40_Mmpi-serial.1x1_numaIA.I2000Clm50BgcCropQianRs.izumi_gnu.clm-cropMonthlyNoinitial
ERS_Ly3_Mmpi-serial.1x1_smallvilleIA.IHistClm50BgcCropQianRs.izumi_gnu.clm-cropMonthOutput
ERS_Ly5_Mmpi-serial.1x1_smallvilleIA.I1850Clm50BgcCrop.izumi_gnu.clm-ciso_monthly
ERS_Ly5_Mmpi-serial.1x1_smallvilleIA.I1850Clm50BgcCrop.izumi_gnu.clm-ciso_monthly--clm-matrixcnOn
SMS.f10_f10_mg37.I2000Clm50BgcCrop.izumi_gnu.clm-crop
SMS_D.f10_f10_mg37.I1850Clm60BgcCrop.izumi_nag.clm-ciso_soil_matrixcn_only
SMS_D.f10_f10_mg37.I2000Clm60BgcCrop.izumi_gnu.clm-crop
SMS_D.f10_f10_mg37.I2000Clm60BgcCrop.izumi_nag.clm-crop
SMS_D_Ld1_Mmpi-serial.f45_f45_mg37.I2000Clm50SpRs.izumi_gnu.clm-ptsRLA
SMS_D_Ld1_P48x1.f10_f10_mg37.I2000Clm45BgcCrop.izumi_nag.clm-oldhyd
SMS_D_Ld1_P48x1.f10_f10_mg37.I2000Clm50BgcCru.izumi_nag.clm-datm_bias_correct_cruv7
SMS_D_Ld3.f10_f10_mg37.I2000Clm60Bgc.izumi_nag.clm-HillslopeD
SMS_D_Ld5.f10_f10_mg37.I1850Clm45BgcCrop.izumi_nag.clm-crop
SMS_D_Ld5.f10_f10_mg37.I2000Clm50BgcCrop.izumi_nag.clm-irrig_alternate
SMS_D_Ld5.f10_f10_mg37.I2000Clm50FatesRs.izumi_nag.clm-FatesCold
SMS_D_Ld5.f45_f45_mg37.I2000Clm60Fates.izumi_nag.clm-FatesCold
SMS_D_Ld65.f10_f10_mg37.I2000Clm60BgcCrop.izumi_nag.clm-FireLi2024GSWP
SMS_D_Ld65.f10_f10_mg37.IHistClm60BgcCrop.izumi_nag.clm-cropMonthOutput--clm-RxCropCalsAdaptGGCMI
SMS_D_P48x1_Ld5.f10_f10_mg37.I2000Clm50BgcCrop.izumi_nag.clm-irrig_spunup
SMS_Ld5_D_P48x1.f10_f10_mg37.IHistClm50Bgc.izumi_nag.clm-monthly
SMS_Ld5_D_P48x1.f10_f10_mg37.IHistClm60Bgc.izumi_nag.clm-decStart
SMS_Ld5_Mmpi-serial.1x1_brazil.IHistClm60Bgc.izumi_gnu.clm-mimics
SMS_Ln9.f10_f10_mg37.I1850Clm45Bgc.izumi_gnu.clm-clm45cam4LndTuningModeZDustSoilErod
SMS_Ly3_Mmpi-serial.1x1_numaIA.I2000Clm50BgcDvCropQianRs.izumi_gnu.clm-ignor_warn_cropMonthOutputColdStart
SMS_Ly5_Mmpi-serial.1x1_smallvilleIA.IHistClm60BgcCropQianRs.izumi_gnu.clm-gregorian_cropMonthOutput
Definition of done:
The text was updated successfully, but these errors were encountered: