Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GEFS regression test suite from EP5r2 configuration/case + Update UPP with new variables #2442

Merged
merged 127 commits into from
Feb 27, 2025

Conversation

NickSzapiro-NOAA
Copy link
Collaborator

@NickSzapiro-NOAA NickSzapiro-NOAA commented Sep 19, 2024

Commit Queue Requirements:

  • Fill out all sections of this template.
  • All sub component pull requests have been reviewed by their code managers.
  • Run the full Intel+GNU RT suite (compared to current baselines) on either Hera/Derecho/Hercules
  • Commit 'test_changes.list' from previous step

Description:

This PR updates the cpld_bmark_p8 tests to a prototype GEFS test case of fully coupled s2swa+IAU+stochastics in atmosphere and ocean, with configuration and warm starts from restarts of EP5r2 ensemble member 1 for 2021-03-25 06Z. The EP5r2 test case was kindly provided by @bingfu-NOAA via @junwang-noaa with aerosol input data and configurations from @lipan-NOAA.

A separate INPUTDATA_ROOT_BMIC is no longer needed and is removed.

The regression test suite samples basic reproducibility/quality checks, particularly:

  • control reproduces itself
  • restart reproduces control
  • changing number of tasks reproduces control
  • changing number of threads reproduces control
  • Intel debug version reproduces itself
  • GNU debug version reproduces itself

All tests do not pass across all platforms, summarized in this regression test suite matrix:
image
and summary slides on pre-test and common issues.

Some tests fail in common. This commit helps share reproducers to follow up on remaining issues. User needs to uncomment these tests to run. Failures may require library/platform support. Hopefully committing this test suite as work in progress facilitates collaborative development particularly in:

  • Platforms with GNU support for coupled model
  • Sensitivity to spack-stack updates (including dcp failure & splitting ESMF_MeshCreate in WW3 and intel-debug failure and HDF update
  • Derecho test failures

Note that there are three intentional differences from GEFS workflow configuration (please inform if you see other differences): 1) aerosols are 1-way coupled in diagnostic mode, 2) wave element mask has been modified as discussed in NOAA-EMC/WW3#1328. 3) ice restart has been quality controlled as discussed in #2562

In the future, depending on aerosol coupling, GOCART .rc files and ExtData directory structure may be revised for consistency with global-workflow. This benchmark configuration and case may be updated as well, particularly with GEFS reforecast or UFS case study.

These new tests use new input data. ICs and ExtData are in new GEFS subdirectory. New WW3 grid is added to existing WW3_input_data_20250212.

This also updates UPP hash, which adds new variables capabilities including snow-liquid-ratio, stream function, and velocity potential. See #2599

Commit Message:

* UFSWM - Add GEFS regression test suite from EP5r2 configuration/case
  * FV3 - Update UPP hash
    * upp - Update UPP with new variables capabilities, including snow-liquid-ratio, stream function, and velocity potential

Priority:

  • High: Intended to support GEFS reforecast

Git Tracking

UFSWM:

Sub component Pull Requests:

UFSWM Blocking Dependencies:

  • None

Changes

Regression Test Changes (Please commit test_changes.list):

  • PR Adds New Tests/Baselines.

Input data Changes:

  • New input data.

Library Changes/Upgrades:

  • No Updates

Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Hercules
    • Gaea C5
    • Gaea C6
    • Derecho
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
  • opnReqTest (complete task if unnecessary)

NickSzapiro-NOAA and others added 30 commits May 6, 2024 06:24
@jkbk2004
Copy link
Collaborator

@NickSzapiro-NOAA @jkbk2004:

Here are the updates we need to make this work for future cases:

My directory is here: /scratch1/NCEPDEV/climate/Jessica.Meixner/PR_WW3/newinputforNick/WW3_input_data_20250212

Diffs between /scratch2/NAGAPE/epic/UFS-WM_RT/NEMSfv3gfs/input-data-20240501/WW3_input_data_20250212 and /scratch1/NCEPDEV/climate/Jessica.Meixner/PR_WW3/newinputforNick/WW3_input_data_20250212 are:

diff -r /scratch2/NAGAPE/epic/UFS-WM_RT/NEMSfv3gfs/input-data-20240501/WW3_input_data_20250212/createmoddefs/creategridfiles.sh WW3_input_data_20250212/createmoddefs/creategridfiles.sh
98c98
<         export grids='mx025lite mx025 mx050 mx100 gwes_30m natl_6m points glo_900'
---
>         export grids='mx025lite mx025 mx050 mx100 gwes_30m natl_6m points glo_900 glo_025'
Only in WW3_input_data_20250212/createmoddefs: ww3_grid.inp.glo_025
Only in /scratch2/NAGAPE/epic/UFS-WM_RT/NEMSfv3gfs/input-data-20240501/WW3_input_data_20250212: ww3_grid.inp.glo_025

So we need to update the /scratch2/NAGAPE/epic/UFS-WM_RT/NEMSfv3gfs/input-data-20240501/WW3_input_data_20250212/createmoddefs/creategridfiles.sh to include the glo_025 script and we need to move /scratch2/NAGAPE/epic/UFS-WM_RT/NEMSfv3gfs/input-data-20240501/WW3_input_data_20250212/ww3_grid.inp.glo_025 to the createmoddefs directory.

I think it's safe to make these changes in the existing directory.

@NickSzapiro-NOAA /scratch2/NAGAPE/epic/UFS-WM_RT/NEMSfv3gfs/input-data-20240501/WW3_input_data_20250212 is now updated.

@jkbk2004 jkbk2004 added Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked. jenkins-ort run ORT testing labels Feb 25, 2025
@FernandoAndrade-NOAA FernandoAndrade-NOAA added jenkins-ort run ORT testing and removed jenkins-ort run ORT testing labels Feb 26, 2025
@DusanJovic-NOAA
Copy link
Collaborator

On WCOSS2 one test (cpld_dcp_gefs_intel) fails at runtime with this error:

aborting job:
Fatal error in PMPI_Send: Other MPI error, error stack:
PMPI_Send(163)............: MPI_Send(buf=0x1d299070, count=3132356, MPI_DOUBLE, dest=1, tag=9, comm=0xc4000028) failed
MPID_Send(499)............: 
MPIDI_send_unsafe(58).....: 
MPIDI_OFI_send_normal(372): OFI tagged senddata failed (ofi_send.h:372:MPIDI_OFI_send_normal:Bad address)
nid002736.cactus.wcoss2.ncep.noaa.gov: rank 1150 exited with code 255
forrtl: error (78): process killed (SIGTERM)
Image              PC                Routine            Line        Source             
fv3.exe            00000000069B0C3B  Unknown               Unknown  Unknown
libpthread-2.31.s  00001499B2850910  Unknown               Unknown  Unknown
libpthread-2.31.s  00001499B284B70A  pthread_cond_wait     Unknown  Unknown
fv3.exe            0000000001466E89  _Z10vmkt_catchP6v         251  ESMCI_VMKernel.C
fv3.exe            0000000001467F2A  _ZN5ESMCI3VMK4exi        2679  ESMCI_VMKernel.C
fv3.exe            0000000000E44F5F  c_esmc_compwait_         1095  ESMCI_FTable.C
fv3.exe            0000000000A22796  esmf_compmod_mp_e        1278  ESMF_Comp.F90
fv3.exe            00000000008247B6  esmf_gridcompmod_        1419  ESMF_GridComp.F90
fv3.exe            000000000045B367  nuopc_driver_mp_l        2889  NUOPC_Driver.F90
fv3.exe            0000000000477D6D  nuopc_driver_mp_i        1982  NUOPC_Driver.F90
fv3.exe            000000000048D239  nuopc_driver_mp_i         487  NUOPC_Driver.F90
fv3.exe            0000000000E43674  _ZN5ESMCI6FTable1        2167  ESMCI_FTable.C
fv3.exe            0000000000E472AA  ESMCI_FTableCallE         824  ESMCI_FTable.C
fv3.exe            0000000001449B5F  _ZN5ESMCI3VMK5ent        2501  ESMCI_VMKernel.C
fv3.exe            000000000143264F  _ZN5ESMCI2VM5ente        1216  ESMCI_VM.C
fv3.exe            0000000000E44AB7  c_esmc_ftablecall         981  ESMCI_FTable.C
fv3.exe            0000000000A22600  esmf_compmod_mp_e        1252  ESMF_Comp.F90
fv3.exe            00000000008247B6  esmf_gridcompmod_        1419  ESMF_GridComp.F90
fv3.exe            000000000042C23E  MAIN__                    392  UFS.F90
fv3.exe            000000000042B062  Unknown               Unknown  Unknown
libc-2.31.so       00001499AE83E24D  __libc_start_main     Unknown  Unknown
fv3.exe            000000000042AF7A  Unknown               Unknown  Unknown

Should this test be disabled on WCOSS2?

@NickSzapiro-NOAA
Copy link
Collaborator Author

Thanks @DusanJovic-NOAA! Yes, that's the same result as in pre-test and gaeac5. Idea is to turn off for now, and I'll make issues to follow up after commit

@NickSzapiro-NOAA NickSzapiro-NOAA changed the title Add GEFS regression test suite from EP5r2 configuration/case Add GEFS regression test suite from EP5r2 configuration/case + Update UPP with new variables Feb 27, 2025
@jkbk2004 jkbk2004 removed the jenkins-ort run ORT testing label Feb 27, 2025
@jkbk2004
Copy link
Collaborator

jkbk2004 commented Feb 27, 2025

Gaea role account access issue continues. We will skip GaeaC5. For ORT runs, it sounds like issue btw hera and jenkins. I see some runs finish ok: /scratch1/NCEPDEV/stmp2/role.epic/FV3_OPNREQ_TEST/opnReqTest_395058. We can start merging process.

@jkbk2004
Copy link
Collaborator

@NickSzapiro-NOAA new fv3 hash is NOAA-EMC/fv3atm@5132aa6. Can you update?

@NickSzapiro-NOAA
Copy link
Collaborator Author

Thanks everyone. Please let me know if ok @jkbk2004

@jkbk2004 jkbk2004 merged commit 6cb9e1d into ufs-community:develop Feb 27, 2025
4 checks passed
@jkbk2004 jkbk2004 mentioned this pull request Feb 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EPIC Support Requested New Baselines New baselines will be added to project. New Input Data Req'd This PR requires new data to be sync across platforms Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked.
Projects
None yet