Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the mean EnKF soil increment to the deterministic member #3295

Merged

Conversation

ClaraDraper-NOAA
Copy link
Contributor

@ClaraDraper-NOAA ClaraDraper-NOAA commented Feb 4, 2025

Description

Add mean EnKF soil increment to the deterministic member. Includes new regriding script and executable (in GDASApp) to perform masked bi-linear interpolation of the soil increments.

Resolves 2773

Type of change

  • Bug fix (fixes something broken)
  • [ x] New feature (adds functionality)
  • Maintenance (code refactor, clean-up, new CI test, etc.)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO
  • Does this change require an update to any of the following submodules?
    • [ YES] UFS-utils

OUTDATED:
Exercizing the new soil analysis feature requires updating UFS_UTILS to at least commit e718003

UPDATE Feb 5:
After responding to reviews, requires updating to UFS_UTILS PR 1022

How has this been tested?

Tested on hera:
C96_atm3DVar, C96C48_hybatmaerosnowDA, and the default test ran succesfully.
Confirmed new functionality works as expected.
Confirmed that the commits do not change output from cycling the DA if the the new feature is not selected.

Checklist

  • [ x] Any dependent changes have been merged and published
  • [x ] My code follows the style guidelines of this project
  • [ x] I have performed a self-review of my own code
  • [x ] I have commented my code, particularly in hard-to-understand areas
  • [x ] I have documented my code, including function, input, and output descriptions
  • [ x] My changes generate no new warnings
  • [ x] New and existing tests pass with my changes
  • This change is covered by an existing CI test or a new one has been added
  • [x ] Any new scripts have been added to the .github/CODEOWNERS file with owners
  • [x ] I have made corresponding changes to the system documentation if necessary

NOTE:

  • Exercizing the new feature requires some additional fix files, currently at $TMP_FIX_FILES in ush/regrid_gsiSfcIncr_to_tile.sh . These will need to be added to a fix directory, if this PR is accepted as is (note: these files could also be generated on the fly)
  • Exercizing the new feature requires jobs/rocoto/sfcanl.sh and sfc.sh to load the ufsda modules in place of the fv3gfs modules. My understanding is that the modules will eventually be unified, but for now I've added comments to the relevant scripts noting this.

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ShellCheck found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Copy link
Contributor

@CoryMartin-NOAA CoryMartin-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first pass

Copy link
Contributor

@aerorahul aerorahul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the changes looks good. seems like some checks are failing. I tried to recommend a couple updates that might help.

ClaraDraper-NOAA and others added 2 commits February 4, 2025 15:12
Co-authored-by: Cory Martin <cory.r.martin@noaa.gov>
Co-authored-by: Cory Martin <cory.r.martin@noaa.gov>
@aerorahul aerorahul added CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera CI-Wcoss2-Ready **CM use only** PR is ready for CI testing on WCOSS labels Feb 25, 2025
@emcbot emcbot added CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera and removed CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera labels Feb 25, 2025
@RussTreadon-NOAA
Copy link
Contributor

WCOSS2 note

g-w CI does not run C96C48_hybatmaerosnowDA on WCOSS2. GFS v17 will exercise updates from this PR. GFS v17 runs on WCOSS2. This is problematic. The gdas and enkfgdas snowanl jobs currently fail on Cactus.

ClaraDraper-NOAA:feature/soilanal_det_clean at 7fd68e8 was installed on Cactus. g-w CI C96C48_hybatmaerosnowDA was set up and launched. As expected the 00Z snowanl jobs died

russ.treadon@clogin08:/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C96C48_hybatmaerosnowDA_pr3295> rocotostat -d C96C48_hybatmaerosnowDA_pr3295.db -w C96C48_hybatmaerosnowDA_pr3295.xml |grep DEAD
202112210000            gdas_snowanl                   181634255                DEAD                   1         2          59.0
202112210000       enkfgdas_esnowanl                   181634258                DEAD                   1         2          94.0

The reason for the failure is the same in each job

ImportError: /opt/cray/pe/lib64/libmpifort_intel.so.12: undefined symbol: MPIX_Enqueue_send

Making the following change to ush/load_ufsda_modules.sh

@@ -41,6 +41,7 @@ case "${MACHINE_ID}" in
       # TODO: Add path to GDASApp libraries and cray-mpich as temporary patches
       # TODO: Remove LD_LIBRARY_PATH lines as soon as permanent solutions are available
       export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${HOMEgfs}/sorc/gdas.cd/build/lib"
+      export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/opt/cray/pe/mpich/8.1.19/ofi/intel/19.0/lib"
     fi
     module load "${MODS}/${MACHINE_ID}"
     ncdump=$( command -v ncdump )

allows both failed jobs to run to completion.

202112210000            gdas_snowanl                   181638536           SUCCEEDED                   0         1         189.0
202112210000       enkfgdas_esnowanl                   181638535           SUCCEEDED                   0         1         170.0

Reached out to NCO, GDIT, and the library team to see if they know why we need to add the line

export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/opt/cray/pe/mpich/8.1.19/ofi/intel/19.0/lib"

back to load_ufsda_modules.sh. We were able, at least temporarily, to remove this line after RFC 13644 was implemented 5-6 February 2025.

@emcbot emcbot added CI-Hera-Running **Bot use only** CI testing on Hera for this PR is in-progress and removed CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera labels Feb 25, 2025
@emcbot
Copy link

emcbot commented Feb 26, 2025

Experiment C96C48_hybatmaerosnowDA FAILED on Hera in Build# 1 with error logs:

/scratch1/NCEPDEV/global/CI/3295/RUNTESTS/COMROOT/C96C48_hybatmaerosnowDA_4fd0ec17/logs/2021122018/gfs_metpg2o1.log

Follow link here to view the contents of the above file(s): (link)

@emcbot emcbot added CI-Hera-Failed **Bot use only** CI testing on Hera for this PR has failed and removed CI-Hera-Running **Bot use only** CI testing on Hera for this PR is in-progress labels Feb 26, 2025
@emcbot
Copy link

emcbot commented Feb 26, 2025

Experiment C96C48_hybatmaerosnowDA FAILED on Hera in Build# 1 in
/scratch1/NCEPDEV/global/CI/3295/RUNTESTS/EXPDIR/C96C48_hybatmaerosnowDA_4fd0ec17

@emcbot emcbot added CI-Hera-Failed **Bot use only** CI testing on Hera for this PR has failed and removed CI-Hera-Failed **Bot use only** CI testing on Hera for this PR has failed labels Feb 26, 2025
@emcbot
Copy link

emcbot commented Feb 26, 2025

CI Failed on Hera in Build# 1
Built and ran in directory /scratch1/NCEPDEV/global/CI/3295


Experiment C96mx100_S2S_4fd0ec17 Completed 1 Cycles: *SUCCESS* at Tue Feb 25 22:41:42 UTC 2025
Experiment C48_ATM_4fd0ec17 Completed 1 Cycles: *SUCCESS* at Tue Feb 25 23:18:09 UTC 2025
Experiment C48mx500_hybAOWCDA_4fd0ec17 Completed 2 Cycles: *SUCCESS* at Tue Feb 25 23:36:26 UTC 2025
Experiment C96C48_hybatmDA_4fd0ec17 Completed 3 Cycles: *SUCCESS* at Wed Feb 26 00:38:06 UTC 2025
Experiment C96C48_hybatmaerosnowDA_4fd0ec17 Terminated with 0
FAIL
FAIL tasks failed and 1 dead at Wed Feb 26 00:44:13 UTC 2025
Experiment C96C48_hybatmaerosnowDA_4fd0ec17 Terminated: *FAIL*
Error logs:
/scratch1/NCEPDEV/global/CI/3295/RUNTESTS/COMROOT/C96C48_hybatmaerosnowDA_4fd0ec17/logs/2021122018/gfs_metpg2o1.log
Experiment C96_atm3DVar_4fd0ec17 Completed 3 Cycles: *SUCCESS* at Wed Feb 26 00:56:21 UTC 2025
Experiment C48_S2SW_4fd0ec17 Completed 1 Cycles: *SUCCESS* at Wed Feb 26 01:02:29 UTC 2025
Experiment C48_S2SWA_gefs_4fd0ec17 Completed 1 Cycles: *SUCCESS* at Wed Feb 26 01:09:19 UTC 2025
Experiment C96C48_ufs_hybatmDA_4fd0ec17 Completed 3 Cycles: *SUCCESS* at Wed Feb 26 01:45:25 UTC 2025
Experiment C48mx500_3DVarAOWCDA_4fd0ec17 Completed 2 Cycles: *SUCCESS* at Wed Feb 26 01:57:37 UTC 2025

@aerorahul
Copy link
Contributor

@ClaraDraper-NOAA
I discussed w/ @malloryprow and we found that the met datasets required for the failed job were deleted to free up space on Hera.
I think this PR is good to merge.

@emcbot emcbot added the CI-Wcoss2-Building **Bot use only** CI testing is cloning/building on WCOSS label Feb 26, 2025
@KateFriedman-NOAA KateFriedman-NOAA removed the CI-Wcoss2-Ready **CM use only** PR is ready for CI testing on WCOSS label Feb 26, 2025
@emcbot emcbot added CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress and removed CI-Wcoss2-Building **Bot use only** CI testing is cloning/building on WCOSS labels Feb 26, 2025
@emcbot
Copy link

emcbot commented Feb 26, 2025

CI Tests set up to run in /lfs/h2/emc/ptmp/emc.global/PR/PR_3295/RUNTESTS on WCOSS

@WalterKolczynski-NOAA WalterKolczynski-NOAA added CI-Wcoss2-Passed **Bot use only** CI testing on WCOSS for this PR has completed successfully and removed CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress labels Feb 27, 2025
@DavidHuber-NOAA
Copy link
Contributor

@aerorahul @ClaraDraper-NOAA There are some permissions issues with some of the prepbufr files used for verification in the metpg2o1 job. The RDHPCS issue RDHPCS#2025022654000078 was opened to sort this out. In the interim, should we temporarily disable the METplus jobs on Hera?

@aerorahul
Copy link
Contributor

@DavidHuber-NOAA
Yes. I think that would be fine.
Meanwhile, lets merge this PR now that WCOSS2 has passed and Hera issues are unrelated and known.

@aerorahul aerorahul merged commit 25778ff into NOAA-EMC:develop Feb 27, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI-Hera-Failed **Bot use only** CI testing on Hera for this PR has failed CI-Wcoss2-Passed **Bot use only** CI testing on WCOSS for this PR has completed successfully JEDI Feature development to support JEDI-based DA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

C96C48_hybatmaerosnowDA fails with updated config.aeroanlgenb Add soil increments to deterministic member
9 participants