-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increased run time for a benchmark ne120 F case on Perlmutter #7010
Comments
@jayeshkrishna @ndkeen please feel free to add any additional information to help investigate this performance issue. |
I would recommend discontinuing testing with 64 MPI's per node as this is not the default for the machine. |
OK, I will rerun this case with 120 or 128 MPI tasks per node to see if the performance improves: [PE layout 1] [PE layout 2] |
@ndkeen [2025-02-13 case run] (64 MPI tasks per node) case.run time: about 14 mins "OverallIOStatistics": [2025-02-13 case run] (120 MPI tasks per node) case.run time: about 22 mins "OverallIOStatistics": [2025-02-13 case run] (128 MPI tasks per node) case.run time: about 24 mins "OverallIOStatistics": |
@amametjanov does the PFS.ne120pg2_r025_RRSwISC6to18E3r5.WCYCL1850NS.pm-cpu_intel.bench-wcycl-hires test on pm-cpu say anything over this time period? |
The test was added in 2024-Aug and the run-time increase pre-dates that.
I didn't have hourly |
We have been testing a benchmark ne120 F case on Perlmutter and Frontier.
Case settings
Compset: F2010
Resolution: ne120pg2_r05_oECv3
STOP_N=1
STOP_OPTION=ndays
Write frequency: every 1 hour (nhtfrq = -1)
Optional: REST_OPTION=none (disables writing out restart files)
PE layout
Performance degradation observed
Additionally, "Init time" (shown in the PACE link) increased to >16 minutes in the February 2025 run.
IO stats also indicate that tot_rtime has significantly increased compared to previous runs.
Performance logs
[2023-07-25 case run] (REST_OPTION="none")
PACE Link: https://pace.ornl.gov/exp-details/154317
Run_time: 47.217 sec
Init time: 189.878 sec
case.run time: about 5 mins
2023-07-25 14:09:54: case.run starting 12314437
...
2023-07-25 14:14:24: case.run success 12314437
"OverallIOStatistics":
"avg_wtput(MB/s)" : 14341.042886
"avg_rtput(MB/s)" : 72076.466363
"tot_wb(bytes)" : 163062403693
"tot_rb(bytes)" : 11457075418660
"tot_wtime(s)" : 10.843593
"tot_rtime(s)" : 151.593427
"tot_time(s)" : 168.650298
[2024-12-04 case run]
PACE Link: https://pace.ornl.gov/exp-details/202812
Run_time: 107.786 sec
Init time: 462.357 sec
case.run time: about 10 mins
2024-12-04 11:34:25: case.run starting 33534166
...
2024-12-04 11:44:38: case.run success 33534166
"OverallIOStatistics":
"avg_wtput(MB/s)" : 11581.018884
"avg_rtput(MB/s)" : 56658.159505
"tot_wb(bytes)" : 437643466533
"tot_rb(bytes)" : 11654545209348
"tot_wtime(s)" : 36.039086
"tot_rtime(s)" : 196.170164
"tot_time(s)" : 260.371857
[2024-12-12 case run]
PACE Link: https://pace.ornl.gov/exp-details/203176
Run_time: 107.577 sec
Init time: 576.172 sec
case.run time: about 12 mins
2024-12-12 02:52:27: case.run starting 33812266
...
2024-12-12 03:04:05: case.run success 33812266
"OverallIOStatistics":
"avg_wtput(MB/s)" : 10940.771565
"avg_rtput(MB/s)" : 51752.134239
"tot_wb(bytes)" : 437643466533
"tot_rb(bytes)" : 11654545209348
"tot_wtime(s)" : 38.148071
"tot_rtime(s)" : 214.766803
"tot_time(s)" : 281.167514
[2025-02-12 case run] (REST_OPTION="none")
PACE Link: https://pace.ornl.gov/exp-details/209893
Run_time: 91.013 sec
Init time: 974.670 sec
case.run time: about 18 mins
2025-02-12 08:15:21: case.run starting 35762678
...
2025-02-12 08:33:21: case.run success 35762678
"OverallIOStatistics":
"avg_wtput(MB/s)" : 8791.979204
"avg_rtput(MB/s)" : 24544.182600
"tot_wb(bytes)" : 318305933857
"tot_rb(bytes)" : 11654545209348
"tot_wtime(s)" : 34.526946
"tot_rtime(s)" : 452.842151
"tot_time(s)" : 481.267834
Summary of concerns
The text was updated successfully, but these errors were encountered: