Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add/Update submit scripts and configure files for Forerunner/TaiwaniaIII #342

Conversation

koarakawaii
Copy link
Contributor

@koarakawaii koarakawaii commented Jul 28, 2024

Caution

1. Remember to load the appropriate modules during the compiling stages, since the paths inside the GAMER configure files can only be recognized after the required modules are loaded, say FFTW, OpenMPI, etc...
2. Remember to load the appropriate modules when submit the jobs, as shown in submit_forerunnerI_*.job/submit_taiwania3_*.job, to guarantee that the dynamic shared libraries can be found.
3. The modules under /home/d07222009/module_CALAB are only available if the user belongs to CALAB.

Note

1. Submit scripts for submit_taiwania3_gnu.job is updated because gcc/9.4.0 no longer exists on TaiwaniaIII
2. Submit scripts for submit_taiwania3_intel.job is updated because intel/2018u4 might be considered a little bit old

Tip

1. After loading the modules from the terminal, one can use module save YOUR_MODULE_SET_NAME to save the whole set of modules as user defined setup, and load it by module r YOUR_MODULE_SET_NAME.

GAMER Modules for Taiwania III

  • Module files
    • Path: /home/d07222009/module_CALAB
    • Packages
      • FFTW 3.3.10
      • GSL 2.8.0
      • HDF5 1.14.4
      • OpenMPI 5.0.5
      • UCX 1.18.0
    • Usage
      • Intel
        module use  /home/d07222009/module_CALAB
        module load intel/2024 intel_2024/fftw/3.3.10 intel_2024/gsl/2.8.0 intel_2024/hdf5/1.14.4 intel_2024/openmpi/5.0.5 intel_2024/openucx/1.18.0
        
      • GNU
        module use  /home/d07222009/module_CALAB     
        module load gcc/13.2.0 gnu_13.2.0/fftw/3.3.10 gnu_13.2.0/gsl/2.8.0 gnu_13.2.0/hdf5/1.14.4 gnu_13.2.0/openmpi/5.0.5  gnu_13.2.0/openucx/1.18.0
        
      • The usage for each case is written in configs/taiwania3_intel.config and configs/taiwania3_gnu.config as well.

GAMER Modules for Forerunner I

  • Module files
    • Path: /home/d07222009/module_CALAB
    • Packages
      • FFTW 3.3.10
      • GSL 2.8.0
      • HDF5 1.14.4
      • OpenMPI 5.0.0
      • UCX 1.18.0
    • Usage
      • Intel
        module use  /home/d07222009/module_CALAB
        module load intel/2024_01_46 oneapi_2024/fftw/3.3.10 oneapi_2024/gsl/2.8.0 oneapi_2024/hdf5/1.14.4 oneapi_2024/openmpi/5.0.0 oneapi_2024/openucx/1.18.0
        
      • GNU
        module use  /home/d07222009/module_CALAB     
        module load gnu_13.2.0/gcc/13.2.0 gnu_13.2.0/fftw/3.3.10 gnu_13.2.0/gsl/2.8.0 gnu_13.2.0/hdf5/1.14.4 gnu_13.2.0/openmpi/5.0.0  gnu_13.2.0/openucx/1.18.0
        
      • The usage for each case is written in configs/taiwania3_intel.config and configs/taiwania3_gnu.config as well.

@hyschive hyschive added enhancement general General issues and improvement labels Sep 9, 2024
Copy link
Contributor

@hsinhaoHHuang hsinhaoHHuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@koarakawaii Thanks a lot for preparing these scripts and installing the libraries for us.
I have tested them and they can work well.
I only have some minor comments for this PR.

@koarakawaii koarakawaii changed the base branch from main to stable December 24, 2024 09:35
@koarakawaii koarakawaii changed the base branch from stable to main December 24, 2024 09:35
Copy link
Contributor

@hsinhaoHHuang hsinhaoHHuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@koarakawaii
Thanks for the update. It looks great.
I have tested it again, and there is no further issue.

For the record, here are some errors I encountered on these two systems:

  • For forerunnerI_gnu and forerunnerI_intel, there are errors at the end of simulations (after ~GAMER OVER~):

    Error message
    [icpnp137:3840499:0:3840499] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x10)
    ==== backtrace (tid:3840503) ==== 
     0  /home/d07222009/openucx/ucx- 
    1.18.0_with_mt_gnu_13.2.0/lib/libucs.so.0(ucs_handle_error+0x294) [0x14a9eb28b804]
     1  /home/d07222009/openucx/ucx-1.18.0_with_mt_gnu_13.2.0/lib/libucs.so.0(+0x339bc) [0x14a9eb28b9bc]
     2  /home/d07222009/openucx/ucx-1.18.0_with_mt_gnu_13.2.0/lib/libucs.so.0(+0x33be7) [0x14a9eb28bbe7]
     3  /lib64/libpthread.so.0(+0x12cf0) [0x14a9edecacf0]
     4  /opt/mellanox/hcoll/lib/libhcoll.so.1(hcoll_update_context_cache_on_group_destruction+0x9e) [0x14a9ed24f06e]
     5  /opt/mellanox/hcoll/lib/libhcoll.so.1(hcoll_context_free+0x1dd) [0x14a9ed24c3fd]
     6  /home/d07222009/openmpi/openmpi_gnu_13.2.0/lib/libmpi.so.40(+0x187bb3) [0x14a9eeeb2bb3]
     7  /home/d07222009/openmpi/openmpi_gnu_13.2.0/lib/libmpi.so.40(+0x6d823) [0x14a9eed98823]
     8  /home/d07222009/openmpi/openmpi_gnu_13.2.0/lib/libmpi.so.40(ompi_attr_delete_all+0x183) [0x14a9eed9ade3]
     9  /home/d07222009/openmpi/openmpi_gnu_13.2.0/lib/libmpi.so.40(ompi_comm_free+0x40) [0x14a9eed9f530]
    10  /home/d07222009/openmpi/openmpi_gnu_13.2.0/lib/libmpi.so.40(+0x1defa2) [0x14a9eef09fa2]
    11  /home/d07222009/openmpi/openmpi_gnu_13.2.0/lib/libmpi.so.40(+0x189fe9) [0x14a9eeeb4fe9]
    12  /home/d07222009/openmpi/openmpi_gnu_13.2.0/lib/libmpi.so.40(mca_coll_base_comm_unselect+0x2611) [0x14a9eee76fa1]
    13  /home/d07222009/openmpi/openmpi_gnu_13.2.0/lib/libmpi.so.40(+0x711da) [0x14a9eed9c1da]
    14  /home/d07222009/openmpi/openmpi_gnu_13.2.0/lib/libmpi.so.40(+0x70a69) [0x14a9eed9ba69]
    15  /home/d07222009/openmpi/openmpi_gnu_13.2.0/lib/libopen-pal.so.80(opal_finalize_cleanup_domain+0x37) [0x14a9ec5a88e7]
  • For forerunnerI_intel and taiwania3_intel, there is an error when compiling with PARTICLE

    Error message
    Particle/Par_EquilibriumIC.cpp:183:24: fatal error: object backing the pointer will be destroyed at the end of the full-expression [-Wdangling-gsl]
    183 |       const char * c = convertToString(params.ExtPot_Table_Name).c_str();
       |                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    1 error generated.
    make: *** [Makefile:616: Object/__cpu__Par_EquilibriumIC.o] Error 1

@hyschive hyschive merged commit acd94c9 into gamer-project:main Dec 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement general General issues and improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants