Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add install instructions for ALCF's Polaris #4636

Merged
merged 2 commits into from
Feb 2, 2024

Conversation

roelof-groenewald
Copy link
Member

Adding instructions for Polaris. These instructions are mostly copied from the Perlmutter instructions.
I'll make a couple comments directly on the changes for some issues I'm still having.

I've only tested GPU installation which is why I left the CPU instructions as Under construction.

@roelof-groenewald roelof-groenewald added component: documentation Docs, readme and manual machine / system Machine or system-specific issue labels Jan 25, 2024
python3 -m pip install --upgrade virtualenv
python3 -m pip cache purge
rm -rf ${SW_DIR}/venvs/warpx
python3 -m venv --system-site-packages ${SW_DIR}/venvs/warpx
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm needing to use --system-site-packages since I cannot manage to install mpi4py in a virtual environment.

Copy link
Member

@ax3l ax3l Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unusual. Sometimes this can happen if they have a pre-installed package of the latest version in either the base environment or cache, which we then just need to skip.

Technically, --no-cache-dir --no-binary does that, but maybe there is more to it here.

Can you share more details on the issue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I sent to ALCF support:

I’m trying to install WarpXhttps://github.com/ECP-WarpX/WarpX with Python support on Polaris and as part of this I would like to create a new Python virtual environment in which to install all the required Python libraries. However, I’m having trouble getting mpi4py installed in the virtual environment and would appreciate any help you can give me.
I’m currently performing the following steps:

  1. Loading required modules:

module load cmake/3.23.2

module load cudatoolkit-standalone

module load cray-python/3.9.13.1

  1. Creating a virtual environment:

python3 -m venv venvs/warpx

source venvs/warpx/bin/activate

  1. Installing pre-requisite Python packages:

python3 -m pip install --upgrade pip

python3 -m pip install --upgrade build

python3 -m pip install --upgrade packaging

python3 -m pip install --upgrade wheel

python3 -m pip install --upgrade setuptools

  1. Trying to install mpi4py:

python3 -m pip install mpi4py

The mpi4py installation fails when checking for required MPI libraries with messages of the form:
checking for MPI compile and link ...
/opt/cray/pe/mpich/8.1.25/ofi/nvidia/20.7/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -g -DOPENSSL_LOAD_CONF -fwrapv -fno-semantic-interposition -fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -g -fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -g -fPIC -I/home/groenewa/scratch/venvs/sandbox/include -I/usr/include/python3.6m -c _configtest.c -o _configtest.o
nvc-Error-Unknown switch: -Wno-unused-result
nvc-Error-Unknown switch: -fmessage-length=0
nvc-Error-Unknown switch: -grecord-gcc-switches
nvc-Error-Unknown switch: -fstack-protector-strong
nvc-Error-Unknown switch: -funwind-tables
nvc-Error-Unknown switch: -fasynchronous-unwind-tables
nvc-Error-Unknown switch: -fstack-clash-protection
nvc-Error-Unknown switch: -fwrapv
nvc-Error-Unknown switch: -fno-semantic-interposition
nvc-Error-Unknown switch: -fmessage-length=0
nvc-Error-Unknown switch: -grecord-gcc-switches
nvc-Error-Unknown switch: -fstack-protector-strong
nvc-Error-Unknown switch: -funwind-tables
nvc-Error-Unknown switch: -fasynchronous-unwind-tables
nvc-Error-Unknown switch: -fstack-clash-protection
nvc-Error-Unknown switch: -fmessage-length=0
nvc-Error-Unknown switch: -grecord-gcc-switches
nvc-Error-Unknown switch: -fstack-protector-strong
nvc-Error-Unknown switch: -funwind-tables
nvc-Error-Unknown switch: -fasynchronous-unwind-tables
nvc-Error-Unknown switch: -fstack-clash-protection
failure.
removing: _configtest.c _configtest.o
error: Cannot compile MPI programs. Check your configuration!!!
ERROR: Failed building wheel for mpi4py

They suggested:

Try using the default conda module (conda/2022-09-08) instead of the cudatoolkit

I've been able to install mpi4py when loading the conda module and creating a virtual environment from it (not actually a conda virtual environment, just a venv virtual environment from the conda Python). I'm now just seeing if the rest of the build works fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm now that the above problem goes away when not loading the cudatoolkit module.

Comment on lines +44 to +45
# export CXXFLAGS="-march=znver3"
# export CFLAGS="-march=znver3"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines are commented for now since apparently the available nvhpc/21.9 compilers do not like these flags. See "Notes on Default Modules" at https://docs.alcf.anl.gov/polaris/compiling-and-linking/compiling-and-linking-overview/.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roelof-groenewald you might need to pass them as -XCompiler="-march=znver3" or they have a different syntax (NVHPC are the prior PGI compilers after all).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should be able to switch these environment variables on again with the switch to GNU. I'll test to confirm.

Comment on lines +118 to +122
python3 -m pip install cupy-cuda11x # CUDA 11.7 compatible wheel
# optional: for libEnsemble
python3 -m pip install -r $HOME/src/warpx/Tools/LibEnsemble/requirements.txt
# optional: for optimas (based on libEnsemble & ax->botorch->gpytorch->pytorch)
python3 -m pip install --upgrade torch # CUDA 11.7 compatible wheel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double check if CUDA 11.X or 12.X

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is 11.8.89. I'll update the comment.

@ax3l ax3l self-assigned this Jan 30, 2024
#
# This file is part of WarpX.
#
# Author: Axel Huebl (edited by Roelof Groenewald for Polaris)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Author: Axel Huebl (edited by Roelof Groenewald for Polaris)
# Authors: Axel Huebl, Roelof Groenewald

Comment on lines +48 to +51
export CC=nvc
export CXX=nvc++
export CUDACXX=nvcc
export CUDAHOSTCXX=nvc++
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that is unusual and we do not even use these on Perlmutter yet.
We fixed a good amount of bugs in this compiler combination, but if you can I would avoid it.

Does

Suggested change
export CC=nvc
export CXX=nvc++
export CUDACXX=nvcc
export CUDAHOSTCXX=nvc++
export CC=$(which gcc)
export CXX=$(which g++)
export CUDACXX=$(which nvcc)
export CUDAHOSTCXX=${CXX}

work as well on the system?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the GNU compilers in the hints as you suggest causes a failure in AMReX configuration:

132 | #error -- unsupported GNU version! gcc versions later than 11 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.

Are you aware of this issue with gcc 12 or later?

It looks like I can get it to compile by specifically pointing to an earlier GNU version:

Suggested change
export CC=nvc
export CXX=nvc++
export CUDACXX=nvcc
export CUDAHOSTCXX=nvc++
export CC=/opt/cray/pe/gcc/11.2.0/bin/gcc
export CXX=/opt/cray/pe/gcc/11.2.0/bin/g++
export CUDACXX=nvcc
export CUDAHOSTCXX=${CXX}


# required dependencies
module load cmake/3.23.2
module load cudatoolkit-standalone
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add the version for slightly more explicit stability and self-documentation, if they encode it:

Suggested change
module load cudatoolkit-standalone
module load cudatoolkit-standalone/...

Comment on lines +4 to +5
# swap to the Milan cray package
# module swap craype-x86-rome craype-x86-milan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is HPE?Cray: Is there a PrgEnv/gnu or something that we can load, similar to Perlmutter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes there is, and with the module load conda... suggested for the mpi4py fix it actually automatically switches to the gnu programming environment. I'll update accordingly.

@ax3l
Copy link
Member

ax3l commented Feb 2, 2024

Please address inline comments as a follow-up PR please :)

@ax3l ax3l merged commit 9d8ecf9 into BLAST-WarpX:development Feb 2, 2024
39 checks passed
@roelof-groenewald roelof-groenewald deleted the docs/polaris branch February 2, 2024 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: documentation Docs, readme and manual machine / system Machine or system-specific issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants