-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add install instructions for ALCF's Polaris #4636
Conversation
python3 -m pip install --upgrade virtualenv | ||
python3 -m pip cache purge | ||
rm -rf ${SW_DIR}/venvs/warpx | ||
python3 -m venv --system-site-packages ${SW_DIR}/venvs/warpx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm needing to use --system-site-packages
since I cannot manage to install mpi4py
in a virtual environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unusual. Sometimes this can happen if they have a pre-installed package of the latest version in either the base environment or cache, which we then just need to skip.
Technically, --no-cache-dir --no-binary
does that, but maybe there is more to it here.
Can you share more details on the issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's what I sent to ALCF support:
I’m trying to install WarpXhttps://github.com/ECP-WarpX/WarpX with Python support on Polaris and as part of this I would like to create a new Python virtual environment in which to install all the required Python libraries. However, I’m having trouble getting
mpi4py
installed in the virtual environment and would appreciate any help you can give me.
I’m currently performing the following steps:
- Loading required modules:
module load cmake/3.23.2
module load cudatoolkit-standalone
module load cray-python/3.9.13.1
- Creating a virtual environment:
python3 -m venv venvs/warpx
source venvs/warpx/bin/activate
- Installing pre-requisite Python packages:
python3 -m pip install --upgrade pip
python3 -m pip install --upgrade build
python3 -m pip install --upgrade packaging
python3 -m pip install --upgrade wheel
python3 -m pip install --upgrade setuptools
- Trying to install mpi4py:
python3 -m pip install mpi4py
The mpi4py installation fails when checking for required MPI libraries with messages of the form:
checking for MPI compile and link ...
/opt/cray/pe/mpich/8.1.25/ofi/nvidia/20.7/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -g -DOPENSSL_LOAD_CONF -fwrapv -fno-semantic-interposition -fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -g -fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -g -fPIC -I/home/groenewa/scratch/venvs/sandbox/include -I/usr/include/python3.6m -c _configtest.c -o _configtest.o
nvc-Error-Unknown switch: -Wno-unused-result
nvc-Error-Unknown switch: -fmessage-length=0
nvc-Error-Unknown switch: -grecord-gcc-switches
nvc-Error-Unknown switch: -fstack-protector-strong
nvc-Error-Unknown switch: -funwind-tables
nvc-Error-Unknown switch: -fasynchronous-unwind-tables
nvc-Error-Unknown switch: -fstack-clash-protection
nvc-Error-Unknown switch: -fwrapv
nvc-Error-Unknown switch: -fno-semantic-interposition
nvc-Error-Unknown switch: -fmessage-length=0
nvc-Error-Unknown switch: -grecord-gcc-switches
nvc-Error-Unknown switch: -fstack-protector-strong
nvc-Error-Unknown switch: -funwind-tables
nvc-Error-Unknown switch: -fasynchronous-unwind-tables
nvc-Error-Unknown switch: -fstack-clash-protection
nvc-Error-Unknown switch: -fmessage-length=0
nvc-Error-Unknown switch: -grecord-gcc-switches
nvc-Error-Unknown switch: -fstack-protector-strong
nvc-Error-Unknown switch: -funwind-tables
nvc-Error-Unknown switch: -fasynchronous-unwind-tables
nvc-Error-Unknown switch: -fstack-clash-protection
failure.
removing: _configtest.c _configtest.o
error: Cannot compile MPI programs. Check your configuration!!!
ERROR: Failed building wheel for mpi4py
They suggested:
Try using the default conda module (conda/2022-09-08) instead of the cudatoolkit
I've been able to install mpi4py
when loading the conda module and creating a virtual environment from it (not actually a conda virtual environment, just a venv
virtual environment from the conda Python). I'm now just seeing if the rest of the build works fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm now that the above problem goes away when not loading the cudatoolkit
module.
# export CXXFLAGS="-march=znver3" | ||
# export CFLAGS="-march=znver3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines are commented for now since apparently the available nvhpc/21.9
compilers do not like these flags. See "Notes on Default Modules" at https://docs.alcf.anl.gov/polaris/compiling-and-linking/compiling-and-linking-overview/.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roelof-groenewald you might need to pass them as -XCompiler="-march=znver3"
or they have a different syntax (NVHPC are the prior PGI compilers after all).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should be able to switch these environment variables on again with the switch to GNU. I'll test to confirm.
python3 -m pip install cupy-cuda11x # CUDA 11.7 compatible wheel | ||
# optional: for libEnsemble | ||
python3 -m pip install -r $HOME/src/warpx/Tools/LibEnsemble/requirements.txt | ||
# optional: for optimas (based on libEnsemble & ax->botorch->gpytorch->pytorch) | ||
python3 -m pip install --upgrade torch # CUDA 11.7 compatible wheel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double check if CUDA 11.X or 12.X
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is 11.8.89. I'll update the comment.
# | ||
# This file is part of WarpX. | ||
# | ||
# Author: Axel Huebl (edited by Roelof Groenewald for Polaris) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Author: Axel Huebl (edited by Roelof Groenewald for Polaris) | |
# Authors: Axel Huebl, Roelof Groenewald |
export CC=nvc | ||
export CXX=nvc++ | ||
export CUDACXX=nvcc | ||
export CUDAHOSTCXX=nvc++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that is unusual and we do not even use these on Perlmutter yet.
We fixed a good amount of bugs in this compiler combination, but if you can I would avoid it.
Does
export CC=nvc | |
export CXX=nvc++ | |
export CUDACXX=nvcc | |
export CUDAHOSTCXX=nvc++ | |
export CC=$(which gcc) | |
export CXX=$(which g++) | |
export CUDACXX=$(which nvcc) | |
export CUDAHOSTCXX=${CXX} |
work as well on the system?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the GNU compilers in the hints as you suggest causes a failure in AMReX configuration:
132 | #error -- unsupported GNU version! gcc versions later than 11 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
Are you aware of this issue with gcc 12 or later?
It looks like I can get it to compile by specifically pointing to an earlier GNU version:
export CC=nvc | |
export CXX=nvc++ | |
export CUDACXX=nvcc | |
export CUDAHOSTCXX=nvc++ | |
export CC=/opt/cray/pe/gcc/11.2.0/bin/gcc | |
export CXX=/opt/cray/pe/gcc/11.2.0/bin/g++ | |
export CUDACXX=nvcc | |
export CUDAHOSTCXX=${CXX} |
|
||
# required dependencies | ||
module load cmake/3.23.2 | ||
module load cudatoolkit-standalone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add the version for slightly more explicit stability and self-documentation, if they encode it:
module load cudatoolkit-standalone | |
module load cudatoolkit-standalone/... |
# swap to the Milan cray package | ||
# module swap craype-x86-rome craype-x86-milan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is HPE?Cray: Is there a PrgEnv/gnu
or something that we can load, similar to Perlmutter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes there is, and with the module load conda...
suggested for the mpi4py
fix it actually automatically switches to the gnu programming environment. I'll update accordingly.
Please address inline comments as a follow-up PR please :) |
Adding instructions for Polaris. These instructions are mostly copied from the Perlmutter instructions.
I'll make a couple comments directly on the changes for some issues I'm still having.
I've only tested GPU installation which is why I left the CPU instructions as Under construction.