-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Particle Container to Pure SoA Again #4653
Conversation
idcpu_data.push_back(0); | ||
amrex::ParticleIDWrapper{idcpu_data.back()} = ParticleType::NextID(); | ||
amrex::ParticleCPUWrapper(idcpu_data.back()) = ParallelDescriptor::MyProc(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use AMReX-Codes/amrex#3733
## Summary Update `ParticleCopyPlan::build` for pure SoA particle layout. ## Additional background - [x] testing on GPU in BLAST-WarpX/warpx#4653 ## Checklist The proposed changes: - [x] fix a bug or incorrect behavior in AMReX - [ ] add new capabilities to AMReX - [ ] changes answers in the test suite to more than roundoff level - [ ] are likely to significantly affect the results of downstream AMReX users - [ ] include documentation in the code and/or rst files, if appropriate --------- Co-authored-by: Andrew Myers <atmyers2@gmail.com>
Source/Particles/Collision/BinaryCollision/DSMC/SplitAndScatterFunc.H
Outdated
Show resolved
Hide resolved
Source/Particles/Collision/BinaryCollision/ParticleCreationFunc.H
Outdated
Show resolved
Hide resolved
1437d6e
to
9897a23
Compare
cf9dd03
to
2ac1993
Compare
More pure SoA and id handling goodness.
Transition to new, purely SoA particle containers. This was originally merged in BLAST-WarpX#3850 and reverted in BLAST-WarpX#4652, since we discovered issues loosing particles & laser particles on GPU.
2ac1993
to
593221d
Compare
- faster: less emitted operations, no jumps - cheaper: less used registers - safer: no read-before-write warnings - cooler: no explanation needed
593221d
to
3c6cbfd
Compare
GPU Tests (CUDA, A100 on Perlmutter)diff --git a/Examples/analysis_default_openpmd_regression.py b/Examples/analysis_default_openpmd_regression.py
index 3aadc49ac5..3e9fb98789 100755
--- a/Examples/analysis_default_openpmd_regression.py
+++ b/Examples/analysis_default_openpmd_regression.py
@@ -15,6 +15,6 @@ test_name = os.path.split(os.getcwd())[1]
# Run checksum regression test
if re.search( 'single_precision', fn ):
- checksumAPI.evaluate_checksum(test_name, fn, output_format='openpmd', rtol=2.e-6)
+ checksumAPI.evaluate_checksum(test_name, fn, output_format='openpmd', rtol=4.)
else:
- checksumAPI.evaluate_checksum(test_name, fn, output_format='openpmd')
+ checksumAPI.evaluate_checksum(test_name, fn, output_format='openpmd', rtol=4.)
diff --git a/Examples/analysis_default_regression.py b/Examples/analysis_default_regression.py
index 453f650be0..6fa855df3d 100755
--- a/Examples/analysis_default_regression.py
+++ b/Examples/analysis_default_regression.py
@@ -15,6 +15,6 @@ test_name = os.path.split(os.getcwd())[1]
# Run checksum regression test
if re.search( 'single_precision', fn ):
- checksumAPI.evaluate_checksum(test_name, fn, rtol=2.e-6)
+ checksumAPI.evaluate_checksum(test_name, fn, rtol=4.)
else:
- checksumAPI.evaluate_checksum(test_name, fn)
+ checksumAPI.evaluate_checksum(test_name, fn, rtol=4.)
diff --git a/Regression/WarpX-tests.ini b/Regression/WarpX-tests.ini
index 3310e642dd..84133add09 100644
--- a/Regression/WarpX-tests.ini
+++ b/Regression/WarpX-tests.ini
@@ -40,7 +40,7 @@ use_ctools = 0
# sections.
#MPIcommand = mpiexec -host @host@ -n @nprocs@ @command@
-MPIcommand = mpiexec -n @nprocs@ @command@
+MPIcommand = srun -n @nprocs@ @command@
MPIhost =
reportActiveTestsOnly = 1
@@ -64,7 +64,7 @@ branch = 24.02
[source]
dir = /home/regtester/AMReX_RegTesting/warpx
branch = development
-cmakeSetupOpts = -DAMReX_ASSERTIONS=ON -DAMReX_TESTING=ON -DWarpX_PYTHON_IPO=OFF -DpyAMReX_IPO=OFF
+cmakeSetupOpts = -DAMReX_ASSERTIONS=ON -DAMReX_TESTING=ON -DWarpX_PYTHON_IPO=OFF -DpyAMReX_IPO=OFF -DWarpX_COMPUTE=CUDA
# -DPYINSTALLOPTIONS="--disable-pip-version-check"
# individual problems follow cat test.sbatch
Tests that pass within a 10hr walltime in
|
currSpecies["position"]["z"].storeChunk(z, {offset}, {numParticleOnTile64}); | ||
} | ||
|
||
// reconstruct x and y from polar coordinates r, theta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oopsi, reconstruction re-added in #4686
@@ -1084,7 +1083,7 @@ PhysicalParticleContainer::AddPlasma (PlasmaInjector const& plasma_injector, int | |||
const int max_new_particles = Scan::ExclusiveSum(counts.size(), counts.data(), offset.data()); | |||
|
|||
// Update NextID to include particles created in this function | |||
Long pid; | |||
int pid; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long!
auto& p = pp[ip]; | ||
p.id() = pid+ip; | ||
p.cpu() = cpuid; | ||
auto const new_id = ip + old_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long!
Transition to new, purely SoA particle containers.
This was originally merged in #3850 and reverted in #4652, since we discovered issues loosing particles & laser particles on GPU.
development
#4654ParticleIDWrapper::make_invalid()
AMReX-Codes/amrex#3735Fun Mini-Benchmarks on CPU, DP
Hardware: 12th Gen Intel(R) Core(TM) i9-12900H
cpu_legacy.txt, cpu_soa.txt
Overall speed: similar to noise level of repeated runs (as expected).
Few noteworthy details in top 10 functions by runtime (Excl.):
ParticleContainer::RedistributeCPU
: 8% slower 👀 -> ParticleContainer::RedistributeCPU for Pure SoA AMReX-Codes/amrex#3744WarpXParticleContainer::ApplyBoundaryConditions
: 5% fasterWarpX::OneStep_nosub
: 2% slower 👀Fun Mini-Benchmarks on A100 GPU, DP
Hardware: Perlmutter (NERSC) A100 GPU
Overall speed: 1.4% faster
Few noteworthy details in top 10 functions by runtime (Excl.):
GatherAndPush
: 1.2% fasterRedistribute_partition
: 4% fasterAddPlasma
: 2.6% fasterApplyBoundaryConditions
: 1% fasterSortParticlesForDeposition
: 231% faster 🚀 🚀 ✨PermutationForDeposition
: 3% fasterInitData
: 15% faster 🚀