-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
torchani faster than nnpops for larger systems? #85
Comments
In your code AFAIK you are using the CPU implementation, is this intended? |
It's not intended to use the CPU implementation... |
Take this minimum example, which is similar to the example provided by @wiederm import sys
from openmm import LangevinIntegrator, unit, Platform
from openmm.app import Simulation, StateDataReporter
from openmmml import MLPotential
from openmmtools.testsystems import WaterBox
# constants which might be modified by the user
step = 1000
waterbox = WaterBox(box_edge=15 * unit.angstrom)
nnp = MLPotential("ani2x")
platform = Platform.getPlatformByName("CUDA")
prop = dict(CudaPrecision="mixed")
for implementation in ("nnpops","torchani"):
print(f"Implementation: {implementation}")
ml_system = nnp.createSystem(waterbox.topology, implementation=implementation)
simulation = Simulation(
waterbox.topology,
ml_system,
LangevinIntegrator(300 * unit.kelvin, 1 / unit.picosecond, 1 * unit.femtosecond),
platform, prop
)
simulation.context.setPositions(waterbox.positions)
# Production
if step > 0:
print("\nMD run: %s steps" % step)
simulation.reporters.append(
StateDataReporter(
sys.stdout,
reportInterval=100,
step=True,
time=True,
potentialEnergy=True,
speed=True,
separator="\t",
)
)
simulation.step(step) In my GPU, an RTX 2080 Ti I get this: Implementation: nnpops
MD run: 1000 steps
#"Step" "Time (ps)" "Potential Energy (kJ/mole)" "Speed (ns/day)"
100 0.10000000000000007 -20461978.001629103 0
200 0.20000000000000015 -20462133.848855495 6.8
300 0.3000000000000002 -20462153.789688706 6.79
400 0.4000000000000003 -20462202.823631693 6.79
500 0.5000000000000003 -20462257.760451913 6.79
600 0.6000000000000004 -20462329.421256337 6.79
700 0.7000000000000005 -20462362.9969222 6.8
800 0.8000000000000006 -20462488.402703974 6.8
900 0.9000000000000007 -20462532.231097963 6.8
1000 1.0000000000000007 -20462481.48763666 6.8
Implementation: torchani
MD run: 1000 steps
#"Step" "Time (ps)" "Potential Energy (kJ/mole)" "Speed (ns/day)"
100 0.10000000000000007 -20456285.324413814 0
200 0.20000000000000015 -20451616.878087416 2.48
300 0.3000000000000002 -20445519.385244645 2.49
400 0.4000000000000003 -20438851.384950936 2.19
500 0.5000000000000003 -20431004.40918439 2.13
600 0.6000000000000004 -20426584.870540198 2.18
700 0.7000000000000005 -20415840.214279402 2.22
800 0.8000000000000006 -20411478.48251822 2.24
900 0.9000000000000007 -20409822.772401713 2.26
1000 1.0000000000000007 -20402172.29296462 2.27 Note, however, that the GPU utilization I am seeing for the torchani implementation is low (under 30%). Whereas NNPOps is using 100%. |
I'm using a RTX2060. If I use your script, I get this output:
For
|
this is what I get on RTX3090, NNPOps is faster as expected but torchani is slower than all of the above:
|
On my ancient GTX 1080 Ti:
|
I have tested your script with two modifications (5K steps, write frequency set to 200 steps) on a RTX 3070 (not the same machine I posted my initial data) and I get the following:
|
Given torchani's low GPU utilization, maybe CPU performance is playing a role here. Perhaps your original 2060 machine has a particularly powerful CPU. |
Yes it seems to be very GPU dependent with the higher end cards with more CUDA cores having much more of a speedup from NNPOps. |
I think it would be useful to collate some performance benchmarks like the above on different hardware, and system sizes, so people can know if their systems are running at expected speed, i.e similar to here: https://openmm.org/benchmarks |
Did we ever figure out what was happening here? |
Hi,
when I simulate a 15 Angstrom waterbox with the
torchani
andnnpops
implementation thetorchani
implementation is slightly faster. Isnnpops
only outperformingtorchani
with small system size? I have attached a minimum example to reproduce the shown output.min.py.zip
The text was updated successfully, but these errors were encountered: