Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPUs? #190

Open
ajwheeler opened this issue Jun 9, 2023 · 3 comments
Open

GPUs? #190

ajwheeler opened this issue Jun 9, 2023 · 3 comments

Comments

@ajwheeler
Copy link
Owner

ajwheeler commented Jun 9, 2023

It has been suggested before that Korg might be sped up by putting the hot loop (line opacity calculation) on the GPU.

This paper presents an interesting implementation of that idea: https://ui.adsabs.harvard.edu/abs/2015ApJ...808..182G/abstract

Comparing their performance to ours is not straightforward. They take about ~1s to calculate opacity from 10^6 lines in their performance tests. If I do a synthesis for a cool star with the Pokazatel water linelist, there are ~2*10^6 lines across ~50 layers -> 10^8 line opacity calculations. This takes ~40s. So naively, we are already doing great (0.4 µs vs their 1 µs per line). BUT:

  • I am running on my laptop, on a single M2 performance core. They are using an NVIDIA Tesla K20 GPU. What we really want is something like [time / line / $ of compute], but I have no idea what the appropriate numbers are for this hardware.
  • They are calculating their lines out to 100 cm^-1 (they also do 10 and 1000). If I am not bungling the math, this corresponds to ~250 Aangstroms, which is orders of magnitude bigger than my line windows. They discuss the fact that Voigt profiles probably aren't really correct that far out, but they do make a difference to the wavelength-integrated opacity. This may be a problem with Korg and other stellar synthesis codes.
  • I'm not sure what temperature and pressure they computed their opacity at. This matter a lot, but not as much as you would think given that their line windows are fixed.
  • I'm not sure how their Voigt approximation compares to ours in speed or accuracy. (Though I noted it in voigt could be more accurate #43.)
  • Finally, conventional wisdom is that pressure broadening of molecular lines is negligible in stars. I still haven't gotten around to verifying this, but assuming it's true, we can always pre-tabulate molecular opacity as a function of temperature. This will be extremely fast. (edit: this is implemented and it is indeed very fast)

The code for this (and for GPU-accelerated RT!) is on their github: https://github.com/exoclime.

@andrew-saydjari
Copy link
Collaborator

Benchmarking on M2's makes things complicated. I would benchmark on Intel/AMD. My 2 cents would be that the metric you might actually want would require normalizing by the number of cores and the fraction of the GPU memory being used (given that they can be partitioned via MiG configurations now). Then, I always have in my mind the order of mag estimate that a modern GPU cost is 100x a modern CPU cost. An example of this is through NSF ACCESS calculator. For example, on DARWIN, a GPU-h is 69x more than a CPU-h.

@andrew-saydjari
Copy link
Collaborator

Just because I saw this today, at Harvard's Cannon cluster, an hour on an NVIDIA A100 is 209.4x an hour on an Intel Cascade Lake core. https://docs.rc.fas.harvard.edu/kb/fairshare/

@ajwheeler
Copy link
Owner Author

#368

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants