GPUs? #190

ajwheeler · 2023-06-09T15:03:33Z

It has been suggested before that Korg might be sped up by putting the hot loop (line opacity calculation) on the GPU.

This paper presents an interesting implementation of that idea: https://ui.adsabs.harvard.edu/abs/2015ApJ...808..182G/abstract

Comparing their performance to ours is not straightforward. They take about ~1s to calculate opacity from 10^6 lines in their performance tests. If I do a synthesis for a cool star with the Pokazatel water linelist, there are ~2*10^6 lines across ~50 layers -> 10^8 line opacity calculations. This takes ~40s. So naively, we are already doing great (0.4 µs vs their 1 µs per line). BUT:

I am running on my laptop, on a single M2 performance core. They are using an NVIDIA Tesla K20 GPU. What we really want is something like [time / line / $ of compute], but I have no idea what the appropriate numbers are for this hardware.
They are calculating their lines out to 100 cm^-1 (they also do 10 and 1000). If I am not bungling the math, this corresponds to ~250 Aangstroms, which is orders of magnitude bigger than my line windows. They discuss the fact that Voigt profiles probably aren't really correct that far out, but they do make a difference to the wavelength-integrated opacity. This may be a problem with Korg and other stellar synthesis codes.
I'm not sure what temperature and pressure they computed their opacity at. This matter a lot, but not as much as you would think given that their line windows are fixed.
I'm not sure how their Voigt approximation compares to ours in speed or accuracy. (Though I noted it in voigt could be more accurate #43.)
Finally, conventional wisdom is that pressure broadening of molecular lines is negligible in stars. I still haven't gotten around to verifying this, but assuming it's true, we can always pre-tabulate molecular opacity as a function of temperature. This will be extremely fast. (edit: this is implemented and it is indeed very fast)

The code for this (and for GPU-accelerated RT!) is on their github: https://github.com/exoclime.

The text was updated successfully, but these errors were encountered:

andrew-saydjari · 2023-06-09T16:07:40Z

Benchmarking on M2's makes things complicated. I would benchmark on Intel/AMD. My 2 cents would be that the metric you might actually want would require normalizing by the number of cores and the fraction of the GPU memory being used (given that they can be partitioned via MiG configurations now). Then, I always have in my mind the order of mag estimate that a modern GPU cost is 100x a modern CPU cost. An example of this is through NSF ACCESS calculator. For example, on DARWIN, a GPU-h is 69x more than a CPU-h.

andrew-saydjari · 2023-06-18T16:09:39Z

Just because I saw this today, at Harvard's Cannon cluster, an hour on an NVIDIA A100 is 209.4x an hour on an Intel Cascade Lake core. https://docs.rc.fas.harvard.edu/kb/fairshare/

ajwheeler · 2024-11-29T20:33:00Z

#368

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPUs? #190

GPUs? #190

ajwheeler commented Jun 9, 2023 •

edited

Loading

andrew-saydjari commented Jun 9, 2023

andrew-saydjari commented Jun 18, 2023

ajwheeler commented Nov 29, 2024

GPUs? #190

GPUs? #190

Comments

ajwheeler commented Jun 9, 2023 • edited Loading

andrew-saydjari commented Jun 9, 2023

andrew-saydjari commented Jun 18, 2023

ajwheeler commented Nov 29, 2024

ajwheeler commented Jun 9, 2023 •

edited

Loading