Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move from Henkelman Bader to PyBader #711

Open
4 tasks
SWeav02 opened this issue Jan 23, 2025 · 3 comments
Open
4 tasks

Move from Henkelman Bader to PyBader #711

SWeav02 opened this issue Jan 23, 2025 · 3 comments
Labels
enhancement New feature or request

Comments

@SWeav02
Copy link
Contributor

SWeav02 commented Jan 23, 2025

Describe the desired feature

Rather than using the henkelman group's bader program which requires either docker or local install, it might be useful to switch to pybader.

Additional context

Currently, the bader workflows we have rely on the Henkelman groups fortran implementation. This means users need to use either docker or a local install. The pybader package uses essentially the same implementation as the Henkelman group, but exclusively using python packages (numpy, numba, tqdm, and pandas). This also means that results can be easily read from the resulting python object rather than needing to read results from output files.

I've already implemented pybader in the Grid class I made for BadELF:

def to_pybader(self):
"""
Returns a Bader object from pybader.
"""
atoms = self.structure.cart_coords
lattice = self.matrix
density = {"charge": self.total}
file_info = {"voxel_offset": np.array([0, 0, 0])}
bader = Bader(density, lattice, atoms, file_info)
return bader
def run_pybader(self, threads: int = 1):
"""
Convenience class for running zero-flux voxel assignment using pybader.
Returns a pybader Bader class object with the assigned voxels
Args:
cores (int):
The number of threads to use when running the Bader algorithm
"""
logging.info("Running Bader")
bader = self.to_pybader()
bader.load_config("speed")
bader.threads = threads
bader.spin_flag = True # loading speed config resets all config vars
bader.volumes_init()
bader.bader_calc()
bader.bader_to_atom_distance()
bader.refine_volumes(bader.atoms_volumes)
bader.threads = 1
bader.min_surface_distance()
return bader

The Grid class would also make it very easy to create a python only implementation of the combine_chgars workflow so that we can remove the extra workflow.

The main issues I foresee are that pybader is not longer actively maintained and does not by default use a reference file like the henkelman group's implementation. I think the first of these is likely to be ok unless one of the four required packages changes drastically. The second is relatively easy to manage because the voxel assignments are stored in a pybader object (bader.atom_volumes). This would make it easy to run the algorithm on one Grid object and then sum the voxels in another Grid object.

@jacksund let me know if you have any issues with this. Otherwise I can work on it when I have some free time.

To-do items

As I see it, to implement this we would need to:

  • Move the Grid class to a more accessible location (toolkit.base_data_types?)
  • Add convenience functions to the Grid class for summing files and running bader
  • Adjust existing workflows to use the Grid class
  • Remove code related to reading the ACF.dat
@SWeav02 SWeav02 added the enhancement New feature or request label Jan 23, 2025
@jacksund
Copy link
Owner

jacksund commented Jan 23, 2025

You might want to consider a couple of things:

The main issues I foresee are that pybader is not longer actively maintained ...
... I think the first of these is likely to be ok unless one of the four required packages changes drastically

Even a super tiny change to a single dep will cause problems. It would mean you're now pinned to an old pandas/numpy/numba version forever unless you fork the repo or get ahold of the maintainer to fix it. There is also a high chance of a future python version breaking a feature. You can see in the pybade repo they've only tested up to Python 3.8, which has already hit its end of life.

So if you do decide to go with it, I won't be adding pybader as a default dependency of simmate. It instead will be a dependency specifically for your app simmate.apps.badelf. This means users will still have to manually install pybader via pip or conda when setting up the app, and you'll add this step to the app docs.

While simmate does have a ton of dependencies already, I've slowly been cutting these down. In fact I just removed django_pandas last week #709 for similar reasons as to why pybader isn't the best idea. I'll be removing django-crispy-forms, django-contrib-comments, dj-database-url, djangorestframework, and more at some point too.

In extreme cases like yours, you can fork pybader and manage it yourself like I am now doing with django_unicorn #702 -- but that is a big commitment. Make sure you fully understand their source code before considering that. This could also wait until pybader "breaks" for you

Remove code related to reading the ACF.dat

please don't do this! Adding support for pybader does not mean you should remove support for the Henkelman code. Leave those features in place and build a config/setting that lets you choose the backend -- or just make a new app entirely if they don't mesh well.

I'm good example of someone that's used to the Henkelman code and would like to use those workflows, even if your badelf + new bader workflows go another route

The Grid class would also make it very easy to create a python only implementation of the combine_chgars workflow so that we can remove the extra workflow.

Sounds good to me. Also Pybader is one option, but you can do this with pymatgen too. We talked about a similar switch a while back in #565. And I actually made a note for myself to depreciate the combine chgcars workflow here:

# TODO: The chgsum.pl script will be replaced with a simple python function
# that just sums the two files. It might not be as fast but it removes one
# executable file from having to be in the user's path. So in the future, this
# Task will be depreciated/removed into the BaderAnalysis.setup method.

Move the Grid class to a more accessible location (toolkit.base_data_types?)

Considering everything else above, it'll only make it to the toolkit if its based on pymatgen. There's nothing wrong with keeping it in the bader app though -- I'd vote to keep it there and then move it to the toolkit if other apps want to make use of it as well.

@jacksund
Copy link
Owner

Even with all those points above! badelf is still your app -- so you can go full steam ahead on this. I'll only push back on things if you start removing features from the bader app and/or start modifying code outside of the badelf app. In terms of me asking to support both Henk bader + pybader within badelf, that's effectively me making a request as a user of your badelf app (and you can ultimately decide not to support it)

@SWeav02
Copy link
Contributor Author

SWeav02 commented Jan 24, 2025

Some good points here! I didn't realize just how outdated pybader was. Unfortunately for me, it's practically essential for the BadELF app because I need the voxel assignments from the bader method to be stored in an array that I use for other stuff down the line. As far as I've been able to figure out, the Henkelman code can't print this information or I would've stuck with it 😢. If pybader ever does break, I might just write my own version of the code that's trimmed down since a lot of what they do isn't necessary for what I need.

Considering everything else above, it'll only make it to the toolkit if its based on pymatgen.

This is on me for not mentioning this, but I actually moved the Grid class to be based on pymatgen's VolumetricData class a while ago. Mine basically just adds a bunch of convenience functions and I use it all the time outside of BadELF. My one gripe, and the reason I didn't inherit from pymatgen initially, is that loading larger files takes an absurd amount of time (like minutes compared to seconds from my previous method). I want to try and add a new loading method at some point, but I'll need whatever method I write to still populate the important stuff so that the pymatgen methods don't break.

The Grid class would also make it very easy to create a python only implementation of the combine_chgars workflow so that we can remove the extra workflow.

Based on this thread, I won't bother moving anything over to pybader. I will probably still replace the combine_chgcars workflow with a simple static function utility in the Grid class, probably in a PR soon. So if you don't mind, lets leave this issue open just until I finish that as a reminder for me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants