This repository contains a simple AutoML tool based on TabPFN to quickly perform binary classification tasks.
Please follow these steps to install mini-automl
# create a conda environment
conda create -n miniautoml python=3.10
conda activate miniautoml
# download tabpfn for cpu usage
pip install torch --index-url
git clone
pip install TabPFN/.
rm -rf TabPFN
# download fragment embeddings
git clone
pip install -e fragment-embedding/.
# clone the current repository
git clone
cd mini-automl
python -m pip install -e .
from fragmentembedding import FragmentEmbedder
from miniautoml import train_binary_classifier, get_example
df = get_example()
smiles_list = df["smiles"].tolist()
y = df["signature_2"].tolist()
X = FragmentEmbedder().transform(smiles_list)
mdl = train_binary_classifier(X[:-10], y[:-10], n_splits=5)
The code above will automatically perform a stratified shuffle splits to estimate model performance. If you just want to train the model, simply set n_splits=None
To pretrain promiscuity and signature models, execute the following:
python scripts/
python scripts/
python scripts/
python scripts/
python scripts/
python scripts/
This work is done by the Georg Winter Lab at CeMM, Vienna.