Privacy Preserving AI/ML for Drug Discovery
ChemXor is an open source library that provides a set of pre-tuned model architectures for evaluating FHE(Fully homomorphic encryption) encrypted inputs. These models can be trained as normal Pytorch models. It also provides convenient functions to quickly query and host these models as a service with strong privacy guarantees for the end-user. It is built on top of TenSEAL and Pytorch.
A cryptosystem that supports arbitrary computation on ciphertexts is known as fully homomorphic encryption (FHE). Such a scheme enables the construction of programs for any desirable functionality, which can be run on encrypted inputs to produce an encryption of the result. Since such a program need never decrypt its inputs, it can be run by an untrusted party without revealing its inputs and internal state. Fully homomorphic cryptosystems have great practical implications in the outsourcing of private computations. (Wikipedia)
Using FHE, one can compute on encrypted data, without learning anything about the data. This enables novel privacy preserving interactions between actors in the context of machine learning.
Look at the notebook directory for tutorials
Create a conda environment
conda create -n chemxor python=3.9
conda activate chemxor
Clone the ChemXor repository and install it
git clone
cd chemxor
python -m pip install -e .
At the moment, one can choose from 3 pre-tuned models.
- OlindaNetZero : Slimmest model with one convolution and 3 linear layers
- OlindaNet: Model with two convolutions and 4 linear layers
- OlindaOneNet: Model with four convolutions and 4 linear layers
These models accept a 32 x 32 input and can be configured to produce a signle or multiple outputs.
from chemxor.models import OlindaNetZero, OlindaNetOne, OlindaNet
# model for regression
model = OlindaNetZero(output = 1)
# model for classification
model = OlindaNetZero(output = 2)
The model is a normal Pytorch Lightning module which is compatible with Pytorch NN module.
ChemXor provides two generic Pytorch Lightning Datamodules (Regression, Classification) that can be used to train and evaluate the models. These Datamodules expects raw data as CSV files with two columns in the following order: value, SMILES
from import OlindaCDataModule, OlindaRDataModule
dm_regression = OlindaRDataModule(csv_path="path/to/csv")
# Use the threshold value to automatically create categorical
# classes from the value column of the CSV
dm_classification = OlindaCDataModule(csv_path="path/to/csv", threshold=[0.5])
The DataModules will take care of converting the smiles input to 32 x 32 images.
It is recommended to use a Pytorch Lightning trainer to train the models. Although a normal Pytorch training loop can also be used.
import pytorch_lightning as pl
# Save the best 3 checkpoints based on validation loss
checkpoint_callback = pl.callbacks.ModelCheckpoint(
trainer = pl.Trainer(callbacks=[checkpoint_callback], accelerator="auto"), datamodule=data_module)
The model performance can be evaluated using TensorBoard
# on the CLI, point towards the logs folder
tensorboard --logdir lightning_logs/version_1
After training, the models can be wrapped using their specific FHE wrappers to process FHE inputs. FHE wrappers will take care of Tenseal context parameters and keys management.
from chemxor.models import OlindaNetZero, OlindaNetOne, OlindaNet
from chemxor.models import FHEOlindaNetZero, FHEOlindaNetOne, FHEOlindaNet
model = OlindaNetZero(output = 1)
fhe_model = FHEOlindaNetZero(model=model)
The Datamodules can generate Pytorch dataloaders that produce encrypted inputs for the model.
from import OlindaCDataModule, OlindaRDataModule
dm_regression = OlindaRDataModule(csv_path="path/to/csv")
enc_data_loader = dm_classification.enc_dataloader(context=fhe_model.enc_context)
enc_sample = next(iter(enc_data_loader))
Also, the FHE models are partitioned to control multiplicative depth. So, the forward function is modified to accept a step parameter. For testing, The FHE model can be evaluated locally as follows:
from chemxor.utils import prepare_fhe_input
output = enc_sample[0]
for step in range(fhe_model.steps):
output = fhe_model(output, step)
dec_out = output.decrypt()
output = prepare_fhe_input(
# final decryted output
decrypted_output = output.decrypt()
This process can automated using a utility function provided by ChemXor
from chemxor.utils import evaluate_fhe_model
decrypted_output = evaluate_fhe_model(fhe_model, enc_sample[0], decrypt = True)
FHE Models can be served in the form of a Flask app as follows:
from chemxor.service import PartitionNetServer
fhe_model_server = PartitionNetServer(fhe_model)
if __name__ == "__main__":
ChemXor's Pre defined Models can also be served using the CLI
chemxor serve olida|olinda_zero|olinda_one
chemxor query [model url] [molecule SMILES]
We use poetry to manage project dependecies. Use poetry to install project in editable mode.
poetry install
This project has been supported by a Biopharma Speed Grant from Merck KGaA.
The Ersilia Open Source Initiative is a Non Profit Organization (1192266) with the mission is to equip labs, universities and clinics in LMIC with AI/ML tools for infectious disease research.
Help us achieve our mission!
This project is licensed under GNU AFFERO GENERAL PUBLIC LICENSE Version 3. The direct and indirect dependecies of the project are licensed as follows:
