Accompanying code for the approach presented in Prompt Tuning for Parameter-efficient Medical Image Segmentation.
Based on this code, we
- introduce a deeply prompt-able encoder-decoder architecture (prompt-able UNet, PUNet) that can incorporate additional class-dependent prompt tokens to achieve dense binary and multi-class segmentation
- contribute architectural components comprising prompt-able shifted window (PSWin) blocks, a heterogeneous bias score generation within the attention scheme, and a weighted similarity aggregation to enable token-dependent class predictions
- propose a contrastive pre-training scheme specifically designed for dense self-supervision by soft assignments to online generated prototypes to establish anatomical representations while circumventing a hard separation of the contrastive attraction and repulsion
- show that ”prompting” of the pre-trained and frozen architecture by non-frozen (learned) prompt tokens is sufficient for adaptation to a segmentation downstream task on medical imaging data,
- leverage our assignement-based self-supervision scheme to enable the concurrent application of a prompt-dependent segmentation supervision in the pre-training phase, further reducing the performance gap between fully fine-tuned and efficiently adapted variants
The published code contains
- the prompt-able UNet (PUNet) architecture and underlying PSWin blocks (see Figure 1)
- the proposed dense self-supervision scheme based on contrastive prototype assignments (see Figure 2)
- the training routines, including using various prompt-dependent predictions in a single batch
- the ability to process 2D as well as 3D imaging data (tested FOVs are included in the config file)
This code is provided as is. It builds upon the PyTorch Lightning framework. Where possible MONAI functionality has been used.
See the data pre-processing and data gathering on how to prepare data for e.g. TCIA .
python3 ./src/main.py --gpus 1 --batch_size 8 --architecture wip --dataset tcia_btcv --dir_images /path/to/my/data --dir_masks /path/to/my/labels
Valid configuration variants are included in the config file which is used for the phase 1 shell script.
For the loss configuration use
- self for self-supervision,
- meta for segmentation (semi-)supervision,
- meta_self for joint supervision,
- and _noninstructed for non-prompt-based architecture variants.
Have a look at the flags of the main module for more details.
python3 ./src/main.py --gpus 1 --batch_size 8 --architecture wip --dataset tcia_btcv --dir_images /path/to/my/data --dir_masks /path/to/my/labels --ckpt /path/to/my/ckpt --no_overwrite --cold_start --downstream --adaptation_variant prompting --selective_freezing --label_indices_base 1 --label_indices_downstream_active 2 --max_epochs 100
Valid configuration variants are included in the config file which is used for the phase 2 shell script.
New classes can be provided via class index, e.g. --label_indices_downstream_active 2
Have a look at the flags of the main module for more details.
python3 ./src/main.py --gpus 1 --mode test --architecture wip --dataset tcia_btcv --dir_images /path/to/my/data --dir_masks /path/to/my/labels --ckpt /path/to/my/ckpt --no_overwrite --cold_start
Figure 1: Schematic illustration of the proposed prompt-able UNet (PUNet). The network consists of an encoder with down-convolutions and a decoder with linear upsampling layers. A depth of 5 levels is chosen with 32, 64, 128, 256, 384 hidden channels
Figure 2: Input image slices
Figure 3: a/f) Exemplary slices of the TCIA/BTCV and CT-ORG dataset, with annotated masks shown in shades of blue, b-e) augmented student views with masked regions or strong contrast adjustments, g-j) respective teacher views with overlays of the cosine similarity of the predicted teacher embedding
Figure 4: Visualization of cosine similarities between predicted teacher embeddings