This project fine-tunes the BLIP (Bootstrapping Language-Image Pre-training) model for pathological Visual Question Answering (VQA) to improve accuracy in pathological yes/no questions. Utilizing the PathVQA dataset with 32,799 question-answer pairs from 4,998 pathology images, the model was trained using the AdamW optimizer, learning rate scheduling, and mixed precision training. Hyperparameter optimization via Optuna led to significant performance improvements: accuracy rose from 0.5164 to 0.8554, precision from 0.5344 to 0.8560, recall from 0.8122 to 0.8805, and F1 score from 0.6447 to 0.8681.
-
Notifications
You must be signed in to change notification settings - Fork 1
Fine-tuning BLIP for pathological visual question answering.
License
slinusc/path-vqa-blip
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Fine-tuning BLIP for pathological visual question answering.
Topics
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published