Skip to content

slinusc/path-vqa-blip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Abstract

This project fine-tunes the BLIP (Bootstrapping Language-Image Pre-training) model for pathological Visual Question Answering (VQA) to improve accuracy in pathological yes/no questions. Utilizing the PathVQA dataset with 32,799 question-answer pairs from 4,998 pathology images, the model was trained using the AdamW optimizer, learning rate scheduling, and mixed precision training. Hyperparameter optimization via Optuna led to significant performance improvements: accuracy rose from 0.5164 to 0.8554, precision from 0.5344 to 0.8560, recall from 0.8122 to 0.8805, and F1 score from 0.6447 to 0.8681.

About

Fine-tuning BLIP for pathological visual question answering.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published