GitHub - tangfy97/Fine-grained-Image-Text-Alignment: Repo for fine-grained multimodal retrieval for artworks

This repo is based on SCAN model proposed by Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu and Xiaodong He.

Bottom-up feature repo was used to extract features, pre-trained models were used for extraction tasks.

This repo does NOT contain model files and extracted features, if you want to download them, please use this link to download the full version (read-to-run) of the project.

Structure of the project:

According to captioins and corresponding image labels, we can extract the corresponding image names. This extraction process can be done using image-text/egyptian_convert.py, the captions for each image will have a corresponding .txt file created. For texting data, these .txt are saved in phrase directory and their corresponding image features are saved under features directory. After obtained egptian-test.npy, the same, we saved the training result under data/phrase_train, features_train directory, and data/train_phrase.npy was obtained for training.
Use image-text/preprocess.ipynb to obtain data/vocab.json. We use the API proposed by Handler et al. to extract noun phrases.
For training process, we can simply modify the PrecompDataset class in image-text/data.py to change related processing methods.
For testing process, we just need to load our saved model then run image-text/evaluation.py script.
Original datasets of Egyptian arts images were saved under artworks.
For training and testing parameters, check descriptions and instructions in evaluationi.py for testing and train.py for training.

Feature extraction command:

python bottom-up-features/extract_features.py --image_dir artworks/test --out_dir artworks/features --cfg bottom-up-features/cfgs/faster_rcnn_resnet101.yml --model bottom-up-features/models/bottomup_pretrained_10_100.pth

Some remarks:

The captions for each image in the original paper were 5 which I changed to 1 in this case
Run the following command to train the dataset

python train.py --data_name chinese_artworks --logger_name runs/chinese_artworks_scan/log --model_name runs/chinese_artworks_scan/log --max_violation --bi_gru --img_dim 2048

Pre-trained models are stored under ./runs/ folder

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.ipynb_checkpoints		.ipynb_checkpoints
runs		runs
util		util
.DS_Store		.DS_Store
.gitmodules		.gitmodules
README.md		README.md
caption_data_pytorch.json		caption_data_pytorch.json
chinese_artworks_convert.ipynb		chinese_artworks_convert.ipynb
data.py		data.py
demo.ipynb		demo.ipynb
egyptian_convert.py		egyptian_convert.py
evaluation.py		evaluation.py
model.py		model.py
preprocess.ipynb		preprocess.ipynb
ranks.pth.tar		ranks.pth.tar
train.py		train.py
vocab.py		vocab.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

This repo is based on SCAN model proposed by Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu and Xiaodong He.

Bottom-up feature repo was used to extract features, pre-trained models were used for extraction tasks.

Structure of the project:

Feature extraction command:

About

Releases

Packages

Languages

tangfy97/Fine-grained-Image-Text-Alignment

Folders and files

Latest commit

History

Repository files navigation

This repo is based on SCAN model proposed by Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu and Xiaodong He.

Bottom-up feature repo was used to extract features, pre-trained models were used for extraction tasks.

Structure of the project:

Feature extraction command:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages