This repo is based on SCAN model proposed by Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu and Xiaodong He.
Bottom-up feature repo was used to extract features, pre-trained models were used for extraction tasks.
This repo does NOT contain model files and extracted features, if you want to download them, please use this link to download the full version (read-to-run) of the project.
-
According to captioins and corresponding image labels, we can extract the corresponding image names. This extraction process can be done using
image-text/egyptian_convert.py
, the captions for each image will have a corresponding.txt
file created. For texting data, these.txt
are saved inphrase
directory and their corresponding image features are saved underfeatures
directory. After obtainedegptian-test.npy
, the same, we saved the training result underdata/phrase_train
,features_train
directory, anddata/train_phrase.npy
was obtained for training. -
Use
image-text/preprocess.ipynb
to obtaindata/vocab.json
. We use the API proposed by Handler et al. to extract noun phrases. -
For training process, we can simply modify the
PrecompDataset
class inimage-text/data.py
to change related processing methods. -
For testing process, we just need to load our saved model then run
image-text/evaluation.py
script. -
Original datasets of Egyptian arts images were saved under
artworks
. -
For training and testing parameters, check descriptions and instructions in
evaluationi.py
for testing andtrain.py
for training.
python bottom-up-features/extract_features.py --image_dir artworks/test --out_dir artworks/features --cfg bottom-up-features/cfgs/faster_rcnn_resnet101.yml --model bottom-up-features/models/bottomup_pretrained_10_100.pth
Some remarks:
- The captions for each image in the original paper were 5 which I changed to 1 in this case
- Run the following command to train the dataset
python train.py --data_name chinese_artworks --logger_name runs/chinese_artworks_scan/log --model_name runs/chinese_artworks_scan/log --max_violation --bi_gru --img_dim 2048
- Pre-trained models are stored under
./runs/
folder