Skip to content

Commit e973d4c

Browse files
committed
update README.md, minor
1 parent 7abd524 commit e973d4c

File tree

3 files changed

+95
-27
lines changed

3 files changed

+95
-27
lines changed

README.md

+35-16
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,49 @@
11
# [FOTS: Fast Oriented Text Spotting with a Unified Network](https://arxiv.org/abs/1801.01671) text detection branch reimplementation ([PyTorch](https://pytorch.org/))
2+
23
## Train
34
1. Train with SynthText for 9 epochs
45
```sh
56
time python3 train.py --train-folder SynthText/ --batch-size 21 --batches-before-train 2
67
```
7-
At this point the result was `Epoch 8: 100%|█████████████| 390/390 [08:20<00:00, 1.01it/s, Mean loss=0.49507]`.
8+
At this point the result was `Epoch 8: 100%|█████████████| 390/390 [08:28<00:00, 1.00it/s, Mean loss=0.98050]`.
89
2. Train with ICDAR15
9-
Change data set in train.py and run
10+
11+
Replace a data set in `data_set = datasets.SynthText(args.train_folder, datasets.transform)` with `datasets.ICDAR2015` in [`train.py`](./train.py) and run
1012
```sh
11-
time python3 train.py --train-folder icdar15/ --batch-size 21 --batches-before-train 2 --continue-training
13+
time python3 train.py --train-folder icdar15/ --continue-training --batch-size 21 --batches-before-train 2
1214
```
13-
It is expected that the provided `--train-folder` contains unzipped `ch4_training_images` and `ch4_training_localization_transcription_gt`.
14-
The result was `Epoch 600: 100%|█████████████| 48/48 [01:06<00:00, 1.04s/it, Mean loss=0.07742]`.
15+
It is expected that the provided `--train-folder` contains unzipped `ch4_training_images` and `ch4_training_localization_transcription_gt`. To avoid saving model at each epoch, the line `if True:` in [`train.py`](./train.py) can be replaced with `if epoch > 60 and epoch % 6 == 0:`
16+
17+
The result was `Epoch 582: 100%|█████████████| 48/48 [01:05<00:00, 1.04s/it, Mean loss=0.11290]`.
18+
1519
### Weight decay history:
16-
Epoch 185: reducing learning rate of group 0 to 5.0000e-04.
17-
Epoch 274: reducing learning rate of group 0 to 2.5000e-04.
18-
Epoch 325: reducing learning rate of group 0 to 1.2500e-04.
19-
Epoch 370: reducing learning rate of group 0 to 6.2500e-05.
20-
Epoch 410: reducing learning rate of group 0 to 3.1250e-05.
21-
Epoch 484: reducing learning rate of group 0 to 1.5625e-05.
22-
Epoch 517: reducing learning rate of group 0 to 7.8125e-06.
23-
Epoch 550: reducing learning rate of group 0 to 3.9063e-06.
20+
Epoch 175: reducing learning rate of group 0 to 5.0000e-04.
21+
22+
Epoch 264: reducing learning rate of group 0 to 2.5000e-04.
23+
24+
Epoch 347: reducing learning rate of group 0 to 1.2500e-04.
25+
26+
Epoch 412: reducing learning rate of group 0 to 6.2500e-05.
27+
28+
Epoch 469: reducing learning rate of group 0 to 3.1250e-05.
29+
30+
Epoch 525: reducing learning rate of group 0 to 1.5625e-05.
31+
32+
Epoch 581: reducing learning rate of group 0 to 7.8125e-06.
33+
2434
## Test
2535
```sh
26-
python3 test.py --images-folder ch4_test_images/ --output-folder res/ --checkpoint epoch_600_checkpoint.pt && zip -jmq runs/u.zip res/* && python2 script.py -g=gt.zip -s=runs/u.zip
27-
36+
python3 test.py --images-folder ch4_test_images/ --output-folder res/ --checkpoint epoch_582_checkpoint.pt && zip -jmq runs/u.zip res/* && python2 script.py -g=gt.zip -s=runs/u.zip
2837
```
2938
`ch4_training_images` and `ch4_training_localization_transcription_gt` are available in [Task 4.4: End to End (2015 edition)](http://rrc.cvc.uab.es/?ch=4&com=downloads). `script.py` and `ch4_test_images` can be found in [My Methods](https://rrc.cvc.uab.es/?ch=4&com=mymethods&task=1) (`Script: IoU` and `test set samples`).
30-
It gives `Calculated!{"precision": 0.8700890518596124, "recall": 0.7997111218103033, "hmean": 0.8334169593577522, "AP": 0}`. The pretrained models are here: https://drive.google.com/open?id=1xaVshLRrMEkb9LA46IJAZhlapQr3vyY2
39+
40+
It gives `Calculated!{"precision": 0.8694968553459119, "recall": 0.7987481945113144, "hmean": 0.8326223337515684, "AP": 0}`.
41+
42+
The pretrained models are here: https://drive.google.com/open?id=1xaVshLRrMEkb9LA46IJAZhlapQr3vyY2
43+
44+
[`test.py`](./test.py) has a commented code to visualize results.
45+
46+
## Difference with the paper
47+
1. The model is different compared to what the paper describes. An explanation is in [`model.py`](./model.py).
48+
2. The authors of FOTS could not train on clipped words because they also have a recognition branch. The whole word is required to be present on an image to be able to be recognized correctly. This reimplementation has only detection branch and that allows to train on crops of the words.
49+
3. The paper suggest using some other data sets in addition. Training on SynthText is simplified in this reimplementation.

model.py

+46
Original file line numberDiff line numberDiff line change
@@ -82,3 +82,49 @@ def forward(self, x):
8282
angle = torch.sigmoid(angle) * np.pi / 2
8383

8484
return confidence, distances, angle
85+
86+
87+
# class FOTSModel(nn.Module):
88+
# """This model is described in the paper, but it trains slower and gives slightly worse results"""
89+
# def __init__(self, crop_height=640):
90+
# super().__init__()
91+
# self.crop_height = crop_height
92+
# self.resnet = torchvision.models.resnet50(pretrained=True)
93+
# self.conv1 = nn.Sequential(
94+
# self.resnet.conv1,
95+
# self.resnet.bn1,
96+
# self.resnet.relu,
97+
# ) # 64 * 4
98+
# self.encoder1 = self.resnet.layer1 # 64 * 4
99+
# self.encoder2 = self.resnet.layer2 # 128 * 4
100+
# self.encoder3 = self.resnet.layer3 # 256 * 4
101+
# self.encoder4 = self.resnet.layer4 # 512 * 4
102+
103+
# self.decoder3 = Decoder(512 * 4, 256 * 4)
104+
# self.decoder2 = Decoder(256 * 4 * 2, 128 * 4)
105+
# self.decoder1 = Decoder(128 * 4 * 2, 64 * 4)
106+
107+
# self.confidence = conv(64 * 4 * 2, 1, kernel_size=1, padding=0, bn=False, relu=False)
108+
# self.distances = conv(64 * 4 * 2, 4, kernel_size=1, padding=0, bn=False, relu=False)
109+
# self.angle = conv(64 * 4 * 2, 1, kernel_size=1, padding=0, bn=False, relu=False)
110+
111+
# def forward(self, x):
112+
# x = self.conv1(x)
113+
# x = F.max_pool2d(x, kernel_size=2, stride=2)
114+
115+
# e1 = self.encoder1(x)
116+
# e2 = self.encoder2(e1)
117+
# e3 = self.encoder3(e2)
118+
# e4 = self.encoder4(e3)
119+
120+
# d3 = self.decoder3(e4, e3)
121+
# d2 = self.decoder2(d3, e2)
122+
# d1 = self.decoder1(d2, e1)
123+
124+
# confidence = self.confidence(d1)
125+
# distances = self.distances(d1)
126+
# distances = torch.sigmoid(distances) * self.crop_height
127+
# angle = self.angle(d1)
128+
# angle = torch.sigmoid(angle) * np.pi / 2
129+
130+
# return confidence, distances, angle

train.py

+14-11
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ def restore_checkpoint(folder, contunue):
1919
optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-5)
2020
lr_scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.5, patience=32, verbose=True, threshold=0.05, threshold_mode='rel')
2121

22-
checkppoint_name = os.path.join(folder, 'last_checkpoint.pt')
22+
checkppoint_name = os.path.join(folder, 'epoch_8_checkpoint.pt')
2323
if os.path.isfile(checkppoint_name) and contunue:
2424
checkpoint = torch.load(checkppoint_name)
2525
model.load_state_dict(checkpoint['model_state_dict'])
@@ -265,15 +265,15 @@ def fit(start_epoch, model, loss_func, opt, lr_scheduler, best_score, max_batche
265265

266266
loss = loss_func(prediction, (classification, regression, thetas, training_mask)) / max_batches_per_iter_cnt
267267
train_loss_stats += loss.item()
268-
loss_count_stats += 1
269268
loss.backward()
270269
batch_per_iter_cnt += 1
271270
if batch_per_iter_cnt == max_batches_per_iter_cnt:
272271
opt.step()
273272
batch_per_iter_cnt = 0
274-
pbar.set_postfix({'Mean loss': f'{train_loss_stats / loss_count_stats:.5f}'}, refresh=False)
275-
lr_scheduler.step(train_loss_stats / loss_count_stats, epoch)
276-
# lr_scheduler.step()
273+
loss_count_stats += 1
274+
mean_loss = train_loss_stats / loss_count_stats
275+
pbar.set_postfix({'Mean loss': f'{mean_loss:.5f}'}, refresh=False)
276+
lr_scheduler.step(mean_loss, epoch)
277277

278278
if valid_dl is None:
279279
val_loss = train_loss_stats / loss_count_stats
@@ -301,17 +301,20 @@ def fit(start_epoch, model, loss_func, opt, lr_scheduler, best_score, max_batche
301301
if __name__ == '__main__':
302302
parser = argparse.ArgumentParser()
303303
parser.add_argument('--train-folder', type=str, required=True, help='Path to folder with train images and labels')
304-
parser.add_argument('--batch-size', type=int, default=8, help='Number of batches to process before train step')
305-
parser.add_argument('--batches-before-train', type=int, default=4, help='Number of batches to process before train step')
304+
parser.add_argument('--batch-size', type=int, default=21, help='Number of batches to process before train step')
305+
parser.add_argument('--batches-before-train', type=int, default=2, help='Number of batches to process before train step')
306306
parser.add_argument('--num-workers', type=int, default=8, help='Path to folder with train images and labels')
307307
parser.add_argument('--continue-training', action='store_true', help='continue training')
308308
args = parser.parse_args()
309309

310-
synth = datasets.SynthText(args.train_folder, datasets.transform)
311-
# icdar = datasets.ICDAR2015(args.train_folder, datasets.transform)
312-
# concat_dataset = torch.utils.data.ConcatDataset((synth, icdar)) # the paper doesn't do that so me neither
310+
data_set = datasets.SynthText(args.train_folder, datasets.transform)
311+
# data_set = datasets.ICDAR2015(args.train_folder, datasets.transform)
313312

314-
dl = torch.utils.data.DataLoader(icdar, batch_size=args.batch_size, shuffle=True,
313+
# SynthText and ICDAR2015 have different layouts. One will probably need to provide two different paths to train
314+
# on concatination of these two data sets. But the paper doesn't concat them so me neither
315+
# datai_set = torch.utils.data.ConcatDataset((synth, icdar))
316+
317+
dl = torch.utils.data.DataLoader(data_set, batch_size=args.batch_size, shuffle=True,
315318
sampler=None, batch_sampler=None, num_workers=args.num_workers)
316319
checkoint_dir = 'runs'
317320
epoch, model, optimizer, lr_scheduler, best_score = restore_checkpoint(checkoint_dir, args.continue_training)

0 commit comments

Comments
 (0)