You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# [FOTS: Fast Oriented Text Spotting with a Unified Network](https://arxiv.org/abs/1801.01671) text detection branch reimplementation ([PyTorch](https://pytorch.org/))
2
+
2
3
## Train
3
4
1. Train with SynthText for 9 epochs
4
5
```sh
5
6
time python3 train.py --train-folder SynthText/ --batch-size 21 --batches-before-train 2
6
7
```
7
-
At this point the result was `Epoch 8: 100%|█████████████| 390/390 [08:20<00:00, 1.01it/s, Mean loss=0.49507]`.
8
+
At this point the result was `Epoch 8: 100%|█████████████| 390/390 [08:28<00:00, 1.00it/s, Mean loss=0.98050]`.
8
9
2. Train with ICDAR15
9
-
Change data setin train.py and run
10
+
11
+
Replace a data setin`data_set = datasets.SynthText(args.train_folder, datasets.transform)` with `datasets.ICDAR2015`in [`train.py`](./train.py) and run
10
12
```sh
11
-
time python3 train.py --train-folder icdar15/ --batch-size 21 --batches-before-train 2 --continue-training
13
+
time python3 train.py --train-folder icdar15/ --continue-training --batch-size 21 --batches-before-train 2
12
14
```
13
-
It is expected that the provided `--train-folder` contains unzipped `ch4_training_images` and `ch4_training_localization_transcription_gt`.
14
-
The result was `Epoch 600: 100%|█████████████| 48/48 [01:06<00:00, 1.04s/it, Mean loss=0.07742]`.
15
+
It is expected that the provided `--train-folder` contains unzipped `ch4_training_images` and `ch4_training_localization_transcription_gt`. To avoid saving model at each epoch, the line `if True:`in [`train.py`](./train.py) can be replaced with `if epoch > 60 and epoch % 6 == 0:`
16
+
17
+
The result was `Epoch 582: 100%|█████████████| 48/48 [01:05<00:00, 1.04s/it, Mean loss=0.11290]`.
18
+
15
19
### Weight decay history:
16
-
Epoch 185: reducing learning rate of group 0 to 5.0000e-04.
17
-
Epoch 274: reducing learning rate of group 0 to 2.5000e-04.
18
-
Epoch 325: reducing learning rate of group 0 to 1.2500e-04.
19
-
Epoch 370: reducing learning rate of group 0 to 6.2500e-05.
20
-
Epoch 410: reducing learning rate of group 0 to 3.1250e-05.
21
-
Epoch 484: reducing learning rate of group 0 to 1.5625e-05.
22
-
Epoch 517: reducing learning rate of group 0 to 7.8125e-06.
23
-
Epoch 550: reducing learning rate of group 0 to 3.9063e-06.
20
+
Epoch 175: reducing learning rate of group 0 to 5.0000e-04.
21
+
22
+
Epoch 264: reducing learning rate of group 0 to 2.5000e-04.
23
+
24
+
Epoch 347: reducing learning rate of group 0 to 1.2500e-04.
25
+
26
+
Epoch 412: reducing learning rate of group 0 to 6.2500e-05.
27
+
28
+
Epoch 469: reducing learning rate of group 0 to 3.1250e-05.
29
+
30
+
Epoch 525: reducing learning rate of group 0 to 1.5625e-05.
31
+
32
+
Epoch 581: reducing learning rate of group 0 to 7.8125e-06.
`ch4_training_images` and `ch4_training_localization_transcription_gt` are available in [Task 4.4: End to End (2015 edition)](http://rrc.cvc.uab.es/?ch=4&com=downloads). `script.py` and `ch4_test_images` can be found in [My Methods](https://rrc.cvc.uab.es/?ch=4&com=mymethods&task=1) (`Script: IoU` and `test set samples`).
30
-
It gives `Calculated!{"precision": 0.8700890518596124, "recall": 0.7997111218103033, "hmean": 0.8334169593577522, "AP": 0}`. The pretrained models are here: https://drive.google.com/open?id=1xaVshLRrMEkb9LA46IJAZhlapQr3vyY2
39
+
40
+
It gives `Calculated!{"precision": 0.8694968553459119, "recall": 0.7987481945113144, "hmean": 0.8326223337515684, "AP": 0}`.
41
+
42
+
The pretrained models are here: https://drive.google.com/open?id=1xaVshLRrMEkb9LA46IJAZhlapQr3vyY2
43
+
44
+
[`test.py`](./test.py) has a commented code to visualize results.
45
+
46
+
## Difference with the paper
47
+
1. The model is different compared to what the paper describes. An explanation is in [`model.py`](./model.py).
48
+
2. The authors of FOTS could not train on clipped words because they also have a recognition branch. The whole word is required to be present on an image to be able to be recognized correctly. This reimplementation has only detection branch and that allows to train on crops of the words.
49
+
3. The paper suggest using some other data sets in addition. Training on SynthText is simplified in this reimplementation.
0 commit comments