update README.md, minor

Wovchena · Wovchena · commit e973d4c9bd73 · 2019-12-29T16:41:43.000+03:00
diff --git a/README.md b/README.md
@@ -1,30 +1,49 @@
 # [FOTS: Fast Oriented Text Spotting with a Unified Network](https://arxiv.org/abs/1801.01671) text detection branch reimplementation ([PyTorch](https://pytorch.org/))
+
 ## Train
 1. Train with SynthText for 9 epochs
     ```sh
     time python3 train.py --train-folder SynthText/ --batch-size 21 --batches-before-train 2
     ```
-    At this point the result was `Epoch 8: 100%|█████████████| 390/390 [08:20<00:00,  1.01it/s, Mean loss=0.49507]`.
+    At this point the result was `Epoch 8: 100%|█████████████| 390/390 [08:28<00:00,  1.00it/s, Mean loss=0.98050]`.
 2. Train with ICDAR15
-    Change data set in train.py and run
+
+    Replace a data set in `data_set = datasets.SynthText(args.train_folder, datasets.transform)` with `datasets.ICDAR2015` in [`train.py`](./train.py) and run
     ```sh
-    time python3 train.py --train-folder icdar15/ --batch-size 21 --batches-before-train 2 --continue-training
+    time python3 train.py --train-folder icdar15/ --continue-training --batch-size 21 --batches-before-train 2
     ```
-    It is expected that the provided `--train-folder` contains unzipped `ch4_training_images` and `ch4_training_localization_transcription_gt`.
-    The result was `Epoch 600: 100%|█████████████| 48/48 [01:06<00:00,  1.04s/it, Mean loss=0.07742]`.
+    It is expected that the provided `--train-folder` contains unzipped `ch4_training_images` and `ch4_training_localization_transcription_gt`. To avoid saving model at each epoch, the line `if True:` in [`train.py`](./train.py) can be replaced with `if epoch > 60 and epoch % 6 == 0:`
+
+    The result was `Epoch 582: 100%|█████████████| 48/48 [01:05<00:00,  1.04s/it, Mean loss=0.11290]`.
+
 ### Weight decay history:
-Epoch   185: reducing learning rate of group 0 to 5.0000e-04.
-Epoch   274: reducing learning rate of group 0 to 2.5000e-04.
-Epoch   325: reducing learning rate of group 0 to 1.2500e-04.
-Epoch   370: reducing learning rate of group 0 to 6.2500e-05.
-Epoch   410: reducing learning rate of group 0 to 3.1250e-05.
-Epoch   484: reducing learning rate of group 0 to 1.5625e-05.
-Epoch   517: reducing learning rate of group 0 to 7.8125e-06.
-Epoch   550: reducing learning rate of group 0 to 3.9063e-06.
+Epoch   175: reducing learning rate of group 0 to 5.0000e-04.
+
+Epoch   264: reducing learning rate of group 0 to 2.5000e-04.
+
+Epoch   347: reducing learning rate of group 0 to 1.2500e-04.
+
+Epoch   412: reducing learning rate of group 0 to 6.2500e-05.
+
+Epoch   469: reducing learning rate of group 0 to 3.1250e-05.
+
+Epoch   525: reducing learning rate of group 0 to 1.5625e-05.
+
+Epoch   581: reducing learning rate of group 0 to 7.8125e-06.
+
 ## Test
 ```sh
-python3 test.py --images-folder ch4_test_images/ --output-folder res/ --checkpoint epoch_600_checkpoint.pt && zip -jmq runs/u.zip res/* && python2 script.py -g=gt.zip -s=runs/u.zip
-
+python3 test.py --images-folder ch4_test_images/ --output-folder res/ --checkpoint epoch_582_checkpoint.pt && zip -jmq runs/u.zip res/* && python2 script.py -g=gt.zip -s=runs/u.zip
 ```
 `ch4_training_images` and `ch4_training_localization_transcription_gt` are available in [Task 4.4: End to End (2015 edition)](http://rrc.cvc.uab.es/?ch=4&com=downloads). `script.py` and `ch4_test_images` can be found in [My Methods](https://rrc.cvc.uab.es/?ch=4&com=mymethods&task=1) (`Script: IoU` and `test set samples`).
-It gives `Calculated!{"precision": 0.8700890518596124, "recall": 0.7997111218103033, "hmean": 0.8334169593577522, "AP": 0}`. The pretrained models are here: https://drive.google.com/open?id=1xaVshLRrMEkb9LA46IJAZhlapQr3vyY2
+
+It gives `Calculated!{"precision": 0.8694968553459119, "recall": 0.7987481945113144, "hmean": 0.8326223337515684, "AP": 0}`.
+
+The pretrained models are here: https://drive.google.com/open?id=1xaVshLRrMEkb9LA46IJAZhlapQr3vyY2
+
+[`test.py`](./test.py) has a commented code to visualize results.
+
+## Difference with the paper
+1. The model is different compared to what the paper describes. An explanation is in [`model.py`](./model.py).
+2. The authors of FOTS could not train on clipped words because they also have a recognition branch. The whole word is required to be present on an image to be able to be recognized correctly. This reimplementation has only detection branch and that allows to train on crops of the words.
+3. The paper suggest using some other data sets in addition. Training on SynthText is simplified in this reimplementation.
diff --git a/model.py b/model.py
@@ -82,3 +82,49 @@ def forward(self, x):
         angle = torch.sigmoid(angle) * np.pi / 2
 
         return confidence, distances, angle
+
+
+# class FOTSModel(nn.Module):
+#     """This model is described in the paper, but it trains slower and gives slightly worse results"""
+#     def __init__(self, crop_height=640):
+#         super().__init__()
+#         self.crop_height = crop_height
+#         self.resnet = torchvision.models.resnet50(pretrained=True)
+#         self.conv1 = nn.Sequential(
+#             self.resnet.conv1,
+#             self.resnet.bn1,
+#             self.resnet.relu,
+#         )  # 64 * 4
+#         self.encoder1 = self.resnet.layer1  # 64 * 4
+#         self.encoder2 = self.resnet.layer2  # 128 * 4
+#         self.encoder3 = self.resnet.layer3  # 256 * 4
+#         self.encoder4 = self.resnet.layer4  # 512 * 4
+
+#         self.decoder3 = Decoder(512 * 4, 256 * 4)
+#         self.decoder2 = Decoder(256 * 4 * 2, 128 * 4)
+#         self.decoder1 = Decoder(128 * 4 * 2, 64 * 4)
+
+#         self.confidence = conv(64 * 4 * 2, 1, kernel_size=1, padding=0, bn=False, relu=False)
+#         self.distances = conv(64 * 4 * 2, 4, kernel_size=1, padding=0, bn=False, relu=False)
+#         self.angle = conv(64 * 4 * 2, 1, kernel_size=1, padding=0, bn=False, relu=False)
+
+#     def forward(self, x):
+#         x = self.conv1(x)
+#         x = F.max_pool2d(x, kernel_size=2, stride=2)
+
+#         e1 = self.encoder1(x)
+#         e2 = self.encoder2(e1)
+#         e3 = self.encoder3(e2)
+#         e4 = self.encoder4(e3)
+
+#         d3 = self.decoder3(e4, e3)
+#         d2 = self.decoder2(d3, e2)
+#         d1 = self.decoder1(d2, e1)
+
+#         confidence = self.confidence(d1)
+#         distances = self.distances(d1)
+#         distances = torch.sigmoid(distances) * self.crop_height
+#         angle = self.angle(d1)
+#         angle = torch.sigmoid(angle) * np.pi / 2
+
+#         return confidence, distances, angle
diff --git a/train.py b/train.py
@@ -19,7 +19,7 @@ def restore_checkpoint(folder, contunue):
     optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-5)
     lr_scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.5, patience=32, verbose=True, threshold=0.05, threshold_mode='rel')
 
-    checkppoint_name = os.path.join(folder, 'last_checkpoint.pt')
+    checkppoint_name = os.path.join(folder, 'epoch_8_checkpoint.pt')
     if os.path.isfile(checkppoint_name) and contunue:
         checkpoint = torch.load(checkppoint_name)
         model.load_state_dict(checkpoint['model_state_dict'])
@@ -265,15 +265,15 @@ def fit(start_epoch, model, loss_func, opt, lr_scheduler, best_score, max_batche
 
             loss = loss_func(prediction, (classification, regression, thetas, training_mask)) / max_batches_per_iter_cnt
             train_loss_stats += loss.item()
-            loss_count_stats += 1
             loss.backward()
             batch_per_iter_cnt += 1
             if batch_per_iter_cnt == max_batches_per_iter_cnt:
                 opt.step()
                 batch_per_iter_cnt = 0
-                pbar.set_postfix({'Mean loss': f'{train_loss_stats / loss_count_stats:.5f}'}, refresh=False)
-        lr_scheduler.step(train_loss_stats / loss_count_stats, epoch)
-        # lr_scheduler.step()
+                loss_count_stats += 1
+                mean_loss = train_loss_stats / loss_count_stats
+                pbar.set_postfix({'Mean loss': f'{mean_loss:.5f}'}, refresh=False)
+        lr_scheduler.step(mean_loss, epoch)
 
         if valid_dl is None:
             val_loss = train_loss_stats / loss_count_stats
@@ -301,17 +301,20 @@ def fit(start_epoch, model, loss_func, opt, lr_scheduler, best_score, max_batche
 if __name__ == '__main__':
     parser = argparse.ArgumentParser()
     parser.add_argument('--train-folder', type=str, required=True, help='Path to folder with train images and labels')
-    parser.add_argument('--batch-size', type=int, default=8, help='Number of batches to process before train step')
-    parser.add_argument('--batches-before-train', type=int, default=4, help='Number of batches to process before train step')
+    parser.add_argument('--batch-size', type=int, default=21, help='Number of batches to process before train step')
+    parser.add_argument('--batches-before-train', type=int, default=2, help='Number of batches to process before train step')
     parser.add_argument('--num-workers', type=int, default=8, help='Path to folder with train images and labels')
     parser.add_argument('--continue-training', action='store_true', help='continue training')
     args = parser.parse_args()
 
-    synth = datasets.SynthText(args.train_folder, datasets.transform)
-    # icdar = datasets.ICDAR2015(args.train_folder, datasets.transform)
-    # concat_dataset = torch.utils.data.ConcatDataset((synth, icdar))  # the paper doesn't do that so me neither
+    data_set = datasets.SynthText(args.train_folder, datasets.transform)
+    # data_set = datasets.ICDAR2015(args.train_folder, datasets.transform)
 
-    dl = torch.utils.data.DataLoader(icdar, batch_size=args.batch_size, shuffle=True,
+    # SynthText and ICDAR2015 have different layouts. One will probably need to provide two different paths to train
+    # on concatination of these two data sets. But the paper doesn't concat them so me neither
+    # datai_set = torch.utils.data.ConcatDataset((synth, icdar))
+
+    dl = torch.utils.data.DataLoader(data_set, batch_size=args.batch_size, shuffle=True,
                                      sampler=None, batch_sampler=None, num_workers=args.num_workers)
     checkoint_dir = 'runs'
     epoch, model, optimizer, lr_scheduler, best_score = restore_checkpoint(checkoint_dir, args.continue_training)