Dataset is split train and validation by book titles.
generate 2 patterns of training datasets.
- validation: book_title=200015779 train: others
- validation: book_title=200003076 train: others
Characters with few occurrences oversampling at this time.
For details, see
- Preprocessing
- Denoising and Ben's preprocessing for train and test images, then crop characters
- When cropping each character, enlarge the area by 5% vertically and horizontally
- Resize each character to square ignoring aspect ratio
- To reduce computational resource, some models training with grayscale image
- To reduce computational resource, undersampling characters that appeared more than 2000 times have been undersampled to 2000 times
- Data augmentation
- brightness, contrast, saturation, hue, random grayscale, rotate, random resize crop
- mixup + RandomErasing or ICAP + RandomErasing (details of ICAP will be described later)
- no vertical and horizontal flip
- Others
- Use L2-constrained Softmax loss for all models
- Warmup learning rate (5epochs)
- SGD + momentum Optimizer
- MultiStep LR or Cosine Annealing LR
- Test-Time-Augmentation 7 crop
For more details, see options of model training scripts
model arch | channel | input size | data augmentation | validation |
EfficientNet-B4 | Grayscale | 190x190 | mixup + RandomErasing | 200015779 |
ResNet152 | Grayscale | 112x112 | mixup + RandomErasing | 200015779 |
SE-ResNeXt101 | RGB | 112x112 | mixup + RandomErasing | 200015779 |
SE-ResNeXt101 | RGB | 112x112 | ICAP + RandomErasing | 200003076 |
ResNet152 | RGB | 112x112 | ICAP + RandomErasing | 200003076 |
ICAP is the data augmentation method I have implemented. Images cut from the four images are pasted while keeping the original image position.
- First training
- Training 5 models with only training dataset (80~90epoch)
- Pseudo labelling
- Generate pseudo label and pseudo cropped images using 5 models with the best validation accuracy from the first training.
- Second training
- Training 5 models again with train and pseudo images (80~90epoch)
- Refinement
- Choose models with the best validation accuracy from the second training and resume training.
- Using smoother target (0.9) for pseudo images, but 1.0 for original train images.
- Traininng each model 3 epochs including pseudo images without data augmentaion.
You can download the refinemented models from here.
- 01_refine_efficientnet_b4_l2softmax_gray190-0060.model
- 02_refine_resnet152_l2softmax_gray112-0069.model
- 03_refine_seresnext101_l2softmax_rgb112-0080.model
- 04_refine_seresnext101_l2softmax_rgb112-0082.model
- 05_refine_resnet152_l2softmax_rgb112-0090.model
Training model inside the docker container. Start the docker container with
And train using shellscripts under train_scripts
# bash train_scripts/
... wait many hours
# bash train_scripts/
... wait many hours
# bash train_scripts/
... wait many hours
# bash train_scripts/
... wait many hours
# bash train_scripts/
... wait many hours
The five models require the following training time.
- 01 about 44hours at GCP(V100x2)
- 02 about 29hours at GCP(V100x2)
- 03 about 32hours at GCP(V100x2)
- 04 about 30hours at GCP(V100x2)
- 05 about 27hours at GCP(V100x2)
Generate pseudo labels (pseudo images) with 5 models from the first training.
When generating pseudo labels, model ensemble uses soft voting insted of hard voting. Then NMS 2 generated results.
$ bash scripts/
generates 4 results
$ scripts/
$ scripts/
generates 2 results
Generate FalsePositive predictor using validation results.
searching hyper parameter with optuna
$ scripts/
{'lambda_l1': 1.5464112458912599e-06, 'lambda_l2': 5.346737781503549e-06, 'num_leaves': 140, 'feature_fraction': 0.8534057661739842, 'bagging_fraction': 0.9376615592819334, 'bagging_freq': 1, 'min_child_samples': 72}
generate FalsePositive predictor for first models
$ scripts/
saved: models/booster_for_val_nms030_tta7_first_5models_soft_prob.pkl
$ python scripts/
saved pseudo images under input/pseudo_images
In addition, the submission file at this point is generated as first_submission.csv
The scores at this time were:
- Private Score:
- Public Score:
If you execute the following command after generating pseudo images, csv files for training data with pseudo images added will be generated.
$ bash scripts/
generates 2 csv files
Training 5 models again with train and pseudo images (80~90epoch)
$ bash train_scripts/pseudo_labeling/
... wait many hours
$ bash train_scripts/pseudo_labeling/
... wait many hours
$ bash train_scripts/pseudo_labeling/
... wait many hours
$ bash train_scripts/pseudo_labeling/
... wait many hours
$ bash train_scripts/pseudo_labeling/
... wait many hours
The five models require the following training time.
- 01 about 44hours at GCP(V100x2)
- 02 about 29hours at GCP(V100x2)
- 03 about 32hours at GCP(V100x2)
- 04 about 30hours at GCP(V100x2)
- 05 about 27hours at GCP(V100x2)
- Choose models with the best validation accuracy from the second training and resume training.
- Using smoother target (0.9) for pseudo images, but 1.0 for original train images.
- Traininng each model 3 epochs including pseudo images without data augmentaion.
$ bash train_scripts/pseudo_refine/
... wait a few hours
$ bash train_scripts/pseudo_refine/
... wait a few hours
$ bash train_scripts/pseudo_refine/
... wait a few hours
$ bash train_scripts/pseudo_refine/
... wait a few hours
$ bash train_scripts/pseudo_refine/
... wait a few hours
The five models require 2 or 3 hours training time.
You can download the refinemented models from here. (repeat)
- 01_refine_efficientnet_b4_l2softmax_gray190-0060.model
- 02_refine_resnet152_l2softmax_gray112-0069.model
- 03_refine_seresnext101_l2softmax_rgb112-0080.model
- 04_refine_seresnext101_l2softmax_rgb112-0082.model
- 05_refine_resnet152_l2softmax_rgb112-0090.model
- mixup: Beyond Empirical Risk Minimization
- Random Erasing Data Augmentation
- Data Augmentation using Random Image Cropping and Patching for Deep CNNs
- CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
- Data Augmentation Revisited: Rethinking the Distribution Gap between Clean and Augmented Data