Xin Xiao, Daiguo Zhou, Jiagao Hu, Yi Hu and Yongchao Xu
abstract: —Semantic segmentation has recently witnessed great progress. Despite the impressive overall results, the segmentation performance in some hard areas (e.g., small objects or thin parts) is still not promising. A straightforward solution is hard sample mining, which is widely used in object detection. Yet, most existing hard pixel mining strategies for semantic segmentation often rely on pixel’s loss value, which tends to decrease during training. Intuitively, the pixel hardness for segmentation mainly depends on image structure and is expected to be stable. In this paper, we propose to learn pixel hardness for semantic segmentation, leveraging hardness information contained in global and historical loss values. More precisely, we add a gradient-independent branch for learning a hardness level (HL) map by maximizing hardness-weighted segmentation loss, which is minimized for the segmentation head. This encourages large hardness values in difficult areas, leading to appropriate and stable HL map. Despite its simplicity, the proposed method can be applied to most segmentation methods with no and marginal extra cost during inference and training, respectively. Without bells and whistles, the proposed method achieves consistent/significant improvement (1.37% mIoU on average) over most popular semantic segmentation methods on Cityscapes dataset, and demonstrates good generalization ability across domains.
To reproduce the results in the paper, we recommend to follow the instructions below. Other versions of Pytorch and mmcv are not tested, but it may work.
- Pytorch == 1.8.2
- mmcv-full == 1.4.5
Step 1: Create a conda environment and activate it.
conda create -n HardnessLevel python=3.7
conda activate HardnessLevel
Step 2: Install PyTorch and torchvision
pip3 install torch==1.8.2 torchvision==0.9.2 --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu111
Step 3: Install mmcv-full
pip install -U openmim
mim install mmcv-full==1.4.5
pip3 install matplotlib numpy packaging prettytable cityscapesscripts
cd mmsegmentation
mkdir data
Please follow the instructions of mmsegmentation for data preparation.
For instance, training PSPNet-ResNet101 with HL on Cityscapes with 4 GPUs by:
bash ./tools/dist_train.sh configs/pspnet_hl/pspnet_r101-d8_769x769_40k_cityscapes_hl.py 4
For instance, test PSPNet-ResNet101 with HL on Cityscapes with 4 GPUs by:
bash ./tools/dist_test.sh configs/pspnet_hl/pspnet_r101-d8_769x769_40k_cityscapes_hl.py /path/pspnet.pth 4 --eval mIoU
BaiduNet Disk (jahh) Google Driver.
Note that you should replace /path/pspnet.pth
with the path you store the pth file. You are
supposed to get 80.65 mIoU on val set.
Training logs can be found here. Experiments are implemented on a device with 8 A100-40GB GPUs. By using the HL map collected from PSPNet, we achieve consistent improvement over the new paradigm of semantic segmentation, Mask2Former.
Evaluate the performance for GTAV -> Cityscapes domain generation by:
bash ./tools/dist_test.sh configs/gta_hl/deeplab_gta2city_res101_hl.py /path/gta_hl.pth 4 --eval mIoU
BaiduNet Disk(ujra) Google Driver.
Note that you should replace /path/gta_hl.pth
with the path you store the pth file. You are
supposed to get 43.06 mIoU on val set.
Please refer to DAFormer for more details.
Cityscapes: results are obtained by DeepLabv3+ with ResNet-101 backbone.
ResNet-101 | 1/16 | 1/8 | 1/4 | 1/2 |
---|---|---|---|---|
SupOnly | 65.7 | 72.5 | 74.4 | 77.8 |
U2PL (paper) | 70.3 | 74.4 | 76.5 | 79.1 |
U2PL (reproduced) | 71.1 | 75.2 | 75.9 | 78.4 |
U2PL + HL | 72.6 | 76.0 | 76.6 | 79.6 |
UniMatch (paper) | 75.7 | 77.3 | 78.7 | _ |
UniMatch + HL | 76.2 | 78.2 | 78.9 | _ |
Note: The results of UniMatch are obtained by the ORIGINAL VERSION (NOT CVPR2023).
This project is released under the Apache 2.0 license.
This code is built using mmsegmentation repositories. Thanks a lot for their great work!
@misc{xiao2023pixels,
title={Not All Pixels Are Equal: Learning Pixel Hardness for Semantic Segmentation},
author={Xin Xiao and Daiguo Zhou and Jiagao Hu and Yi Hu and Yongchao Xu},
year={2023},
eprint={2305.08462},
archivePrefix={arXiv},
primaryClass={cs.CV}
}