Improving Test-Time Efficiency in Source-Free Semantic Segmentation via Multi-Stage Self-Training
Source-free domain adaptive semantic segmentation aims at adapting a model trained on the source domain to the target domain without requiring access to the source data. Self-training has emerged as a leading approach to address this challenging problem. However, without robust denoising mechanisms to reduce the noise in pseudo labels, it still easily fall into biased estimates. Most existing methods address this issue by introducing novel architectures, but often at the cost of increased model complexity or reliance on additional input modalities. Different from previous studies, this article introduces UniSFDA, a unified multi-stage self-training framework that integrates cross-model transfer learning, uncertainty-aware pseudo label fusion, and intra-domain style augmentation, thereby enhancing both segmentation accuracy and test-time efficiency. Our proposed framework offers exceptional flexibility, with each component being independent and ready to be integrated into any existing self-training framework. Additionally, we investigate the performance of various representative segmentation models, including DeepLabv2, SegFormer, DFormer, and ViT-Adapter, within our framework. It is worth noting that UniSFDA is model-agnostic, allowing both source and target networks to be instantiated with arbitrary segmentation architectures, and thus readily benefiting from future advances in segmentation models. Experiments on the GTA5 (rightarrow) Cityscapes and SYNTHIA (rightarrow) Cityscapes benchmarks demonstrate the effectiveness of our framework. With DeepLabv2 (SegFormer) as the source model, UniSFDA establishes new state-of-the-art performance, achieving mIoU scores of 61.8\% (65.4\%) and 57.9\% (59.6\%) on the two benchmarks, respectively.
Added 2026-04-21