2026 sparse transformer;video shadow detection;long-short term attention

Dual Sparse Long-Short Term Transformer for Video Shadow Detection

Han, Shuo and Yao, Rui and Hao, Huili and Feng, Qian and Zhu, Hancheng and Zhao, Jiaqi and Zhou, Yong

Video Shadow Detection (VSD) is critical yet challenging, primarily due to ambiguous shadow boundaries and the presence of confusing shadow-like non-shadow regions, which existing methods struggle to resolve effectively by limited temporal modeling. We propose the Dual Sparse Long-Short Term Transformer Network (DSLSTT-Net), a novel framework designed to enhance feature learning by integrating robust temporal consistency and detailed local context. DSLSTT-Net utilizes a dual-stream architecture to concurrently process global temporal information and local shadow feature refinement, enabling effective discrimination between true shadows and confusing areas. At its core, the Sparse Long-Short Term Attention Module (Sparse LSTAM) is introduced to efficiently propagate only high-confidence shadow features from memory, significantly enhancing feature discriminability and computational efficiency. Furthermore, an Adaptive Fusion Module (AFM) dynamically merges purified long-term features with short-term details, optimizing final segmentation. Experimental results confirm that DSLSTT-Net significantly outperforms state-of-the-art methods on VSD benchmarks, validating our approach of dual-stream architecture and sparse temporal modeling. The source code is available at .

Added 2026-04-21