Probing Deep into Temporal Profile Makes the Infrared Small Target Detector Much Better
Ruojing Li, Wei An, Yingqian Wang, Xinyi Ying, Yimian Dai, Longguang Wang, Miao Li, Yulan Guo, Li Liu
Abstract
Infrared small target (IRST) detection is challenging in simultaneously achieving precise, robust, and efficient performance due to extremely dim targets and strong interference. Current learning-based methods attempt to leverage "more" information from both the spatial and the short-term temporal domains, but suffer from unreliable performance under complex conditions while incurring computational redundancy. In this paper, we explore the "more essential" information from a more crucial domain for the detection. Through theoretical analysis, we reveal that the global temporal saliency and correlation information in the temporal profile demonstrate significant superiority in distinguishing target signals from other signals. To investigate whether such superiority is preferentially leveraged by well-trained networks, we built the first prediction attribution tool in this field and verified the importance of the temporal profile information. Inspired by the above conclusions, we remodel the IRST detection task as a one-dimensional signal anomaly detection task, and propose an efficient deep temporal probe network (DeepPro) that only performs calculations in the time dimension for IRST detection. We conducted extensive experiments to fully validate the effectiveness of our method. The experimental results are exciting, as our DeepPro outperforms existing state-of-the-art IRST detection methods on widely-used benchmarks with extremely high efficiency, and achieves a significant improvement on dim targets and in complex scenarios. We provide a new modeling domain, a new insight, a new method, and a new performance, which can promote the development of IRST detection. Codes are available at https://github.com/TinaLRJ/DeepPro.
Temporal Profile Information
![]() |
|---|
Fig. 1: Visualizations of IRSTs in different scenarios and typical detection difficulties in different domains. (a, b) Distinct targets in different scenarios (Cases 1, 2, and 3): in appearance, targets and clutters show small inter-class differences, while different targets exhibit marked intra-class variations. (c1) Infrared image. (c2) Spatial domain: small dim targets are barely observable. (c3) Short-term spatial-temporal (ST) domain: targets are indistinguishable from interferences. (c4) Temporal profile: records dynamic statistical details of all signals at fixed spatial location, where small dim target signals are complete and more prominent. The temporal profile contains more essential information. (d, e) More contrasts of the three cases.
![]() |
|---|
Case. 1
![]() |
|---|
Case. 2
![]() |
|---|
Case. 3
What is Temporal Profile Information?
![]() |
|---|
Fig. 2: Spatial and temporal-profile visualizations of targets in real scenes with noise and clutter. The red annotations mark the targets.
![]() |
|---|
Fig. 3: Visualizations of the temporal profiles of target-in-noise(/-clutter) signals and correlation analyses between noise, clutter and target in the temporal profile. The target signals are same with fixed intensity (maximum is 1), and added to noise and clutter signals with different standard deviations.
How Critical is Temporal Profile Information?
![]() |
|---|
Fig. 4: Attribution visualizations of reference frames at targets with different SNRs. For the predictions of the salient targets and the dim targets, the influential pixels are almost concentrated on the temporal profile of the target.
![]() |
|---|
Fig. 5: Temporal variations of the average influence and the influence range of all random seeds in different scenes. (a)-(f) show the temporal variations on the targets with SNR of 1.09, 1.32, 2.52, 4.04, 6.06, and 9.44, respectively. (g) illustrates the temporal variation in the whole test set.
Temporal Probe Mechanism
![]() |
|---|
Fig. 6: Visualization of the structure of TPro. Firstly, temporal probes are employed to pull out complete temporal features of a single pixel from the split input features. Then, multiple learnable SCorMs are applied to the gotten features to extract temporal correlation features. This process is performed pixel-wise across all split input features. Finally, all temporal correlation features are concatenated and fused to generate output features. Due to the learned correlationship among signals, target features are enhanced, and background features are suppressed. Notably, the entire pipeline strictly excludes any space-dimension computations.
Network Architecture
![]() |
|---|
Fig. 7: An overview of our DeepPro. There are three similar levels in our DeepPro. Note that this is the first IRST detection network with calculations (e.g., multiplications and additions) only in the time dimension.
![]() |
|---|
Fig. 8: The framework of our DeepPro-Plus. Different from DeepPro which only calculates in the time dimension, this version incorporates few spatial computations to acquire little spatial information as supplementation.
Quantitative Results
Table 1: Detection results achieved by different state-of-the-art methods on the NUDT-MIRSDT dataset and the NUDT-MIRSDT-HiNo dataset. The best results are in bold, and the second-best results are underlined. SF and MF refer to single-frame and multi-frame methods, respectively.
![]() |
|---|
Table 2: Detection results achieved by different versions of our DeepPro on the NUDT-MIRSDT dataset and the NUDT-MIRSDT-HiNo dataset.
![]() |
|---|
![]() |
|---|
Fig. 9: ROC performances of different methods on the NUDT-MIRSDT dataset and the NUDT-MIRSDT-HiNo dataset. (a) NUDT-MIRSDT (SNR≤3). (b) NUDT-MIRSDT. (c) NUDT-MIRSDT-HiNo.
Table 3: Detection results achieved by different state-of-the-art methods on the IRDST-simulation dataset. The best results are in bold, and the second-best results are underlined.
![]() |
|---|
Table 4: Detection results achieved by different state-of-the-art methods on the real-world RGBT-Tiny dataset.
![]() |
|---|
Table 5: Detection results achieved by different multi-frame state-of-the-art methods on the large-scale IRSatVideo-LEO dataset.
![]() |
|---|
Qualitative Results
![]() |
|---|
Fig. 10: Visual comparison on the NUDT-MIRSDT dataset. (a1-a3) Results on the NUDT-MIRSDT (SNR≤3) subset. (b1-b3) Results on the NUDT-MIRSDT (SNR> 3) subset. The first column shows the temporal profile (TP) of the target central pixel. For better visualization, the target area is enlarged in the top-right corner and highlighted with a red circle. The false alarm area is marked with a yellow circle.
![]() |
|---|
Fig. 11: Visual comparison on the real-world RGBT-Tiny dataset. For better visualization, the target area is enlarged in the top-left corner and highlighted with a red circle. The false alarm area is marked with a yellow circle.
Materials
Citation
@Article{li2025probing,
author = {Li, Ruojing and An, Wei and Wang, Yingqian and Ying, Xinyi and Dai, Yimian and Wang, Longguang and Li, Miao and Guo, Yulan and Liu, Li},
title = {Probing deep into temporal profile makes the infrared small target detector much better},
journal = {arXiv preprint arXiv:2506.12766},},
year = {2025}
}