ProDehaze: Prompting Diffusion Models Toward Faithful Image Dehazing

International Conference on Multimedia & Expo (ICME), 2025

[Paper] [Code]

Framework of ProDehaze. We employ a two-phase finetuning strategy for faithful dehazing. In the first phase, we train the Structure-Prompted Restorer (SPR) in the latent space using a structural prompt generated by a Haar Feature Extractor (HFE) from the hazy input $x_{in}$. It is then concatenated with the latent representation of $x_{in}$ and injected into the trainable adapter $\mathcal{N}$ to provide structural guidance. In the second phase, we finetune the Haze-aware Self-Correcting Refiner (HCR) in the decoding process. The haze-aware prompt, initialized by Dark Channel Prior (DCP), produces a sparse mask $M_s$ that emphasizes the clearer areas in $x_{in}$. It is used to modulate the attention map of the window swin transformer (WST) in the decoder $\mathcal{D}$. Finally, $\mathcal{D}$ and the refine network are jointly trained for better alignment between the clearer regions in $x_{in}$ and the output $x_{r}$.

Abstract

We propose ProDehaze, a dehazing framework that leverages internal image priors to guide large-scale pretrained diffusion models. It introduces two selective priors: a Structure-Prompted Restorer in latent space to focus on structure-rich regions, and a Haze-Aware Self-Correcting Refiner during decoding to align distributions between clear input areas and output.

Structure-prompted Restorer (SPR)

Structure-prompted Restorer (SPR) utilizes high-frequency internal priors to guide the dehazing process. In the latent space of pretrained diffusion models, we extract high-frequency components using Haar Discrete Wavelet Transform (Haar DWT) and a learnable convolution kernel to generate a high-frequency feature xhigh. This feature is concatenated with the input xin and fed into the model as a conditioning signal cf for improved dehazing. The model is fine-tuned using the objective function:

$L_{SPR} = E_{x_{in}, t, c_f,\epsilon \sim N(0,1)} ||\epsilon - \epsilon_{\theta}(z_t, t, N(c_f))||^2$

Where $\epsilon_{\theta}$ is the pretrained denoising UNet, and $N$ is the trainable adaptor.

Haze-aware Self-correcting Refiner (HCR)

Haze-aware Self-correcting Refiner (HCR) introduces a self-correction mechanism to enhance dehazing fidelity by leveraging haze-aware priors. We incorporate self-attention (SA) blocks into the decoder, modulated by priors based on haze density. These priors focus on areas with thinner haze, while deprioritizing dense haze regions. We initialize the haze-aware prior using the Dark Channel Prior (DCP), and calculate the correlation map Mcorr,l for interaction between haze-affected regions. We sparsify the map using a top-k selection to emphasize clearer areas. The mask is generated as follows:

$M_s^{ij} = \begin{cases} -\infty, & (i, j) \in I \\ 1 - M_{\text{corr},ij}, & (i, j) \notin I \end{cases}$

Where $I$ is the set of indices corresponding to the top-k haze-affected regions, and $M_{corr}$ is the correlation map. The sparsified mask $M_s$ is then used to modulate the self-attention in the decoder.

Results

Qualitative result on I-Haze, O-Haze, DenseHaze, NhHaze.
Qualitative result on RTTS.

Related Projects

Citation

@article{zhou2025prodehaze,
title={ProDehaze: Prompting Diffusion Models Toward Faithful Image Dehazing},
author={Zhou, Tianwen and Wang, Jing and Wu, Songtao and Xu, Kuanhong},
journal={arXiv preprint arXiv:2503.17488},
year={2025}
}
}