Abstract
Recent advances in image editing leverage latent diffusion models (LDMs) for versatile, text-prompt-driven edits across diverse tasks. Yet, maintaining pixel-level edge structures—crucial for tasks such as photorealistic style transfer or image tone adjustment—remains as a challenge for latent-diffusion-based editing. To overcome this limitation, we propose a novel Structure Preservation Loss (SPL) that leverages local linear models to quantify structural differences between input and edited images. Our training-free approach integrates SPL directly into the diffusion model's generative process to ensure structural fidelity. This core mechanism is complemented by a post-processing step to mitigate LDM decoding distortions, a masking strategy for precise edit localization, and a color preservation loss to preserve hues in unedited areas. Experiments confirm SPL enhances structural fidelity, delivering state-of-the-art performance in latent-diffusion-based image editing.
Method Overview
Structure Preservation Loss Motivation. Local linear models capture edge structures by modeling pixel relationships. Our bidirectional SPL (E↔S) ensures that both the input structure guides the edit and the edit preserves the original structure.
Pipeline Overview. We integrate SPL into the diffusion model's generative process to preserve edge structures while enabling versatile image editing.
Automatic Masking Strategy
For local editing tasks, we employ a cross-attention based masking strategy that automatically identifies regions to be edited.
The mask is derived from cross-attention maps during the diffusion process, enabling precise localization of edits without manual annotation.
Combined with our upsampling technique, this allows for high-quality local modifications while preserving the rest of the image.
Comparison with State-of-the-Art
Qualitative comparison with existing methods on various editing tasks.
Global Color Change
SPL maintains edge structures during color transformations while other methods introduce artifacts or lose details.
Local Structure Change
Our method (b) preserves fine details and structures better than InstructPix2Pix (c), InfEdit (d), DDPM Inv. (e), NT+P2P (f), and GNRI (g).
More Comparison Results
Additional comparisons showing SPL's superior structure preservation across diverse editing scenarios.
Additional Global Editing Examples
Our method successfully applies global edits (e.g., color change, seasonal transformation) while preserving fine-grained structural details.
Additional Local Editing Examples
For local editing tasks, our method generates precise edit masks from text prompts to enable structure-preserving localized modifications.
Quantitative Comparison on PIE-Bench
| Method | Preservation | Prompt Fidelity | |||||
|---|---|---|---|---|---|---|---|
| SPL (×10²) ↓ | SSIM ↑ | LPIPS ↓ | FSIM ↑ | GMSD ↓ | CLIP S. ↑ | CLIP D. ↑ | |
| Ours | 0.107 | 0.854 | 0.212 | 0.927 | 0.062 | 0.272 | 0.163 |
| InfEdit (Baseline) | 0.748 | 0.600 | 0.375 | 0.792 | 0.180 | 0.284 | 0.163 |
| InstructPix2Pix | 0.517 | 0.668 | 0.348 | 0.839 | 0.134 | 0.262 | 0.153 |
| DDPM Inv. | 0.710 | 0.702 | 0.238 | 0.841 | 0.161 | 0.271 | 0.091 |
| NT+P2P | 0.545 | 0.736 | 0.214 | 0.881 | 0.126 | 0.256 | 0.135 |
| GNRI | 0.972 | 0.635 | 0.286 | 0.801 | 0.184 | 0.247 | 0.066 |
Our method outperforms other models in LDM-based image editing, achieving the highest scores in preservation metrics while maintaining strong prompt fidelity.
Ablation Study
Comparison of Structure Preservation Methods
We compare our SPL with alternative structure preservation approaches:
- MSE: Mean Squared Error - fails to preserve fine structures
- SSIM: Structural Similarity Index - limited in capturing edge details
- SPL (E→S): Edge-to-Structure variant
- SPL (E↔S): Bidirectional variant achieving best results
Our bidirectional SPL formulation provides the best balance between structure preservation and edit quality.
Effect of Each Component
Our method consists of complementary components:
- SPL: Core loss for edge structure preservation
- CPL: Color Preservation Loss for hue consistency
- PP: Post-processing to correct VAE artifacts
Each component contributes to the final high-quality result.
Guidance vs. Post-Processing
Applying our loss only as post-processing (c) fails to fix the baseline's severe structural errors (b).
In contrast, our iterative guidance during the diffusion process (d) prevents these errors from forming in the first place, achieving superior structure preservation.
BibTeX
@inproceedings{gong2026spl,
title={Edge-Aware Image Manipulation via Diffusion Models with a Novel Structure-Preservation Loss},
author={Gong, Minsu and Ryu, Nuri and Ok, Jungseul and Cho, Sunghyun},
booktitle={Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
year={2026},
}