Edge-Aware Image Manipulation via Diffusion Models with a Novel Structure-Preservation Loss

Gong, Minsu; Ryu, Nuri; Ok, Jungseul; Cho, Sunghyun

Edge-Aware Image Manipulation via Diffusion Models with a Novel Structure-Preservation Loss

Minsu Gong^1,*,†, Nuri Ryu^2,*, Jungseul Ok², Sunghyun Cho²

¹Planby Technologies, ²POSTECH
^*Equal contribution, ^†Work done while the author was at POSTECH
WACV 2026

Paper Code arXiv

SPL teaser showing structure-preserving image editing results

Our Structure Preservation Loss (SPL) enables edge-aware image manipulation that preserves pixel-level structures while allowing versatile text-prompt-driven edits.

Abstract

Recent advances in image editing leverage latent diffusion models (LDMs) for versatile, text-prompt-driven edits across diverse tasks. Yet, maintaining pixel-level edge structures—crucial for tasks such as photorealistic style transfer or image tone adjustment—remains as a challenge for latent-diffusion-based editing. To overcome this limitation, we propose a novel Structure Preservation Loss (SPL) that leverages local linear models to quantify structural differences between input and edited images. Our training-free approach integrates SPL directly into the diffusion model's generative process to ensure structural fidelity. This core mechanism is complemented by a post-processing step to mitigate LDM decoding distortions, a masking strategy for precise edit localization, and a color preservation loss to preserve hues in unedited areas. Experiments confirm SPL enhances structural fidelity, delivering state-of-the-art performance in latent-diffusion-based image editing.

Method Overview

SPL motivation: Local Linear Model analysis

Structure Preservation Loss Motivation. Local linear models capture edge structures by modeling pixel relationships. Our bidirectional SPL (E↔S) ensures that both the input structure guides the edit and the edit preserves the original structure.

Pipeline Overview. We integrate SPL into the diffusion model's generative process to preserve edge structures while enabling versatile image editing.

Automatic Masking Strategy

For local editing tasks, we employ a cross-attention based masking strategy that automatically identifies regions to be edited.

The mask is derived from cross-attention maps during the diffusion process, enabling precise localization of edits without manual annotation.

Combined with our upsampling technique, this allows for high-quality local modifications while preserving the rest of the image.

Comparison with State-of-the-Art

Qualitative comparison with existing methods on various editing tasks.

Global Color Change

SPL maintains edge structures during color transformations while other methods introduce artifacts or lose details.

Local Structure Change

Our method (b) preserves fine details and structures better than InstructPix2Pix (c), InfEdit (d), DDPM Inv. (e), NT+P2P (f), and GNRI (g).

More Comparison Results

Additional comparisons showing SPL's superior structure preservation across diverse editing scenarios.

Additional Global Editing Examples

Our method successfully applies global edits (e.g., color change, seasonal transformation) while preserving fine-grained structural details.

Additional Local Editing Examples

For local editing tasks, our method generates precise edit masks from text prompts to enable structure-preserving localized modifications.

Quantitative Comparison on PIE-Bench

Method	Preservation					Prompt Fidelity
Method	SPL (×10²) ↓	SSIM ↑	LPIPS ↓	FSIM ↑	GMSD ↓	CLIP S. ↑	CLIP D. ↑
Ours	0.107	0.854	0.212	0.927	0.062	0.272	0.163
InfEdit (Baseline)	0.748	0.600	0.375	0.792	0.180	0.284	0.163
InstructPix2Pix	0.517	0.668	0.348	0.839	0.134	0.262	0.153
DDPM Inv.	0.710	0.702	0.238	0.841	0.161	0.271	0.091
NT+P2P	0.545	0.736	0.214	0.881	0.126	0.256	0.135
GNRI	0.972	0.635	0.286	0.801	0.184	0.247	0.066

Our method outperforms other models in LDM-based image editing, achieving the highest scores in preservation metrics while maintaining strong prompt fidelity.

Ablation Study

Comparison of Structure Preservation Methods

We compare our SPL with alternative structure preservation approaches:

MSE: Mean Squared Error - fails to preserve fine structures
SSIM: Structural Similarity Index - limited in capturing edge details
SPL (E→S): Edge-to-Structure variant
SPL (E↔S): Bidirectional variant achieving best results

Our bidirectional SPL formulation provides the best balance between structure preservation and edit quality.

Effect of Each Component

Our method consists of complementary components:

SPL: Core loss for edge structure preservation
CPL: Color Preservation Loss for hue consistency
PP: Post-processing to correct VAE artifacts

Each component contributes to the final high-quality result.

Guidance vs. Post-Processing

Applying our loss only as post-processing (c) fails to fix the baseline's severe structural errors (b).

In contrast, our iterative guidance during the diffusion process (d) prevents these errors from forming in the first place, achieving superior structure preservation.

BibTeX

@inproceedings{gong2026spl,
  title={Edge-Aware Image Manipulation via Diffusion Models with a Novel Structure-Preservation Loss},
  author={Gong, Minsu and Ryu, Nuri and Ok, Jungseul and Cho, Sunghyun},
  booktitle={Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
  year={2026},
}