Forward Process vs. Reverse Process

피파 한 줄 정리: Forward는 단순한 수학 (noise를 더해), reverse는 학습 (noise를 예측해서 빼). 이 비대칭이 학습 가능성을 만들어.

Diffusion models have two phases, and understanding the distinction is key to understanding how they learn and how they generate.

Think of it like learning to restore antique furniture. The forward process is deliberately damaging furniture in controlled stages so you can study what damage looks like at each level. The reverse process is using that knowledge to repair damaged furniture — because you've seen every stage of degradation, you know exactly how to undo each step.

The Forward Process (Training Time)

During training, the model is shown real images and watches them get progressively destroyed by adding Gaussian noise. This happens in many small steps (typically 1000 steps of increasing noise).

Forward Process (happens during training):

Step 0    Step 250    Step 500    Step 750    Step 1000
  🖼️   →   🖼️+🌫️   →   🌫️🌫️    →   🌫️🌫️🌫️  →    🎲
Clean    Light noise   Heavy     Very heavy    Pure
image                  noise     noise         noise

At each step, the model sees:
- The noisy image (input)
- How much noise was added (the noise level / timestep)
- The original clean image (target to predict)

The model's training objective is simple: given a noisy image and the noise level, predict the noise that was added (or equivalently, predict the clean image underneath). Over billions of training examples, it gets extraordinarily good at this prediction.

The Reverse Process (Generation Time)

Generation is the forward process run backward. Start with pure noise (step 1000), and ask the model: "What noise was added here? Remove it." Do this step by step, and you gradually walk from pure chaos to a coherent image.

Reverse Process (happens during generation):

Step 1000   Step 750    Step 500    Step 250    Step 0
   🎲    →   🌫️🌫️🌫️  →   🌫️🌫️    →   🖼️+🌫️  →   🖼️
  Pure     Shapes     Structure   Nearly     Clean
  noise    hint       emerges     clear      image!

At each step, the model:
1. Looks at the current noisy state
2. Predicts "what noise is here?"
3. Subtracts that predicted noise
4. Moves one step closer to a clean image

Why This Is Clever

Here's the elegant part: the forward process is fixed and mathematically simple — it's just adding known amounts of Gaussian noise. No learning needed. All the intelligence goes into the reverse process — learning to predict and remove noise. This asymmetry makes the training problem tractable: you don't need to learn how to destroy images (that's trivial), you need to learn how to restore them.

And because the model has seen billions of images being destroyed at every noise level, it has learned an incredible amount about image structure: what natural images look like, how they're composed, what features emerge at different scales, and how to reconstruct plausible details from partial information.

Key Takeaways

Forward process: Gradually add noise to real images during training (easy, fixed, no learning).
Reverse process: Gradually remove noise during generation (hard, learned, this is where the intelligence lives).
The model is trained to denoise — generation is an emergent consequence of applying denoising from pure noise.
The asymmetry (simple destruction, complex restoration) makes the learning problem tractable.

The Forward Process (Training Time)

The Reverse Process (Generation Time)

Why This Is Clever

External links

Exercise

Progress

댓글 0