C.W.K.
Stream
Lesson 02 of 10 · published

Diffusion Intuition: From Noise to Image

~15 min · diffusion, latent-space, l2

Level 0Spark
0 XP0/100 lessons0/14 achievements
0/200 XP to next level200 XP to go0% complete

피파 한 줄 정리: Diffusion의 핵심 한 줄: **noise를 점점 *제거*하는 법을 배우면, 그 능력으로 noise에서 시작해 image를 *생성*할 수 있다**. 이걸 sculptor 비유로 잡아.

Here's the most beautiful idea in modern image generation, and it's surprisingly simple. Imagine you have a photograph. Now imagine slowly adding TV static (random noise) to it, one layer at a time. After enough layers, the photograph is completely destroyed — nothing but pure static remains. Diffusion models learn to run this process in reverse: starting from pure noise, they gradually remove the static, step by step, until a coherent image emerges.

Forward (Training): Add noise progressively

  🖼️ ──▶ 🖼️+🌫️ ──▶ 🌫️+🌫️ ──▶ 🌫️🌫️🌫️ ──▶ 🎲 Pure noise
  Clean     Slightly     Mostly      Very         Total
  image     noisy        noisy       noisy        chaos

Reverse (Generation): Remove noise progressively

  🎲 ──▶ 🌫️🌫️ ──▶ 🌫️+🌁 ──▶ 🌁+🖼️ ──▶ 🖼️ Final image
  Pure     Still       Shapes     Details    Clean,
  noise    chaotic     emerge     sharpen    coherent

That's it. That's the core insight. The model has learned what noise looks like at every stage, and it's been trained to predict what the slightly-less-noisy version should look like. By applying this prediction repeatedly, it peels away layers of noise to reveal an image.

Why Starting From Noise Works

This seems backward — why would you start with garbage? Because noise is maximally unpredictable. It contains no bias, no preexisting structure, no constraints. Starting from noise means the model has complete freedom to create any image. The noise is like a blank canvas made of randomness — and the denoising process is the painting.

The Sculptor Analogy

Think of it like a sculptor working with marble. The sculptor doesn't add the statue — the statue is "already inside" the marble block, and the sculptor removes material to reveal it. Similarly, the image is "already inside" the noise (mathematically, any image can be reached from any noise sample), and the diffusion model removes noise to reveal it.

Here's what each phase looks like conceptually:

  • Early steps (high noise): The model makes big decisions — overall composition, rough layout, major shapes, color scheme. Like a sculptor blocking out the general form.
  • Middle steps (medium noise): Structure solidifies — faces take shape, objects become recognizable, spatial relationships lock in. The sculptor defines limbs, posture, proportions.
  • Late steps (low noise): Fine details emerge — skin texture, fabric weave, lighting subtleties, sharp edges. The sculptor polishes and adds detail.
Step 1-5:     🟫🟫🟫  "Big decisions" — layout, composition, major shapes
Step 6-15:    🗿       "Structure" — objects form, faces emerge
Step 16-30:   🏛️       "Refinement" — details, textures, lighting
Step 30-50:   🖼️       "Polish" — final crisp details
Key Takeaways
  • Diffusion = start with noise, gradually remove it to reveal an image.
  • The model learned to reverse a noise-adding process: predict "what does a slightly cleaner version look like?"
  • Early steps decide composition and layout; late steps add fine detail.
  • Starting from noise gives the model maximum creative freedom — any image is reachable.

External links

Exercise

Real-time diffusion visualizer (Hugging Face spaces 등) 보기. Composition 등장 vs detail 등장 timestep 기록. 'big decisions early' 묘사와 일치? 어디서 갈라졌나?

Progress

Progress is local-only — sign in to sync across devices.
이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

댓글 0

🔔 답글 알림 (로그인 필요)
로그인댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.