C.W.K.
Stream
Lesson 07 of 10 · published

Sound Effects, Ambience, Music, and Voice as Layers

~17 min · audio, voice, l7

Level 0Spark
0 XP0/100 lessons0/14 achievements
0/200 XP to next level200 XP to go0% complete

피파 한 줄 정리: Pro audio는 4 layer: dialogue (top) → SFX → ambience → music (bottom·duck). Native audio 써도 layer별로 갈아끼우는 게 정석.

Mental model: A film's audio is never "one sound." It's a carefully constructed stack of independent layers, just like a Photoshop file has layers for background, subject, and text. Sound designers work with four primary audio layers, each serving a different purpose. Understanding this layered approach is essential for creating professional generative media.

The Four Audio Layers

Layer 1: Dialogue / Voice

The most important layer — humans prioritize speech above all other sounds. Dialogue must be clear, well-timed, and emotionally appropriate. In generative media, this comes from TTS, voice cloning, or native audio in video models.

  • Always sits "on top" of the mix — other layers support it, never overpower it.
  • Needs clean recording quality (no background noise artifacts).
  • Timing relative to visuals is critical (see Lesson 5 on lip sync).

Layer 2: Sound Effects (SFX)

Specific, identifiable sounds tied to visible events: footsteps, doors, typing, glass clinking, explosions. They make visual events feel real and impactful.

  • Must be synchronized to visual events (a door closes exactly when the sound plays).
  • Subtle SFX (cloth rustling, keyboard typing) add realism that viewers feel but don't consciously notice.
  • Can be generated, sourced from libraries, or extracted from native audio.

Layer 3: Ambience / Room Tone

The background atmosphere of a scene: café chatter, forest insects, city traffic, air conditioning hum. Ambience establishes "where you are" and fills the sonic space between events.

  • Should be continuous and consistent — cutting abruptly breaks immersion.
  • Changes between scenes signal location shifts to the viewer.
  • Often the most overlooked layer, but its absence creates an eerie "vacuum" feeling.

Layer 4: Music / Score

Background music sets emotional tone: tension, joy, wonder, nostalgia. It guides the viewer's emotional response independently of visuals.

  • Should complement, not compete with dialogue.
  • Volume ducking: music automatically drops during dialogue and rises during visual-only moments.
  • Style should match the visual aesthetic (cinematic visuals + lo-fi hip-hop = mismatch).
The Audio Layer Stack:

  Priority:
  ┌──────────────────────────────────────────┐  Highest
  │  Layer 1: DIALOGUE / VOICE               │  (always
  │  "Welcome to our product demo..."        │   on top)
  ├──────────────────────────────────────────┤
  │  Layer 2: SOUND EFFECTS                   │  Synced to
  │  [click] [whoosh] [typing]               │  visual events
  ├──────────────────────────────────────────┤
  │  Layer 3: AMBIENCE                        │  Continuous
  │  ~~~office hum, distant traffic~~~       │  background
  ├──────────────────────────────────────────┤  Lowest
  │  Layer 4: MUSIC / SCORE                   │  (ducks under
  │  ♫ gentle corporate background ♫         │   dialogue)
  └──────────────────────────────────────────┘

Why Layers Matter for Generative Media

When you generate everything in one pass (native audio), you get all layers mixed together — convenient but inflexible. If the voice is perfect but the music is wrong, you can't change just the music without regenerating everything.

When you generate layers separately, you have full mix control: adjust dialogue volume, swap out the music, time sound effects precisely, change ambient atmosphere. The tradeoff is more work for more control.

Key Takeaways
  • Professional audio has four layers: dialogue, sound effects, ambience, and music.
  • Dialogue sits on top; music on the bottom. SFX sync to events; ambience fills the space.
  • Generating layers separately gives maximum control; native audio gives maximum convenience.
  • Even with native audio, plan to replace or adjust individual layers in post-production.

External links

Exercise

Generate한 30초 clip. 4 audio layer 중 최소 3개 추가. Proper level로 mix (dialogue top·music ducked). 얼마나 'finished' 느낌인지.

Progress

Progress is local-only — sign in to sync across devices.
이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

댓글 0

🔔 답글 알림 (로그인 필요)
로그인댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.