C.W.K.
Stream
Lesson 08 of 10 · published

Pose and Composition Control: ControlNet Concepts

~14 min · control, editing, l8

Level 0Spark
0 XP0/100 lessons0/14 achievements
0/200 XP to next level200 XP to go0% complete

피파 한 줄 정리: ControlNet은 prompt에 *structural conditioning*을 추가해 (edge·depth·pose·sketch). Text가 못 하는 spatial 정밀 컨트롤을 visual로 줘.

Mental model: Imagine you're a film director working with a body double. You position the double exactly how you want — arms up, head tilted, stepping forward — then photograph them as a silhouette. Now you hand that silhouette to an artist and say "paint this person in this exact pose, but make them a medieval knight in a forest." The silhouette controls the structure; the text controls the content. That's ControlNet.

What ControlNet Does

ControlNet adds a structural conditioning signal to the diffusion process. Instead of relying only on text (which is imprecise about spatial details), you provide an explicit visual guide that tells the model where things should be and how they should be shaped. The text then tells the model what those things should look like.

Types of Control Signals

Different types of control images extract different structural information:

Control Type     What It Captures              Best For
  ──────────────────────────────────────────────────────────────
  Canny Edge       Edges and outlines            Sharp structural guidance
  Depth Map        Distance from camera          3D spatial arrangement
  Pose (OpenPose)  Skeleton/joint positions      Human body positioning
  Normal Map       Surface orientation           Lighting-consistent surfaces
  Segmentation     Semantic regions              Scene layout (sky/ground/building)
  Scribble/Sketch  Rough hand-drawn guides       Quick compositional ideas
  Lineart          Clean line drawings           Illustration and manga

How It Works (High Level)

ControlNet operates as a parallel neural network that "shadows" the main diffusion model. The control image is processed by this parallel network, which produces feature maps that are injected into the main model at each denoising step. This means the model is simultaneously guided by:

  1. Text prompt (semantic content: what the scene is about)
  2. Control signal (structural content: where things are positioned)
  3. Random noise (variation: the specific creative interpretation)

Control Strength

Like reference weight, control strength is a slider:

  • Low (0.2–0.4): The control image is a suggestion. The model may deviate for artistic reasons.
  • Medium (0.5–0.7): Strong guidance. Structure is clearly followed but the model has room for natural interpretation.
  • High (0.8–1.0): Strict adherence. The output closely follows the control signal. Can sometimes look stiff or unnatural if the control image itself is imperfect.
Key Takeaways
  • ControlNet adds structural conditioning (edges, depth, pose) alongside text conditioning.
  • Different control types guide different aspects: edges for shape, depth for 3D, pose for body.
  • Control strength balances structural fidelity vs. creative freedom.
  • Multiple control signals can be stacked for multi-dimensional guidance.
  • A rough control image is better than 50 prompt iterations for spatial precision.

Code

예시 코드·text
# Conceptual pipeline
Input:
  text_prompt = "a medieval knight standing in a misty forest"
  control_image = pose_skeleton.png  (OpenPose format)
  control_strength = 0.8

Process:
  1. Encode text → text embeddings
  2. Encode control image → structural features (via ControlNet)
  3. Start from noise
  4. At each denoising step:
     - Main model receives text guidance (what to generate)
     - ControlNet injects structural guidance (where to put it)
  5. Output: knight in exact pose from skeleton, in a forest

Result: Exact pose control + creative freedom for style and content

External links

Exercise

Stack이 ControlNet 지원하면 (ComfyUI·SD WebUI), 같은 prompt를 control 없이·Canny edge·OpenPose skeleton·depth map으로. 너의 specific scene에 어떤 control type이 의도와 가장 일치?

Progress

Progress is local-only — sign in to sync across devices.
이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

댓글 0

🔔 답글 알림 (로그인 필요)
로그인댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.