C.W.K.
Stream
Lesson 03 of 10 · published

Why Hands, Fingers, and Complex Objects Break

~18 min · failures, diagnosis, l3

Level 0Spark
0 XP0/100 lessons0/14 achievements
0/200 XP to next level200 XP to go0% complete

피파 한 줄 정리: 손이 어려운 이유 = high articulation × occlusion × small pixel budget × no anatomical model. Jewelry·치아·glass도 같은 failure profile.

Mental model: Even professional artists say hands are the hardest thing to draw. There's a reason art students spend weeks on hand studies — hands have 27 bones, dozens of possible poses, constant self-occlusion, and they look dramatically different from every angle. Now imagine learning to draw hands only by looking at millions of photographs where hands appear in every conceivable position, size, lighting condition, and level of visibility. That's what an image model does — and the results are predictably messy.

Why Hands Are Especially Hard

Hands fail for a collision of reasons that don't affect simpler body parts:

  • High articulation: Five fingers, each with three joints, plus a complex wrist. The combinatorial space of valid hand poses is enormous.
  • Frequent occlusion: Fingers overlap each other, wrap around objects, hide behind palms. The model often sees partial hands in training data.
  • Small pixel budget: In most photographs, hands occupy a tiny fraction of the image. The model has fewer pixels to work with and less training signal per hand.
  • No anatomical model: The system doesn't know humans have five fingers. It knows that hand-shaped regions tend to have elongated protrusions, but the exact count is a soft statistical pattern, not a hard rule.

Jewelry, Teeth, and Small Details

The same class of problems affects any small, detailed, structurally specific element:

  • Jewelry: Rings duplicate, necklace chains break mid-air, earrings mismatch between ears. These are tiny, precise structures that the model treats as decorative texture rather than structured objects.
  • Teeth: Too many teeth, uneven sizes, teeth that blur into each other — because the model learned "mouth region has white shapes" rather than "32 teeth arranged in an arc."
  • Glasses: Frames may connect inconsistently, lenses may differ in shape, or the glasses might partially merge with the face.

Transparent and Reflective Objects

Glass, water, mirrors, and chrome surfaces create another category of failure. These objects don't have a single stable appearance — they look different depending on what's behind or around them. The model must simultaneously generate the object and a plausible refraction or reflection, which requires implicit scene understanding that pixel-pattern matching doesn't provide.

Practical Workarounds

  • Frame out the problem: Crop at the wrist, use close-up faces without hands, or put hands in pockets/behind back.
  • Specify the pose: "Arms crossed," "holding a coffee mug with both hands" — constrained poses fail less.
  • Inpaint after generation: Generate the overall image, then fix hands with a targeted edit pass.
  • Use reference images: Provide a hand pose reference to guide the model (covered in Track 5).
Key Takeaways
  • Hands fail because of high articulation, frequent occlusion, small pixel area, and no anatomical knowledge.
  • Jewelry, teeth, glasses, and transparent objects fail for similar reasons: structural precision + high variability.
  • The model predicts plausible visual patterns, not physically correct structures.
  • Workarounds: constrain the pose, crop out problem areas, or fix with inpainting.

External links

Exercise

손 포함된 portrait generate. Inpainting으로 손 오류 fix. 문서화: inpaint pass 몇 번? 어떤 pose가 best?

Progress

Progress is local-only — sign in to sync across devices.
이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

댓글 0

🔔 답글 알림 (로그인 필요)
로그인댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.