C.W.K.
Stream
Lesson 09 of 10 · published

How Generators "Know" Style, Lighting, and Composition

~16 min · diffusion, latent-space, l9

Level 0Spark
0 XP0/100 lessons0/14 achievements
0/200 XP to next level200 XP to go0% complete

피파 한 줄 정리: 'oil painting' 한 단어가 brush texture·canvas·색감·composition까지 다 부르는 이유 — 학습 데이터의 statistical association.

Type "Renaissance oil painting, chiaroscuro lighting, three-quarter portrait" and the model produces something that genuinely looks like a Renaissance painting — with appropriate brush texture, dramatic shadow, muted earth tones, and classical composition. How? Where did this knowledge come from?

Here's the analogy: imagine a child who grew up in the world's greatest art museum, spending every day surrounded by millions of paintings, photographs, and videos. She was never taught art theory — no one explained the rule of thirds, complementary colors, or Rembrandt lighting. But through sheer exposure, she absorbed these patterns. She can't articulate the rules, but she can feel when something looks "right."

What the Training Data Teaches

During training, the model processes billions of image-text pairs. These pairs come from the internet — a mix of professional photography, fine art, stock images, screenshots, product shots, film stills, and everything in between. From this corpus, the model implicitly learns:

Style as a Pattern Language:

  • "Oil painting" = visible brush strokes + canvas texture + specific color palettes + particular ways light falls
  • "35mm film photography" = specific grain patterns + lens characteristics + color science + common compositions
  • "Anime" = specific proportions + line work + color blocking + eye styles + shading conventions
  • "Cyberpunk" = neon colors + rain + dark environments + chrome surfaces + Asian typography

Lighting as Physics Approximation:

  • How shadows fall based on light direction
  • How different light sources (sun, neon, candle, flash) produce different color temperatures and shadow qualities
  • How materials interact with light (metal reflects, skin scatters, glass refracts)
  • How atmospheric effects (fog, dust, rain) scatter light

Composition as Visual Grammar:

  • Where subjects are typically placed in professional photographs
  • How foreground, midground, and background create depth
  • Leading lines, framing, negative space, visual weight
  • Camera angle conventions: bird's eye, worm's eye, eye level, Dutch angle

Why This Creates Bias

The training data isn't a neutral sample of all possible images. It's a sample of what's on the internet, which is heavily biased toward:

  • Professional, well-composed photographs (highly shared/liked content)
  • Western aesthetic conventions (dominant in large-scale datasets)
  • Popular, polished styles (commercial photography, trending digital art)
  • Conventional beauty standards and common demographics

This is why default generations (without strong style direction) tend to look polished, conventional, and sometimes generic. The model has learned that "most images the internet calls good look like this," so it gravitates toward those patterns.

Key Takeaways
  • Style, lighting, and composition knowledge comes from statistical patterns learned across billions of image-text pairs.
  • The model doesn't "understand" art theory — it has absorbed correlations between text descriptions and visual patterns.
  • Training data bias means defaults lean toward polished, conventional, Western aesthetics.
  • You can steer away from defaults, but you're working against strong learned priors — be explicit about what you want.

External links

Exercise

Style 단어 ('cyberpunk'·'Renaissance'·'cottagecore') 골라. 그 단어만으로 image 1개 generate. 모델이 학습된 style association으로 추가한 것 10가지 적기.

Progress

Progress is local-only — sign in to sync across devices.
이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

댓글 0

🔔 답글 알림 (로그인 필요)
로그인댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.