피파 한 줄 정리: Training data prior가 'default beautiful'을 만들어. Vague prompt = 통계적 평균 = stock-photo 느낌. 명시적으로 default를 깨야 distinctive해져.
Mental model: A chef who trained exclusively at five-star French restaurants. She can make extraordinary coq au vin — but ask her for street-style tacos and she'll unconsciously add a beurre blanc sauce and plate it on fine china. She's not being stubborn; her entire understanding of "good food" was shaped by her training environment. Similarly, image models carry deep biases from their training data that shape everything they generate, often in ways you don't notice until you look for them.
What Are Training Data Priors?
Every image model was trained on a massive dataset of images scraped from the internet, purchased from stock photo agencies, or curated from specific sources. These datasets are not neutral mirrors of reality — they reflect the biases, preferences, and conventions of the photographers, artists, and curators who created and uploaded them.
Common biases include:
- Stock photo aesthetics: Centered subjects, clean backgrounds, warm lighting, attractive people smiling. The "default person" in many models looks like a stock photo model.
- Western-centric compositions: Architectural styles, clothing, food, and landscapes skew heavily toward North American and European visual conventions.
- Professional photography conventions: Rule of thirds, golden-hour lighting, shallow depth of field. These aren't wrong, but they become the model's default for "good image."
- Demographic biases: Certain ethnicities, body types, ages, and abilities may be over- or under-represented depending on the training data mix.
The "Default Beautiful Image" Problem
Try generating a simple prompt like "a person walking down a street." Without explicit style direction, most models will produce something strikingly similar: a young, conventionally attractive person, warm color palette, slightly cinematic composition, shallow depth of field, clean urban environment. This isn't because the model decided this is what you wanted — it's because this is the statistical mode of its training data. The "average beautiful image" emerges naturally from optimizing across millions of similar photographs.
Composition Clichés
Models have learned that certain compositions correlate with certain subjects, and they reproduce these correlations faithfully:
- "Portrait" → head-and-shoulders, centered, eye-level, blurred background
- "Landscape" → horizon at lower third, dramatic sky, leading lines
- "Food" → overhead or 45-degree angle, rustic wooden surface, scattered ingredients
- "Product" → centered on white or gradient background, slightly elevated angle
These are conventions the model learned from millions of examples. They produce pleasant results, but they also make AI images look samey. Breaking out of these conventions requires explicit, specific prompting — and even then, the model may resist.
"A woman drinking coffee"
"An elderly woman drinking black coffee from a chipped ceramic mug at a cluttered kitchen table, early morning harsh fluorescent light, shot from slightly below, documentary photography style, messy and lived-in"
How to Recognize and Work With Priors
You can't eliminate training data bias, but you can work with awareness of it:
- Recognize the default: If your output looks generically polished, you're seeing the prior, not your creative direction. Add specificity.
- Specify the uncommon: Unusual lighting (fluorescent, overcast, twilight), non-standard compositions (worm's-eye view, through a window), imperfect subjects (messy, worn, mundane) push past the defaults.
- Use negative prompts: Where supported, negate the clichés you don't want ("no bokeh, no golden hour, no centered composition").
- Reference images: Show the model what you want rather than trying to describe your way out of the prior.
- Training data priors create a "default beautiful image" that models gravitate toward.
- Stock photo aesthetics, Western composition conventions, and demographic skews are baked into the data.
- Vague prompts produce generic results because the model falls back to the statistical center.
- Specificity, unusual details, and reference images help push past default priors.