피파 한 줄 정리: 순서? 옛날엔 중요했어 (CLIP-77token). 지금은 (FLUX·SD 3.5) 덜 중요해. 그래도 subject → scene → lighting → style 순이 안전한 default야.
There's a persistent debate in the image generation community: does the order of words in your prompt matter? The answer is: it depends on the model, and less than you think, but more than not at all.
The General Principle
Think of your prompt as a newspaper article. Journalists front-load the most important information (the "inverted pyramid"). Prompt encoding works similarly: most models give slightly more weight to words that appear earlier in the prompt. This is especially true for models with limited context windows (older CLIP-based models with 77-token limits).
Prompt attention (simplified):
Position: [Start ............... Middle ............... End]
Weight: HIGH ──────────────── MEDIUM ──────────────── LOWER
████████████████████ ██████████████ ████████
"A red car on a mountain road, sunset, dramatic clouds, film photography"
↑ Highest weight ↑ Lowest weight
(red car is dominant) (film photography may
have less influence)
Model-Specific Behavior
CLIP-based models (SD 1.5, SDXL): 77-token hard limit. Order matters more. Early tokens get more attention. Prompts get truncated beyond the limit — so critical information at the end might get cut off entirely.
T5-based models (SD 3.5, FLUX): Much longer context windows (hundreds of tokens). Better at understanding the full prompt regardless of order. FLUX in particular uses a powerful language model (Mistral Small) that understands syntax and grammar, so natural sentence order often works better than keyword stuffing.
Midjourney: Proprietary, but community testing suggests front-loaded prompts perform better, especially for subject identity.
"cinematic lighting, detailed textures, 8k resolution, dramatic atmosphere, an astronaut floating above Earth"
"An astronaut floating above Earth, dramatic atmosphere, cinematic lighting, detailed textures"
When Order Genuinely Matters
- Subject identity: Put your main subject early. "A black cat sitting on a red chair" is more likely to give you a black cat than "red chair with dramatic lighting and a black cat sitting on it."
- Style vs. subject priority: Putting style first ("Oil painting of a...") makes style dominant. Putting subject first ("A warrior in... oil painting style") makes the subject dominant.
- Short context windows: With older models (77-token limit), anything beyond ~15 words may get progressively less attention.
When Order Doesn't Matter Much
- Modern models with T5 encoders: FLUX and SD 3.5 understand syntax well enough that natural English order is usually fine.
- Stylistic modifiers: "warm lighting" vs. "lighting, warm" makes negligible difference.
- Medium-length prompts: In the 20-50 token range, all words get reasonable attention.
- Earlier words get slightly more weight in most models — put the subject first.
- Modern models (FLUX, SD 3.5) are more order-agnostic than older ones (SD 1.5).
- Lead with subject → scene → lighting → style/camera for a safe default order.
- Don't over-optimize for order — it provides marginal returns compared to seed selection and other techniques.