- Why structure beats keyword stacking
- The 8-part prompt formula
- Visual: what a structured prompt looks like
- Wrong vs. right examples
- 5 real prompts (portrait / product / illustration / architecture / video)
- 5 common pitfalls when going structured
- Parameter cheat sheet for Midjourney, SDXL, Flux, Niji
Why structure beats keyword stacking
Most beginners write prompts by piling up adjectives: cinematic, stunning, ultra detailed, 8k, masterpiece, amazing, award-winning, professional. Sometimes the result is good. Most of the time, three problems show up. First, you cannot tell which word did the work, so the next attempt has to copy the entire string. Second, words fight each other: dreamy and photorealistic pull the model in opposite directions. Third, you have no surgical control: when the face is off, your only move is to roll again.
Structured prompts split the description into functionally distinct blocks. Subject says what is in the picture. Scene says where and when. Style says what visual language it belongs to. Light says the time of day and the direction of illumination. Composition says how the camera frames it. Material says the close-up texture that makes it feel real. Quality says how clean and finished the image should look. Parameters say the technical details: aspect ratio, version, seed, steps, guidance.
The biggest win is that each block can be edited without breaking the others. If you have a great portrait prompt and you only want to move from a studio to a cafe, you change the scene and light blocks. If the subject keeps drifting, you fix the subject block instead of layering more adjectives. Adjectives are emergency tools; structure is the system.
The 8-part prompt formula
[subject] + [scene] + [style] + [light] + [composition] + [material details] + [quality cues] + [parameters]
For a perfume bottle product poster, each block fills in roughly like this. Subject: transparent glass perfume bottle, amber liquid, brushed gold cap. Scene: cream linen surface, soft drop shadow. Style: minimal commercial photography, luxury advertising aesthetic. Light: top-left soft box, low contrast, single controlled highlight. Composition: centered, bottom-third negative space. Material: glass refraction, liquid swirl residue, brushed metal grain. Quality: razor-sharp edges, color accurate, clean background. Parameters: --ar 4:5 --v 6 --s 250 in Midjourney, or SDXL at 1024×1280 with CFG 6.5, 30 steps, Euler a.
You do not have to fill all eight blocks. Subject, style, light, composition and quality are the must-haves. Material, scene and parameters are add-ons. A frequent beginner mistake is stacking quality words ("8k, masterpiece, best quality") before nailing subject and scene, which is like polishing the lens before pointing the camera at anything.
Visual: what a structured prompt looks like
Laying the prompt out vertically by block gives you a quick diagnostic: which block is empty, which block is overstuffed. A useful habit is to glance at this diagram and add detail to the thinnest two blocks instead of bolting more words onto the heaviest one.
Wrong vs. right examples
✗ Wrong
beautiful girl, masterpiece, best quality, 8k, ultra detailed, cinematic, stunning, amazing, awesome, perfect
No subject identity, no scene, no light, no composition. Nine of the ten words are empty adjectives. The model falls back to training-set averages, so each generation differs wildly.
✓ Right
a young ceramic artist in a linen apron, standing beside a walnut workbench in a sunlit studio, soft window light from the left, shallow depth of field, medium shot, natural skin texture, photorealistic portrait, sharp focus
Subject (ceramic artist), scene (studio), light (soft window light), composition (medium shot + shallow DoF), material (natural skin texture) and style (photorealistic portrait) are all set. To change profession, only the subject block moves.
5 real sample prompts
portrait of a 35-year-old female architect in a dark wool coat, walking through an empty plaza in Lisbon at golden hour, warm side light from the west, medium-long shot, photorealistic, 50mm lens, shallow depth of field, muted earth tones --ar 3:4 --v 6 --s 200
Profession, age and clothing give the subject identity. The scene is a specific city at a specific time. Light has direction and color temperature. The lens hint tells the model what depth-of-field to expect.
transparent perfume bottle with amber liquid and brushed gold cap, centered on a cream linen surface with soft shadow, minimal commercial photography, top-left soft box light, low contrast, centered composition with bottom third negative space, glass refraction and liquid swirl, ultra sharp edges, color graded --ar 4:5
"Luxury feel" is translated into executable elements: negative space, low contrast, soft box light, restrained palette, glass refraction. Strip those and the image collapses to a generic product shot.
a young woman in late Song-dynasty robes, standing under an old plum tree in light snow, narrow palette of indigo, ivory and faded crimson, ink wash illustration with light watercolor texture, three-quarter view, soft diffused light, traditional composition with empty top half --ar 2:3 --niji 6
Stylized illustration is easy to derail. A narrow palette of clause and an explicit brush language (ink wash + light watercolor) stop the model from drifting into generic anime or Western digital painting.
brutalist concrete cultural center, sharp geometric facade with deep shadow recesses, overcast afternoon light, wide-angle architectural photo, low ground-level perspective, two-point linear composition, raw concrete texture with visible formwork, museum scale, no people --ar 16:9
Architecture lives or dies by weather, vantage point and lens. Overcast compresses the shadows and reads the geometry; no people stops the model from defaulting to tourists.
a 5-second cinematic shot, close-up of a single raindrop falling onto a glass window at night, neon city lights blurred in background, slow motion 120fps look, camera holds static, gentle vertical impact ripple, shallow depth of field, cinematic color grading, no camera shake
Video adds three blocks: duration (5 seconds), camera motion (holds static) and motion constraint (gentle vertical impact). Skipping "camera holds static" makes most models default to a slow push, which kills the still-life mood.
5 common pitfalls when going structured
"Luxury / dreamy / cinematic" force the model to guess. Translate them: luxury becomes negative space, restrained palette and clean composition; dreamy becomes soft light, faint mist, shallow DoF and a pastel palette.
"Photorealistic + oil painting texture" or "cyberpunk neon + natural window light" are common contradictions. Pick one dominant style and let the rest serve it.
Layering ten quality words ("masterpiece, best quality, ultra detailed, 8k, hdr, raw, professional") dilutes the subject block. Midjourney v6 and Flux barely need quality cues; SDXL is happy with two or three.
"A woman", "a man", "a product" are non-subjects. The model fills in with training-set averages. Always give at least category + identity + one distinguishing detail.
--ar, --s, --seed, --v belong at the end, separated by a space. Mixed in with the description, some models read them as junk tokens.
Parameter cheat sheet
| Platform | Aspect ratio | Stylize / creativity | Seed | Negative prompt | Version |
|---|---|---|---|---|---|
| Midjourney v6 | --ar 16:9 | --s 0–1000 (default 100) | --seed 12345 | --no people | --v 6 / --niji 6 |
| SDXL / SD 1.5 | 1024×1024 etc. | CFG 5–9 | seed 12345 | separate negative field | checkpoint name |
| Flux Dev / Schnell | set in UI | guidance 2.5–5 | seed | weak / not used | dev / schnell |
| Niji 6 | --ar 2:3 / 3:4 | --s 100–400 | --seed | --no realistic | --niji 6 |
| Video models | 16:9 / 9:16 / 1:1 | guidance / motion strength | partially supported | often ignored | per model |