How to Write AI Prompts That Actually Work (Complete Guide for Beginners, 2026)
You typed "beautiful woman portrait" into Midjourney. What came back looked like a department store mannequin having a crisis. You tried "make it more realistic" — same thing, slightly worse. Then you Googled "how to write AI prompts" and landed on a listicle with 12 vague tips and zero actual examples. That's the exact gap this guide closes. Here's the real formula: five parts, in order, with bad-vs-good comparisons for every piece, so you stop guessing and start directing.
Table of Contents
Why Most First Prompts Fail
The core problem is that AI image models aren't search engines. When you type "sunset" into Google, you get pictures of sunsets. When you type "sunset" into Midjourney or DALL-E 3, the model has to fill in dozens of blanks: What time of day exactly? What colors? What's in the foreground? What's the mood? What lens? It fills every blank you left — with statistical averages, not your vision.
"A dog" produces a medium-sized brown dog in profile, sitting on grass, soft neutral light. Every single time. Not because the model is bad, but because that's literally the average of all the dog images it trained on. "Beautiful woman" gets you a symmetrical 25-year-old with a soft-focus smile on a clean white background. The model didn't fail. It did exactly what an averager does when given nothing specific to aim at.
Short prompts don't fail because they're short. They fail because they leave the wrong decisions to the model. The fix isn't to write more words — it's to fill the right five blanks deliberately. Here's what those five blanks are.
The 5-Part Prompt Formula
Subject + Style + Lighting + Composition + Quality
You don't need all five for every prompt. But knowing these five dimensions — and making conscious choices about each one — is what separates images that land from images that look like stock photos from 2015. Think of it as a checklist: before you hit generate, ask yourself whether you've covered the subject, the visual language, the light source, the camera position, and the finish quality. Let's go through each part with real examples.
Part 1: Subject — Specific Enough to Be Surprising
The subject is the foundation of every prompt. But generic subjects produce generic images. You're not describing a concept to the model — you're describing a specific moment, object, or person with enough detail that the model can't fall back on its defaults.
Bad prompt: "a woman"
Why it fails: Age? Race? Expression? Setting? Action? The model picks safe defaults for everything. You'll get a generic 25-year-old, soft light, neutral background, slight smile.
Good prompt: "a 60-year-old Japanese ceramicist with clay-stained hands, examining a bowl she just pulled from the kiln, expression of intense concentration"
Why it works: Age, occupation, physical detail, action, and emotion are all specified. The model has a real target, not an average to fall back on.
A strong subject has three layers. First, a primary noun — one main thing, not five. Second, a defining detail — one or two specific adjectives that aren't "beautiful" or "amazing" (those are instructions about quality, not visual specifics). Third, an action or relationship — what is it doing, or what is it near? The action layer forces the model to make compositional decisions. Usually good ones. "A crow landing on an abandoned piano" is more interesting than "a crow near a piano" for exactly that reason.
Part 2: Style — Pick One and Commit
Style tells the model which visual language to work in. "Digital art style" tells it almost nothing — it's like telling a chef to cook "food style." The gap between specific style choices is enormous in the final output.
Bad prompt: "digital art style, beautiful, stunning"
Why it fails: "Digital art" covers everything from Pixar renders to pixel art to hyperrealism. "Beautiful" and "stunning" are quality adjectives — they don't improve quality, they just tell the model you want something good. It already assumes that.
Good prompt: "cinematic photography, anamorphic lens, ARRI Alexa color grade, shallow depth of field"
Why it works: These are specific technical references. The model knows what ARRI Alexa color science looks like. It knows what anamorphic lens characteristics produce. You're speaking the same language as its training data.
Style choices that produce reliable results: cinematic photography, editorial fashion photography, oil painting with impasto texture, Studio Ghibli animation, matte painting concept art, vintage film photography with grain, product photography with clean commercial lighting. Pick one per prompt. Mixing two styles usually produces a muddy compromise — unless you specifically want the tension between them, like "oil painting rendered with hyperrealistic photographic detail." Browse the PromptSpace image gallery to see what different style combinations actually produce before you commit to one.
Part 3: Lighting — The Fastest Way to Change the Emotion
Lighting is where experienced photographers earn their rates. Midjourney and DALL-E 3 both respond to lighting descriptors more reliably than almost any other component. The same subject in golden hour light, harsh overhead noon, and neon glow are three completely different emotional experiences — and three completely different prompts.
Bad prompt: "good lighting, well lit"
Why it fails: "Good lighting" is meaningless — it's like asking for "nice weather." The model defaults to flat, even, generic studio light that makes everything look like a mid-tier LinkedIn headshot.
Good prompt: "Rembrandt lighting, single warm key light at 45 degrees above left, triangle of light under the eye, deep shadow on the right side of face"
Why it works: Specific, technical, and the model has seen thousands of images tagged exactly this way. It knows what to do.
Lighting descriptors worth learning: golden hour (warm amber, long shadows, nostalgic); Rembrandt lighting (one key light, dramatic portrait feel); rim lighting / backlit (glowing subject edges, cinematic silhouette); neon glow (colored shadows, rain-wet reflections, cyberpunk); diffused overcast (soft even shadows, intimate or melancholic); bioluminescence (internal teal light source, otherworldly). The one rule: match lighting to mood. "Golden hour" paired with "dark and unsettling atmosphere" is a contradiction that weakens both descriptors.
Part 4: Composition — Tell the Camera Where to Stand
Composition controls framing, perspective, and spatial relationships within the frame. Leave it unspecified and the model defaults to a medium shot with the subject centered. Fine for snapshots. For anything intentional, you need to direct the camera explicitly.
Bad prompt: "a mountain landscape"
Why it fails: You get medium distance, centered mountains, flat horizon. A postcard. Competent, generic, forgettable.
Good prompt: "extreme wide angle view looking up at jagged peaks from a tiny rocky ledge, cracked stone foreground in sharp focus, peaks disappearing into storm clouds, vertiginous perspective, atmospheric depth"
Why it works: Camera position, focal distance, foreground element, and atmosphere are all specified. The model is working from a blueprint, not an average.
Useful composition terms: close-up portrait (face fills frame), Dutch angle (tilted camera suggesting unease), bird's-eye view, worm's-eye view, over-the-shoulder shot, extreme wide angle, macro shot. Lens choice matters too: 35mm wide, 85mm portrait, 200mm telephoto compression, fisheye distortion. These are technical terms from real photography — the model has seen them in millions of images and responds accurately. The cinematic prompts collection uses these composition terms extensively, and it's worth seeing the actual outputs to build intuition.
Part 5: Quality Modifiers — Specific References, Not Superlatives
Quality modifiers aren't magic words that make everything better. They're specific technical or platform references that pull the model toward a certain level of detail and finish. They work when they match your chosen style — and they backfire when they don't.
Bad prompt: "8K ultra HD hyper realistic stunning award winning masterpiece"
Why it fails: This quality stack is so overused it's nearly meaningless. Stacking six vague superlatives doesn't multiply quality — it dilutes the signal. The model has seen this phrase on average-quality images thousands of times.
Good prompt: "Hasselblad medium format, Kodak Portra 400 film grain, tack-sharp focus on eyes, f/2.8 shallow depth of field, skin texture visible"
Why it works: Specific camera system, specific film stock, specific technical characteristics. The model maps these to real visual qualities rather than abstract aspirations.
For photography: use specific camera brands (Sony A7R IV, Canon 5D Mark IV), lens specs (85mm f/1.4), and film stocks (Kodak Portra 400, Fuji Velvia). For illustration and concept art: "Artstation trending, detailed concept art, professional illustration." For cinematic work: "ARRI Alexa footage, anamorphic, color graded." Use the PromptSpace prompt generator to experiment with how quality modifiers combine with different style components and see which combinations produce the results you're after.
Advanced Techniques: Negative Prompts, Aspect Ratios, Model-Specific Tips
Negative Prompts
Negative prompts tell the model what to exclude. In Midjourney, use --no [term] appended at the end. In DALL-E 3 and most other tools, include a short instruction like "avoid text, watermarks, and blurry backgrounds" in your main prompt.
Keep the negative prompt focused. A 20-term list beats a 200-term one. The 200-word negative prompts on forums are usually bloated with contradictions that cancel each other out. Start with: --no text, watermark, blurry, overexposed, extra fingers, bad anatomy. Then add specific exclusions based on what's actually appearing wrong in your outputs — don't preemptively ban things that aren't a problem for your particular prompt.
Aspect Ratios
This is the most consistently overlooked parameter, and it's one of the few that matters on every single generation. Default Midjourney output is roughly square. If you're generating for a specific format, set the ratio explicitly before you start — not as an afterthought after you've got a composition you like.
- --ar 16:9 — YouTube thumbnails, desktop wallpapers, landscape photography
- --ar 9:16 — Instagram Stories, TikTok videos, phone wallpapers
- --ar 4:5 — Instagram feed posts, portrait photography
- --ar 1:1 — Square social posts, profile pictures
- --ar 21:9 — Cinematic ultra-wide, website banner images
A great composition generated in the wrong ratio will be ruined when you try to crop it for your actual use case. Set aspect ratio first, then build your prompt.
Model-Specific Tips
Midjourney v6 and later respond strongly to descriptive natural language — write like a film director giving shot instructions. DALL-E 3 responds better to complete sentences and scene descriptions; it understands narrative context unusually well. Stable Diffusion uses comma-separated terms and parentheses for weighted emphasis like (term:1.3), with the negative prompt in a separate field. The five-part formula applies to all three models, but the syntax and emphasis patterns differ between them.
Prompt Formulas for Common Use Cases
Profile Picture / Headshot
Professional headshot of [person description], shallow depth of field,
clean neutral background, Rembrandt lighting, Sony A7R IV,
85mm portrait lens, skin detail, eyes sharp,
--ar 1:1 --no text, blurry, extra fingers
See tested headshot prompts in the realistic photo prompts collection — sorted by style and skin tone rendering approach.
YouTube Thumbnail
[Subject in dramatic pose or expression], high contrast,
bold saturated colors, cinematic lighting, dark vignette edges,
editorial photography style, shallow depth of field,
--ar 16:9 --no text, watermark, logo
The thumbnail prompt library has 50+ tested formulas organized by niche — tech, gaming, finance, lifestyle, and more.
Product Shot
[Product name and brief description] floating on [background color or texture],
dramatic side rim lighting, clean reflections on surface below,
commercial photography, tack-sharp focus, minimal shadows,
--ar 4:5 --no text, hands, clutter
Cinematic Landscape
[Location or setting], [time of day], [weather condition],
extreme wide angle, sharp foreground detail, atmospheric haze in distance,
ARRI Alexa color grade, anamorphic lens,
epic sense of scale, --ar 21:9 --no people, cars, power lines
How to Iterate and Improve: The Real Workflow
Professional AI image creators don't nail it on the first generation. They run a focused iteration loop, and knowing the loop is as important as knowing the formula.
Step 1 — Start with Subject + Style only. Get the core concept right before adding complexity. If the subject isn't working at this stage, no amount of lighting detail will save it. Run two or three variations and see which direction has potential.
Step 2 — Add lighting. Once the subject and style are locked, run the prompt with two or three different lighting descriptors. Golden hour vs. Rembrandt vs. overcast produce dramatically different emotional results from the same subject. Pick the one that matches your intended feel.
Step 3 — Lock the direction and add Composition + Quality. Don't start over when you have a promising result. Refine the winning direction by adding the remaining formula components and see what changes.
Step 4 — Use Variations, not full regeneration. In Midjourney, use the V buttons on a result you like rather than writing a new prompt. You're iterating toward a specific image, not rolling dice each time. For DALL-E 3, edit the prompt slightly and explicitly ask to "keep the same composition but change the lighting."
Expect to run 5-10 generations per final image. That's not failure — that's the normal professional workflow. The people whose results you see on social media are showing you their best output, not their first.
Honest Limitations: What Prompts Can't Fix
A well-crafted prompt will not fix fundamental model limitations, and it's worth knowing these upfront so you don't spend an hour tweaking prompts when the problem is structural.
Hands are still a documented weak point for most image models in 2026. Add --no deformed hands, extra fingers, missing fingers to reduce occurrences, but you'll still get bad hands periodically. The practical fix is to use inpainting to correct the hand region rather than regenerating the entire image from scratch.
Legible text inside images is unreliable. Midjourney v6 can handle simple short words, but anything beyond a word or two will render as plausible-looking gibberish in most tools. Generate the image without text and add your actual text in Canva, Photoshop, or Figma afterward. Don't try to prompt your way around this limitation.
Specific real-person likenesses are hit or miss, and some will be refused by content filters regardless of how the prompt is written. The formula works within what the model can actually produce — it can't unlock capabilities the model doesn't have or bypass safety guardrails.
Finally, prompts don't compensate for an unclear concept. A technically perfect five-part prompt for a vague idea produces a technically polished vague image. Spend time deciding what you actually want to communicate before spending time on how to describe it. The formula is a precision tool — it needs a target.
Frequently Asked Questions
How long should my AI image prompt be?
Long enough to cover the five components — usually 30 to 60 words for a focused result. Longer is not better. A 150-word prompt stuffed with vague filler terms produces worse results than a 40-word prompt with precise references. Every word in your prompt competes for the model's attention. Make each one point to a specific visual characteristic, not an abstract quality like "amazing" or "epic."
Does the order of words in a prompt actually matter?
Yes — most models weight terms that appear earlier in the prompt more heavily. Put your subject and primary style first. Move lighting, mood, and quality modifiers toward the end. Technical parameters like aspect ratio always go last. If something specific isn't appearing in your output, try moving it earlier in the prompt rather than adding more words around it.
Why do I keep getting the same generic result no matter what I type?
You're using terms the model maps to its statistical average output. Words like "beautiful," "realistic," "cool," and "high quality" don't point to specific visual characteristics — they're abstract evaluations the model has seen attached to millions of average images. Replace them with precise technical references: camera models, specific lighting setups, artist names, film stocks, named color grades. Specificity is the only thing that breaks the generic default.
Should I use ChatGPT to write my Midjourney or DALL-E prompts?
It can help, but only if you brief it properly. Asking ChatGPT to "write a Midjourney prompt for a portrait" produces the same generic output as your own generic prompt — for the same reasons. Instead, use the five-part formula as your brief: ask it to write a prompt "using Rembrandt lighting, cinematic photography style, 85mm lens, with the subject doing a specific action, and add quality modifiers for a Hasselblad look." That specificity in your request produces a usable result.
Do these techniques work for text-based AI tools like ChatGPT and Claude, not just image generators?
The specificity principle applies everywhere. For text AI, the equivalent of the five-part formula is: specify the format (bullet list, short paragraph, numbered steps), the tone (formal, conversational, technical), the target audience, the length, and one concrete example of what good looks like. The exact formula components don't translate directly from images to text, but the core idea does — fill the blanks deliberately rather than leaving the model to guess what you meant.
The formula works because it forces deliberate choices along every dimension that actually drives the output. Once you've internalized Subject + Style + Lighting + Composition + Quality, prompting stops being a guessing game. You're directing, not hoping. Explore the full PromptSpace prompt library for hundreds of tested prompts built on this formula, organized by style and use case, ready to copy and customize for your own projects.
