Two engineers open the same chat window, with the same model, on the same Tuesday morning. One ships a feature with public database queries, hardcoded keys, no rate limit, and a UI that converts at half the rate it should. The other ships the same feature audited against OWASP, profiled for hot paths, copy-tested against high-converting landing pages, and fully instrumented. Same model, same week, two completely different products, separated entirely by what each engineer knew to ask the model about.

Two engineers at identical workstations side by side, one prompt window glowing with a single shallow question, the other branching into a constellation of specialist queries — security, performance, copy, architecture, observability

Two Engineers, Same Model

Engineer A is vibing. The prompt is “build me a dashboard that shows sales by region.” The model writes the dashboard. It uses the most common pattern in its training data, because that is what models default to when nothing else is specified. The default pattern has public read access on the table. The default pattern has no authorization check on the endpoint. The default pattern logs the API key into the browser console. It runs. It looks fine. It ships.

Engineer B opens the same window. Same first prompt. Then: “audit this for the OWASP top 10.” Then: “what would the senior security engineer at a fintech say about this access pattern?” Then: “is this query going to do a full scan when we hit a million rows?” Then: “rewrite the empty state copy as if it were a top-converting SaaS dashboard.” Then: “what’s missing in the observability story?” Then: “are there duplicate code paths I should consolidate?”

Same model. Same Tuesday. Different output ceiling by an order of magnitude.

Between Engineer A and Engineer B, the model itself has not changed. Engineer B is just reaching further into it — calling a security expert, a database planner, a marketing strategist, an SRE, and a refactoring reviewer, all of which already live in the weights, dormant, waiting for a vocabulary trigger.

The AI you experience is the AI you invoke.

Capability Elicitation Is the Real Game

There is a phrase for this in the research literature: capability elicitation. It is the practice of getting a model to actually use what it already knows. The UK AI Safety Institute has shown that elicitation techniques can lift model performance by amounts comparable to a 5–20x increase in training compute. Same weights. Same model card. Different prompting strategy. Five to twenty times the output quality.

Five to twenty times.

Read that number again. The practical AI gains in 2026 are coming out of that gap — the distance between what the model already knows and what your prompts manage to surface — not out of the next model release. Models have trained on every published security audit, database design review, copy test, architecture critique, and code review humans have ever written. All of it is in there. None of it shows up unless you ask.

The default behavior of every chat model is to produce the most common pattern in its training data conditioned on the surface form of the prompt. The most common pattern in training data is code that compiles and runs, not code that is secure, performant, accessible, instrumented, and maintainable. Those properties live in different neighborhoods of the model’s latent space, and they only get activated when the prompt rhymes with the language those neighborhoods were trained on.

This is why your agent’s IQ matches your context is only half the story. Context is what the model can see. Prompt vocabulary is which regions of the model you actually reach into. You need both. People obsess over the first and ignore the second.

What’s Missing, Not What’s Present

The vibe coder ships exploitable code because security is mostly made of absences — the auth check that isn’t there, the rate limit that isn’t there, the input validation, the row-level security. Default LLM output is generative; it shows you what’s on the page. Absence is invisible unless someone asks “what’s missing?”

The same logic applies on every other axis you care about:

Performance. The query works on the data you have today. It might thrash on the data you have in eighteen months. The model will not volunteer that.
Accessibility. The UI passes the sight check. A screen reader’s path through it is a separate question, and only gets answered if you raise it.
Conversion copy. “No data yet” is the easy empty state. The one that actually activates a user to populate their first row takes a different prompt entirely.
Observability. Endpoints that work in dev are not the same thing as endpoints you can diagnose at 3 a.m. when they don’t.
Modularity. The function works in isolation. Whether you have now written it four times across the codebase is something only a refactoring-eyed prompt will surface.

Each of those is a different neighborhood of the model. Each requires a different prompt phrasing to enter. A vibe coder with a hundred-billion-parameter model and one-line prompts is using a Ferrari to drive to the corner store. The summoning is where the gains live now.

This is also why a single eval harness outperforms a hundred ad-hoc prompt tweaks. The harness forces you to enumerate the dimensions you care about. The act of enumeration is what reaches into the model.

Infographic: One shallow prompt produces a single generic output, while a stack of specialist prompts (build, security audit, perf check, copy review, arch audit) routed through the same model produces five labeled outputs (security, performance, copy, architecture, observability). Headline: YOU GET THE AI YOU ASK FOR. Subtitle: PROMPT VOCABULARY = OUTPUT CEILING.

Your Vocabulary Is Your Ceiling

Here is the uncomfortable part. The list of things you know to ask about is the list of things you already understand. Never read about SQL injection? You won’t ask the AI to check for it. Never thought about page-load budgets? You won’t ask the AI to audit them. The AI answers questions; it doesn’t volunteer the curriculum you never learned.

That is the mirror. The AI doesn’t flatter you the way a sycophantic model agrees with you. It reflects the limit of your professional vocabulary — every dimension you cannot articulate, every trade-off you cannot name, every expert you have never been exposed to, all stay below the surface of the response.

Vibe coders ship bad software because their internal checklist is short. The model is happily, dutifully producing exactly what was asked.

The output ceiling of any chat with any model is bounded by the asker’s vocabulary.

How to Raise the Floor

“Be a better engineer before you use AI” is true and unhelpful. The practical fix is to externalize the checklist so the model never has to rely on you remembering, at 11 p.m. on a Friday, every axis that matters.

Three concrete moves:

1. Write a prompting checklist for every recurring task. A “ship a new endpoint” checklist contains the questions you would otherwise forget: security audit, rate-limit check, auth check, input validation, pagination plan, index plan, observability hook, error envelope, empty state copy, loading state copy, test coverage on the unhappy path. Make it a file. Run the file before merging.

2. Codify the checklist into project rules that have teeth. Rules files (.cursor/rules, AGENTS.md, repo-level guidance) raise the default set of brain regions the model activates without you needing to retype the checklist every prompt. The model’s vocabulary gets bigger because yours did, once, in a file.

3. Add a forced “what am I missing?” pass. After the model finishes the task, prompt: “Audit this work as five different senior reviewers — a security engineer, an SRE, a database architect, a UX writer, and a tech lead doing a code review. List what each one would flag.” That single pass routes the model through five neighborhoods it did not visit on the first draft. It is the cheapest 5x you will ever get.

These three moves are how you build exactly what you want, instead of accepting the nearest plausible thing the model produced. None of them make the model smarter. They widen the door.

The Model Is a Polymath in Hibernation

Your chat window already contains a security expert. It also contains a database engineer, a conversion copywriter, an SRE, and a principal architect. They are all trained in, all already on the payroll, all sitting dormant. The session pays for them whether you invite them or not.

They wake up when you address them by name.

Most prompts only wake the generalist. The generalist writes something plausible. The work ships. The bugs the rest of the polymath would have caught ship too, because the rest of the polymath was never invited to the meeting.

You get the AI you ask for.

So address it like the polymath it is.

You Get the AI You Ask For