EngineeringMay 8, 20269 min read

How to choose the right model for the right task

A decision framework for picking an AI model without falling for benchmarks, vibes, or whichever lab shipped most recently.

Joe BrownBy Joe Brown

Most model-selection decisions are made the same way: someone reads a launch post, someone else sees a benchmark on Twitter, a third person says "GPT just works," and a model is picked. Six months later the team discovers the model is wrong for the job - too slow, too expensive, too cautious, too chatty, or quietly bad at the one thing the product actually needs.

Choosing a model is a product decision, not a benchmark decision. Here is the framework we use.

Step 1: Write the job description

Before looking at any model, write down what the model has to do, in the same way you'd write a job description for a human. What is the input? What is the output? What does "good" look like? What does "unacceptable" look like? How fast does the answer need to come back? Who is the user, and what age are they? Most failed model decisions skip this step.

Step 2: Pick the constraints that actually matter

Every model is a trade-off across roughly seven axes:

Rank these for your job. Almost every team finds that only two or three matter, and the rest are noise. A bedtime-story generator does not care about reasoning depth. A code assistant does not care about audio. A safety-critical kids' chatbot trades a little reasoning for a lot of refusal discipline.

Step 3: Map task to tier - not to lab

Stop thinking in lab names. Start thinking in tiers:

Once you know the tier, pick the cheapest model in that tier that passes your evaluations. The cheapest model is almost always good enough, and using it leaves headroom in your budget for the requests that genuinely need frontier reasoning.

Step 4: Evaluate on your own data

Public benchmarks measure averages on tasks that aren't yours. Build a 50-200 example evaluation set from your actual product. Score each candidate model against it - automatically if you can, by hand if you can't. The model that wins your eval set is the right model. Nothing else matters.

Step 5: Plan to swap it

The right model in May 2026 will not be the right model in May 2027. New releases will be cheaper, faster, or smarter - usually all three. Build your app so that swapping models is a one-line change behind a routing abstraction. Run your evaluation set on every new candidate. Switch when the numbers say so, not when the launch post says so.

Common mistakes

"The best model is the cheapest one that passes your eval set. Everything else is marketing."