Why I Stopped Chasing the Latest Model

Every few weeks, a new model drops. The benchmarks look impressive. Twitter erupts. The temptation to upgrade is immediate. For a while, I gave in to that temptation. I was always running the latest version, always chasing the frontier.

I stopped. Here is why.

The Diminishing Returns Curve

The difference between no LLM and a basic LLM is transformative. The difference between a good LLM and a slightly better LLM is often negligible for real-world tasks.

I ran an experiment: I took our top 20 production use cases and tested them on models from different generations. The results surprised me:

The benchmarks that drive model releases measure capabilities at the frontier—the hardest 5% of tasks. Most enterprise work lives in the other 95%, where diminishing returns hit hard.

The Hidden Costs of Staying Current

Upgrading to the latest model is not free:

The Case for Version Pinning

I now pin model versions in production and upgrade deliberately, not reflexively.

How I pin:

Benefits I have observed:

When to Actually Upgrade

Pinning does not mean never upgrading. It means upgrading intentionally. I upgrade when:

What I do not upgrade for: marginal benchmark improvements, Twitter hype, or FOMO.

The Decision Framework

When a new model releases, I ask:

  1. Does it solve a problem we actually have? Not a theoretical problem. Not a problem we might have someday. A real problem causing real pain today.
  2. Is the improvement large enough to justify migration costs? I use a 20% threshold. If the improvement is not at least 20% on metrics that matter, the disruption is not worth it.
  3. Have early adopters validated stability? I let others find the edge cases. Waiting 4-6 weeks after release costs little and saves significant debugging time.
  4. Do we have capacity for proper evaluation? If we cannot run comprehensive tests, we cannot upgrade safely. Wait until we can.

What This Looks Like in Practice

Our production systems typically run models 6-12 months behind the frontier. We evaluate new releases quarterly, upgrade annually unless there is a compelling reason to move faster.

This feels slow in an industry obsessed with the latest. But our systems are stable, our costs are predictable, and our engineers spend time building features instead of chasing model regressions.

The teams that outperform us are not the ones running the newest models. They are the ones with the best evaluations, the cleanest data, and the most refined prompts. Those advantages compound. Model version does not.

The Uncomfortable Truth

The model is rarely the bottleneck. In almost every underperforming LLM system I have audited, the problems were:

Upgrading the model would not fix any of these. It would just make them faster and more expensive.

Stop chasing models. Start improving systems.

← Back to Home