GPT-5 Is Here — A Developer's First Look at What Actually Changed

OpenAI released GPT-5 today, and the Hacker News thread has already blown past 2,000 comments. After spending a few hours with the model and reading through Simon Willison’s excellent breakdown of the system card and pricing, I have some initial thoughts on what this means for those of us who actually build things with these models.

Let me be upfront: I’m going to focus on what’s practically different for developers rather than getting caught up in benchmark numbers. We’ve all seen enough cherry-picked demos to know that the real test is sustained, production-level performance on the messy, ambiguous tasks that make up actual software development.

What’s Genuinely New
#

GPT-5 represents a meaningful step forward in several areas that matter for production use:

Context handling has improved substantially. The model is better at maintaining coherence across long conversations and large codebases. If you’ve been frustrated by GPT-4’s tendency to “forget” earlier context in long sessions, GPT-5 handles this noticeably better. For code review and refactoring tasks that require understanding an entire module or service, this is a real improvement.

Instruction following is tighter. The gap between what you ask for and what you get has narrowed. This might sound minor, but if you’re building automated pipelines that depend on consistent LLM output formatting, fewer parsing failures and edge cases translate directly into more reliable systems.

Reasoning about code shows improvement on complex, multi-step problems. I threw some of my standard test cases at it — debugging race conditions, explaining complex type system interactions, analyzing security implications of design choices — and the results were consistently better than GPT-4, particularly on problems that require holding multiple concerns in mind simultaneously.

The Pricing Question
#

The pricing structure deserves attention because it affects architectural decisions. Based on the initial numbers, GPT-5 is more expensive per token than GPT-4, which was already not cheap for high-volume applications. This reinforces a pattern I’ve been advocating for: use the right model for the right task.

For many production applications, the smart move isn’t to upgrade everything to GPT-5. It’s to use GPT-5 for the tasks where it genuinely outperforms cheaper models — complex reasoning, nuanced code generation, difficult debugging — and keep using smaller, faster, cheaper models for classification, simple extraction, and routine tasks.

If you’re not already implementing model routing in your AI-powered applications, now’s the time. A simple dispatcher that sends easy queries to a small model and complex queries to GPT-5 can cut your API costs dramatically while actually improving latency for the majority of requests.

The Developer Experience Angle
#

OpenAI also published a GPT-5 for Developers guide, which signals they’re taking the developer experience seriously. The API improvements include better streaming support, more granular control over response formatting, and improved function calling reliability.

The function calling improvements are particularly interesting for anyone building agents or tool-using systems. GPT-4’s function calling was good but had a frustrating failure mode where it would sometimes call functions with subtly wrong parameter types or make unnecessary calls. Early testing suggests GPT-5 is more disciplined here.

For those of us building development tools that integrate LLMs, the improved consistency means less defensive coding around LLM outputs. You still need error handling — these are probabilistic systems — but the error rate seems genuinely lower.

What Hasn’t Changed
#

Let me be the pragmatist in the room: GPT-5 doesn’t solve the fundamental limitations of LLMs for software development.

It still hallucinates. Less frequently, perhaps, but it still confidently generates code that references APIs that don’t exist or uses library features from the wrong version. You still need tests, code review, and verification for anything it produces.

It still struggles with novel architectures. If you’re working with a framework or pattern that isn’t well-represented in the training data, GPT-5 will still give you plausible-looking but incorrect solutions. The model is better, not magical.

It doesn’t replace understanding. I’ve seen a growing trend of developers treating LLMs as oracles rather than tools, accepting generated code without understanding it. GPT-5 being better at generating correct code might actually make this worse, because the failure modes become subtler and harder to catch.

The Open Source Response
#

Every major OpenAI release accelerates the open-source AI community. I expect we’ll see a flurry of activity in the coming weeks as researchers analyze GPT-5’s capabilities and work to replicate them in open models. The gap between closed and open models has been narrowing steadily, and GPT-5 will set new targets for projects like Llama, Mistral, and others to aim for.

For teams that need to run models on-premises — whether for data privacy, latency, or cost reasons — the open-source trajectory remains encouraging even as GPT-5 raises the bar.

My Take
#

GPT-5 is a genuine improvement, not just an incremental version bump with better marketing. The improvements in context handling and instruction following address real pain points that I’ve hit repeatedly in production systems.

But I want to push back on the narrative that each new model release fundamentally changes what’s possible. The jump from GPT-3.5 to GPT-4 was transformative — it crossed a threshold where LLMs became genuinely useful for professional software development. GPT-5 makes that experience better and more reliable, but it’s an evolution, not a revolution.

The most important thing you can do today isn’t rush to upgrade everything to GPT-5. It’s to think carefully about where LLMs add value in your workflow, build robust evaluation frameworks, and make sure you can swap models easily as the landscape continues to evolve. The model you should use six months from now might not be from OpenAI at all.

That said, if you haven’t already, go try it. Form your own opinions. The best way to understand what a new model can do is to throw your hardest problems at it and see what comes back.

AI Models & Releases - This article is part of a series.

Part : Google Gemini 2.0 — A New Chapter in Multimodal AI

Part : This Article

Part : OpenAI's o3 and o4-mini — Reasoning Models Get Real

Part : Claude 3.7 Sonnet — Extended Thinking Changes the Game for AI-Assisted Development

Part : Claude 3.5 Gets a Computer — Anthropic's 'Computer Use' and the Future of AI Agents

Part : Google Launches Gemini 2.0 Flash — The Multi-Modal AI Race Accelerates

Part : OpenAI Launches o1 Full Model and $200/Month ChatGPT Pro — The Reasoning Era Begins

Part : ChatGPT Search Is Here — Should Google Be Worried?

Part : Claude Gets Hands — Anthropic's Computer Use Changes the AI Game

Part : OpenAI o1 — The Dawn of Reasoning Models

Part : Llama 3.1 405B — Meta Goes All-In on Open-Source AI