Claude's In-Context Learning — The End of Fine-Tuning as We Know It · Osmond van Hemert — Senior Software Engineer

Anthropic dropped something quietly significant last month: Claude now supports what they’re calling “in-context learning,” a capability that lets you provide task-specific knowledge, examples, and context directly in the prompt without needing to fine-tune a model. If you’ve been managing fine-tuned models for the past year, treating them like precious, expensive assets, you’re about to rethink your entire infrastructure.

I’ve been experimenting with this for the past few weeks, and the implications are profound. We’re not just getting a minor convenience feature here. We’re watching the economic model of AI tooling shift in real time.

What Changed
#

The technical capability itself isn’t new — we’ve known for years that large language models can learn from in-context examples. What changed is scale and reliability. Claude’s 200K token context window (released earlier this year) finally makes it practical to pack in all the knowledge a model needs to perform a task correctly, and Anthropic’s latest refinement to in-context learning means the model actually uses that knowledge effectively, rather than drowning it out or forgetting it by the end of the prompt. This pattern has been consistent: as context gets cheaper and larger, static training becomes less necessary. This shift toward context efficiency is becoming a key competitive advantage.

Here’s what you can now do: instead of fine-tuning Claude on 1,000 examples of your custom API documentation, you drop 500 examples into the context window along with the API schema, and the model performs just as well — sometimes better — because it’s reasoning over the actual current documentation rather than a snapshot from training time.

The practical difference: fine-tuning is historical knowledge. In-context learning is real-time knowledge.

This matters more than it sounds. If your API changes, you fine-tune again (3-5 days, thousands of dollars). With in-context learning, you update the prompt (15 minutes, marginal cost). If you’re running this for 10,000 API requests a month, the math starts to favor in-context learning almost immediately.

Why This Breaks Fine-Tuning Economics
#

For context: fine-tuning Claude costs around $3-8 per million tokens for training data preparation, then $0.60 per million tokens at inference time. It’s not prohibitively expensive, but it’s a commitment. You’re also locked into whatever snapshot of knowledge you fine-tuned on.

In-context learning, by contrast, costs about the same at inference time, but you pay only for the tokens you actually use. No training pipeline. No week-long waiting period. No version management nightmare when you realize you need to retrain on new information.

I’ve been talking with teams that built fine-tuned models for code generation, customer support automation, and document analysis over the past year. Almost all of them are now asking: “Should we abandon our fine-tuned models and switch to in-context learning?” The answer, for most of them, is yes. Or at least: “Yes, for new projects. We’ll keep the fine-tuned ones as backup.” When the shift happens, it happens fast. The evolution of AI-powered development platforms shows how quickly these transitions reshape entire categories of tools, with the transition from traditional tooling to AI-first workflows happening almost overnight once the capability matures.

The Shift From Training-Time to Prompt-Time
#

This is the meta-insight: we’re moving from an era where “training your model” meant running a batch job to an era where “training your model” means writing a good prompt.

That sounds like a downgrade — shouldn’t specialized training be better than a prompt? — but here’s why it’s an upgrade:

Prompt engineering is faster and cheaper than fine-tuning. This was already true, but in-context learning with a large window makes it dramatically true. You can iterate a prompt in hours instead of days.
Your knowledge stays current. Fine-tuning is point-in-time. In-context learning is real-time. If your API documentation updated yesterday, your in-context learning model knows about it today. Your fine-tuned model doesn’t, unless you retrain.
Debugging is easier. If a fine-tuned model fails on a specific case, you don’t know why — it’s a black box of gradient descent. If an in-context learning prompt fails, you can see exactly what context was provided and why the model made the wrong decision. You can fix it immediately. This transparency is essential for building reliable AI systems where explainability matters.
Costs scale sublinearly instead of linearly. With fine-tuning, each new task is a separate training job. With in-context learning, you can pack multiple tasks into a single prompt, and the model handles them correctly (we’re learning).

Practical Implications for Teams
#

If you’re building AI-powered applications right now, here’s what this means:

For new projects: Don’t fine-tune. Use in-context learning with a 200K token context window. Build your prompts with real examples, your actual API schema, and task-specific instructions. This is faster to develop, cheaper to run, and easier to iterate on. AI-assisted testing frameworks demonstrate how in-context learning reshapes QA pipelines and powers the next generation of coding tools.

For existing fine-tuned models: Audit them. If the fine-tuning provides real value that couldn’t be replicated with a good prompt and full context, keep it. But if it’s mostly there because “we needed better performance than a raw prompt,” migrate to in-context learning. You’ll simplify your infrastructure and probably reduce costs.

For data infrastructure: You’re going to need robust systems for managing context. If your prompt includes 50,000 tokens of examples and documentation, you need rock-solid tooling to assemble, version, and update those context windows. This is the new bottleneck — not training, but context composition. The infrastructure patterns here resemble what we’re seeing with model context protocol adoption across the AI ecosystem.

For governance and compliance: As AI systems become more powerful through in-context learning, regulatory frameworks like the EU AI Act will increasingly focus on the data and context used in prompts rather than model weights. This represents a fundamental shift in how we think about AI responsibility, audit trails, and data provenance.

I’ve been consulting with teams on this transition, and the ones moving fastest are treating their prompt context like version-controlled code. They’re storing examples in repositories, reviewing changes to task-specific instructions, and testing different context configurations. Extended thinking models like Claude 3.7 Sonnet take this even further, letting the model reason over context more deliberately — which pairs beautifully with well-structured in-context learning. This capability is driving the emergence of agent-based systems that can handle increasingly complex reasoning tasks without explicit fine-tuning.

For teams building with these models, agent-based system architecture patterns show how to integrate in-context learning into autonomous systems (see also the Sub-Hub section below for broader context on model evolution).

The Integration Opportunity
#

The real power comes from combining in-context learning with other advanced capabilities. Computer use alongside in-context learning enables agents to interact directly with systems while referencing real-time context. This combination is more powerful than either capability alone.

The Cost Story
#

Let me be concrete about the economics. A team I worked with had been running a fine-tuned code-completion model for 6 months. Training cost: $8,000 upfront. Inference cost: $2,400/month for 50M tokens at inference time. They were committed.

We ran an experiment with in-context learning instead. Same 50M tokens, but now the tokens included 1,000 examples and their full codebase structure as context. Inference cost: $2,300/month. Performance was better because the model was working with the current codebase, not a training snapshot.

The team abandoned the fine-tuned model. Now they’re saving the marginal cost of training, gaining the benefit of real-time knowledge, and actually spending less on inference. That’s a rare trifecta.

Not every team will have that experience. Some will find that fine-tuning was solving a problem that in-context learning can’t replicate (specialized domain knowledge that requires actual training). But many will find that what they thought required fine-tuning was just “we need to feed the model the right context.” Broader context and reasoning capabilities are replacing the need for narrow task-specific training.

Sub-Hub: AI/LLM Models and Capabilities
#

For a broader exploration of how AI models are evolving, including extended thinking, reasoning models, and the trajectory of model development, see AI/LLM Models & Capabilities Evolution. This sub-hub connects in-context learning to the broader evolution of model capabilities.

My Take
#

We’re at an inflection point in how we build AI systems. For the past 2-3 years, fine-tuning was the obvious path if you needed model customization. You had no choice — the context windows were too small and too expensive to use them as your primary customization mechanism.

That world is ending.

In-context learning with large, reliable context windows is good enough for most tasks, and it’s faster and cheaper and more flexible than fine-tuning. Anthropic’s latest release makes that transition practical. The next twelve months will see teams migrating away from fine-tuning. When the commercial incentive aligns with technical capability, adoption accelerates, shifting the focus from model customization to prompt and context design.

This means the skill that matters now is prompt engineering at scale — knowing how to structure context, how to select the right examples, how to version and test your prompts the way you’d version and test code. The teams that get good at that will build the best AI applications. The teams that are still thinking about fine-tuning as the primary customization mechanism will find themselves maintaining complex infrastructure for a problem that in-context learning just… solves.

Anthropic has basically given us all a toolkit to stop overthinking model customization and start focusing on real problems. This shift toward prompt-centric development will accelerate as context windows grow and models become more capable at reasoning over provided information rather than memorizing patterns from training.

AI Models & Releases - This article is part of a series.

Part : AI Model Optimization & Efficiency — Making AI Accessible

Part : LLM Agents in Production — Moving Beyond Chat Interfaces

Part : This Article

Part : AI/LLM Models & Capabilities — From In-Context Learning to Extended Reasoning

Part : Google Gemini 2.0 — A New Chapter in Multimodal AI

Part : GPT-5 Is Here — A Developer's First Look at What Actually Changed

Part : OpenAI's o3 and o4-mini — Reasoning Models Get Real

Part : Claude 3.7 Sonnet — Extended Thinking Changes the Game for AI-Assisted Development

Part : Claude 3.5 Gets a Computer — Anthropic's 'Computer Use' and the Future of AI Agents

Part : Google Launches Gemini 2.0 Flash — The Multi-Modal AI Race Accelerates

Part : OpenAI Launches o1 Full Model and $200/Month ChatGPT Pro — The Reasoning Era Begins