Anthropic dropped something quietly significant last month: Claude now supports what they’re calling “in-context learning,” a capability that lets you provide task-specific knowledge, examples, and context directly in the prompt without needing to fine-tune a model. If you’ve been managing fine-tuned models for the past year, treating them like precious, expensive assets, you’re about to rethink your entire infrastructure.
I’ve been experimenting with this for the past few weeks, and the implications are profound. We’re not just getting a minor convenience feature here. We’re watching the economic model of AI tooling shift in real time.
What Changed#
The technical capability itself isn’t new — we’ve known for years that large language models can learn from in-context examples. What changed is scale and reliability. Claude’s 200K token context window (released earlier this year) finally makes it practical to pack in all the knowledge a model needs to perform a task correctly, and Anthropic’s latest refinement to in-context learning means the model actually uses that knowledge effectively, rather than drowning it out or forgetting it by the end of the prompt.
Here’s what you can now do: instead of fine-tuning Claude on 1,000 examples of your custom API documentation, you drop 500 examples into the context window along with the API schema, and the model performs just as well — sometimes better — because it’s reasoning over the actual current documentation rather than a snapshot from training time.
The practical difference: fine-tuning is historical knowledge. In-context learning is real-time knowledge.
This matters more than it sounds. If your API changes, you fine-tune again (3-5 days, thousands of dollars). With in-context learning, you update the prompt (15 minutes, marginal cost). If you’re running this for 10,000 API requests a month, the math starts to favor in-context learning almost immediately.
Why This Breaks Fine-Tuning Economics#
For context: fine-tuning Claude costs around $3-8 per million tokens for training data preparation, then $0.60 per million tokens at inference time. It’s not prohibitively expensive, but it’s a commitment. You’re also locked into whatever snapshot of knowledge you fine-tuned on.
In-context learning, by contrast, costs about the same at inference time, but you pay only for the tokens you actually use. No training pipeline. No week-long waiting period. No version management nightmare when you realize you need to retrain on new information.
I’ve been talking with teams that built fine-tuned models for code generation, customer support automation, and document analysis over the past year. Almost all of them are now asking: “Should we abandon our fine-tuned models and switch to in-context learning?” The answer, for most of them, is yes. Or at least: “Yes, for new projects. We’ll keep the fine-tuned ones as backup.”
The Shift From Training-Time to Prompt-Time#
This is the meta-insight: we’re moving from an era where “training your model” meant running a batch job to an era where “training your model” means writing a good prompt.
That sounds like a downgrade — shouldn’t specialized training be better than a prompt? — but here’s why it’s an upgrade:
Prompt engineering is faster and cheaper than fine-tuning. This was already true, but in-context learning with a large window makes it dramatically true. You can iterate a prompt in hours instead of days.
Your knowledge stays current. Fine-tuning is point-in-time. In-context learning is real-time. If your API documentation updated yesterday, your in-context learning model knows about it today. Your fine-tuned model doesn’t, unless you retrain.
Debugging is easier. If a fine-tuned model fails on a specific case, you don’t know why — it’s a black box of gradient descent. If an in-context learning prompt fails, you can see exactly what context was provided and why the model made the wrong decision. You can fix it immediately.
Costs scale sublinearly instead of linearly. With fine-tuning, each new task is a separate training job. With in-context learning, you can pack multiple tasks into a single prompt, and the model handles them correctly (we’re learning).
Practical Implications for Teams#
If you’re building AI-powered applications right now, here’s what this means:
For new projects: Don’t fine-tune. Use in-context learning with a 200K token context window. Build your prompts with real examples, your actual API schema, and task-specific instructions. This is faster to develop, cheaper to run, and easier to iterate on.
For existing fine-tuned models: Audit them. If the fine-tuning provides real value that couldn’t be replicated with a good prompt and full context, keep it. But if it’s mostly there because “we needed better performance than a raw prompt,” migrate to in-context learning. You’ll simplify your infrastructure and probably reduce costs.
For data infrastructure: You’re going to need robust systems for managing context. If your prompt includes 50,000 tokens of examples and documentation, you need rock-solid tooling to assemble, version, and update those context windows. This is the new bottleneck — not training, but context composition.
I’ve been consulting with teams on this transition, and the ones moving fastest are treating their prompt context like version-controlled code. They’re storing examples in repositories, reviewing changes to task-specific instructions, and testing different context configurations. It’s not that different from the discipline of good prompt engineering, but scaled up.
The Cost Story#
Let me be concrete about the economics. A team I worked with had been running a fine-tuned code-completion model for 6 months. Training cost: $8,000 upfront. Inference cost: $2,400/month for 50M tokens at inference time. They were committed.
We ran an experiment with in-context learning instead. Same 50M tokens, but now the tokens included 1,000 examples and their full codebase structure as context. Inference cost: $2,300/month. Performance was better because the model was working with the current codebase, not a training snapshot.
The team abandoned the fine-tuned model. Now they’re saving the marginal cost of training, gaining the benefit of real-time knowledge, and actually spending less on inference. That’s a rare trifecta.
Not every team will have that experience. Some will find that fine-tuning was solving a problem that in-context learning can’t replicate (specialized domain knowledge that requires actual training). But many will find that what they thought required fine-tuning was just “we need to feed the model the right context.”
My Take#
We’re at an inflection point in how we build AI systems. For the past 2-3 years, fine-tuning was the obvious path if you needed model customization. You had no choice — the context windows were too small and too expensive to use them as your primary customization mechanism.
That world is ending.
In-context learning with large, reliable context windows is good enough for most tasks, and it’s faster and cheaper and more flexible than fine-tuning. Anthropic’s latest release makes that transition practical. The next twelve months will see teams migrating away from fine-tuning, not because there’s anything wrong with fine-tuning, but because in-context learning works and is dramatically simpler.
This means the skill that matters now is prompt engineering at scale — knowing how to structure context, how to select the right examples, how to version and test your prompts the way you’d version and test code. The teams that get good at that will build the best AI applications. The teams that are still thinking about fine-tuning as the primary customization mechanism will find themselves maintaining complex infrastructure for a problem that in-context learning just… solves.
Anthropic has basically given us all a toolkit to stop overthinking model customization and start focusing on real problems.



