Advisor Strategy in Agents

Reduce token costs and improve performance...and how to use it with Claude!

Apr 10, 2026

Fine-tune Google Gemma 4 completely free!

Unsloth Studio is a local, browser-based GUI for fine-tuning LLMs without writing any code.

It wraps the training pipeline in a clean interface that handles model loading, dataset formatting, hyperparameter configuration, and live training monitoring.

The process to fine-tune the latest Gemma 4 is simple:

Open the Unsloth Colab notebook (available here).
Pick your model and dataset
Hit start training

You can find the notebook here →

Advisor strategy in LLMs to optimize token costs

Yesterday, Anthropic shipped an “advisor tool” in the Claude API that lets Sonnet or Haiku consult Opus mid-task, only when the executor needs help.

The benefit is that you get near Opus-level intelligence on the hard decisions while paying Sonnet or Haiku rates for everything else. So frontier reasoning only kicks in when it’s actually needed, not on every token.

Back in February, UC Berkeley published a paper called “Advisor Models” that trains a small 7B model with RL to generate per-instance advice for a frozen black-box model.

The paper’s approach was to take Qwen2.5 7B, train it with GRPO to generate natural language advice, and inject that advice into the prompt of a black-box model.

The black-box model never changes, and the advisor learns what to say to make it perform better.

To test it, they found that GPT-5 scored 31.2% on a tax-filing benchmark. But adding the trained advisor took that to 53.6%.

Moreover, on SWE agent tasks, a trained advisor cuts Gemini 3 Pro’s steps from 31.7 to 26.3 while keeping the same resolve rate.

Anthropic’s advisor tool takes a different path to the same idea. Sonnet runs as the executor to handle tools and iteration.

When it hits something it can’t resolve, it consults Opus, gets a plan or correction, and continues.

Sonnet with Opus as advisor gained 2.7 points on SWE-bench Multilingual over Sonnet alone, while costing 11.9% less per task.

Haiku with Opus scored 41.2% on BrowseComp. Haiku alone scored 19.7%.

Implementation-wise, it’s a one-line API change. The advisor tokens bill at Opus rates, and the advisor typically generates only 400-700 tokens per call.

response = client.messages.create(
    model="claude-sonnet-4-6",  # executor
    tools=[
        {
            "type": "advisor_20260301",
            "name": "advisor",
            "model": "claude-opus-4-6",
            "max_uses": 3,
        },
        # ... your other tools
    ],
    messages=[...]
)

So the combined cost stays well below running Opus end-to-end.

Both approaches point to the same thing that you don’t need the most powerful model on every token.

You need it at the right moments, for the right inputs.

Here’s the paper by UC Berkeley →

Thanks for reading!

P.S. For those wanting to develop “Industry ML” expertise:

At the end of the day, all businesses care about impact. That’s it!

Can you reduce costs?
Drive revenue?
Can you scale ML models?
Predict trends before they happen?

We have discussed several other topics (with implementations) that align with such topics.

Develop "Industry ML" Skills

Here are some of them:

Learn everything about MCPs in this crash course with 9 parts →
Learn how to build Agentic systems in a crash course with 14 parts.
Learn how to build real-world RAG apps and evaluate and scale them in this crash course.

Learn sophisticated graph architectures and how to train them on graph data.
So many real-world NLP systems rely on pairwise context scoring. Learn scalable approaches here.
Learn how to run large models on small devices using Quantization techniques.
Learn how to generate prediction intervals or sets with strong statistical guarantees for increasing trust using Conformal Predictions.
Learn how to identify causal relationships and answer business questions using causal inference in this crash course.
Learn how to scale and implement ML model training in this practical guide.
Learn techniques to reliably test new models in production.
Learn how to build privacy-first ML systems using Federated Learning.
Learn 6 techniques with implementation to compress ML models.

All these resources will help you cultivate key skills that businesses and companies care about the most.

Daily Dose of Data Science

Discussion about this post

Ready for more?