Build Agents That Can Learn Like Humans

Playback speed

Share post at current time

Share from 0:00

0:00

Transcript

Build Agents That Can Learn Like Humans

[Hands-on] Fine-tuning LLMs with RL.

Avi Chawla

Jan 28, 2026

Reinforcement learning for LLMs always had one major problem: manually defining reward functions.

You have to figure out how to score model outputs, handle edge cases, and tune everything until training works.

ART (Agent Reinforcement Trainer) is an open-source framework that solves this with a simple approach:

ART GitHub repo

Let the agent attempt tasks multiple times
An LLM judge relatively grades each attempt
The model learns from what worked vs what didn’t

Notice that it needs no manual reward engineering. You don’t have to manually score output or tune penalties. You just need to know which attempt was better, and LLM judges are naturally great at that comparison.

If this sounds familiar, it’s because it’s the core idea behind GRPO (Group Relative Policy Optimization), the algorithm that made DeepSeek R1 so effective.

But here’s what makes ART different.

Most RL frameworks are built for simple chatbot interactions.

One input, one output, and done. But real-world agents search through documents, invoke APIs, and reason across multiple steps before completing a task.