Function Approximation in RL

The full RL nanodegree, covered with implementation.

May 24, 2026

Part 5 of the RL series is here.

Reinforcement Learning Nanodegree part 5

Everything we built so far (value functions, Bellman equations, MC, TD, SARSA, Q-learning) assumed you could store one value per state in a table.

This chapter covers what happens when that assumption breaks, which it does in every real problem.

It covers:

Why lookup tables stop working for real-world problems
How to replace them with parameterized functions that generalize across similar states.
The learning algorithms that make this work (gradient Monte Carlo and semi-gradient TD)
What happens when you combine function approximation with bootstrapping and off-policy learning
and a full hands-on implementation that trains an agent to solve Mountain Car, a continuous-state control problem where the car has to learn to build momentum by swinging back and forth.

Everything is covered from scratch, so no RL background is required.

You can read Part 5 of the course here →

Why care?

Look at what has happened in the past two years.

DeepSeek-R1 used GRPO for reasoning.
ChatGPT was shaped by RLHF.
Claude uses constitutional AI with RL.

Every frontier LLM released recently has some form of reinforcement learning in its post-training pipeline.

RL is no longer a niche subfield for robotics and game-playing. It is a core component of how the most capable AI systems are built today.

Google Trends reflects this.

Search interest for “reinforcement learning” was nearly flat from 2004 to 2024. In the past year, it has gone vertical, hitting an all-time high.

The demand for RL expertise has followed.

If you look at ML engineering roles at labs like OpenAI, Anthropic, DeepMind, or any team working on post-training, alignment, or agentic systems, RL fluency shows up as a requirement consistently.

Understanding how reward signals shape model behavior, how policy optimization works, and how exploration interacts with credit assignment is becoming as fundamental as understanding backpropagation was five years ago.

This series is structured the same way as our MLOps/LLMOps course: concept by concept, with clear explanations, diagrams, math where it matters, and hands-on implementations you can run.

👉 Over to you: What topics would you like us to cover in this RL series?

Thanks for reading!

Daily Dose of Data Science

Discussion about this post

Ready for more?