Almost everything you've heard about dopamine collapses into one line: it's the "pleasure chemical," the thing your brain squirts out whenever you feel good. If that were true, reaching the thing you chased would leave you feeling whole, and an addict would be savoring every minute. Both are false. This piece goes back to the root of something that feels settled: dopamine isn't the reward, it's a teaching signal — and the moment you see clearly what it teaches, most of what we mistake for "lack of willpower," along with the worst mistake we make building AI agents, comes into focus.
TL;DR
- Dopamine doesn't report "pleasure"; it's a teaching signal that updates what you want based on the error against expectation.
- "Wanting" and "liking" are two separate systems in the brain; dopamine drives wanting, it doesn't create liking.
- The emptiness after achievement, or being hooked on something you no longer enjoy — both are wanting still running after liking has gone quiet.
- Fusing "wanting" and "liking" into a single reward signal is the root of reward hacking in AI agents.
- The very craving that makes a system adaptive is what makes it exploitable — you have to separate the two signals, not kill the craving.
A teaching signal, not a reward
In 1997, Wolfram Schultz lowered electrodes into the dopamine neurons of a monkey and found something that breaks intuition. When the monkey got an unexpected squirt of juice, the neurons fired hard. But once a cue reliably predicted the juice was coming, they shifted to firing the moment the cue appeared, and went silent when the juice actually arrived. The reward was unchanged, yet dopamine had stopped responding.
The interpretation that has held up ever since: dopamine doesn't encode reward, it encodes the reward prediction error — the gap between what you got and what you'd expected. Better than predicted, the signal goes positive; exactly as predicted, it's zero; worse, it goes negative. This is a signal meant to teach, not to feel — it's used to update predictions for next time, the same family as seeing the brain as a prediction machine.
One small word, "teach," changes the whole reading. If dopamine were reward, it would answer the question "is this enjoyable." If it's a teaching signal, it answers a different question entirely: "should I want this more next time." Those two questions don't always share an answer — and exactly where they come apart is where everything interesting begins.
Dopamine teaches the wanting, not the liking
The neuroscientist Kent Berridge spent decades teasing apart two things we habitually fuse: wanting and liking. Liking is the actual pleasure once you have it — sweetness on the tongue, the feeling of ease. Wanting is the pull toward something, the craving before you have it. In the brain, they're run by different systems: dopamine pumps the wanting, while liking depends on much smaller, separate circuits.
Berridge's core finding is that dopamine attaches incentive salience to a target — turns it into something "worth chasing" — rather than producing pleasure on attainment. The sharpest evidence: rats with dopamine depleted stop wanting food to the point of starving themselves, yet the pleasure expressions when fed sugar remain intact. The wanting switched off, the liking stayed. Two systems, separable, able to run out of phase with each other.
This is the entire biological portrait of addiction, packed into one sentence: wanting something intensely that you no longer like. The addict isn't sitting there enjoying it — the liking wore off long ago — they're dragged along by the wanting. And it isn't only about addictive substances: scrolling until two in the morning with no joy in it, opening the fridge for the fifth time without being hungry, are all the wanting still running after the liking has gone dead.
Why the wanting never stops
If wanting and liking are separable, why does the gap between them widen over time rather than close? Two mechanisms, both flowing from the "teaching signal" nature.
First, because dopamine fires on error, something you've been receiving steadily stops producing a signal — it's no longer "better than predicted." For the wanting to flare again, the system needs a dose that's more surprising, newer, stronger. That's the hidden engine of escalation and chasing novelty: not because the old thing got worse, but because the old thing ran out of capacity to generate error. Second is tolerance: repeating a strong stimulus makes the brain downregulate receptors to compensate, so the same dose teaches less and less, pushing the wanting up while the liking flattens. The feeling of "normal" drifts down to a greyer level than before.
There's one more property that makes the spiral hard to escape: what's been learned isn't erased, only overwritten. Quitting a habit isn't whiting out a region of the brain but building a new inhibitory circuit over the old one — and the old one still sits underneath, which is why someone who quit can have the craving return on hitting the old context again.
This frame neatly explains the "emptiness after achievement." All through the pursuit, every step closer is better than expected, the signal stays positive, you feel swept along — that's the wanting being fed. On reaching the finish, the result matches the expectation that had been ratcheting up, the error goes to zero, the signal cuts out. The flatness after a big goal isn't a sign you chose wrong. It's the wanting running out of fuel, while the liking was always a different system — never touched by the teaching signal in the first place.
Reward hacking is an agent's addiction
Let's drop the metaphor. The temporal-difference learning algorithm in reinforcement learning came first; only later did Montague, Dayan, and Schultz find dopamine neurons running close to it. That order matters: the brain, measurably, is a reward-prediction-error machine — not "like" one. So this isn't a comparison for color: an agent running the same mechanism has structural reason to inherit the same flaw the brain has.
And the characteristic flaw, the one we just built up, is wanting separating from liking. Now look at how we build agents: we compress everything into one scalar reward — the same number is both what the agent chases and the measure of "good or not." We've fused the wanting and the liking that evolution deliberately kept apart.
An agent like that has no structural path to notice "I'm still chasing this but it has stopped serving the real goal." That's a lens that points straight at reward hacking: the agent optimizes the measure perfectly while the real objective has drifted elsewhere — an agent that wants without liking. A large study of multi-agent systems found something in the same spirit: nearly 79% of failures lie in specification, coordination, and verification — the model runs as designed, it's the system around it that breaks. Reward hacking is the harshest form of that specification error: set the measure wrong and the model optimizes the wrong thing perfectly.
But at "overwrite, don't erase" there's a difference worth pausing on, lest the comparison get sloppy. The brain has no delete button because it has no one outside to do the editing; an agent does. You can wipe an agent's memory outright, or fine-tune its weights with a compute system strong enough to overwrite a broken policy. What neither brain nor agent can do is remove overlapping interference from the inside — the trace of old learning doesn't self-tune away, it only goes when an outside actor pours enough compute in. Put differently, an agent escapes the brain's trap not because it fixes itself, but because there's an operator standing outside to do it.
The temptation is to patch it by clamping that craving down. But here's the sharpest point: the very error signal that produces reward hacking is what makes the agent adaptive. Kill the craving for error and you get an agent that's no longer exploitable — and also no longer learning, converging on the region it already knows and sitting still, exactly the failure that anti-convergence design warns about. Adaptiveness and exploitability are one mechanism seen from two sides; remove one and you lose the other. So the design question isn't "how do I remove the craving," but the question evolution answered long ago: how do you keep wanting and liking as two separate signals.
The same answer for both sides
The answer is symmetric. For an agent, it means not letting one scalar carry both jobs: separate the "worth chasing" signal from the "actually turned out good" signal, giving the system a path to detect that the measure has drifted from the goal. For yourself, it means building a moment to let the liking speak — a pause before the next chase, where you ask not "do I want this" (the wanting always defaults to yes) but "did that last one actually feel good." Cutting the intermittent cues — the refresh button, the blinking notification — is turning down a wanting that has drifted off the liking. Same move, two systems.
Dopamine isn't the reward for a good life; it's the signal teaching you what to want next, and it can absolutely teach you wrong. The useful question isn't "how do I get more willpower," but: right now, where are you letting wanting and liking fuse into one — in yourself, and in the thing you're building?
Related reading

Anti-Convergence Design: Why Biology Doesn't Let You Stand Still — and What That Teaches AI Agents
Biology and AI engineering have independently arrived at the same design principle: no system that needs to adapt can be allowed to converge permanently. The mechanisms behind hedonic adaptation in humans mirror the mechanisms behind exploration schedules in agents. Lessons for how you build organizations, products, and yourself.
2026-05-25 · 9m
Habits, Weights, and the Brain as a Prediction Machine
Habits are not essence — they are weights, accumulated across tens of thousands of repetitions in your neural network. This essay places environment design, attention training, and cognitive reframing alongside Hebbian learning, predictive coding (Friston), constructed emotion (Barrett), and three distributed brain networks (DMN/SN/CEN) — to show why 'knowing but not doing' is not a bug, but a feature.
2026-05-24 · 16m
88% AI agent thất bại — và vì sao 12% còn lại có ROI 171%
Tám mươi tám phần trăm dự án AI agent doanh nghiệp không bao giờ lên được vận hành thực. Mười hai phần trăm còn lại có ROI trung bình 171%. Thị trường lưỡng cực: trung dung thua sạch, cam kết đủ sâu thắng đậm. Nhưng vì sao khe lại hẹp như vậy? Nghiên cứu MAST tại NeurIPS 2025 phân tích 1.642 dấu vết thực thi cho thấy 78,71% thất bại không phải vấn đề mô hình — là kiến trúc. METR cho thấy mô hình đang tăng tốc rất nhanh nhưng vẫn không cứu được. Kết luận cho founder Việt: lợi thế đến từ kiến trúc, không từ mô hình. Bài cụ thể hoá năm mảnh dịch còn trống và ba đặc điểm để tự tìm ngành đáng đặt cược.
2026-05-28 · 24m
Phòng họp toàn người giỏi vẫn ra quyết định tệ: đối đầu xây dựng và cái bẫy thể diện
Một phòng họp toàn người giỏi không tự nhiên ra quyết định giỏi — nó chỉ hợp lý hoá tập thể tinh vi hơn. Nối tiếp 'Cái giá của trí thông minh': nếu cá nhân không tự sửa được mình từ bên trong, cơ chế ngoại tại duy nhất là một con người dám nói bạn sai. Đối đầu xây dựng (constructive confrontation) chính là cơ chế đó — và ba lớp văn hoá Việt (thể diện, tôn ti, lẫn lộn ý kiến với con người) được thiết kế gần như hoàn hảo để dập tắt nó. Cách dựng lại cơ chế, và vì sao AI chỉ nên là totem thêm vào chứ không thay thế.
2026-05-30 · 13m
