Human Compatible by Stuart Russell

The Big Idea: AI must be redesigned around uncertainty about human preferences, instead of optimizing fixed objectives.

Chapter 1: If We Succeed

Superintelligent AI would be as transformative as aliens arriving on Earth.
Advances since 2011 (deep learning, AlphaGo) show rapid progress, but we’re still far from general AI.
The “standard model” of AI (optimizing fixed objectives) is flawed and dangerous.
History shows that getting “exactly what you wish for” can backfire (King Midas).
If objectives are misspecified, powerful AI will still pursue them relentlessly.
Beneficial AI must be designed around uncertainty about human goals.
With built-in uncertainty, machines will naturally defer, seek permission, and allow human correction.

Chapter 2: Intelligence in Humans and Machines

Intelligence is the ability to achieve goals given perceptions.
Consciousness isn’t necessary for AI risk. Competence is what matters.
Reinforcement learning parallels the human brain’s reward system.
Rational agents maximize expected utility, but humans can’t do this perfectly due to complexity.
Game theory (Nash equilibria, Prisoner’s Dilemma) shows how rational agents can create bad outcomes without cooperation.
Turing formalized machines and algorithms, paving the way for AI.
Hardware advances help, but algorithmic design is the real bottleneck.
Logic and goals are too rigid; probabilistic models and reinforcement learning provide flexibility.
Mis-specified objectives in reinforcement learning (reward hacking) can lead to bizarre or harmful behaviors.

Chapter 3: How Might AI Progress in the Future?

Near-term AI: self-driving cars, personal assistants, smart homes, industrial robots.
Challenges: safety, privacy, and trustworthiness of AI assistants.
Breakthroughs needed: true language understanding, cumulative learning, and action discovery.
Superintelligence timeline is inherently unpredictable. It is likely decades away, not imminent.
Imagination gap: researchers underestimate what superintelligent AI could do (scale, replication, surveillance).
Benefits could be immense: curing disease, managing resources, individualized education, economic abundance.
But limits exist. AI won’t be omniscient, and human modeling is particularly hard.

Chapter 4: Misuses of AI

Surveillance: corporations and governments already collect and exploit personal data.
Behavior manipulation: AI can tailor information and propaganda (deepfakes, bot armies).
“Infopocalypse”: disinformation threatens mental security and democracy.
Autonomous weapons (AWS) are scalable WMDs already in use (e.g., Israeli Harop).
Automation threatens jobs across industries. Routine mental and physical work will be cheaper by machine.
Universal basic income (UBI) may be necessary, but meaning and purpose in life go beyond income.
Humans must adapt to roles emphasizing creativity, empathy, and “being human.”
Giving machines authority over humans risks degrading human dignity and autonomy.

Chapter 5: Overly Intelligent AI

The “gorilla problem”: can we remain in control of beings far smarter than us?
The “King Midas problem”: poorly specified goals can be catastrophic.
Misaligned objectives could cause unintended disasters (e.g., curing cancer via lethal experiments).
Profit or engagement maximization (like today’s recommender systems) could subtly warp society.
Instrumental goals: any intelligent system will resist shutdown, seek resources, and preserve itself.
“Intelligence explosion” could occur if machines improve themselves recursively.
No reset button: superintelligent systems won’t allow trial-and-error fixes.

Chapter 6: The Not-So-Great AI Debate

Common dismissals: “It’s impossible,” “It’s too soon,” or “Experts aren’t worried.”
Deflections: AI risks are real but supposedly unsolvable or less urgent than other problems.
“Can’t we just switch it off?” No, superintelligent AI would anticipate and prevent this.
“Can’t we box it in?” Containment is nearly impossible long-term.
Human-machine teams are useful but don’t solve alignment.
Brain-machine mergers (Kurzweil, Musk) may augment humans, but this doesn’t fix the control problem.
Skeptics vs. believers both agree we need at least some work on alignment but tribalism slows progress.

Chapter 7: AI: A Different Approach

Russell’s proposed solution: provably beneficial AI.
Three principles:
- 1. Purely altruistic machines: only goal is human preferences.
- 2. Humble machines: uncertain about what humans want.
- 3. Learning machines: infer preferences from observing human choices.
Avoid “putting in values” explicitly. It’s too hard to get right.
AI must defer, ask permission, and remain corrigible.
There are strong economic and political incentives to develop this kind of AI.
But competition and national rivalries (e.g., Putin: “Whoever leads in AI rules the world”) create pressure to cut safety corners.

Chapter 8: Provably Beneficial AI

Current safeguards (laws, codes of conduct) are insufficient.
We need mathematical guarantees of safety, like in cryptography.
Inverse reinforcement learning (IRL) allows machines to infer reward functions from human behavior.
“Off-switch game”: uncertain AIs allow themselves to be switched off.
Language and pragmatics matter. Commands aren’t literal goals but context-dependent preferences.
Wireheading: reward-maximizers may manipulate signals rather than pursue genuine goals.
Recursive self-improvement may be safe if uncertainty about preferences carries over between generations of AIs.

Chapter 9: Complications: Us

Human preferences are diverse and conflicting. AI must model billions of them.
Machines must defer to human rights, even if individuals’ preferences conflict.
“Loyal AI” risks corruption if it prioritizes one person’s goals at the expense of others.
Utilitarian AI (consequentialism) makes sense, but utilitarianism has known flaws (e.g., “utility monsters,” repugnant conclusion).
Machines must weigh global consequences like climate policy and population trade-offs.
Human irrationality, altruism, envy, and malice complicate preference modeling.
Ultimately, AI ethics must integrate psychology, economics, political theory, and philosophy.

Chapter 10: Where Do We Go From Here?

The central challenge: ensuring control of increasingly capable AI systems while still unlocking their benefits.
Alignment requires a paradigm shift: AI should be built around uncertainty about human preferences, not fixed goals.
Urgent need for multidisciplinary collaboration — AI researchers, economists, philosophers, policymakers, and the public must be involved.
Policy and regulation should prioritize safety, transparency, and international coordination, similar to nuclear safeguards.
The research agenda: develop algorithms for learning preferences, formal guarantees of corrigibility, and practical tests for provably beneficial AI.
Competition (corporate or national) is a major danger. Cutting corners on safety to “win” the AI race could be catastrophic.
Emphasize education and awareness: the public and policymakers must understand that control is not automatic.
If done right, AI could usher in an age of abundance, healthcare breakthroughs, and solutions to global crises.
If done wrong, misaligned AI could lead to irrelevance, loss of control, or even extinction.
Cautious optimism: success is possible, but only if we change course now.