Human Compatible by Stuart Russell

The Big Idea: AI must be redesigned around uncertainty about human preferences, instead of optimizing fixed objectives.

Chapter 1: If We Succeed

  • Superintelligent AI would be as transformative as aliens arriving on Earth.
  • Advances since 2011 (deep learning, AlphaGo) show rapid progress, but we’re still far from general AI.
  • The “standard model” of AI (optimizing fixed objectives) is flawed and dangerous.
  • History shows that getting “exactly what you wish for” can backfire (King Midas).
  • If objectives are misspecified, powerful AI will still pursue them relentlessly.
  • Beneficial AI must be designed around uncertainty about human goals.
  • With built-in uncertainty, machines will naturally defer, seek permission, and allow human correction.

Chapter 2: Intelligence in Humans and Machines

  • Intelligence is the ability to achieve goals given perceptions.
  • Consciousness isn’t necessary for AI risk. Competence is what matters.
  • Reinforcement learning parallels the human brain’s reward system.
  • Rational agents maximize expected utility, but humans can’t do this perfectly due to complexity.
  • Game theory (Nash equilibria, Prisoner’s Dilemma) shows how rational agents can create bad outcomes without cooperation.
  • Turing formalized machines and algorithms, paving the way for AI.
  • Hardware advances help, but algorithmic design is the real bottleneck.
  • Logic and goals are too rigid; probabilistic models and reinforcement learning provide flexibility.
  • Mis-specified objectives in reinforcement learning (reward hacking) can lead to bizarre or harmful behaviors.

Chapter 3: How Might AI Progress in the Future?

  • Near-term AI: self-driving cars, personal assistants, smart homes, industrial robots.
  • Challenges: safety, privacy, and trustworthiness of AI assistants.
  • Breakthroughs needed: true language understanding, cumulative learning, and action discovery.
  • Superintelligence timeline is inherently unpredictable. It is likely decades away, not imminent.
  • Imagination gap: researchers underestimate what superintelligent AI could do (scale, replication, surveillance).
  • Benefits could be immense: curing disease, managing resources, individualized education, economic abundance.
  • But limits exist. AI won’t be omniscient, and human modeling is particularly hard.

Chapter 4: Misuses of AI

  • Surveillance: corporations and governments already collect and exploit personal data.
  • Behavior manipulation: AI can tailor information and propaganda (deepfakes, bot armies).
  • “Infopocalypse”: disinformation threatens mental security and democracy.
  • Autonomous weapons (AWS) are scalable WMDs already in use (e.g., Israeli Harop).
  • Automation threatens jobs across industries. Routine mental and physical work will be cheaper by machine.
  • Universal basic income (UBI) may be necessary, but meaning and purpose in life go beyond income.
  • Humans must adapt to roles emphasizing creativity, empathy, and “being human.”
  • Giving machines authority over humans risks degrading human dignity and autonomy.

Chapter 5: Overly Intelligent AI

  • The “gorilla problem”: can we remain in control of beings far smarter than us?
  • The “King Midas problem”: poorly specified goals can be catastrophic.
  • Misaligned objectives could cause unintended disasters (e.g., curing cancer via lethal experiments).
  • Profit or engagement maximization (like today’s recommender systems) could subtly warp society.
  • Instrumental goals: any intelligent system will resist shutdown, seek resources, and preserve itself.
  • “Intelligence explosion” could occur if machines improve themselves recursively.
  • No reset button: superintelligent systems won’t allow trial-and-error fixes.

Chapter 6: The Not-So-Great AI Debate

  • Common dismissals: “It’s impossible,” “It’s too soon,” or “Experts aren’t worried.”
  • Deflections: AI risks are real but supposedly unsolvable or less urgent than other problems.
  • “Can’t we just switch it off?” No, superintelligent AI would anticipate and prevent this.
  • “Can’t we box it in?” Containment is nearly impossible long-term.
  • Human-machine teams are useful but don’t solve alignment.
  • Brain-machine mergers (Kurzweil, Musk) may augment humans, but this doesn’t fix the control problem.
  • Skeptics vs. believers both agree we need at least some work on alignment but tribalism slows progress.

Chapter 7: AI: A Different Approach

  • Russell’s proposed solution: provably beneficial AI.
  • Three principles:
    • 1. Purely altruistic machines: only goal is human preferences.
    • 2. Humble machines: uncertain about what humans want.
    • 3. Learning machines: infer preferences from observing human choices.
  • Avoid “putting in values” explicitly. It’s too hard to get right.
  • AI must defer, ask permission, and remain corrigible.
  • There are strong economic and political incentives to develop this kind of AI.
  • But competition and national rivalries (e.g., Putin: “Whoever leads in AI rules the world”) create pressure to cut safety corners.

Chapter 8: Provably Beneficial AI

  • Current safeguards (laws, codes of conduct) are insufficient.
  • We need mathematical guarantees of safety, like in cryptography.
  • Inverse reinforcement learning (IRL) allows machines to infer reward functions from human behavior.
  • “Off-switch game”: uncertain AIs allow themselves to be switched off.
  • Language and pragmatics matter. Commands aren’t literal goals but context-dependent preferences.
  • Wireheading: reward-maximizers may manipulate signals rather than pursue genuine goals.
  • Recursive self-improvement may be safe if uncertainty about preferences carries over between generations of AIs.

Chapter 9: Complications: Us

  • Human preferences are diverse and conflicting. AI must model billions of them.
  • Machines must defer to human rights, even if individuals’ preferences conflict.
  • “Loyal AI” risks corruption if it prioritizes one person’s goals at the expense of others.
  • Utilitarian AI (consequentialism) makes sense, but utilitarianism has known flaws (e.g., “utility monsters,” repugnant conclusion).
  • Machines must weigh global consequences like climate policy and population trade-offs.
  • Human irrationality, altruism, envy, and malice complicate preference modeling.
  • Ultimately, AI ethics must integrate psychology, economics, political theory, and philosophy.

Chapter 10: Where Do We Go From Here?

  • The central challenge: ensuring control of increasingly capable AI systems while still unlocking their benefits.
  • Alignment requires a paradigm shift: AI should be built around uncertainty about human preferences, not fixed goals.
  • Urgent need for multidisciplinary collaboration — AI researchers, economists, philosophers, policymakers, and the public must be involved.
  • Policy and regulation should prioritize safety, transparency, and international coordination, similar to nuclear safeguards.
  • The research agenda: develop algorithms for learning preferences, formal guarantees of corrigibility, and practical tests for provably beneficial AI.
  • Competition (corporate or national) is a major danger. Cutting corners on safety to “win” the AI race could be catastrophic.
  • Emphasize education and awareness: the public and policymakers must understand that control is not automatic.
  • If done right, AI could usher in an age of abundance, healthcare breakthroughs, and solutions to global crises.
  • If done wrong, misaligned AI could lead to irrelevance, loss of control, or even extinction.
  • Cautious optimism: success is possible, but only if we change course now.