9780525558613_Human

9780525558613_Human_TX.indd i 8/7/19 11:21 PM

Not

for

Distribution

tio

9780525558613_Human_TX.indd ii 8/7/19 11:21 PM

Not

for

Distribution

ALSO BY STUART RUSSELL

The Use of Knowledge in Analogy and Induction (1989)

Do the Right Thing: Studies in Limited Rationality

(with Eric Wefald, 1991)

Artificial Intelligence: A Modern Approach

(with Peter Norvig, 1995, 2003, 2010, 2019)

9780525558613_Human_TX.indd iii 8/7/19 11:21 PMM

Not

for

Distribution

(1989

RationalityRationality

Modern ApprModern Appr

5, 2003, 20102003, 201

9780525558613_Human_TX.indd iv 8/7/19 11:21 PM

Not

for

Distribution

ARTIFICIAL INTELLIGENCE AND THE

PROBLEM OF CONTROL

Stuart Russell

VIKING

9780525558613_Human_TX.indd v 8/7/19 11:21 PMM

Not

for

Distribution

IGENCEGENCE

OF COOF CO

bution

VIKING

An imprint of Penguin Random House LLC

penguinrandomhouse.com

Penguin supports copyright. Copyright fuels creativity, encourages

diverse voices, promotes free speech, and creates a vibrant culture. Thank

you for buying an authorized edition of this book and for complying with

in any form without permission. You are supporting writers and allowing

Penguin to continue to publish books for every reader.

ISBN 9780525558613 (hardcover)

ISBN 9780525558620 (ebook)

Printed in the United States of America

1 3 5 7 9 10 8 6 4 2

9780525558613_Human_TX.indd vi 8/7/19 11:21 PM

Not

for

BN 9780

978

ISBN 9780

SBN 97

Printed in th

rinted in t

Distribution

use LLCLLC

uart Russellart Russell

ht fuels creatifuels creat

and creates a vd creates a

n of this bookof this boo

g, scanning, og, scanning, o

on. You are suon. You are su

e to publish

2555

For Loy, Gordon, Lucy, George, and Isaac

M 9780525558613_Human_TX.indd vii 8/7/19 11:21 PM

for

Distribution

aac

9780525558613_Human_TX.indd viii 8/7/19 11:21 PM

Not

for

Distribution

CONTENTS

PREFACE xi

Chapter 1. IF WE SUCCEED 1

Chapter 2. INTELLIGENCE IN HUMANS AND MACHINES 13

Chapter 3. HOW MIGHT AI PROGRESS IN THE FUTURE? 62

Chapter 4. MISUSES OF AI 103

Chapter 5. OVERLY INTELLIGENT AI 132

Chapter 6. THE NOT- SO- GREAT AI DEBATE 145

Chapter 7. AI: A DIFFERENT APPROACH 171

Chapter 8. PROVABLY BENEFICIAL AI 184

Chapter 9. COMPLICATIONS: US 211

Chapter 10. PROBLEM SOLVED? 246

Appendix A. SEARCHING FOR SOLUTIONS 257

Appendix B. KNOWLEDGE AND LOGIC 267

Appendix C. UNCERTAINTY AND PROBABILITY 273

Appendix D. LEARNING FROM EXPERIENCE 285

Acknowledgments 297

Notes 299

Image Credits 324

Index 325

9780525558613_Human_TX.indd ix 8/7/19 11:21 PMM

OVAB

COMPLICCOMPLIC

PROPRO

for

O-

FFERENT A

FFERENT

LY BEY B

Distribution

ND MACHD MACH

S IN THE FS IN THE

GENT AI GENT AI

REAT AREAT

9780525558613_Human_TX.indd x 8/7/19 11:21 PM

Not

for

Distribution

PREFACE

Why This Book? Why Now?

This book is about the past, present, and future of our attempt to

understand and create intelligence. This matters, not because AI is

rapidly becoming a pervasive aspect of the present but because it is

the dominant technology of the future. The world’s great powers are

waking up to this fact, and the world’s largest corporations have known

it for some time. We cannot predict exactly how the technology will

develop or on what timeline. Nevertheless, we must plan for the

possibility that machines will far exceed the human capacity for

decision making in the real world. What then?

Everything civilization has to offer is the product of our intelli-

gence; gaining access to considerably greater intelligence would be the

biggest event in human history. The purpose of the book is to explain

why it might be the last event in human history and how to make sure

that it is not.

9780525558613_Human_TX.indd xi 8/7/19 11:21 PMM

Not

e. W

on what

what

ty that may that m

making making

for

ogy of

y o

act, and the

ct, and th

We canno

We cann

Distribution

w?w?

esent, and ent, and

igence. Thigence. Th

ve aspect

he f

xii PREFACE

Overview of the Book

The book has three parts. The first part (Chapters 1 to 3) explores the

idea of intelligence in humans and in machines. The material requires

no technical background, but for those who are interested, it is supple-

mented by four appendices that explain some of the core concepts

underlying present- day AI systems. The second part (Chapters 4 to 6)

discusses some problems arising from imbuing machines with intel-

ligence. I focus in particular on the problem of control: retaining

absolute power over machines that are more powerful than us. The

third part (Chapters 7 to 10) suggests a new way to think about AI

and to ensure that machines remain beneficial to humans, forever.

The book is intended for a general audience but will, I hope, be of

value in convincing specialists in artificial intelligence to rethink their

fundamental assumptions.

9780525558613_Human_TX.indd xii 8/7/19 11:21 PM

Not

for

Distribution

ntrol

erful than

ful than

way to thinkay to thin

ficial to hul to hu

ience but wence but w

icial intelligal intelli

M 9780525558613_Human_TX.indd xiii 8/7/19 11:21 PM

for

Distribution

tio

9780525558613_Human_TX.indd xiv 8/7/19 11:21 PM

Not

for

Distribution

IF WE SUCCEED

long time ago, my parents lived in Birmingham, England, in a

house near the university. They decided to move out of the

city and sold the house to David Lodge, a professor of English

literature. Lodge was by that time already a well-known novelist. I

never met him, but I decided to read some of his books: Changing

Places and Small World. Among the principal characters were fictional

academics moving from a fictional version of Birmingham to a fic-

tional version of Berkeley, California. As I was an actual academic

from the actual Birmingham who had just moved to the actual Berke-

ley, it seemed that someone in the Department of Coincidences was

telling me to pay attention.

One particular scene from Small World struck me: The protago-

nist, an aspiring literary theorist, attends a major international confer-

ence and asks a panel of leading figures, “What follows if everyone

agrees with you?” The question causes consternation, because the

panelists had been more concerned with intellectual combat than as-

certaining truth or attaining understanding. It occurred to me then

that an analogous question could be asked of the leading figures in AI:

“What if you succeed?” The field’s goal had always been to create

9780525558613_Human_TX.indd 1 8/7/19 11:21 PMM

Not

of B

ual Birmi

al Birmi

emed that semed that

e to paye to pay

for

. Amo

from a fic

from a fi

erkeley,

Distribution

in Birmingn Birming

They decideey decid

o David LodDavid Lod

t time alret time alr

ded to rea

ded to re

ng th

2 HUMAN COMPATIBLE

human- level or superhuman AI, but there was little or no consider-

ation of what would happen if we did.

A few years later, Peter Norvig and I began work on a new AI text-

book, whose first edition appeared in 1995.

The book’s final section

is titled “What If We Do Succeed?” The section points to the possibil-

ity of good and bad outcomes but reaches no firm conclusions. By the

time of the third edition in 2010, many people had finally begun to

consider the possibility that superhuman AI might not be a good

thing— but these people were mostly outsiders rather than main-

stream AI researchers. By 2013, I became convinced that the issue not

only belonged in the mainstream but was possibly the most important

question facing humanity.

In November 2013, I gave a talk at the Dulwich Picture Gallery, a

venerable art museum in south London. The audience consisted

mostly of retired people— nonscientists with a general interest in in-

tellectual matters— so I had to give a completely nontechnical talk. It

seemed an appropriate venue to try out my ideas in public for the first

time. After explaining what AI was about, I nominated five candi-

dates for “biggest event in the future of humanity”:

1. We all die (asteroid impact, climate catastrophe, pandemic, etc.).

2. We all live forever (medical solution to aging).

3. We invent faster- than- light travel and conquer the universe.

4. We are visited by a superior alien civilization.

5. We invent superintelligent AI.

I suggested that the fifth candidate, superintelligent AI, would be

the winner, because it would help us avoid physical catastrophes and

achieve eternal life and faster- than- light travel, if those were indeed

possible. It would represent a huge leap— a discontinuity— in our civ-

ilization. The arrival of superintelligent AI is in many ways analogous

to the arrival of a superior alien civilization but much more likely to

9780525558613_Human_TX.indd 2 8/7/19 11:21 PM

Not

aste

ive foreve

e foreve

nvent ent

asteaste

are visiteare visit

for

in the

roid imp

oid imp

Distribution

at th

he most im

most im

ulwich Pictwich Pic

on. The aun. The au

sts with a g with a

ve a complea comple

o try out mo try out m

t AI was

futu

IF WE SUCCEED 3

occur. Perhaps most important, AI, unlike aliens, is something over

which we have some say.

Then I asked the audience to imagine what would happen if we

received notice from a superior alien civilization that they would ar-

rive on Earth in thirty to fifty years. The word pandemonium doesn’t

begin to describe it. Yet our response to the anticipated arrival of su-

perintelligent AI has been... well, underwhelming begins to describe

it. (In a later talk, I illustrated this in the form of the email exchange

shown in figure 1.) Finally, I explained the significance of superintelli-

gent AI as follows: “Success would be the biggest event in human

history... and perhaps the last event in human history.”

From: Superior Alien Civilization <[email protected].u>

To: [email protected]

Subject: Contact

Be warned: we shall arrive in 30– 50 years

From: [email protected]

To: Superior Alien Civilization <[email protected].u>

Subject:2XWRIRIÀFH5H&RQWDFW

+XPDQLW\LVFXUUHQWO\RXWRIWKHRIÀFH:HZLOOUHVSRQGWR\RXU

message when we return.

FIGURE 3UREDEO\QRWWKHHPDLOH[FKDQJHWKDWZRXOGIROORZWKHILUVWFRQWDFW

by a superior alien civilization.

A few months later, in April 2014, I was at a conference in Iceland

and got a call from National Public Radio asking if they could inter-

view me about the movie Transcendence, which had just been released

in the United States. Although I had read the plot summaries and re-

views, I hadn’t seen it because I was living in Paris at the time, and it

would not be released there until June. It so happened, however, that

M 9780525558613_Human_TX.indd 3 8/7/19 11:21 PM

Not

LVFXUUHQFXUUH

ge when we

e when we

3URE3URE

for

Civilizatio

viliza

IRIÀFH5H

RIÀFH5H

WO\

Distribution

even

istory.”ory.”

sirius.canismius.canism

30–30–

50 year50 year

4 HUMAN COMPATIBLE

I had just added a detour to Boston on the way home from Iceland, so

that I could participate in a Defense Department meeting. So, after

arriving at Boston’s Logan Airport, I took a taxi to the nearest theater

showing the movie. I sat in the second row and watched as a Berkeley

AI professor, played by Johnny Depp, was gunned down by anti- AI

activists worried about, yes, superintelligent AI. Involuntarily, I shrank

down in my seat. (Another call from the Department of Coinci-

dences?) Before Johnny Depp’s character dies, his mind is uploaded to

a quantum supercomputer and quickly outruns human capabilities,

threatening to take over the world.

On April 19, 2014, a review of Transcendence

, co- authored with

physicists Max Tegmark, Frank Wilczek, and Stephen Hawking, ap-

peared in the Huffington Post. It included the sentence from my Dul-

wich talk about the biggest event in human history. From then on, I

would be publicly committed to the view that my own field of re-

search posed a potential risk to my own species.

+RZ'LG:H*HW+HUH"

The roots of AI stretch far back into antiquity, but its “official” begin-

ning was in 1956. Two young mathematicians, John McCarthy and

Marvin Minsky, had persuaded Claude Shannon, already famous as the

inventor of information theory, and Nathaniel Rochester, the designer

of IBM’s first commercial computer, to join them in organizing a sum-

mer program at Dartmouth College. The goal was stated as follows:

The study is to proceed on the basis of the conjecture that every

aspect of learning or any other feature of intelligence can in prin-

ciple be so precisely described that a machine can be made to sim-

ulate it. An attempt will be made to find how to make machines

use language, form abstractions and concepts, solve kinds of prob-

lems now reserved for humans, and improve themselves. We think

9780525558613_Human_TX.indd 4 8/7/19 11:21 PM

Not

tre

1956. Tw

56. Tw

insky, had pinsky, had

f informf inform

for

HW+

+

tch far b

ch far b

Distribution

utho

Stephen Hatephen Ha

e sentence entence

man historyman history

e view thatiew that

y own specown spec

HUH

IF WE SUCCEED 5

that a significant advance can be made in one or more of these

problems if a carefully selected group of scientists work on it

together for a summer.

Needless to say, it took much longer than a summer: we are still working

on all these problems.

In the first decade or so after the Dartmouth meeting, AI had sev-

eral major successes, including Alan Robinson’s algorithm for general-

purpose logical reasoning

and Arthur Samuel’s checker-playing

program, which taught itself to beat its creator.

The first AI bubble

burst in the late 1960s, when early efforts at machine learning and

machine translation failed to live up to expectations. A report com-

missioned by the UK government in 1973 concluded, “In no part of

the field have the discoveries made so far produced the major impact

that was then promised.”

In other words, the machines just weren’t

smart enough.

My eleven-year-old self was, fortunately, unaware of this report.

Two years later, when I was given a Sinclair Cambridge Programmable

calculator, I just wanted to make it intelligent. With a maximum pro-

gram size of thirty- six keystrokes, however, the Sinclair was not quite

big enough for human-level AI. Undeterred, I gained access to the gi-

ant CDC 6600 supercomputer

at Imperial College London and wrote

a chess program— a stack of punched cards two feet high. It wasn’t

very good, but it didn’t matter. I knew what I wanted to do.

By the mid-1980s, I had become a professor at Berkeley, and AI

was experiencing a huge revival thanks to the commercial potential of

so- called expert systems. The second AI bubble burst when these sys-

tems proved to be inadequate for many of the tasks to which they

were applied. Again, the machines just weren’t smart enough. An AI

winter ensued. My own AI course at Berkeley, currently bursting with

over nine hundred students, had just twenty- five students in 1990.

The AI community learned its lesson: smarter, obviously, was bet-

ter, but we would have to do our homework to make that happen. The

M 9780525558613_Human_TX.indd 5 8/7/19 11:21 PM

Not

00 superc

0 superc

prp

ogram—gram—

d, but itd, but it

for

ed to m

ix keystro

ix keystr

an-an-

eveeve

Distribution

firs

achine lea

hine lea

tations. A rations. A

concludedncluded

ar producedr produce

words, the ords, the

was, fortunawas, fortun

given a Si

given a S

ake i

ake

6 HUMAN COMPATIBLE

field became far more mathematical. Connections were made to the

long- established disciplines of probability, statistics, and control the-

ory. The seeds of today’s progress were sown during that AI winter,

including early work on large- scale probabilistic reasoning systems

and what later became known as deep learning.

Beginning around 2011, deep learning techniques began to pro-

duce dramatic advances in speech recognition, visual object recogni-

tion, and machine translation— three of the most important open

problems in the field. By some measures, machines now match or ex-

ceed human capabilities in these areas. In 2016 and 2017, DeepMind’s

AlphaGo defeated Lee Sedol, former world Go champion, and Ke Jie,

the current champion— events that some experts predicted wouldn’t

happen until 2097, if ever.

Now AI generates front- page media coverage almost every day.

Thousands of start- up companies have been created, fueled by a flood

of venture funding. Millions of students have taken online AI and

machine learning courses, and experts in the area command salaries in

the millions of dollars. Investments flowing from venture funds, na-

tional governments, and major corporations are in the tens of billions

of dollars annually— more money in the last five years than in the en-

tire previous history of the field. Advances that are already in the

pipeline, such as self- driving cars and intelligent personal assistants,

are likely to have a substantial impact on the world over the next de-

cade or so. The potential economic and social benefits of AI are vast,

creating enormous momentum in the AI research enterprise.

What Happens Next?

Does this rapid rate of progress mean that we are about to be over-

taken by machines? No. There are several breakthroughs that have

to happen before we have anything resembling machines with super-

human intelligence.

9780525558613_Human_TX.indd 6 8/7/19 11:21 PM

Not

tor

h as as

elf-elf-

to have a suto have a s

. The p. The p

for

d majo

maj

ore mon

ore mo

y of the

Distribution

17, D

mpion, an

pion, an

rts predictets predicte

coverage acoverage

e been creabeen crea

students haudents ha

experts in texperts in

stments fl

stments f

corp

cor

IF WE SUCCEED 7

Scientific breakthroughs are notoriously hard to predict. To get a

sense of just how hard, we can look back at the history of another field

with civilization- ending potential: nuclear physics.

In the early years of the twentieth century, perhaps no nuclear

physicist was more distinguished than Ernest Rutherford, the discov-

erer of the proton and the “man who split the atom” (figure 2[a]). Like

his colleagues, Rutherford had long been aware that atomic nuclei

stored immense amounts of energy; yet the prevailing view was that

tapping this source of energy was impossible.

On September 11, 1933, the British Association for the Advance-

ment of Science held its annual meeting in Leicester. Lord Rutherford

addressed the evening session. As he had done several times before, he

poured cold water on the prospects for atomic energy: “Anyone who

looks for a source of power in the transformation of the atoms is

talking moonshine.” Rutherford’s speech was reported in the Times of

London the next morning (figure 2[b]).

Leo Szilard (figure 2[c]), a Hungarian physicist who had recently

fled from Nazi Germany, was staying at the Imperial Hotel on Russell

(a) (b) (c)

FIGURE D/RUG5XWKHUIRUGQXFOHDUSK\VLFLVWE([FHUSWVIURPDUHSRUWLQ

the TimesRI6HSWHPEHUFRQFHUQLQJDVSHHFKJLYHQE\5XWKHUIRUGWKH

SUHYLRXVHYHQLQJF/HR6]LODUGQXFOHDUSK\VLFLVW

M 9780525558613_Human_TX.indd 7 8/7/19 11:21 PM

Not

Distribution

or th

er. Lord R

Lord R

several timeeveral time

omic energyc energ

ansformationsformatio

eech was rech was re

e 2[b]).[b]).

8 HUMAN COMPATIBLE

Square in London. He read the Times’ report at breakfast. Mulling over

what he had read, he went for a walk and invented the neutron- induced

nuclear chain reaction.

The problem of liberating nuclear energy went

from impossible to essentially solved in less than twenty- four hours.

Szilard filed a secret patent for a nuclear reactor the following year. The

first patent for a nuclear weapon was issued in France in 1939.

The moral of this story is that betting against human ingenuity is

foolhardy, particularly when our future is at stake. Within the AI

community, a kind of denialism is emerging, even going as far as deny-

ing the possibility of success in achieving the long- term goals of AI. It’s

as if a bus driver, with all of humanity as passengers, said, “Yes, I am

driving as hard as I can towards a cliff, but trust me, we’ll run out of

gas before we get there!”

I am not saying that success in AI will necessarily happen, and I

think it’s quite unlikely that it will happen in the next few years. It

seems prudent, nonetheless, to prepare for the eventuality. If all goes

well, it would herald a golden age for humanity, but we have to face

the fact that we are planning to make entities that are far more pow-

erful than humans. How do we ensure that they never, ever have

power over us?

To get just an inkling of the fire we’re playing with, consider how

content- selection algorithms function on social media. They aren’t

particularly intelligent, but they are in a position to affect the entire

world because they directly influence billions of people. Typically,

such algorithms are designed to maximize

click- through, that is, the

probability that the user clicks on presented items. The solution is

simply to present items that the user likes to click on, right? Wrong.

The solution is to change the user’s preferences so that they become

more predictable. A more predictable user can be fed items that they

are likely to click on, thereby generating more revenue. People with

more extreme political views tend to be more predictable in which

items they will click on. (Possibly there is a category of articles that

9780525558613_Human_TX.indd 8 8/7/19 11:21 PM

Not

n in

tion algor

on algo

ly intelligenly intellige

ause thause th

for

ow do

kling of

Distribution

goal

rs, said, “Y

said, “Y

t me, we’llt me, we’ll

will ill

necessarnecessar

happen in tppen in t

epare for thare for th

age for humage for h

g to make

IF WE SUCCEED 9

die- hard centrists are likely to click on, but it’s not easy to imagine

what this category consists of.) Like any rational entity, the algorithm

learns how to modify the state of its environment— in this case, the

user’s mind— in order to maximize its own reward.

The consequences

include the resurgence of fascism, the dissolution of the social contract

that underpins democracies around the world, and potentially the end

of the European Union and NATO. Not bad for a few lines of code,

even if it had a helping hand from some humans. Now imagine what a

really intelligent algorithm would be able to do.

What Went Wrong?

The history of AI has been driven by a single mantra: “The more intel-

ligent the better.” I am convinced that this is a mistake— not because

of some vague fear of being superseded but because of the way we

have understood intelligence itself.

The concept of intelligence is central to who we are— that’s why

we call ourselves Homo sapiens, or “wise man.” After more than two

thousand years of self- examination, we have arrived at a characteriza-

tion of intelligence that can be boiled down to this:

Humans are intelligent to the extent that our actions can be expected

to achieve our objectives.

All those other characteristics of intelligence— perceiving, thinking,

learning, inventing, and so on— can be understood through their con-

tributions to our ability to act successfully. From the very beginnings

of AI, intelligence in machines has been defined in the same way:

Machines are intelligent to the extent that their actions can be expected

to achieve their objectives.

M 9780525558613_Human_TX.indd 9 8/7/19 11:21 PM

Not

nce

ans are intellans are inte

ieve our ieve our

for

o sapi

sap

elf-elf

xamin

xami

that can

Distribution

ingle mantrngle mantr

at this is a this is a

erseded buseded bu

itself.itself.

ence is cen

ence is ce

nsns

10 HUMAN COMPATIBLE

Because machines, unlike humans, have no objectives of their own,

we give them objectives to achieve. In other words, we build optimiz-

ing machines, we feed objectives into them, and off they go.

This general approach is not unique to AI. It recurs throughout the

technological and mathematical underpinnings of our society. In the

field of control theory, which designs control systems for everything

from jumbo jets to insulin pumps, the job of the system is to mini-

mize a cost function that typically measures some deviation from a

desired behavior. In the field of economics, mechanisms and policies

are designed to maximize the utility of individuals, the welfare of

groups, and the profit of corporations.

In operations research, which

solves complex logistical and manufacturing problems, a solution

maximizes an expected sum of rewards over time. Finally, in statistics,

learning algorithms are designed to minimize an expected loss func-

tion that defines the cost of making prediction errors.

Evidently, this general scheme— which I will call the standard

model— is widespread and extremely powerful. Unfortunately, we

don’t want machines that are intelligent in this sense.

The drawback of the standard model was pointed out in 1960 by

Norbert Wiener, a legendary professor at MIT and one of the leading

mathematicians of the mid- twentieth century. Wiener had just seen

Arthur Samuel’s checker- playing program learn to play checkers far

better than its creator. That experience led him to write a prescient

but little- known paper, “Some Moral and Technical Consequences of

Automation.”

Here’s how he states the main point:

If we use, to achieve our purposes, a mechanical agency with

whose operation we cannot interfere effectively... we had better

be quite sure that the purpose put into the machine is the purpose

which we really desire.

“The purpose put into the machine” is exactly the objective that ma-

chines are optimizing in the standard model. If we put the wrong

9780525558613_Human_TX.indd 10 8/7/19 11:21 PM

Not

el’s s

heck

hec

n its creaton its creat

nown pnown

for

e stand

tan

egendary pr

egendary p

id-id

Distribution

the

ns research

research

problems, problems,

time. Finalle. Final

imize an emize an e

rediction erdiction e

e—

hich hich

xtremely poxtremely

intelligent

ard

IF WE SUCCEED 11

objective into a machine that is more intelligent than us, it will achieve

the objective, and we lose. The social- media meltdown I described

earlier is just a foretaste of this, resulting from optimizing the wrong

objective on a global scale with fairly unintelligent algorithms. In

Chapter 5, I spell out some far worse outcomes.

All this should come as no great surprise. For thousands of years,

we have known the perils of getting exactly what you wish for. In

every story where someone is granted three wishes, the third wish is

always to undo the first two wishes.

In summary, it seems that the march towards superhuman intelli-

gence is unstoppable, but success might be the undoing of the human

race. Not all is lost, however. We have to understand where we went

wrong and then fix it.

Can We Fix It?

The problem is right there in the basic definition of AI. We say that

machines are intelligent to the extent that their actions can be ex-

pected to achieve their objectives, but we have no reliable way to make

sure that their objectives are the same as our objectives.

What if, instead of allowing machines to pursue their objectives,

we insist that they pursue our objectives? Such a machine, if it could

be designed, would be not just intelligent but also beneficial to humans.

So let’s try this:

Machines are beneficial to the extent that their actions can be ex-

pected to achieve our objectives.

This is probably what we should have done all along.

The difficult part, of course, is that our objectives are in us (all

eight billion of us, in all our glorious variety) and not in the machines.

It is, nonetheless, possible to build machines that are beneficial in

M 9780525558613_Human_TX.indd 11 8/7/19 11:21 PM

Not

bje

instead o

nstead o

that they pthat they

ed, woued, wou

for

nt to

heirei

objectiv

object

ctives ar

tives ar

Distribution

erhu

doing of th

ing of th

erstand wherstand wh

in the ba

he e

12 HUMAN COMPATIBLE

exactly this sense. Inevitably, these machines will be uncertain about

our objectives— after all, we are uncertain about them ourselves— but

it turns out that this is a feature, not a bug (that is, a good thing and

not a bad thing). Uncertainty about objectives implies that machines

will necessarily defer to humans: they will ask permission, they will

accept correction, and they will allow themselves to be switched off.

Removing the assumption that machines should have a definite

objective means that we will need to tear out and replace part of

the foundations of artificial intelligence— the basic definitions of what

we are trying to do. That also means rebuilding a great deal of the

superstructure— the accumulation of ideas and methods for actually

doing AI. The result will be a new relationship between humans and

machines, one that I hope will enable us to navigate the next few de-

cades successfully.

9780525558613_Human_TX.indd 12 8/7/19 11:21 PM

Not

for

Distribution

eat d

ethods for

hods for

between hbetween h

navigate thigate th

INTELLIGENCE IN HUMANS

AND MACHINES

hen you arrive at a dead end, it’s a good idea to retrace

your steps and work out where you took a wrong turn. I

have argued that the standard model of AI, wherein ma-

chines optimize a fixed objective supplied by humans, is a dead end.

The problem is not that we might fail to do a good job of building AI

systems; it’s that we might succeed too well. The very definition of

success in AI is wrong.

So let’s retrace our steps, all the way to the beginning. Let’s try to

understand how our concept of intelligence came about and how it

came to be applied to machines. Then we have a chance of coming up

with a better definition of what counts as a good AI system.

Intelligence

How does the universe work? How did life begin? Where are my keys?

These are fundamental questions worthy of thought. But who is ask-

ing these questions? How am I answering them? How can a handful

M 9780525558613_Human_TX.indd 13 8/7/19 11:21 PM

Not

at w

I is wrong

s wrong

t’s retrace ot’s retrace

nd hownd how

for

d obje

obj

t that we m

that we m

we migh

e migh

Distribution

dead end, iad end, i

ork out whork out w

at the stan

at the sta

tive

14 HUMAN COMPATIBLE

of matter— the few pounds of pinkish- gray blancmange we call a

brain— perceive, understand, predict, and manipulate a world of un-

imaginable vastness? Before long, the mind turns to examine itself.

We have been trying for thousands of years to understand how our

minds work. Initially, the purposes included curiosity, self- management,

persuasion, and the rather pragmatic goal of analyzing mathematical

arguments. Yet every step towards an explanation of how the mind

works is also a step towards the creation of the mind’s capabilities in an

artifact— that is, a step towards artificial intelligence.

Before we can understand how to create intelligence, it helps to

understand what it is. The answer is not to be found in IQ tests, or

even in Turing tests, but in a simple relationship between what we

perceive, what we want, and what we do. Roughly speaking, an entity

is intelligent to the extent that what it does is likely to achieve what it

wants, given what it has perceived.

Evolutionary origins

Consider a lowly bacterium, such as E. coli. It is equipped with

about half a dozen flagella— long, hairlike tentacles that rotate at the

base either clockwise or counterclockwise. (The rotary motor itself is

an amazing thing, but that’s another story.) As

E. coli floats about in its

liquid home— your lower intestine— it alternates between rotating its

flagella clockwise, causing it to “tumble” in place, and counterclock-

wise, causing the flagella to twine together into a kind of propeller so

the bacterium swims in a straight line. Thus,

E. coli does a sort of ran-

dom walk— swim, tumble, swim, tumble— that allows it to find and

consume glucose rather than staying put and dying of starvation.

If this were the whole story, we wouldn’t say that

E. coli is particu-

larly intelligent, because its actions would not depend in any way on

its environment. It wouldn’t be making any decisions, just executing a

fixed behavior that evolution has built into its genes. But this isn’t

the whole story. When

E. coli senses an increasing concentration of

9780525558613_Human_TX.indd 14 8/7/19 11:21 PM

Not

wis

hing, but t

ng, but t

me——

our loour l

ockwiseockwise

for

acteriu

eri

lagella—

e or cou

Distribution

nce,

und in IQ

d in IQ

hip betweeip betwee

ughly speahly spea

oes is likely es is likely

m, s

INTELLIGENCE IN HUMANS AND MACHINES 15

glucose, it swims longer and tumbles less, and it does the opposite

when it senses a decreasing concentration of glucose. So, what it does

(swim towards glucose) is likely to achieve what it wants (more glu-

cose, let’s assume), given what it has perceived (an increasing glucose

concen tration).

Perhaps you are thinking, “But evolution built this into its genes

too! How does that make it intelligent?” This is a dangerous line of

reasoning, because evolution built the basic design of your brain into

your genes too, and presumably you wouldn’t wish to deny your own

intelligence on that basis. The point is that what evolution has built

into E. coli’s genes, as it has into yours, is a mechanism whereby the

bacterium’s behavior varies according to what it perceives in its envi-

ronment. Evolution doesn’t know, in advance, where the glucose is

going to be or where your keys are, so putting the capability to find

them into the organism is the next best thing.

Now,

E. coli is no intellectual giant. As far as we know, it doesn’t

remember where it has been, so if it goes from A to B and finds no

glucose, it’s just as likely to go back to A. If we construct an environ-

ment where every attractive glucose gradient leads only to a spot of

phenol (which is a poison for

E. coli), the bacterium will keep follow-

ing those gradients. It never learns. It has no brain, just a few simple

chemical reactions to do the job.

A big step forward occurred with action potentials, which are a form

of electrical signaling that first evolved in single- celled organisms

around a billion years ago. Later multicellular organisms evolved spe-

cialized cells called neurons that use electrical action potentials to carry

signals rapidly— up to 120 meters per second, or 270 miles per hour—

within the organism. The connections between neurons are called syn-

apses. The strength of the synaptic connection dictates how much

electrical excitation passes from one neuron to another. By changing

the strength of synaptic connections, animals learn.

Learning confers a

huge evolutionary advantage, because the animal can adapt to a range

of circumstances. Learning also speeds up the rate of evolution itself.

M 9780525558613_Human_TX.indd 15 8/7/19 11:21 PM

Not

ents

actions to

ions to

step forwarstep forwa

ical signical sig

for

ractive

tiv

poison for

. It neve

It neve

Distribution

olutio

hanism wh

nism wh

it perceivest perceive

nce, where, wher

putting theputting the

best thing.t thing.

giant. As fiant. As

, so if it go, so if it g

go back t

gluc

16 HUMAN COMPATIBLE

Initially, neurons were organized into nerve nets, which are distrib-

uted throughout the organism and serve to coordinate activities such

as eating and digestion or the timed contraction of muscle cells across

a wide area. The graceful propulsion of jellyfish is the result of a nerve

net. Jellyfish have no brains at all.

Brains came later, along with complex sense organs such as eyes

and ears. Several hundred million years after jellyfish emerged with

their nerve nets, we humans arrived with our big brains— a hundred

billion (10

) neurons and a quadrillion (10

) synapses. While slow

compared to electronic circuits, the “cycle time” of a few milliseconds

per state change is fast compared to most biological processes. The

human brain is often described by its owners as “the most complex

object in the universe,” which probably isn’t true but is a good excuse

for the fact that we still understand little about how it really works.

While we know a great deal about the biochemistry of neurons and

synapses and the anatomical structures of the brain, the neural imple-

mentation of the cognitive

level— learning, knowing, remembering,

reasoning, planning, deciding, and so on— is still mostly anyone’s

guess.

(Perhaps that will change as we understand more about AI, or

as we develop ever more precise tools for measuring brain activity.)

So, when one reads in the media that such- and- such AI technique

“works just like the human brain,” one may suspect it’s either just

someone’s guess or plain fiction.

In the area of consciousness, we really do know nothing, so I’m go-

ing to say nothing. No one in AI is working on making machines con-

scious, nor would anyone know where to start, and no behavior has

consciousness as a prerequisite. Suppose I give you a program and ask,

“Does this present a threat to humanity?” You analyze the code and

indeed, when run, the code will form and carry out a plan whose re-

sult will be the destruction of the human race, just as a chess program

will form and carry out a plan whose result will be the defeat of any

human who faces it. Now suppose I tell you that the code, when run,

also creates a form of machine consciousness. Will that change your

9780525558613_Human_TX.indd 16 8/7/19 11:21 PM

Not

ike the h

e the h

guess or plguess or p

area of area of

for

ill chan

more preci

more prec

in the

Distribution

w m

cal proces

l proces

as “the mos “the mo

true but is e but is

le about hoe about ho

he biochembiochem

tures of theres of the

el——

learninlearn

ng, and s

ge a

INTELLIGENCE IN HUMANS AND MACHINES 17

prediction? Not at all. It makes absolutely no difference.

Your predic-

tion about its behavior is exactly the same, because the prediction is

based on the code. All those Hollywood plots about machines myste-

riously becoming conscious and hating humans are really missing the

point: it’s competence, not consciousness, that matters.

There is one important cognitive aspect of the brain that we are

beginning to understand—namely, the reward system. This is an inter-

nal signaling system, mediated by dopamine, that connects positive

and negative stimuli to behavior. Its workings were discovered by

the Swedish neuroscientist Nils- Åke Hillarp and his collaborators in

the late 1950s. It causes us to seek out positive stimuli, such as sweet-

tasting foods, that increase dopamine levels; it makes us avoid negative

stimuli, such as hunger and pain, that decrease dopamine levels. In a

sense it’s quite similar to E. coli

’s glucose- seeking mechanism, but

much more complex. It comes with built- in methods for learning, so

that our behavior becomes more effective at obtaining reward over

time. It also allows for delayed gratification, so that we learn to desire

things such as money that provide eventual reward rather than imme-

diate reward. One reason we understand the brain’s reward system is

that it resembles the method of reinforcement learning developed in AI,

for which we have a very solid theory.

From an evolutionary point of view, we can think of the brain’s

reward system, just like E. coli

’s glucose- seeking mechanism, as a way

of improving evolutionary fitness. Organisms that are more effective

in seeking reward— that is, finding delicious food, avoiding pain, en-

gaging in sexual activity, and so on— are more likely to propagate their

genes. It is extraordinarily difficult for an organism to decide what

actions are most likely, in the long run, to result in successful propa-

gation of its genes, so evolution has made it easier for us by providing

built- in signposts.

These signposts are not perfect, however. There are ways to obtain

reward that probably reduce the likelihood that one’s genes will prop-

agate. For example, taking drugs, drinking vast quantities of sugary

M 9780525558613_Human_TX.indd 17 8/7/19 11:21 PM

Not

ave

evolution

volution

ystem, just ystem, just

ving evoving ev

for

son we

n w

e method o

e method

a very so

very s

Distribution

coll

muli, such

uli, such

makes us avmakes us av

ease dopame dopam

cose-ose-

seekinseekin

builtuilt

n men m

e effective effective

d gratificatid gratifica

provide eve

provide ev

und

18 HUMAN COMPATIBLE

carbonated beverages, and playing video games for eighteen hours a

day all seem counterproductive in the reproduction stakes. Moreover,

if you were given direct electrical access to your reward system, you

would probably self- stimulate without stopping until you died.

The misalignment of reward signals and evolutionary fitness

doesn’t affect only isolated individuals. On a small island off the coast

of Panama lives the pygmy three- toed sloth, which appears to be ad-

dicted to a Valium- like substance in its diet of red mangrove leaves

and may be going extinct.

Thus, it seems that an entire species can

disappear if it finds an ecological niche where it can satisfy its reward

system in a maladaptive way.

Barring these kinds of accidental failures, however, learning to

maximize reward in natural environments will usually improve one’s

chances for propagating one’s genes and for surviving environmental

changes.

Evolutionary accelerator

Learning is good for more than surviving and prospering. It also

speeds up evolution. How could this be? After all, learning doesn’t

change one’s DNA, and evolution is all about changing DNA over

generations. The connection between learning and evolution was pro-

posed in 1896 by the American psychologist James Baldwin

and in-

dependently by the British ethologist Conwy Lloyd Morgan

but not

generally accepted at the time.

The Baldwin effect, as it is now known, can be understood by

imagining that evolution has a choice between creating an instinctive

organism whose every response is fixed in advance and creating an

adaptive organism that learns what actions to take. Now suppose, for

the purposes of illustration, that the optimal instinctive organism can

be coded as a six- digit number, say, 472116, while in the case of the

adaptive organism, evolution specifies only 472*** and the organism

itself has to fill in the last three digits by learning during its lifetime.

9780525558613_Human_TX.indd 18 8/7/19 11:21 PM

Not

The conne

e conne

1896 by the1896 by th

tly by thtly by t

for

r more

How coul

How cou

and ev

Distribution

tisfy

however, lhowever, l

will usually usually

for survivifor survivi

ator

than

tha

INTELLIGENCE IN HUMANS AND MACHINES 19

Clearly, if evolution has to worry about choosing only the first three

digits, its job is much easier; the adaptive organism, in learning the last

three digits, is doing in one lifetime what evolution would have taken

many generations to do. So, provided the adaptive organisms can sur-

vive while learning, it seems that the capability for learning consti-

tutes an evolutionary shortcut. Computational simulations suggest

that the Baldwin effect is real.

The effects of culture only accelerate

the process, because an organized civilization protects the individual

organism while it is learning and passes on information that the indi-

vidual would otherwise need to learn for itself.

The story of the Baldwin effect is fascinating but incomplete: it

assumes that learning and evolution necessarily point in the same di-

rection. That is, it assumes that whatever internal feedback signal de-

fines the direction of learning within the organism is perfectly aligned

with evolutionary fitness. As we have seen in the case of the pygmy

three- toed sloth, this does not seem to be true. At best, built- in mech-

anisms for learning provide only a crude hint of the long- term conse-

quences of any given action for evolutionary fitness. Moreover, one has

to ask, “How did the reward system get there in the first place?” The

answer, of course, is by an evolutionary process, one that internalized

a feedback mechanism that is at least somewhat aligned with evolu-

tionary fitness.

Clearly, a learning mechanism that caused organisms

to run away from potential mates and towards predators would not

last long.

Thus, we have the Baldwin effect to thank for the fact that neu-

rons, with their capabilities for learning and problem solving, are so

widespread in the animal kingdom. At the same time, it is important

to understand that evolution doesn’t really care whether you have a

brain or think interesting thoughts. Evolution considers you only as an

agent, that is, something that acts. Such worthy intellectual character-

istics as logical reasoning, purposeful planning, wisdom, wit, imagina-

tion, and creativity may be essential for making an agent intelligent, or

they may not. One reason artificial intelligence is so fascinating is that

M 9780525558613_Human_TX.indd 19 8/7/19 11:21 PM

Not

han

ss.

Clea

way from pway from

for

eward

war

s by an evo

s by an ev

ism tha

Distribution

g but incom

but inco

y point in ty point in t

nternal feednal fee

e organism organism

ve seen in tseen in

em to be trum to be tru

nly a crudenly a crud

for evolut

syste

20 HUMAN COMPATIBLE

it offers a potential route to understanding these issues: we may come

to understand both how these intellectual characteristics make intel-

ligent behavior possible and why it’s impossible to produce truly intel-

ligent behavior without them.

Rationality for one

From the earliest beginnings of ancient Greek philosophy, the con-

cept of intelligence has been tied to the ability to perceive, to reason,

and to act successfully.

Over the centuries, the concept has become

both broader in its applicability and more precise in its definition.

Aristotle, among others, studied the notion of successful reasoning—

methods of logical deduction that would lead to true conclusions given

true premises. He also studied the process of deciding how to act—

sometimes called practical reasoning— and proposed that it involved

deducing that a certain course of action would achieve a desired goal:

We deliberate not about ends, but about means. For a doctor does

not deliberate whether he shall heal, nor an orator whether he

shall persuade.... They assume the end and consider how and by

what means it is attained, and if it seems easily and best produced

thereby; while if it is achieved by one means only they consider

how it will be achieved by this and by what means this will be

achieved, till they come to the first cause... and what is last in the

order of analysis seems to be first in the order of becoming. And if

we come on an impossibility, we give up the search, e.g., if we

need money and this cannot be got; but if a thing appears possible

we try to do it.

This passage, one might argue, set the tone for the next two- thousand-

odd years of Western thought about rationality. It says that the “end”—

what the person wants— is fixed and given; and it says that the rational

9780525558613_Human_TX.indd 20 8/7/19 11:21 PM

Not

while if it

le if it

will be achwill be ac

ed, till thed, till th

for

her he

. They assu

. They ass

ttained,

Distribution

ept h

n its defin

ts defin

successful uccessful

d to true contrue co

ess of decidss of decid

—

nd propnd prop

ction wouldion would

nds, but ab

nds, but a

hall

INTELLIGENCE IN HUMANS AND MACHINES 21

action is one that, according to logical deduction across a sequence of

actions, “easily and best” produces the end.

Aristotle’s proposal seems reasonable, but it isn’t a complete guide

to rational behavior. In particular, it omits the issue of uncertainty. In

the real world, reality has a tendency to intervene, and few actions or

sequences of actions are truly guaranteed to achieve the intended end.

For example, it is a rainy Sunday in Paris as I write this sentence, and

on Tuesday at 2:15 p.m. my flight to Rome leaves from Charles de

Gaulle Airport, about forty- five minutes from my house. I plan to

leave for the airport around 11:30 a.m., which should give me plenty

of time, but it probably means at least an hour sitting in the departure

area. Am I certain to catch the flight? Not at all. There could be huge

traffic jams, the taxi drivers may be on strike, the taxi I’m in may

break down or the driver may be arrested after a high- speed chase,

and so on. Instead, I could leave for the airport on Monday, a whole

day in advance. This would greatly reduce the chance of missing the

flight, but the prospect of a night in the departure lounge is not an

appealing one. In other words, my plan involves a

trade- off between

the certainty of success and the cost of ensuring that degree of cer-

tainty. The following plan for buying a house involves a similar trade-

off: buy a lottery ticket, win a million dollars, then buy the house.

This plan “easily and best” produces the end, but it’s not very likely to

succeed. The difference between this harebrained house- buying plan

and my sober and sensible airport plan is, however, just a matter of

degree. Both are gambles, but one seems more rational than the other.

It turns out that gambling played a central role in generalizing Ar-

istotle’s proposal to account for uncertainty. In the 1560s, the Italian

mathematician Gerolamo Cardano developed the first mathemati-

cally precise theory of probability— using dice games as his main ex-

ample. (Unfortunately, his work was not published until 1663.

) In

the seventeenth century, French thinkers including Antoine Arnauld

and Blaise Pascal began— for assuredly mathematical reasons— to

M 9780525558613_Human_TX.indd 21 8/7/19 11:21 PM

Not

asily and b

ly and b

The differehe differ

ober anober an

for

ss and

ng plan for

ng plan fo

ticket, w

icket, w

Distribution

d giv

ing in the d

g in the

ll. There col. There co

trike, the te, the

ted after ated after a

the airporhe airpor

tly reduce ty reduce t

night in thnight in t

ords, my p

the

22 HUMAN COMPATIBLE

study the question of rational decisions in gambling.

Consider the

following two bets:

A: 20 percent chance of winning $10

B: 5 percent chance of winning $100

The proposal the mathematicians came up with is probably the same

one you would come up with: compare the expected values of the bets,

which means the average amount you would expect to get from each

bet. For bet A, the expected value is 20 percent of $10, or $2. For bet

B, the expected value is 5 percent of $100, or $5. So bet B is better,

according to this theory. The theory makes sense, because if the same

bets are offered over and over again, a bettor who follows the rule ends

up with more money than one who doesn’t.

In the eighteenth century, the Swiss mathematician Daniel Ber-

noulli noticed that this rule didn’t seem to work well for larger amounts

of money.

For example, consider the following two bets:

A: 100 percent chance of getting $10,000,000

(expected value $10,000,000)

B: 1 percent chance of getting $1,000,000,100

(expected value $10,000,001)

Most readers of this book, as well as its author, would prefer bet A to

bet B, even though the expected- value rule says the opposite! Ber-

noulli posited that bets are evaluated not according to expected mon-

etary value but according to expected utility

. Utility— the property of

being useful or beneficial to a person— was, he suggested, an internal,

subjective quantity related to, but distinct from, monetary value. In

particular, utility exhibits diminishing returns with respect to money.

This means that the utility of a given amount of money is not strictly

proportional to the amount but grows more slowly. For example, the

utility of having $1,000,000,100 is much less than a hundred times

9780525558613_Human_TX.indd 22 8/7/19 11:21 PM

Not

per

(ex

ers of thers of t

for

cent ch

t c

(expected v

(expected

cent cha

ent cha

Distribution

, or

So bet B i

o bet B i

e, because e, because

who followo follow

n’t.nt.

wiss mathemss mathe

seem to worem to wor

der the follder the fo

nce

INTELLIGENCE IN HUMANS AND MACHINES 23

the utility of having $10,000,000. How much less? You can ask your-

self! What would the odds of winning a billion dollars have to be for

you to give up a guaranteed ten million? I asked this question of the

graduate students in my class and their answer was around 50 percent,

meaning that bet B would have an expected value of $500 million to

match the desirability of bet A. Let me say that again: bet B would

have an expected dollar value fifty times greater than bet A, but the

two bets would have equal utility.

Bernoulli’s introduction of utility— an invisible property— to ex-

plain human behavior via a mathematical theory was an utterly re-

markable proposal for its time. It was all the more remarkable for the

fact that, unlike monetary amounts, the utility values of various bets

and prizes are not directly observable; instead, utilities are to be in-

ferred from the preferences exhibited by an individual. It would be two

centuries before the implications of the idea were fully worked out

and it became broadly accepted by statisticians and economists.

In the middle of the twentieth century, John von Neumann (a

great mathematician after whom the standard “von Neumann archi-

tecture” for computers was named

) and Oskar Morgenstern pub-

lished an axiomatic basis for utility theory.

What this means is the

following: as long as the preferences exhibited by an individual satisfy

certain basic axioms that any rational agent should satisfy, then neces-

sarily the choices made by that individual can be described as maxi-

mizing the expected value of a utility function. In short, a rational

agent acts so as to maximize expected utility.

It’s hard to overstate the importance of this conclusion. In many

ways, artificial intelligence has been mainly about working out the

details of how to build rational machines.

Let’s look in a bit more detail at the axioms that rational entities

are expected to satisfy. Here’s one, called transitivity: if you prefer A

to B and you prefer B to C, then you prefer A to C. This seems pretty

reasonable! (If you prefer sausage pizza to plain pizza, and you prefer

plain pizza to pineapple pizza, then it seems reasonable to predict that

M 9780525558613_Human_TX.indd 23 8/7/19 11:21 PM

Not

ng a

c axioms t

xioms t

e choices me choices m

he expehe expe

for

s was

basis for u

basis for

s the pr

the pr

Distribution

as an

e remarkab

emarkab

y values of values of

tead, utilitid, utilit

an individun individu

f the idea whe idea

by statisticistatistici

entieth cenentieth ce

whom the

whom th

nam

24 HUMAN COMPATIBLE

you will choose sausage pizza over pineapple pizza.) Here’s another,

called monotonicity: if you prefer prize A to prize B, and you have a

choice of lotteries where A and B are the only two possible outcomes,

you prefer the lottery with the highest probability of getting A rather

than B. Again, pretty reasonable.

Preferences are not just about pizza and lotteries with monetary

prizes. They can be about anything at all; in particular, they can be

about entire future lives and the lives of others. When dealing with

preferences involving sequences of events over time, there is an addi-

tional assumption that is often made, called stationarity: if two differ-

ent futures A and B begin with the same event, and you prefer A to

B, you still prefer A to B after the event has occurred. This sounds

reasonable, but it has a surprisingly strong consequence: the utility of

any sequence of events is the sum of rewards associated with each

event (possibly discounted over time, by a sort of mental interest

rate).

Although this “utility as a sum of rewards” assumption is

widespread— going back at least to the eighteenth- century “hedonic

calculus” of Jeremy Bentham, the founder of utilitarianism— the sta-

tionarity assumption on which it is based is not a necessary property

of rational agents. Stationarity also rules out the possibility that one’s

preferences might change over time, whereas our experience indicates

otherwise.

Despite the reasonableness of the axioms and the importance of

the conclusions that follow from them, utility theory has been sub-

jected to a continual barrage of objections since it first became widely

known. Some despise it for supposedly reducing everything to money

and selfishness. (The theory was derided as “American” by some French

authors,

even though it has its roots in France.) In fact, it is perfectly

rational to want to live a life of self- denial, wishing only to reduce the

suffering of others. Altruism simply means placing substantial weight

on the well- being of others in evaluating any given future.

Another set of objections has to do with the difficulty of obtaining

the necessary probabilities and utility values and multiplying them

9780525558613_Human_TX.indd 24 8/7/19 11:21 PM

Not

ht c

e the reasoe the reas

usions thusions t

for

n whic

whi

tationarity

ationarity

hange ov

ange ov

Distribution

: if

nd you pre

you pre

occurred. Tccurred. T

onsequenceequenc

ewards assowards asso

e, by a sorby a so

a sum of sum of

ast to the ast to the

m, the fou

it i

INTELLIGENCE IN HUMANS AND MACHINES 25

together to calculate expected utilities. These objections are simply

confusing two different things: choosing the rational action and choos-

ing it by calculating expected utilities. For example, if you try to poke

your eyeball with your finger, your eyelid closes to protect your eye;

this is rational, but no expected- utility calculations are involved. Or

suppose you are riding a bicycle downhill with no brakes and have a

choice between crashing into one concrete wall at ten miles per hour

or another, farther down the hill, at twenty miles per hour; which

would you prefer? If you chose ten miles per hour, congratulations!

Did you calculate expected utilities? Probably not. But the choice of

ten miles per hour is still rational. This follows from two basic as-

sumptions: first, you prefer less severe injuries to more severe injuries,

and second, for any given level of injuries, increasing the speed of

collision increases the probability of exceeding that level. From these

two assumptions it follows mathematically— without considering any

numbers at all— that crashing at ten miles per hour has higher ex-

pected utility than crashing at twenty.

In summary, maximizing

expected utility may not require calculating any expectations or any

utilities. It’s a purely external description of a rational entity.

Another critique of the theory of rationality lies in the identifica-

tion of the locus of decision making. That is, what things count as

agents? It might seem obvious that humans are agents, but what about

families, tribes, corporations, cultures, and nation- states? If we exam-

ine social insects such as ants, does it make sense to consider a single

ant as an intelligent agent, or does the intelligence really lie in the

colony as a whole, with a kind of composite brain made up of multiple

ant brains and bodies that are interconnected by pheromone signaling

instead of electrical signaling? From an evolutionary point of view, this

may be a more productive way of thinking about ants, since the ants

in a given colony are typically closely related. As individuals, ants and

other social insects seem to lack an instinct for self- preservation as

distinct from colony preservation: they will always throw themselves

into battle against invaders, even at suicidal odds. Yet sometimes

M 9780525558613_Human_TX.indd 25 8/7/19 11:21 PM

Not

us o

ight seem

ht seem

tribes, corptribes, cor

l insectsl insect

for

xterna

ue of the th

e of the t

f decisi

Distribution

ut th

from two

om two

to more sevo more sev

s, increasinncreasin

ceeding thaeeding tha

atically—cally—

t ten milesten miles

at twenty.at twent

equire calc

equire cal

desc

26 HUMAN COMPATIBLE

humans will do the same even to defend unrelated humans; it is as if

the species benefits from the presence of some fraction of individuals

who are willing to sacrifice themselves in battle, or to go off on wild,

speculative voyages of exploration, or to nurture the offspring of oth-

ers. In such cases, an analysis of rationality that focuses entirely on the

individual is clearly missing something essential.

The other principal objections to utility theory are empirical—

that is, they are based on experimental evidence suggesting that hu-

mans are irrational. We fail to conform to the axioms in systematic

ways.

It is not my purpose here to defend utility theory as a formal

model of human behavior. Indeed, humans cannot possibly behave

rationally. Our preferences extend over the whole of our own future

lives, the lives of our children and grandchildren, and the lives of oth-

ers, living now or in the future. Yet we cannot even play the right

moves on the chessboard, a tiny, simple place with well- defined rules

and a very short horizon. This is not because our preferences are irra-

tional but because of the complexity of the decision problem. A great

deal of our cognitive structure is there to compensate for the mis-

match between our small, slow brains and the incomprehensibly huge

complexity of the decision problem that we face all the time.

So, while it would be quite unreasonable to base a theory of bene-

ficial AI on an assumption that humans are rational, it’s quite reason-

able to suppose that an adult human has roughly consistent preferences

over future lives. That is, if you were somehow able to watch two movies,

each describing in sufficient detail and breadth a future life you might

lead, such that each constitutes a virtual experience, you could say which

you prefer, or express indifference.

This claim is perhaps stronger than necessary, if our only goal is to

make sure that sufficiently intelligent machines are not catastrophic

for the human race. The very notion of catastrophe

entails a definitely-

not- preferred life. For catastrophe avoidance, then, we need claim

only that adult humans can recognize a catastrophic future when it is

spelled out in great detail. Of course, human preferences have a much

9780525558613_Human_TX.indd 26 8/7/19 11:21 PM

Not

wou

n assump

assump

ppose that appose that

e lives. Te lives.

for

all, slow

slo

ecision prob

ecision pro

d be qu

Distribution

ory

ot possibly

possibly

ole of our oole of our o

dren, and thn, and t

e cannot evcannot ev

ple place wiplace w

not becauset because

plexitylexity

of th of t

ure is the

ure is th

bra

INTELLIGENCE IN HUMANS AND MACHINES 27

more fine- grained and, presumably, ascertainable structure than just

“ non- catastrophes are better than catastrophes.”

A theory of beneficial AI can, in fact, accommodate inconsistency

in human preferences, but the inconsistent part of your preferences

can never be satisfied and there’s nothing AI can do to help. Suppose,

for example, that your preferences for pizza violate the axiom of

transitivity:

ROBOT: Welcome home! Want some pineapple pizza?

YOU: No, you should know I prefer plain pizza to pineapple.

ROBOT: OK, one plain pizza coming up!

YOU: No thanks, I like sausage pizza better.

ROBOT: So sorry, one sausage pizza!

YOU: Actually, I prefer pineapple to sausage.

ROBOT: My mistake, pineapple it is!

YOU: I already said I like plain better than pineapple.

There is no pizza the robot can serve that will make you happy

because there’s always another pizza you would prefer to have. A ro-

bot can satisfy only the consistent part of your preferences—for exam-

ple, let’s say you prefer all three kinds of pizza to no pizza at all. In

that case, a helpful robot could give you any one of the three pizzas,

thereby satisfying your preference to avoid “no pizza” while leaving

you to contemplate your annoyingly inconsistent pizza topping prefer-

ences at leisure.

Rationality for two

The basic idea that a rational agent acts so as to maximize ex-

pected utility is simple enough, even if actually doing it is impossibly

complex. The theory applies, however, only in the case of a single

agent acting alone. With more than one agent, the notion that it’s

possible— at least in principle— to assign probabilities to the different

M 9780525558613_Human_TX.indd 27 8/7/19 11:21 PM

Not

u p

helpful ro

lpful ro

satisfying ysatisfying

ntemplantempl

for

anoth

not

the consist

the consis

refer all

efer all

Distribution

ineap

ausage.ausage.

is!

better thanetter than

obot can

rpi

28 HUMAN COMPATIBLE

outcomes of one’s actions becomes problematic. The reason is that

now there’s a part of the world— the other agent— that is trying to

second- guess what action you’re going to do, and vice versa, so it’s not

obvious how to assign probabilities to how that part of the world is

going to behave. And without probabilities, the definition of rational

action as maximizing expected utility isn’t applicable.

As soon as someone else comes along, then, an agent will need

some other way to make rational decisions. This is where game theory

comes in. Despite its name, game theory isn’t necessarily about games

in the usual sense; it’s a general attempt to extend the notion of ratio-

nality to situations with multiple agents. This is obviously important

for our purposes, because we aren’t planning (yet) to build robots that

live on uninhabited planets in other star systems; we’re going to put

the robots in our world, which is inhabited by us.

To make it clear why we need game theory, let’s look at a simple ex-

ample: Alice and Bob playing soccer in the back garden (figure 3). Alice

is about to take a penalty kick and Bob is in goal. Alice is going to shoot

FIGURE $OLFHDERXWWRWDNHDSHQDOW\NLFNDJDLQVW%RE

9780525558613_Human_TX.indd 28 8/7/19 11:21 PM

Not

for

Distribution

notio

bviously im

iously im

et) to build t) to build

stems; we’rms; we’

ed by us.d by us.

e theory, letheory, le

er in the bacn the bac

and Bob is innd Bob is

INTELLIGENCE IN HUMANS AND MACHINES 29

to Bob’s left or to his right. Because she is right- footed, it’s a little bit

easier and more accurate for Alice to shoot to Bob’s right. Because Alice

has a ferocious shot, Bob knows he has to dive one way or the other right

away— he won’t have time to wait and see which way the ball is going.

Bob could reason like this: “Alice has a better chance of scoring if she

shoots to my right, because she’s right- footed, so she’ll choose that, so

I’ll dive right.” But Alice is no fool and can imagine Bob thinking this

way, in which case she will shoot to Bob’s left. But Bob is no fool and can

imagine Alice thinking this way, in which case he will dive to his left.

But Alice is no fool and can imagine Bob thinking this way.... OK, you

get the idea. Put another way: if there is a rational choice for Alice, Bob

can figure it out too, anticipate it, and stop Alice from scoring, so the

choice couldn’t have been rational in the first place.

As early as 1713— once again, in the analysis of gambling games— a

solution was found to this conundrum.

The trick is not to choose any

one action but to choose a randomized strategy. For example, Alice can

choose the strategy “shoot to Bob’s right with probability 55 percent

and shoot to his left with probability 45 percent.” Bob could choose

“dive right with probability 60 percent and left with probability 40

percent.” Each mentally tosses a suitably biased coin just before act-

ing, so they don’t give away their intentions. By acting unpredictably,

Alice and Bob avoid the contradictions of the preceding paragraph.

Even if Bob works out what Alice’s randomized strategy is, there’s not

much he can do about it without a crystal ball.

The next question is, What should the probabilities be? Is Alice’s

choice of 55 percent– 45 percent rational? The specific values depend

on how much more accurate Alice is when shooting to Bob’s right,

how good Bob is at saving the shot when he dives the right way, and so

on. (See the notes for the complete analysis.

) The general criterion is

very simple, however:

1. Alice’s strategy is the best she can devise, assuming that Bob’s

is fixed.

M 9780525558613_Human_TX.indd 29 8/7/19 11:21 PM

Not

ob avoid

avoid

ob works oob works o

can do can do

for

ability

lity

ntally tosse

ntally toss

give awa

ive awa

Distribution

ay.

hoice for A

oice for A

ice from scoce from sco

t place.ace.

analysis of gnalysis of g

The tri The tr

mized strategzed strateg

Bob’s rightBob’s rig

probability

probabilit

60 p

30 HUMAN COMPATIBLE

2. Bob’s strategy is the best he can devise, assuming that Alice’s

is fixed.

If both conditions are satisfied, we say that the strategies are in

equilibrium. This kind of equilibrium is called a Nash equilibrium in

honor of John Nash, who, in 1950 at the age of twenty- two, proved

that such an equilibrium exists for any number of agents with any ra-

tional preferences and no matter what the rules of the game might be.

After several decades’ struggle with schizophrenia, Nash eventually

recovered and was awarded the Nobel Memorial Prize in Economics

for this work in 1994.

For Alice and Bob’s soccer game, there is only one equilibrium. In

other cases, there may be several, so the concept of Nash equilibria,

unlike that of expected- utility decisions, does not always lead to a

unique recommendation for how to behave.

Worse still, there are situations in which the Nash equilibrium seems

to lead to highly undesirable outcomes. One such case is the famous

prisoner’s dilemma,

so named by Nash’s PhD adviser, Albert Tucker, in

1950.

The game is an abstract model of those all- too- common real-

world situations where mutual cooperation would be better for all

concerned but people nonetheless choose mutual destruction.

The prisoner’s dilemma works as follows: Alice and Bob are sus-

pects in a crime and are being interrogated separately. Each has a

choice: to confess to the police and rat on his or her accomplice, or

to refuse to talk.

If both refuse, they are convicted on a lesser

charge and serve two years; if both confess, they are convicted on a

more serious charge and serve ten years; if one confesses and the other

refuses, the one who confesses goes free and the accomplice serves

twenty years.

Now, Alice reasons as follows: “If Bob is going to confess, then I

should confess too (ten years is better than twenty); if he is going to

refuse, then I should confess (going free is better than spending two

years in prison); so either way, I should confess.” Bob reasons the same

9780525558613_Human_TX.indd 30 8/7/19 11:21 PM

Not

eop

ner’s dile

r’s dile

a crime anda crime an

confessconfes

for

abstra

bstr

here mutua

ere mutu

e nonet

Distribution

e in

ly one equiy one equi

ncept of Npt of N

s, does not, does not

ehave.have.

in which thewhich th

outcomes. Outcomes.

d by Nash’

d by Nash

INTELLIGENCE IN HUMANS AND MACHINES 31

way. Thus, they both end up confessing to their crimes and serving ten

years, even though by jointly refusing they could have served only two

years. The problem is that joint refusal isn’t a Nash equilibrium, because

each has an incentive to defect and go free by confessing.

Note that Alice could have reasoned as follows: “Whatever reason-

ing I do, Bob will also do. So we’ll end up choosing the same thing.

Since joint refusal is better than joint confession, we should refuse.”

This form of reasoning acknowledges that, as rational agents, Alice

and Bob will make choices that are correlated rather than indepen-

dent. It’s just one of many approaches that game theorists have tried

in their efforts to obtain less depressing solutions to the prisoner’s

dilemma.

Another famous example of an undesirable equilibrium is the trag-

edy of the commons, first analyzed in 1833 by the English economist

William Lloyd

but named, and brought to global attention, by the

ecologist Garrett Hardin in 1968.

The tragedy arises when several

people can consume a shared resource— such as common grazing land

or fish stocks— that replenishes itself slowly. Absent any social or legal

constraints, the only Nash equilibrium among selfish ( non- altruistic)

agents is for each to consume as much as possible, leading to rapid

collapse of the resource. The ideal solution, where everyone shares the

resource such that the total consumption is sustainable, is not an equi-

librium because each individual has an incentive to cheat and take

more than their fair share— imposing the costs on others. In practice,

of course, humans do sometimes avoid this tragedy by setting up

mechanisms such as quotas and punishments or pricing schemes.

They can do this because they are not limited to deciding how much

to consume; they can also decide to communicate. By enlarging the

decision problem in this way, we find solutions that are better for

everyone.

These examples, and many others, illustrate the fact that extend-

ing the theory of rational decisions to multiple agents produces many

interesting and complex behaviors. It’s also extremely important

M 9780525558613_Human_TX.indd 31 8/7/19 11:21 PM

Not

eso

h that the

that the

because eaccause ea

n their n their

for

Nash e

to consume

to consum

urce. Th

Distribution

orist

ns to the

to the

ble equilibrequilib

833 by the 33 by the

ought to glght to g

The tra The tr

esource—source—

shes itself

uilib

32 HUMAN COMPATIBLE

because, as should be obvious, there is more than one human being.

And soon there will be intelligent machines too. Needless to say, we

have to achieve mutual cooperation, resulting in benefit to humans,

rather than mutual destruction.

Computers

Having a reasonable definition of intelligence is the first ingredient in

creating intelligent machines. The second ingredient is a machine in

which that definition can be realized. For reasons that will soon be-

come obvious, that machine is a computer. It could have been some-

thing different— for example, we might have tried to make intelligent

machines out of complex chemical reactions or by hijacking biological

cells

— but devices built for computation, from the very earliest me-

chanical calculators onwards, have always seemed to their inventors to

be the natural home for intelligence.

We are so used to computers now that we barely notice their ut-

terly incredible powers. If you have a laptop or a desktop or a smart

phone, look at it: a small box, with a way to type characters. Just by

typing, you can create programs that turn the box into something

new, perhaps something that magically synthesizes moving images of

oceangoing ships hitting icebergs or alien planets with tall blue people;

type some more, and it translates English into Chinese; type some

more, and it listens and speaks; type some more, and it defeats the

world chess champion.

This ability of a single box to carry out any process that you

can imagine is called universality, a concept first introduced by Alan

Turing in 1936.

Universality means that we do not need separate

machines for arithmetic, machine translation, chess, speech under-

standing, or animation: one machine does it all. Your laptop is essen-

tially identical to the vast server farms run by the world’s largest IT

companies— even those equipped with fancy, special- purpose tensor

9780525558613_Human_TX.indd 32 8/7/19 11:21 PM

Not

cre

somethin

omethin

g ships hittg ships hit

e more, e more,

for

If you

fyo

small box,

mall box,

ate pro

Distribution

s a m

that will

at will

oulduld

have bhave b

e tried to med to m

ons or by hons or by h

ation, fromon, from

always seelways see

gence.gence.

uters now

hav

INTELLIGENCE IN HUMANS AND MACHINES 33

processing units for machine learning. It’s also essentially identical to

all future computing devices yet to be invented. The laptop can do

exactly the same tasks, provided it has enough memory; it just takes a

lot longer.

Turing’s paper introducing universality was one of the most im-

portant ever written. In it, he described a simple computing device

that could accept as input the description of any other computing de-

vice, together with that second device’s input, and, by simulating the

operation of the second device on its input, produce the same output

that the second device would have produced. We now call this first

device a universal Turing machine. To prove its universality, Turing in-

troduced precise definitions for two new kinds of mathematical ob-

jects: machines and programs. Together, the machine and program

define a sequence of events— specifically, a sequence of state changes

in the machine and its memory.

In the history of mathematics, new kinds of objects occur quite

rarely. Mathematics began with numbers at the dawn of recorded his-

tory. Then, around 2000 BCE, ancient Egyptians and Babylonians

worked with geometric objects (points, lines, angles, areas, and so on).

Chinese mathematicians introduced matrices during the first millen-

nium BCE, while sets as mathematical objects arrived only in the

nineteenth century. Turing’s new objects— machines and programs—

are perhaps the most powerful mathematical objects ever invented. It

is ironic that the field of mathematics largely failed to recognize this,

and from the 1940s onwards, computers and computation have been

the province of engineering departments in most major universities.

The field that emerged— computer science— exploded over the

next seventy years, producing a vast array of new concepts, designs,

methods, and applications, as well as seven of the eight most valuable

companies in the world.

The central concept in computer science is that of an algorithm,

which is a precisely specified method for computing something. Algo-

rithms are, by now, familiar parts of everyday life: a square- root

M 9780525558613_Human_TX.indd 33 8/7/19 11:21 PM

Not

ile

century. T

ntury. T

aps the mosaps the mo

that thethat the

for

c objec

bje

icians intro

icians intr

sets as

Distribution

ow c

versality, T

rsality, T

ds of mathes of mathe

the machinmachin

y, a sequeny, a sequen

cs, new kinnew kin

th numbersth numbe

BCE, anc

s(p

34 HUMAN COMPATIBLE

algorithm in a pocket calculator receives a number as input and re-

turns the square root of that number as output; a chess- playing algo-

rithm takes a chess position and returns a move; a route- finding

algorithm takes a start location, a goal location, and a street map and

returns the fastest route from start to goal. Algorithms can be de-

scribed in English or in mathematical notation, but to be implemented

they must be coded as programs in a programming language. More

complex algorithms can be built by using simpler ones as building

blocks called subroutines

—for example, a self- driving car might use a

route- finding algorithm as a subroutine so that it knows where to go.

In this way, software systems of immense complexity are built up,

layer by layer.

Computer hardware matters because faster computers with more

memory allow algorithms to run more quickly and to handle more

information. Progress in this area is well known but still mind-

boggling. The first commercial electronic programmable computer,

the Ferranti Mark I, could execute about a thousand (10

) instructions

per second and had about a thousand bytes of main memory. The fast-

est computer as of early 2019, the Summit machine at the Oak Ridge

National Laboratory in Tennessee, executes about 10

instructions

per second (a thousand trillion times faster) and has 2.5 × 10

bytes of

memory (250 trillion times more). This progress has resulted from

advances in electronic devices and even in the underlying physics, al-

lowing an incredible degree of miniaturization.

Although comparisons between computers and brains are not es-

pecially meaningful, the numbers for Summit slightly exceed the raw

capacity of the human brain, which, as noted previously, has about

synapses and a “cycle time” of about one hundredth of a second,

for a theoretical maximum of about 10

“operations” per second. The

biggest difference is power consumption: Summit uses about a million

times more power.

Moore’s law, an empirical observation that the number of electronic

components on a chip doubles every two years, is expected to continue

9780525558613_Human_TX.indd 34 8/7/19 11:21 PM

Not

usa

0 trillion

trillion

n electronin electron

incredibincredi

for

y 2019,

019

y in Tenne

in Tenn

nd trilli

Distribution

ws w

exity are b

ity are b

ter computcompu

quickly andquickly an

is well knowell kn

lectronic pctronic p

ute about aute about

housand b

the

INTELLIGENCE IN HUMANS AND MACHINES 35

until 2025 or so, although at a slightly slower rate. For some years,

speeds have been limited by the large amount of heat generated by the

fast switching of silicon transistors; moreover, circuit sizes cannot get

much smaller because the wires and connectors are (as of 2019) no

more than twenty- five atoms wide and five to ten atoms thick. Beyond

2025, we will need to use more exotic physical phenomena— including

negative capacitance devices,

single- atom transistors, graphene nano-

tubes, and photonics— to keep Moore’s law (or its successor) going.

Instead of just speeding up general- purpose computers, another

possibility is to build special- purpose devices that are customized to

perform just one class of computations. For example, Google’s tensor

processing units (TPUs) are designed to perform the calculations re-

quired for certain machine learning algorithms. One TPU pod (2018

version) performs roughly 10

calculations per second— nearly as much

as the Summit machine— but uses about one hundred times less

power and is one hundred times smaller. Even if the underlying chip

technology remains roughly constant, these kinds of machines can

simply be made larger and larger to provide vast quantities of raw

computational power for AI systems.

Quantum computation is a different kettle of fish. It uses the

strange properties of quantum- mechanical wave functions to achieve

something remarkable: with twice the amount of quantum hardware,

you can do more than twice the amount of computation! Very roughly,

it works like this:

Suppose you have a tiny physical device that stores

a quantum bit, or qubit. A qubit has two possible states, 0 and 1.

Whereas in classical physics the qubit device has to be in one of the

two states, in quantum physics the wave function that carries informa-

tion about the qubit says that it is in both states simultaneously. If you

have two qubits, there are four possible joint states: 00, 01, 10, and 11.

If the wave function is coherently entangled across the two qubits,

meaning that no other physical processes are there to mess it up, then

the two qubits are in all four states simultaneously. Moreover, if the

two qubits are connected into a quantum circuit that performs some

M 9780525558613_Human_TX.indd 35 8/7/19 11:21 PM

Not

ies

emarkable

markabl

more thamore tha

ike thisike this

for

for AI

putation is

putation i

uanuan

Distribution

e cu

ple, Googl

e, Goog

rm the calcm the calc

hms. One Ts. One

ns per ns per

econecon

about onbout on

smaller. Evmaller. Ev

constant, tconstant,

d larger to

d larger t

yste

36 HUMAN COMPATIBLE

calculation, then the calculation proceeds with all four states simulta-

neously. With three qubits, you get eight states processed simultane-

ously, and so on. Now, there are some physical limitations so that the

amount of work that gets done is less than exponential in the number

of qubits,

but we know that there are important problems for which

quantum computation is provably more efficient than any classical

computer.

As of 2019, there are experimental prototypes of small quantum

processors in operation with a few tens of qubits, but there are no in-

teresting computing tasks for which a quantum processor is faster

than a classical computer. The main difficulty is decoherence—

processes such as thermal noise that mess up the coherence of the

multi- qubit wave function. Quantum scientists hope to solve the

decoherence problem by introducing error correction circuitry, so that

any error that occurs in the computation is quickly detected and cor-

rected by a kind of voting process. Unfortunately, error- correcting

systems require far more qubits to do the same work: while a quantum

machine with a few hundred perfect qubits would be very powerful

compared to existing classical computers, we will probably need a few

million error- correcting qubits to actually realize those computations.

Going from a few tens to a few million qubits will take quite a few

years. If, eventually, we get there, that would completely change the

picture of what we can do by sheer brute- force computation.

Rather

than waiting for real conceptual advances in AI, we might be able to

use the raw power of quantum computation to bypass some of the

barriers faced by current “unintelligent” algorithms.

The limits of computation

Even in the 1950s, computers were described in the popular press

as “ super- brains” that were “faster than Einstein.” So can we say now,

finally, that computers are as powerful as the human brain? No. Fo-

cusing on raw computing power misses the point entirely. Speed alone

9780525558613_Human_TX.indd 36 8/7/19 11:21 PM

Not

w t

ntually, w

ually, w

what we cawhat we c

ng for rng for r

for

assical

ica

ting qubits

ting qubit

ens to a

ns to a

Distribution

cesso

is decohe

decohe

the coherethe cohere

ntists hopets hop

or correctionr correctio

tion is quicn is quic

ess. Unfortu. Unfortu

s to do the ss to do the

d perfect

comp

com

INTELLIGENCE IN HUMANS AND MACHINES 37

won’t give us AI. Running a poorly designed algorithm on a faster

computer doesn’t make the algorithm better; it just means you get the

wrong answer more quickly. (And with more data there are more op-

portunities for wrong answers!) The principal effect of faster ma-

chines has been to make the time for experimentation shorter, so that

research can progress more quickly. It’s not hardware that is holding

AI back; it’s software. We don’t yet know how to make a machine re-

ally intelligent— even if it were the size of the universe.

Suppose, however, that we do manage to develop the right kind of

AI software. Are there any limits placed by physics on how powerful

a computer can be? Will those limits prevent us from having enough

computing power to create real AI? The answers seem to be yes, there

are limits, and no, there isn’t a ghost of a chance that the limits will

prevent us from creating real AI. MIT physicist Seth Lloyd has esti-

mated the limits for a laptop- sized computer, based on considerations

from quantum theory and entropy.

The numbers would raise even

Carl Sagan’s eyebrows: 10

operations per second and 10

bytes of

memory, or approximately a billion trillion trillion times faster and

four trillion times more memory than Summit— which, as noted pre-

viously, has more raw power than the human brain. Thus, when one

hears suggestions that the human mind represents an upper limit on

what is physically achievable in our universe,

one should at least ask

for further clarification.

Besides limits imposed by physics, there are other limits on the

abilities of computers that originate in the work of computer scien-

tists. Turing himself proved that some problems are undecidable by

any computer: the problem is well defined, there is an answer, but

there cannot exist an algorithm that always finds that answer. He gave

the example of what became known as the halting problem: Can an

algorithm decide if a given program has an “infinite loop” that pre-

vents it from ever finishing?

Turing’s proof that no algorithm can solve the halting problem

incredibly important for the foundations of mathematics, but it seems

M 9780525558613_Human_TX.indd 37 8/7/19 11:21 PM

Not

ns t

ically ach

ally ach

er clarificatclarifica

es limites limit

for

e mem

aw power t

aw power

hat the

Distribution

n ho

from havin

m havin

rs seem to bs seem to b

chance thatnce tha

physicist Sphysicist S

omputer, bmputer, b

py.y

366

The n The n

operations poperations

a billion

ory t

ory

38 HUMAN COMPATIBLE

to have no bearing on the issue of whether computers can be intelli-

gent. One reason for this claim is that the same basic limitation seems

to apply to the human brain. Once you start asking a human brain to

perform an exact simulation of itself simulating itself simulating itself,

and so on, you’re bound to run into difficulties. I, for one, have never

worried about my inability to do this.

Focusing on decidable problems, then, seems not to place any real

restrictions on AI. It turns out, however, that decidable doesn’t mean

easy. Computer scientists spend a lot of time thinking about the com-

plexity of problems, that is, the question of how much computation is

needed to solve a problem by the most efficient method. Here’s an

easy problem: given a list of a thousand numbers, find the biggest

number. If it takes one second to check each number, then it takes a

thousand seconds to solve this problem by the obvious method of

checking each in turn and keeping track of the biggest. Is there a faster

method? No, because if a method didn’t check some number in the

list, that number might be the biggest, and the method would fail. So,

the time to find the largest element is proportional to the size of the

list. A computer scientist would say the problem has linear complex-

ity, meaning that it’s very easy; then she would look for something

more interesting to work on.

What gets theoretical computer scientists excited is the fact that

many problems appear

to have exponential complexity in the worst

case. This means two things: first, all the algorithms we know about

require exponential time— that is, an amount of time exponential in

the size of the input— to solve at least some problem instances; sec-

ond, theoretical computer scientists are pretty sure that more efficient

algorithms do not exist.

Exponential growth in difficulty means that problems may be

solvable in theory (that is, they are certainly decidable) but sometimes

unsolvable in practice; we call such problems intractable. An example

is the problem of deciding whether a given map can be colored with

just three colors, so that no two adjacent regions have the same color.

9780525558613_Human_TX.indd 38 8/7/19 11:21 PM

Not

s theoreti

heoreti

blems appeblems app

means means

for

st wou

’s very easy

s very ea

work on

Distribution

com

method. H

ethod. H

bers, find tbers, find t

h number, umber,

m by the obby the ob

ck of the biof the bi

didn’tidn’t

cheche

biggest, andbiggest, a

element is

element i

ld sa

INTELLIGENCE IN HUMANS AND MACHINES 39

(It is well known that coloring with four different colors is always

possible.) With a million regions, it may be that there are some cases

(not all, but some) that require something like 2

1000

computational

steps to find the answer, which means about 10

275

years on the Sum-

mit supercomputer or a mere 10

242

years on Seth Lloyd’s ultimate-

physics laptop. The age of the universe, about 10

years, is a tiny blip

compared to this.

Does the existence of intractable problems give us any reason to

think that computers cannot be as intelligent as humans? No. There is

no reason to suppose that humans can solve intractable problems

either. Quantum computation helps a bit (whether in machines or

brains), but not enough to change the basic conclusion.

Complexity means that the real- world decision problem— the

problem of deciding what to do right now, at every instant in one’s

life— is so difficult that neither humans nor computers will ever come

close to finding perfect solutions.

This has two consequences: first, we expect that, most of the time,

real- world decisions will be at best halfway decent and certainly far

from optimal; second, we expect that a great deal of the mental archi-

tecture

of humans and computers— the way their decision processes

actually operate— will be designed to overcome complexity to the ex-

tent possible—that is, to make it possible to find even halfway decent

answers despite the overwhelming complexity of the world. Finally,

we expect that the first two consequences will remain true no matter

how intelligent and powerful some future machine may be. The ma-

chine may be far more capable than us, but it will still be far from

perfectly rational.

Intelligent Computers

The development of logic by Aristotle and others made available pre-

cise rules for rational thought, but we do not know whether Aristotle

M 9780525558613_Human_TX.indd 39 8/7/19 11:21 PM

Not

e—

—that is,

—that is

despite thedespite th

ct that tct that t

for

we ex

e e

and and

ompu

omp

ill be dill be d

Distribution

ctabl

her in ma

r in ma

nclusion.clusion.

d decisiondecision

now, at eveow, at eve

ans nor coms nor com

s: first, we es: first, we

e at best h

e at best

ect

40 HUMAN COMPATIBLE

ever contemplated the possibility of machines that implemented these

rules. In the thirteenth century, the influential Catalan philosopher,

seducer, and mystic Ramon Llull came much closer: he actually made

paper wheels inscribed with symbols, by means of which he could

generate logical combinations of assertions. The great seventeenth-

century French mathematician Blaise Pascal was the first to develop a

real and practical mechanical calculator. Although it could only add

and subtract and was used mainly in his father’s tax- collecting office,

it led Pascal to write, “The arithmetical machine produces effects

which appear nearer to thought than all the actions of animals.”

Technology took a dramatic leap forward in the nineteenth cen-

tury when the British mathematician and inventor Charles Babbage

designed the Analytical Engine, a programmable universal machine in

the sense defined later by Turing. He was helped in his work by Ada,

Countess of Lovelace, daughter of the romantic poet and adventurer

Lord Byron. Whereas Babbage hoped to use the Analytical Engine

to compute accurate mathematical and astronomical tables, Lovelace

understood its true potential,

describing it in 1842 as “a thinking

or... a reasoning machine” that could reason about “all subjects in the

universe.” So, the basic conceptual elements for creating AI were in

place! From that point, surely, AI would be just a matter of time....

A long time, unfortunately— the Analytical Engine was never

built, and Lovelace’s ideas were largely forgotten. With Turing’s theo-

retical work in 1936 and the subsequent impetus of World War II,

universal computing machines were finally realized in the 1940s.

Thoughts about creating intelligence followed immediately. Turing’s

1950 paper, “Computing Machinery and Intelligence,”

is the best

known of many early works on the possibility of intelligent machines.

Skeptics were already asserting that machines would never be able to

do X, for almost any X you could think of, and Turing refuted those

assertions. He also proposed an operational test for intelligence, called

the imitation game, which subsequently (in simplified form) became

known as the Turing test. The test measures the behavior of the

9780525558613_Human_TX.indd 40 8/7/19 11:21 PM

Not

me, e,

nfo

Lovelace’s Lovelace’s

rk in 19rk in 1

for

ine” th

e t

asic concep

asic conce

nt, sure

Distribution

anim

he ninetee

ninetee

ntor Charlntor Charl

able univere unive

s helped in helped in

e romanticromantic

oped to used to us

atical and astical and

al,

414

descrdescr

tco

INTELLIGENCE IN HUMANS AND MACHINES 41

machine— specifically, its ability to fool a human interrogator into

thinking that it is human.

The imitation game serves a specific role in Turing’s paper—namely

as a thought experiment to deflect skeptics who supposed that ma-

chines could not think in the right way, for the right reasons, with the

right kind of awareness. Turing hoped to redirect the argument to-

wards the issue of whether a machine could behave in a certain way;

and if it did— if it was able, say, to discourse sensibly on Shakespeare’s

sonnets and their meanings— then skepticism about AI could not

really be sustained. Contrary to common interpretations, I doubt that

the test was intended as a true definition of intelligence, in the sense

that a machine is intelligent if and only if it passes the Turing test.

Indeed, Turing wrote, “May not machines carry out something which

ought to be described as thinking but which is very different from

what a man does?” Another reason not to view the test as a definition

for AI is that it’s a terrible definition to work with. And for that rea-

son, mainstream AI researchers have expended almost no effort to

pass the Turing test.

The Turing test is not useful for AI because it’s an informal and

highly contingent definition: it depends on the enormously com-

plicated and largely unknown characteristics of the human mind,

which derive from both biology and culture. There is no way to “un-

pack” the definition and work back from it to create machines that

will provably pass the test. Instead, AI has focused on rational behav-

ior, just as described previously: a machine is intelligent to the extent

that what it does is likely to achieve what it wants, given what it has

perceived.

Initially, like Aristotle, AI researchers identified “what it wants”

with a goal that is either satisfied or not. These goals could be in toy

worlds like the 15- puzzle, where the goal is to get all the numbered

tiles lined up in order from 1 to 15 in a little (simulated) square tray;

or they might be in real, physical environments: in the early 1970s, the

Shakey robot at SRI in California was pushing large blocks into

M 9780525558613_Human_TX.indd 41 8/7/19 11:21 PM

Not

rge

e from bo

rom bo

e definitione definitio

ably pasably pa

for

not us

t u

definition:

definition

y unkn

Distribution

ons,

igence, in

ence, in

passes the passes the

carry out soy out so

which is vwhich is v

ot to view to view

tion to worn to wo

hers have exers have

eful

42 HUMAN COMPATIBLE

desired configurations, and Freddy at the University of Edinburgh was

assembling a wooden boat from its component pieces. All this work

was done using logical problem- solvers and planning systems to con-

struct and execute guaranteed plans to achieve goals.

By the 1980s, it was clear that logical reasoning alone could not

suffice, because, as noted previously, there is no plan that is guaranteed

to get you to the airport. Logic requires certainty, and the real world

simply doesn’t provide it. Meanwhile, the Israeli- American computer

scientist Judea Pearl, who went on to win the 2011 Turing Award, had

been working on methods for uncertain reasoning based in probability

theory.

AI researchers gradually accepted Pearl’s ideas; they adopted

the tools of probability theory and utility theory and thereby con-

nected AI to other fields such as statistics, control theory, economics,

and operations research. This change marked the beginning of what

some observers call modern AI.

Agents and environments

The central concept of modern AI is the intelligent agent—

something that perceives and acts. The agent is a process occurring

over time, in the sense that a stream of perceptual inputs is converted

into a stream of actions. For example, suppose the agent in question is

a self- driving taxi taking me to the airport. Its inputs might include

eight RGB cameras operating at thirty frames per second; each frame

consists of perhaps 7.5 million pixels, each with an image intensity

value in each of three color channels, for a total of more than five giga-

bytes per second. (The flow of data from the two hundred million

photoreceptors in the retina is even larger, which partially explains

why vision occupies such a large fraction of the human brain.) The

taxi also gets data from an accelerometer one hundred times per sec-

ond, as well as GPS data. This incredible flood of raw data is trans-

formed by the simply gargantuan computing power of billions of

transistors (or neurons) into smooth, competent driving behavior. The

9780525558613_Human_TX.indd 42 8/7/19 11:21 PM

Not

sen

of actions

action

ing taxi taking taxi ta

camera camer

for

pt of

ceives and

eives and

se that

Distribution

d in p

deas; they

as; they

ory and thory and th

ontrol theorol theo

arked the brked the b

ments

mod

INTELLIGENCE IN HUMANS AND MACHINES 43

taxi’s actions include the electronic signals sent to the steering wheel,

brakes, and accelerator, twenty times per second. (For an experienced

human driver, most of this maelstrom of activity is unconscious: you

may be aware only of making decisions such as “overtake this slow

truck” or “stop for gas,” but your eyes, brain, nerves, and muscles are

still doing all the other stuff.) For a chess program, the inputs are

mostly just the clock ticks, with the occasional notification of the op-

ponent’s move and the new board state, while the actions are mostly

doing nothing while the program is thinking, and occasionally choos-

ing a move and notifying the opponent. For a personal digital assis-

tant, or PDA, such as Siri or Cortana, the inputs include not just the

acoustic signal from the microphone (sampled forty- eight thousand

times per second) and input from the touch screen but also the con-

tent of each Web page that it accesses, while the actions include both

speaking and displaying material on the screen.

The way we build intelligent agents depends on the nature of the

problem we face. This, in turn, depends on three things: first, the

nature of the environment the agent will operate in— a chessboard is

a very different place from a crowded freeway or a mobile phone; sec-

ond, the observations and actions that connect the agent to the

environment— for example, Siri might or might not have access to the

phone’s camera so that it can see; and third, the agent’s objective—

teaching the opponent to play better chess is a very different task from

winning the game.

To give just one example of how the design of the agent depends

on these things: If the objective is to win the game, a chess program

need consider only the current board state and does not need any

memory of past events.

The chess tutor, on the other hand, should

continually update its model of which aspects of chess the pupil does

or does not understand so that it can provide useful advice. In other

words, for the chess tutor, the pupil’s mind is a relevant part of the

environment. Moreover, unlike the board, it is a part of the environ-

ment that is not directly observable.

M 9780525558613_Human_TX.indd 43 8/7/19 11:21 PM

Not

era so th

a so th

the opponethe oppon

the gamthe gam

for

rom a

ions and a

ons and

example

xample

Distribution

nal d

include no

clude no

d d

orty-orty-

ighig

h screen bucreen b

while the acwhile the ac

the screen.e screen

agents depgents dep

urn, dependurn, depen

the agent

row

44 HUMAN COMPATIBLE

The characteristics of problems that influence the design of agents

include at least the following:

• whether the environment is fully observable (as in chess, where

the inputs provide direct access to all the relevant aspects of

the current state of the environment) or partially observable

(as in driving, where one’s field of view is limited, vehicles are

opaque, and other drivers’ intentions are mysterious);

• whether the environment and actions are discrete (as in chess)

or effectively continuous (as in driving);

• whether the environment contains other agents (as in chess

and driving) or not (as in finding the shortest routes on a map);

• whether the outcomes of actions, as specified by the “rules” or

“physics” of the environment, are predictable (as in chess) or

unpredictable (as in traffic and weather), and whether those

rules are known or unknown;

• whether the environment is dynamically changing, so that the

time to make decisions is tightly constrained (as in driving) or

not (as in tax strategy optimization);

• the length of the horizon over which decision quality is mea-

sured according to the objective— this may be very short (as in

emergency braking), of intermediate duration (as in chess,

where a game lasts up to about one hundred moves), or very

long (as in driving me to the airport, which might take hun-

dreds of thousands of decision cycles if the taxi is deciding one

hundred times per second).

As one can imagine, these characteristics give rise to a bewildering

variety of problem types. Just multiplying the choices listed above gives

192 types. One can find real- world problem instances for all the types.

Some types are typically studied in areas outside AI—for example,

designing an autopilot that maintains level flight is a short- horizon,

9780525558613_Human_TX.indd 44 8/7/19 11:21 PM

Not

rdi

ency brak

cy brak

ere a game ere a game

g (as in g (as in

for

tegy op

the horizon

the horizo

ng to the

g to the

Distribution

nts (as in

s (as in

st routes ont routes o

ecified by thed by t

predictable redictable

weather), aeather),

t is dynamict is dynam

s is tightly

timi

INTELLIGENCE IN HUMANS AND MACHINES 45

continuous, dynamic problem that is usually studied in the field of con-

trol theory.

Obviously some problem types are easier than others. AI has made

a lot of progress on problems such as board games and puzzles that are

observable, discrete, deterministic, and have known rules. For the eas-

ier problem types, AI researchers have developed fairly general and

effective algorithms and a solid theoretical understanding; often, ma-

chines exceed human performance on these kinds of problems. We

can tell that an algorithm is general because we have mathematical

proofs that it gives optimal or near- optimal results with reasonable

computational complexity across an entire class of problems, and be-

cause it works well in practice on those kinds of problems without

needing any problem- specific modifications.

Video games such as StarCraft are quite a bit harder than board

games: they involve hundreds of moving parts and time horizons of

thousands of steps, and the board is only partially visible at any given

time. At each point, a player might have a choice of at least 10

moves,

compared to about 10

in Go.

On the other hand, the rules are

known and the world is discrete with only a few types of objects. As

of early 2019, machines are as good as some professional StarCraft

players but not yet ready to challenge the very best humans.

important, it took a fair amount of problem- specific effort to reach

that point; general- purpose methods are not quite ready for StarCraft.

Problems such as running a government or teaching molecular bi-

ology are much harder. They have complex, mostly unobservable envi-

ronments (the state of a whole country, or the state of a student’s

mind), far more objects and types of objects, no clear definition of

what the actions are, mostly unknown rules, a great deal of uncer-

tainty, and very long time scales. We have ideas and off- the- shelf tools

that address each of these characteristics separately but, as yet, no

general methods that cope with all the characteristics simultaneously.

When we build AI systems for these kinds of tasks, they tend to

M 9780525558613_Human_TX.indd 45 8/7/19 11:21 PM

Not

yet

t took a f

took a

eneral-eneral-

ems sucems suc

for

is disc

dis

hines are a

hines are

ready t

Distribution

with

f problem

problem

s of probles of proble

quite a bit quite a bit

oving partsing parts

d is only pas only pa

might have amight have

Go.

4747

On O

ete w

46 HUMAN COMPATIBLE

require a great deal of problem- specific engineering and are often very

brittle.

Progress towards generality occurs when we devise methods that

are effective for harder problems within a given type or methods that

require fewer and weaker assumptions so they are applicable to more

problems. General- purpose AI would be a method that is applicable

across all problem types and works effectively for large and difficult

instances while making very few assumptions. That’s the ultimate

goal of AI research: a system that needs no problem- specific engineer-

ing and can simply be asked to teach a molecular biology class or run

a government. It would learn what it needs to learn from all the avail-

able resources, ask questions when necessary, and begin formulating

and executing plans that work.

Such a general- purpose method does not yet exist, but we are

moving closer. Perhaps surprisingly, a lot of this progress towards gen-

eral AI results from research that isn’t about building scary, general-

purpose AI systems. It comes from research on tool AI or narrow AI,

meaning nice, safe, boring AI systems designed for particular prob-

lems such as playing Go or recognizing handwritten digits. Research

on this kind of AI is often thought to present no risk because it’s

problem- specific and nothing to do with general- purpose AI.

This belief results from a misunderstanding of what kind of work

goes into these systems. In fact, research on tool AI can and often does

produce progress towards general- purpose AI, particularly when it is

done by researchers with good taste attacking problems that are be-

yond the capabilities of current general methods. Here, good taste

means that the solution approach is not merely an ad hoc encoding of

what an intelligent person would do in such- and- such situation but an

attempt to provide the machine with the ability to figure out the solu-

tion for itself.

For example, when the AlphaGo team at Google DeepMind suc-

ceeded in creating their world- beating Go program, they did this with-

out really working on Go. What I mean by this is that they didn’t write

9780525558613_Human_TX.indd 46 8/7/19 11:21 PM

Not

f results f

results f

these systemthese syste

rogress rogress

for

o or re

is often th

is often t

d nothin

Distribution

ogy c

from all t

om all t

and begin fnd begin f

es not yet s not yet

lot of this pt of this

isn’t aboutn’t about

from researfrom rese

AI system

cogn

INTELLIGENCE IN HUMANS AND MACHINES 47

a whole lot of Go- specific code saying what to do in different kinds

of Go situations. They didn’t design decision procedures that work

only for Go. Instead, they made improvements to two fairly general-

purpose techniques— lookahead search to make decisions and rein-

forcement learning to learn how to evaluate positions— so that they

were sufficiently effective to play Go at a superhuman level. Those

improvements are applicable to many other problems, including prob-

lems as far afield as robotics. Just to rub it in, a version of AlphaGo

called AlphaZero recently learned to trounce AlphaGo at Go, and

also to trounce Stockfish (the world’s best chess program, far better

than any human) and Elmo (the world’s best shogi program, also bet-

ter than any human). AlphaZero did all this in one day.

There was also substantial progress towards general- purpose AI in

research on recognizing handwritten digits in the 1990s. Yann Le-

Cun’s team at AT& T Labs didn’t write special algorithms to recognize

“8” by searching for curvy lines and loops; instead, they improved on

existing neural network learning algorithms to produce convolutional

neural networks. Those networks, in turn, exhibited effective charac-

ter recognition after suitable training on labeled examples. The same

algorithms can learn to recognize letters, shapes, stop signs, dogs, cats,

and police cars. Under the headline of “deep learning,” they have rev-

olutionized speech recognition and visual object recognition. They are

also one of the key components in AlphaZero as well as in most of the

current self- driving car projects.

If you think about it, it’s hardly surprising that progress towards

general AI is going to occur in narrow- AI projects that address specific

tasks; those tasks give AI researchers something to get their teeth into.

(There’s a reason people don’t say, “Staring out the window is the

mother of invention.”) At the same time, it’s important to understand

how much progress has occurred and where the boundaries are. When

AlphaGo defeated Lee Sedol and later all the other top Go players,

many people assumed that because a machine had learned from

scratch to beat the human race at a task known to be very difficult

M 9780525558613_Human_TX.indd 47 8/7/19 11:21 PM

Not

speech rec

eech re

of the key cof the key

elf-elf-

drivdriv

for

uitable

abl

n to recogn

n to recog

der the

Distribution

gram

i program

program

one day.one day.

494

ards s

eneraenera

digits in theigits in the

te special aspecial a

and loops; id loops; i

ning algorithning algori

works, in t

works, in

train

48 HUMAN COMPATIBLE

even for highly intelligent humans, it was the beginning of the end—

just a matter of time before AI took over. Even some skeptics may have

been convinced when AlphaZero won at chess and shogi as well as Go.

But AlphaZero has hard limitations: it works only in the class of dis-

crete, observable, two- player games with known rules. The approach

simply won’t work at all for driving, teaching, running a government,

or taking over the world.

These sharp boundaries on machine competence mean that when

people talk about “machine IQ” increasing rapidly and threatening to

exceed human IQ, they are talking nonsense. To the extent that the

concept of IQ makes sense when applied to humans, it’s because hu-

man abilities tend to be correlated across a wide range of cognitive

activities. Trying to assign an IQ to machines is like trying to get four-

legged animals to compete in a human decathlon. True, horses can run

fast and jump high, but they have a lot of trouble with pole- vaulting

and throwing the discus.

Objectives and the standard model

Looking at an intelligent agent from the outside, what matters is

the stream of actions it generates from the stream of inputs it receives.

From the inside, the actions have to be chosen by an agent program.

Humans are born with one agent program, so to speak, and that pro-

gram learns over time to act reasonably successfully across a huge

range of tasks. So far, that is not the case for AI: we don’t know how

to build one general- purpose AI program that does everything, so in-

stead we build different types of agent programs for different types of

problems. I will need to explain at least a tiny bit about how these

different agent programs work; more detailed explanations are given

in the appendices at the end of the book for those who are interested.

(Pointers to particular appendices are given as superscripts like this

and this.

) The primary focus here is on how the standard model is

9780525558613_Human_TX.indd 48 8/7/19 11:21 PM

Not

ion

ide, the a

e, the a

are born witare born w

ns over ns over

for

e s

ntelligent ag

telligent a

s it gene

it gene

Distribution

exte

ns, it’s bec

it’s bec

ide range ode range o

s is like tryilike try

cathlon. Trcathlon. Tr

ot of troubof troub

tandartanda

INTELLIGENCE IN HUMANS AND MACHINES 49

instantiated in these various kinds of agents— in other words, how the

objective is specified and communicated to the agent.

The simplest way to communicate an objective is in the form of a

goal

. When you get into your self- driving car and touch the “home”

icon on the screen, the car takes this as its objective and proceeds to

plan and execute a route. A state of the world either satisfies the goal

(yes, I’m at home) or it doesn’t (no, I don’t live at the San Francisco

Airport). In the classical period of AI research, before uncertainty

became a primary issue in the 1980s, most AI research assumed a

world that was fully observable and deterministic, and goals made

sense as a way to specify objectives. Sometimes there is also a cost

function to evaluate solutions, so an optimal solution is one that mini-

mizes total cost while reaching the goal. For the car, this might be

built in— perhaps the cost of a route is some fixed combination of the

time and fuel consumption— or the human might have the option of

specifying the trade- off between the two.

The key to achieving such objectives is the ability to “mentally

simulate” the effects of possible actions, sometimes called lookahead

. Your self- driving car has an internal map, so it knows that driv-

ing east from San Francisco on the Bay Bridge gets you to Oakland.

Algorithms originating in the 1960s

find optimal routes by looking

ahead and searching through many possible action sequences.

These

algorithms form a ubiquitous part of modern infrastructure: they pro-

vide not just driving directions but also airline travel solutions, robotic

assembly, construction planning, and delivery logistics. With some

modifications to handle the impertinent behavior of opponents, the

same idea of lookahead applies to games such as tic- tac- toe, chess, and

Go, where the goal is to win according to the game’s particular defini-

tion of winning.

Lookahead algorithms are incredibly effective for their specific

tasks, but they are not very flexible. For example, AlphaGo “knows”

the rules of Go, but only in the sense that it has two subroutines,

M 9780525558613_Human_TX.indd 49 8/7/19 11:21 PM

Not

gina

earching t

rching t

ms form a ubms form a u

ust drivust driv

for

ng car h

car

Francisco o

Francisco

ting in t

ing in t

Distribution

and

there is a

here is a

lution is onution is on

For the carthe ca

ome fixed ome fixed

human miuman mi

the two.e two.

h objectiveh objectiv

ssible actio

ssible acti

as an

50 HUMAN COMPATIBLE

written in a traditional programming language such as C++: one sub-

routine generates all the possible legal moves and the other encodes

the goal, determining whether a given state is won or lost. For Alpha Go

to play a different game, someone has to rewrite all this C++ code.

Moreover, if you give it a new goal— say, visiting the exoplanet that

orbits Proxima Centauri— it will explore billions of sequences of Go

moves in a vain attempt to find a sequence that achieves the goal. It

cannot look inside the C++ code and determine the obvious: no

sequence of Go moves gets you to Proxima Centauri. AlphaGo’s

knowledge is essentially locked inside a black box.

In 1958, two years after his Dartmouth summer meeting had ini-

tiated the field of artificial intelligence, John McCarthy proposed a

much more general approach that opens up the black box: writing

general- purpose reasoning programs that can absorb knowledge on

any topic and reason with it to answer any answerable question.

One

particular kind of reasoning would be practical reasoning of the kind

suggested by Aristotle: “Doing actions A, B, C,... will achieve goal

G.” The goal could be anything at all: make sure the house is tidy be-

fore I get home, win a game of chess without losing either of your

knights, reduce my taxes by 50 percent, visit Proxima Centauri, and

so on. McCarthy’s new class of programs soon became known as

knowledge- based systems.

To make knowledge- based systems possible requires answering

two questions. First, how can knowledge be stored in a computer?

Second, how can a computer reason correctly with that knowledge to

draw new conclusions? Fortunately, ancient Greek philosophers—

particularly Aristotle— provided basic answers to these questions long

before the advent of computers. In fact, it seems quite likely that, had

Aristotle been given access to a computer (and some electricity, I sup-

pose), he would have been an AI researcher. Aristotle’s answer, reiter-

ated by McCarthy, was to use formal logic

as the basis for knowledge

and reasoning.

9780525558613_Human_TX.indd 50 8/7/19 11:21 PM

Not

sed system

d system

nowlenowl

ions. Fiions. F

for

game

taxes by 50

taxes by 5

new cla

Distribution

er meeting

meeting

McCarthy pMcCarthy p

p the blackhe blac

at can absot can abso

r any answeny answe

d be practicbe practi

g actions A,g actions A

ing at all:

of c

INTELLIGENCE IN HUMANS AND MACHINES 51

There are two kinds of logic that really matter in computer sci-

ence. The first, called propositional or Boolean logic, was known to

the Greeks as well as to ancient Chinese and Indian philosophers. It

is the same language of AND gates, NOT gates, and so on that makes

up the circuitry of computer chips. In a very literal sense, a modern

CPU is just a very large mathematical expression— hundreds of mil-

lions of pages— written in the language of propositional logic. The

second kind of logic, and the one that McCarthy proposed to use for

AI, is called

first- order logic.

The language of first- order logic is far

more expressive than propositional logic, which means that there are

things that can be expressed very easily in first- order logic that are

painful or impossible to write in propositional logic. For example, the

rules of Go take about a page in first- order logic but millions of pages

in propositional logic. Similarly, we can easily express knowledge

about chess, British citizenship, tax law, buying and selling, moving,

painting, cooking, and many other aspects of our commonsense world.

In principle, then, the ability to reason with first- order logic gets

us a long way towards general- purpose intelligence. In 1930, the bril-

liant Austrian logician Kurt Gödel had published his famous complete-

ness theorem,

proving that there is an algorithm with the following

property:

For any collection of knowledge and any question expressible in first-

order logic, the algorithm will tell us the answer to the question if

there is one.

This is a pretty incredible guarantee. It means, for example, that

we can tell the system the rules of Go and it will tell us (if we wait

long enough) whether there is an opening move that wins the game.

We can tell it facts about local geography, and it will tell us the way to

the airport. We can tell it facts about geometry and motion and uten-

sils, and it will tell the robot how to lay the table for dinner. More

M 9780525558613_Human_TX.indd 51 8/7/19 11:21 PM

Not

ny collection ny collection

logic, thlogic, th

for

Kurt G

urt

ving that th

ving that t

Distribution

ns th

rder logic

der logi

logic. For eogic. For e

logic but mic but m

an easily ean easily e

law, buyinw, buyin

r aspects ofaspects of

lity to reaslity to rea

ral-

ral

urpourpo

öde

52 HUMAN COMPATIBLE

generally, given any achievable goal and sufficient knowledge of the

effects of its actions, an agent can use the algorithm to construct a

plan that it can execute to achieve the goal.

It must be said that Gödel did not actually provide an algorithm;

he merely proved that one existed. In the early 1960s, real algorithms

for logical reasoning began to appear,

and McCarthy’s dream of gen-

erally intelligent systems based on logic seemed within reach. The

first major mobile robot project in the world, SRI’s Shakey project,

was based on logical reasoning (see figure 4). Shakey received a goal

from its human designers, used vision algorithms to create logical as-

sertions describing the current situation, performed logical inference

to derive a guaranteed plan to achieve the goal, and then executed the

plan. Shakey was “living” proof that Aristotle’s analysis of human cog-

nition and action was at least partially correct.

Unfortunately, Aristotle’s (and McCarthy’s) analysis was far from

being completely correct. The main problem is ignorance—not, I

FIGURE  6KDNH\ WKH URERW

FLUFD  ,Q WKH EDFNJURXQG

are some of the objects that

6KDNH\ SXVKHG DURXQG LQ LWV

suite of rooms.

9780525558613_Human_TX.indd 52 8/7/19 11:21 PM

Not

id t

oved that

ed that

reasoning breasoning

lligent lligent

for

n agen

age

ute to achi

ute to ach

hat Göd

Distribution

ble goal a

can

INTELLIGENCE IN HUMANS AND MACHINES 53

hasten to add, on the part of Aristotle or McCarthy, but on the part of

all humans and machines, present and future. Very little of our knowl-

edge is absolutely certain. In particular, we don’t know very much

about the future. Ignorance is just an insuperable problem for a purely

logical system. If I ask, “Will I get to the airport on time, if I leave

three hours before my flight?” or “Can I obtain a house by buying a

winning lottery ticket and then buying the house with the proceeds?”

the correct answer will be, in each case, “I don’t know.” The reason is

that, for each question, both yes and no are logically possible. As a

practical matter, one can never be absolutely certain of any empirical

question unless the answer is already known.

Fortunately, certainty

is completely unnecessary for action: we just need to know which ac-

tion is best, not which action is certain to succeed.

Uncertainty means that the “purpose put into the machine” can-

not, in general, be a precisely delineated goal, to be achieved at all

costs. There is no longer such a thing as a “sequence of actions that

achieves the goal,” because any sequence of actions will have multiple

possible outcomes, some of which won’t achieve the goal. The likeli-

hood of success really matters: leaving for the airport three hours in

advance of your flight may mean that you won’t miss the flight and

buying a lottery ticket may mean that you’ll win enough to buy a new

house, but these are very different mays. Goals cannot be rescued by

looking for plans that maximize the probability of achieving the goal.

A plan that maximizes the probability of getting to the airport in time

to catch a flight might involve leaving home days in advance, organiz-

ing an armed escort, lining up many alternative means of transport in

case the others break down, and so on. Inevitably, one must take into

account the relative desirabilities of different outcomes as well as their

likelihoods.

Instead of a goal, then, we could use a utility function to describe

the desirability of different outcomes or sequences of states. Often,

the utility of a sequence of states is expressed as a sum of rewards for

each of the states in the sequence. Given a purpose defined by a utility

M 9780525558613_Human_TX.indd 53 8/7/19 11:21 PM

Not

tic

hese are v

se are v

or plans thaor plans th

at maxiat max

for

matte

att

ght

maymay

ket

maymay

Distribution

of an

rtunately,

unately,

eed to knoweed to kno

ucceed.eed.

se put into e put into

neated goal,ted goal

thing as a hing as a

ny sequenceny sequenc

fwhich w

s: le

54 HUMAN COMPATIBLE

or reward function, the machine aims to produce behavior that maxi-

mizes its expected utility or expected sum of rewards, averaged over

the possible outcomes weighted by their probabilities. Modern AI is

partly a rebooting of McCarthy’s dream, except with utilities and

probabilities instead of goals and logic.

Pierre- Simon Laplace, the great French mathematician, wrote in

1814, “The theory of probabilities is just common sense reduced to

calculus.”

It was not until the 1980s, however, that a practical formal

language and reasoning algorithms were developed for probabilistic

knowledge. This was the language of Bayesian networks,

introduced

by Judea Pearl. Roughly speaking, Bayesian networks are the probabi-

listic cousins of propositional logic. There are also probabilistic cous-

ins of first- order logic, including Bayesian logic

and a wide variety of

probabilistic programming languages.

Bayesian networks and Bayesian logic are named after the Rever-

end Thomas Bayes, a British clergyman whose lasting contribution to

modern thought— now known as Bayes’ theorem— was published in

1763, shortly after his death, by his friend Richard Price.

In its mod-

ern form, as suggested by Laplace, the theorem describes in a very

simple way how a prior

probability— the initial degree of belief one

has in a set of possible hypotheses— becomes a posterior probability as

a result of observing some evidence. As more new evidence arrives,

the posterior becomes the new prior and the process of Bayesian up-

dating repeats ad infinitum. This process is so fundamental that the

modern idea of rationality as maximization of expected utility is

sometimes called Bayesian rationality. It assumes that a rational agent

has access to a posterior probability distribution over possible current

states of the world, as well as over hypotheses about the future, based

on all its past experience.

Researchers in operations research, control theory, and AI have

also developed a variety of algorithms for decision making under un-

certainty, some dating back to the 1950s. These so- called “dynamic

programming” algorithms are the probabilistic cousins of lookahead

9780525558613_Human_TX.indd 54 8/7/19 11:21 PM

Not

ssib

bserving s

rving s

rior becomerior becom

eats ad eats ad

for

by La

y L

prior

rior

prpr

obab

oba

ypotypo

Distribution

ks are the

are the

lso probabiso probabi

gic

588

and a w and a

ogic are naic are na

yman whosman whos

as as

Bayes’Bayes

by his fri

by his fr

place

INTELLIGENCE IN HUMANS AND MACHINES 55

search and planning and can generate optimal or near- optimal behav-

ior for all sorts of practical problems in finance, logistics, transpor-

tation, and so on, where uncertainty plays a significant role.

The

purpose is put into these machines in the form of a reward function,

and the output is a policy that specifies an action for every possible

state the agent could get itself into.

For complex problems such as backgammon and Go, where the

number of states is enormous and the reward comes only at the end of

the game, lookahead search won’t work. Instead, AI researchers have

developed a method called reinforcement learning, or RL for short. RL

algorithms learn from direct experience of reward signals in the envi-

ronment, much as a baby learns to stand up from the positive reward

of being upright and the negative reward of falling over. As with dy-

namic programming algorithms, the purpose put into an RL algorithm

is the reward function, and the algorithm learns an estimator for the

value of states (or sometimes the value of actions). This estimator can

be combined with relatively myopic lookahead search to generate

highly competent behavior.

The first successful reinforcement learning system was Arthur

Samuel’s checkers program, which created a sensation when it was

demonstrated on television in 1956. The program learned essentially

from scratch, by playing against itself and observing the rewards of

winning and losing.

In 1992, Gerry Tesauro applied the same idea to

the game of backgammon, achieving world- champion- level play after

1,500,000 games.

Beginning in 2016, DeepMind’s AlphaGo and its

descendants used reinforcement learning and self- play to defeat the

best human players at Go, chess, and shogi.

Reinforcement learning algorithms can also learn how to select

actions based on raw perceptual input. For example, DeepMind’s

DQN system learned to play forty- nine different Atari video games

entirely from scratch— including Pong, Freeway, and Space Invaders.

It used only the screen pixels as input and the game score as a reward

signal. In most of the games, DQN learned to play better than a

M 9780525558613_Human_TX.indd 55 8/7/19 11:21 PM

Not

n t

h, by play

by play

and losing.and losing

of backof bac

for

ul rein

rei

program, w

program,

elevision

levision

Distribution

RL fo

d signals in

ignals in

om the posm the pos

of falling ovlling ov

pose put inpose put in

rithm learnhm learn

value of acalue of ac

myopic lomyopic

orce

56 HUMAN COMPATIBLE

professional human player— despite the fact that DQN has no a priori

notion of time, space, objects, motion, velocity, or shooting. It is quite

hard to work out what DQN is actually doing, besides winning.

If a newborn baby learned to play dozens of video games at super-

human levels on its first day of life, or became world champion at Go,

chess, and shogi, we might suspect demonic possession or alien inter-

vention. Remember, however, that all these tasks are much simpler

than the real world: they are fully observable, they involve short time

horizons, and they have relatively small state spaces and simple, pre-

dictable rules. Relaxing any of these conditions means that the stan-

dard methods will fail.

Current research, on the other hand, is aimed precisely at going

beyond standard methods so that AI systems can operate in larger

classes of environments. On the day I wrote the preceding paragraph,

for example, OpenAI announced that its team of five AI programs

had learned to beat experienced human teams at the game Dota 2.

(For the uninitiated, who include me: Dota 2 is an updated version of

Defense of the Ancients

, a real- time strategy game in the Warcraft fam-

ily; it is currently the most lucrative and competitive e- sport, with

prizes in the millions of dollars.) Dota 2 involves communication,

teamwork, and quasi- continuous time and space. Games last for tens

of thousands of time steps, and some degree of hierarchical organiza-

tion of behavior seems to be essential. Bill Gates described the an-

nouncement as “a huge milestone in advancing artificial intelligence.”

A few months later, an updated version of the program defeated the

world’s top professional Dota 2 team.

Games such as Go and Dota 2 are a good testing ground for rein-

forcement learning methods because the reward function comes with

the rules of the game. The real world is less convenient, however, and

there have been dozens of cases in which faulty definitions of rewards

led to weird and unanticipated behaviors.

Some are innocuous, like

the simulated evolution system that was supposed to evolve fast-

moving creatures but in fact produced creatures that were enormously

9780525558613_Human_TX.indd 56 8/7/19 11:21 PM

Not

of time st

time st

havior seemavior see

nt as “a nt as “a

for

most l

ost

ons of dolla

ns of dol

ontinontin

Distribution

s tha

med preciseed precise

ms can opcan op

ote the preote the pre

at its teamits team

human teauman tea

ude me: Doude me: D

ime straime stra

crat

INTELLIGENCE IN HUMANS AND MACHINES 57

tall and moved fast by falling over.

Others are less innocuous, like

the social- media click- through optimizers that seem to be making a

fine mess of our world.

The final category of agent program I will consider is the simplest:

programs that connect perception directly to action, without any

intermediate deliberation or reasoning. In AI, we call this kind of pro-

gram a reflex agent

— a reference to the low- level neural reflexes ex-

hibited by humans and animals, which are not mediated by thought.

For example, the human blinking reflex connects the outputs of low-

level processing circuits in the visual system directly to the motor area

that controls the eyelids, so that any rapidly looming region in the vi-

sual field causes a hard blink. You can test it now by trying (not too

hard) to poke yourself in the eye with your finger. We can think of

this reflex system as a simple “rule” of the following form:

if <rapidly looming region in visual field> then <blink>.

The blinking reflex does not “know what it’s doing”: the objective

(of shielding the eyeball from foreign objects) is nowhere represented;

the knowledge (that a rapidly looming region corresponds to an object

approaching the eye, and that an object approaching the eye might

damage it) is nowhere represented. Thus, when the non- reflex part of

you wants to put in eye drops, the reflex part still blinks.

Another familiar reflex is emergency braking— when the car in

front stops unexpectedly or a pedestrian steps into the road. Quickly

deciding whether braking is required is not easy: when a test vehicle in

autonomous mode killed a pedestrian in 2018, Uber explained that

“emergency braking maneuvers are not enabled while the vehicle is

under computer control, to reduce the potential for erratic vehicle be-

havior.”

Here, the human designer’s objective is clear— don’t kill

pedestrians— but the agent’s policy (had it been activated) implements

it incorrectly. Again, the objective is not represented in the agent: no

autonomous vehicle today knows that people don’t like to be killed.

M 9780525558613_Human_TX.indd 57 8/7/19 11:21 PM

Not

e e

s nowhere

owhere

ts to put in ts to put in

her famher fam

for

ll from

fro

t a rapidly l

t a rapidly

ye, and

e, and

Distribution

o the

ing region

g region

now by trynow by try

ur finger. Winger. W

he followinhe followin

in visual fieldvisual fie

s not “kno

s not “kn

fore

58 HUMAN COMPATIBLE

Reflex actions also play a role in more routine tasks such as staying

in lane: as the car drifts ever so slightly out of the ideal lane position,

a simple feedback control system can nudge the steering wheel in the

opposite direction to correct the drift. The size of the nudge would

depend on how far the car drifted. These kinds of control systems are

usually designed to minimize the square of the tracking error added

up over time. The designer derives a feedback control law that, under

certain assumptions about speed and road curvature, approximately

implements this minimization.

A similar system is operating all the

time while you are standing up; if it were to stop working, you’d fall

over within a few seconds. As with the blinking reflex, it’s quite hard

to turn this mechanism off and allow yourself to fall over.

Reflex agents, then, implement a designer’s objective, but do not

know what the objective is or why they are acting in a certain way.

This means they cannot really make decisions for themselves; some-

one else, typically the human designer or perhaps the process of bio-

logical evolution, has to decide everything in advance. It is very hard

to create a good reflex agent by manual programming except for very

simple tasks such as tic- tac- toe or emergency braking. Even in those

cases, the reflex agent is extremely inflexible and cannot change its

behavior when circumstances indicate that the implemented policy is

no longer appropriate.

One possible way to create more powerful reflex agents is through

a process of learning from examples.

Rather than specifying a rule

for how to behave, or supplying a reward function or a goal, a human

can supply examples of decision problems along with the correct deci-

sion to make in each case. For example, we can create a French- to-

English translation agent by supplying examples of French sentences

along with the correct English translations. (Fortunately, the Cana-

dian and EU parliaments generate millions of such examples every

year.) Then a supervised learning algorithm processes the examples

to produce a complex rule that takes any French sentence as input

9780525558613_Human_TX.indd 58 8/7/19 11:21 PM

Not

ircu

ropriate.priate.

ossible wayossible way

of learnof learn

for

ac-c-

ent is extre

nt is extr

mstanc

Distribution

king

flex, it’s qu

x, it’s qu

o fall over.o fall over

er’s objectiobject

are acting are acting

decisions fecisions f

igner or perner or pe

e everythine everythi

by manua

by manu

eor

INTELLIGENCE IN HUMANS AND MACHINES 59

and produces an English translation. The current champion learning

algorithm for machine translation is a form of so- called deep learning,

and it produces a rule in the form of an artificial neural network with

hundreds of layers and millions of parameters.

Other deep learning

algorithms have turned out to be very good at classifying the objects

in images and recognizing the words in a speech signal. Machine trans-

lation, speech recognition, and visual object recognition are three of

the most important subfields in AI, which is why there has been so

much excitement about the prospects for deep learning.

One can argue almost endlessly about whether deep learning will

lead directly to human- level AI. My own view, which I will explain

later, is that it falls far short of what is needed,

but for now let’s focus

on how such methods fit into the standard model of AI, where an al-

gorithm optimizes a fixed objective. For deep learning, or indeed for

any supervised learning algorithm, the “purpose put into the machine”

is usually to maximize predictive accuracy— or, equivalently, to min-

imize error. That much seems obvious, but there are actually two

ways to understand it, depending on the role that the learned rule is

going to play in the overall system. The first role is a purely perceptual

role: the network processes the sensory input and provides informa-

tion to the rest of the system in the form of probability estimates for

what it’s perceiving. If it’s an object recognition algorithm, maybe it

says “70 percent probability it’s a Norfolk terrier, 30 percent it’s a Nor-

wich terrier.”

The rest of the system decides on an external action to

take based on this information. This purely perceptual objective is

unproblematic in the following sense: even a “safe” superintelligent AI

system, as opposed to an “unsafe” one based on the standard model,

needs to have its perception system as accurate and well calibrated as

possible.

The problem comes when we move from a purely perceptual role

to a decision- making role. For example, a trained network for recog-

nizing objects might automatically generate labels for images on a

M 9780525558613_Human_TX.indd 59 8/7/19 11:21 PM

Not

rceiving. I

eiving.

percent propercent pro

ier.”ier.”

7070

for

erall sy

ll s

processes th

rocesses t

he syste

Distribution

eep l

which I wi

ich I wi

but for nowbut for no

model of Adel of A

r deep learr deep lear

he “purpose“purpose

ccuracy—ccuracy

ms obviousms obviou

ending on

tem

60 HUMAN COMPATIBLE

Web site or social- media account. Posting those labels is an action

with consequences. Each labeling action requires an actual classifica-

tion decision, and unless every decision is guaranteed to be perfect,

the human designer must supply a loss function that spells out the cost

of misclassifying an object of type A as an object of type B. And that’s

how Google had an unfortunate problem with gorillas. In 2015, a soft-

ware engineer named Jacky Alciné complained on Twitter that the

Google Photos image- labeling service had labeled him and his friend

as gorillas.

While it is unclear how exactly this error occurred, it is

almost certain that Google’s machine learning algorithm was designed

to minimize a fixed, definite loss function— moreover, one that as-

signed equal cost to any error. In other words, it assumed that the cost

of misclassifying a person as a gorilla was the same as the cost of mis-

classifying a Norfolk terrier as a Norwich terrier. Clearly, this is not

Google’s (or their users’) true loss function, as was illustrated by the

public relations disaster that ensued.

Since there are thousands of possible image labels, there are mil-

lions of potentially distinct costs associated with misclassifying one

category as another. Even if it had tried, Google would have found it

very difficult to specify all these numbers up front. Instead, the right

thing to do would be to acknowledge the uncertainty about the true

misclassification costs and to design a learning and classification algo-

rithm that was suitably sensitive to costs and uncertainty about costs.

Such an algorithm might occasionally ask the Google designer ques-

tions such as “Which is worse, misclassifying a dog as a cat or misclas-

sifying a person as an animal?” In addition, if there is significant

uncertainty about misclassification costs, the algorithm might well

refuse to label some images.

By early 2018, it was reported that Google Photos does refuse to

classify a photo of a gorilla. Given a very clear image of a gorilla with

two babies, it says, “ Hmm... not seeing this clearly yet.”

I don’t wish to suggest that AI’s adoption of the standard model

was a poor choice at the time. A great deal of brilliant work has gone

9780525558613_Human_TX.indd 60 8/7/19 11:21 PM

Not

ld b

ion costs

n costs

t was suitabt was suita

lgorithmlgorithm

for

en if it

cify all thes

cify all the

e to ack

Distribution

m wa

eover, one

ver, one

assumed thassumed th

e same as thme as t

ch terrier. Ch terrier. C

nction, as wtion, as

ed.

of possible of possibl

costs asso

costs ass

had

INTELLIGENCE IN HUMANS AND MACHINES 61

into developing the various instantiations of the model in logical,

probabilistic, and learning systems. Many of the resulting systems are

very useful; as we will see in the next chapter, there is much more to

come. On the other hand, we cannot continue to rely on our usual

practice of ironing out the major errors in an objective function by

trial and error: machines of increasing intelligence and increasingly

global impact will not allow us that luxury.

M 9780525558613_Human_TX.indd 61 8/7/19 11:21 PM

Not

for

Distribution

HOW MIGHT AI PROGRESS

IN THE FUTURE?

The Near Future

On May 3, 1997, a chess match began between Deep Blue, a chess

computer built by IBM, and Garry Kasparov, the world chess cham-

pion and possibly the best human player in history. Newsweek billed

the match as “The Brain’s Last Stand.” On May 11, with the match

tied at 2½– 2½, Deep Blue defeated Kasparov in the final game. The

media went berserk. The market capitalization of IBM increased by

$18 billion overnight. AI had, by all accounts, achieved a massive

breakthrough.

From the point of view of AI research, the match represented no

breakthrough at all. Deep Blue’s victory, impressive as it was, merely

continued a trend that had been visible for decades. The basic design

for chess- playing algorithms was laid out in 1950 by Claude Shannon,

with major improvements in the early 1960s. After that, the chess

ratings of the best programs improved steadily, mainly as a result of

faster computers that allowed programs to look further ahead. In

1994,

Peter Norvig and I charted the numerical ratings of the best

9780525558613_Human_TX.indd 62 8/7/19 11:21 PM

Not

½, Deep B

Deep

nt berserk. nt berserk

n overnn overn

for

, and

he best hum

e best hu

Brain’s L

Distribution

atch bega

Garry

HOW MIGHT AI PROGRESS IN THE FUTURE? 63

chess programs from 1965 onwards, on a scale where Kasparov’s rat-

ing was 2805. The ratings started at 1400 in 1965 and improved in an

almost perfect straight line for thirty years. Extrapolating the line for-

ward from 1994 predicts that computers would be able to defeat

Kasparov in 1997— exactly when it happened.

For AI researchers, then, the real breakthroughs happened thirty

or forty years before Deep Blue burst into the public’s consciousness.

Similarly, deep convolutional networks existed, with all the mathe-

matics fully worked out, more than twenty years before they began to

create headlines.

The view of AI breakthroughs that the public gets from the

media— stunning victories over humans, robots becoming citizens of

Saudi Arabia, and so on— bears very little relation to what really hap-

pens in the world’s research labs. Inside the lab, research involves a lot

of thinking and talking and writing mathematical formulas on white-

boards. Ideas are constantly being generated, abandoned, and redis-

covered. A good idea— a real breakthrough— will often go unnoticed

at the time and may only later be understood as having provided the

basis for a substantial advance in AI, perhaps when someone reinvents

it at a more convenient time. Ideas are tried out, initially on simple

problems to show that the basic intuitions are correct and then on

harder problems to see how well they scale up. Often, an idea will fail

by itself to provide a substantial improvement in capabilities, and it

has to wait for another idea to come along so that the combination of

the two can demonstrate value.

All this activity is completely invisible from the outside. In the

world beyond the lab, AI becomes visible only when the gradual accu-

mulation of ideas and the evidence for their validity crosses a thresh-

old: the point where it becomes worthwhile to invest money and

engineering effort to create a new commercial product or an impres-

sive demonstration. Then the media announce that a breakthrough

has occurred.

One can expect, then, that many other ideas that have been

M 9780525558613_Human_TX.indd 63 8/7/19 11:21 PM

Not

ems to se

ms to se

to provide to provide

ait for anait for a

for

advanc

van

nient time.

nient time

that the

Distribution

blic gets

ic gets

ts becomins becomin

elation to wtion to

the lab, resthe lab, res

mathematicathemati

ng generategenerate

reakthroureakthro

ater be un

in A

64 HUMAN COMPATIBLE

gestating in the world’s research labs will cross the threshold of com-

mercial applicability over the next few years. This will happen more

and more frequently as the rate of commercial investment increases

and as the world becomes more and more receptive to applications of

AI. This chapter provides a sampling of what we can see coming down

the pipe.

Along the way, I’ll mention some of the drawbacks of these tech-

nological advances. You will probably be able to think of many more,

but don’t worry. I’ll get to those in the next chapter.

The AI ecosystem

In the beginning, the environment in which most computers oper-

ated was essentially formless and void: their only input came from

punched cards and their only method of output was to print charac-

ters on a line printer. Perhaps for this reason, most researchers viewed

intelligent machines as question- answerers; the view of machines as

agents perceiving and acting in an environment did not become wide-

spread until the 1980s.

The advent of the World Wide Web in the 1990s opened up a

whole new universe for intelligent machines to play in. A new word,

softbot, was coined to describe software “robots” that operate entirely

in a software environment such as the Web. Softbots, or bots as they

later became known, perceive Web pages and act by emitting se-

quences of characters, URLs, and so on.

AI companies mushroomed during the dot- com boom ( 1997–

2000), providing core capabilities for search and e- commerce, including

link analysis, recommendation systems, reputation systems, compari-

son shopping, and product categorization.

In the early 2000s, the widespread adoption of mobile phones

with microphones, cameras, accelerometers, and GPS provided new

access for AI systems to people’s daily lives; “smart speakers” such as

9780525558613_Human_TX.indd 64 8/7/19 11:21 PM

Not

erse

oined to d

ned to d

are environare enviro

me knome kno

for

he World W

he World

for inte

Distribution

ich most comost co

their only their only

d of outputof outpu

his reason, s reason,

on-

nswerenswer

in an envi

in an env

HOW MIGHT AI PROGRESS IN THE FUTURE? 65

the Amazon Echo, Google Home, and Apple HomePod have com-

pleted this process.

By around 2008, the number of objects connected to the Internet

exceeded the number of people connected to the Internet— a transi-

tion that some point to as the beginning of the Internet of Things

(IoT). Those things include cars, home appliances, traffic lights, vend-

ing machines, thermostats, quadcopters, cameras, environmental sen-

sors, robots, and all kinds of material goods both in the manufacturing

process and in the distribution and retail system. This provides AI

systems with far greater sensory and control access to the real world.

Finally, improvements in perception have allowed AI- powered

robots to move out of the factory, where they relied on rigidly con-

strained arrangements of objects, and into the real, unstructured,

messy world, where their cameras have something interesting to

look at.

Self- driving cars

In the late 1950s, John McCarthy imagined that an automated

vehicle might one day take him to the airport. In 1987, Ernst Dick-

manns demonstrated a self- driving Mercedes van on the autobahn in

Germany; it was capable of staying in lane, following another car,

changing lanes, and overtaking.

More than thirty years later, we

still don’t have a fully autonomous car, but it’s getting much closer.

The focus of development has long since moved from academic re-

search labs to large corporations. As of 2019, the best- performing test

vehicles have logged millions of miles of driving on public roads (and

billions of miles in driving simulators) without serious incident.

Un-

fortunately, other autonomous and semi- autonomous vehicles have

killed several people.

Why has it taken so long to achieve safe autonomous driving? The

first reason is that the performance requirements are exacting.

M 9780525558613_Human_TX.indd 65 8/7/19 11:21 PM

Not

rat

t was cap

was cap

lanes, andlanes, an

t have at have

for

John

ohn

day take hi

day take h

ed a

d a

elfelf

Distribution

the

lowed AI

wed AI

y relied on relied on

to the real,he real

ave somethve someth

McC

66 HUMAN COMPATIBLE

Human drivers in the United States suffer roughly one fatal accident

per one hundred million miles traveled, which sets a high bar. Auton-

omous vehicles, to be accepted, will need to be much better than that:

perhaps one fatal accident per billion miles, or twenty- five thousand

years of driving forty hours per week. The second reason is that one

anticipated workaround— handing control to the human when the ve-

hicle is confused or out of its safe operating conditions— simply doesn’t

work. When the car is driving itself, humans quickly become disen-

gaged from the immediate driving circumstances and cannot regain

context quickly enough to take over safely. Moreover, nondrivers and

taxi passengers who are in the back seat are in no position to drive the

car if something goes wrong.

Current projects are aiming at SAE Level 4 autonomy,

which

means that the vehicle must at all times be capable of driving autono-

mously or stopping safely, subject to geographical limits and weather

conditions. Because weather and traffic conditions can change, and

because unusual circumstances can arise that a Level 4 vehicle cannot

handle, a human has to be in the vehicle and ready to take over if

needed. (Level 5— unrestricted autonomy— does not require a human

driver but is even more difficult to achieve.) Level 4 autonomy goes far

beyond the simple, reflex tasks of following white lines and avoiding

obstacles. The vehicle has to assess the intent and probable future

trajectories of all relevant objects, including objects that may not be

visible, based on both current and past observations. Then, using look-

ahead search, the vehicle has to find a trajectory that optimizes some

combination of safety and progress. Some projects are trying more

direct approaches based on reinforcement learning (mainly in simula-

tion, of course) and supervised learning from recordings of hundreds

of human drivers, but these approaches seem unlikely to reach the

required level of safety.

The potential benefits of fully autonomous vehicles are immense.

Every year, 1.2 million people die in car accidents worldwide and tens

of millions suffer serious injuries. A reasonable target for autonomous

9780525558613_Human_TX.indd 66 8/7/19 11:21 PM

Not

le,

e vehicle

vehicle

es of all relees of all re

sed on bsed on b

for

strict

ict

ore difficult

re difficu

reflex ta

eflex ta

Distribution

nond

osition to d

ition to d

evel 4 auto4 aut

be capable oe capable

geographicographic

traffic conaffic con

can arise tcan arise

in the ve

HOW MIGHT AI PROGRESS IN THE FUTURE? 67

vehicles would be to reduce these numbers by a factor of ten. Some

analyses also predict a vast reduction in transportation costs, parking

structures, congestion, and pollution. Cities will shift from personal

cars and large buses to ubiquitous shared- ride, autonomous electric

vehicles, providing door- to- door service and feeding high- speed mass-

transit connections between hubs.

With costs as low as three cents

per passenger mile, most cities would probably opt to provide the ser-

vice for free— while subjecting riders to interminable barrages of

advertising.

Of course, to reap all these benefits, the industry has to pay atten-

tion to the risks. If there are too many deaths attributed to poorly

designed experimental vehicles, regulators may halt planned deploy-

ments or impose extremely stringent standards that might be un-

reachable for decades.

And people might, of course, decide not to buy

or ride in autonomous vehicles unless they are demonstrably safe. A

2018 poll revealed a significant decline in consumers’ level of trust in

autonomous vehicle technology compared to 2016.

Even if the tech-

nology is successful, the transition to widespread autonomy will be

an awkward one: human driving skills may atrophy or disappear, and

the reckless and antisocial act of driving a car oneself may be banned

altogether.

Intelligent personal assistants

Most readers will by now have experienced the unintelligent per-

sonal assistant: the smart speaker that obeys purchase commands

overheard on the television, or the cell phone chatbot that responds to

“Call me an ambulance!” with “OK, from now on I’ll call you ‘Ann Am-

bulance

.’ ” Such systems are essentially voice- mediated interfaces to

applications and search engines; they are based largely on canned

stimulus– response templates, an approach that dates back to the Eliza

system in the mid- 1960s.

These early systems have shortcomings of three kinds: access,

M 9780525558613_Human_TX.indd 67 8/7/19 11:21 PM

Not

lligent plligent p

for

an driv

tisocial act

tisocial ac

Distribution

has t

ttributed

ributed

ay halt plany halt plan

ndards thatrds tha

ht, of courset, of course

ess they arethey are

decline in coline in co

gy comparegy compa

ansition t

ansition

ing s

ing

68 HUMAN COMPATIBLE

content, and context. Access shortcomings mean that they lack sensory

awareness of what’s going on—for example, they might be able to hear

what the user is saying but they can’t see who the user is talking to.

Content shortcomings mean that they simply fail to understand the

meaning of what the user is saying or texting, even if they have access

to it. Context shortcomings mean that they lack the ability to keep track

of and reason about the goals, activities, and relationships that consti-

tute daily life.

Despite these shortcomings, smart speakers and cell phone assis-

tants offer just enough value to the user to have entered the homes

and pockets of hundreds of millions of people. They are, in a sense,

Trojan horses for AI. Because they are there, embedded in so many

lives, every tiny improvement in their capabilities is worth billions of

dollars.

And so, improvements are coming thick and fast. Probably the

most important is the elementary capacity to understand content— to

know that “John’s in the hospital” is not just a prompt to say “I hope it’s

nothing serious

” but contains actual information that the user’s eight-

year- old son is in a nearby hospital and may have a serious injury or

illness. The ability to access email and text communications as well

as phone calls and domestic conversations (through the smart speaker

in the house) would give AI systems enough information to build a

reasonably complete picture of the user’s life— perhaps even more

information than might have been available to the butler working

for a nineteenth- century aristocratic family or the executive assistant

working for a modern- day CEO.

Raw information, of course, is not enough. To be really useful, an

assistant also needs commonsense knowledge of how the world works:

that a child in the hospital is not simultaneously at home; that hospital

care for a broken arm seldom lasts for more than a day or two; that the

child’s school will need to know of the expected absence; and so on.

Such knowledge allows the assistant to keep track of things it does not

observe directly— an essential skill for intelligent systems.

9780525558613_Human_TX.indd 68 8/7/19 11:21 PM

d d

would g

y complete y complete

on thanon than

for

rby ho

y h

to access em

o access e

omestic

Distribution

ered

hey are, in

y are, in

embedded imbedded i

ilities is woes is w

ng thick anthick a

capacity topacity to

al” is not jual” is not j

actual in

pita

HOW MIGHT AI PROGRESS IN THE FUTURE? 69

The capabilities described in the preceding paragraph are, I be-

lieve, feasible with existing technology for probabilistic reasoning,

but this would require a very substantial effort to construct models of

all the kinds of events and transactions that make up our daily lives.

Up to now, these kinds of commonsense modeling projects have gen-

erally not been undertaken (except possibly in classified systems for

intelligence analysis and military planning) because of the costs in-

volved and the uncertain payoff. Now, however, projects like this

could easily reach hundreds of millions of users, so the investment

risks are lower and the potential rewards are much higher. Further-

more, access to large numbers of users allows the intelligent assistant

to learn very quickly and fill in all the gaps in its knowledge.

Thus, one can expect to see intelligent assistants that will, for pen-

nies a month, help users with managing an increasingly large range of

daily activities: calendars, travel, household purchases, bill payment,

children’s homework, email and call screening, reminders, meal plan-

ning, and— one can but dream— finding my keys. These skills will not

be scattered across multiple apps. Instead, they will be facets of a

single, integrated agent that can take advantage of the synergies avail-

able in what military people call the common operational picture.

The general design template for an intelligent assistant involves

background knowledge about human activities, the ability to extract

information from streams of perceptual and textual data, and a learn-

ing process to adapt the assistant to the user’s particular circum-

stances. The same general template can be applied to at least three

other major areas: health, education, and finances. For these applica-

tions, the system needs to keep track of the state of the user’s body,

mind, and bank account (broadly construed). As with assistants for

daily life, the up- front cost of creating the necessary general knowl-

edge in each of these three areas amortizes across billions of users.

In the case of health, for example, we all have roughly the same

physiology, and detailed knowledge of how it works has already been

encoded in machine- readable form.

Systems will adapt to your

M 9780525558613_Human_TX.indd 69 8/7/19 11:21 PM

Not

knowledg

nowledg

ion from stion from s

ess to aess to a

for

t that

hat

ry people c

ry people

sign tem

ign tem

Distribution

high

intelligent

telligent

its knowledts knowled

ssistants thtants th

g an increasan increas

usehold puehold pu

call screenill screeni

m—

inding indin

fff

le apps. I

an ta

an t

70 HUMAN COMPATIBLE

individual characteristics and lifestyle, providing preventive sugges-

tions and early warning of problems.

In the area of education, the promise of intelligent tutoring sys-

tems was recognized even in the 1960s,

but real progress has been a

long time coming. The primary reasons are shortcomings of content

and access: most tutoring systems don’t understand the content of

what they purport to teach, nor can they engage in two- way commu-

nication with their pupils through speech or text. (I imagine myself

teaching string theory, which I don’t understand, in Laotian, which I

don’t speak.) Recent progress in speech recognition means that auto-

mated tutors can, at last, communicate with pupils who are not yet

fully literate. Moreover, probabilistic reasoning technology can now

keep track of what students know and don’t know

and can optimize

the delivery of instruction to maximize learning. The Global Learning

XPRIZE competition, which started in 2014, offered $15 million for

“ open- source, scalable software that will enable children in develop-

ing countries to teach themselves basic reading, writing and arithme-

tic within 15 months.” Results from the winners, Kitkit School and

onebillion, suggest that the goal has largely been achieved.

In the area of personal finance, systems will keep track of invest-

ments, income streams, obligatory and discretionary expenditures,

debt, interest payments, emergency reserves, and so on, in much the

same way that financial analysts keep track of the finances and pros-

pects of corporations. Integration with the agent that handles daily life

will provide an even finer- grained understanding, perhaps even ensur-

ing that the children get their pocket money minus any mischief-

related deductions. One can expect to receive the quality of day- to- day

financial advice previously reserved for the ultra- rich.

If your privacy alarm bells weren’t ringing as you read the preced-

ing paragraphs, you haven’t been keeping up with the news. There are,

however, multiple layers to the privacy story. First, can a personal

assistant really be useful if it knows nothing about you? Probably not.

9780525558613_Human_TX.indd 70 8/7/19 11:21 PM

Not

tre

payment

ayment

that financthat finan

orporatiorporati

for

the go

e g

rsonal fina

rsonal fin

ams, ob

ms, ob

Distribution

eans

ls who are

who are

g technologtechnolog

knowow

1313

and an

earning. Tharning. Th

in 2014, of2014, o

hat will enat will ena

ves basic reves basic r

ults from t

ults from

lha

HOW MIGHT AI PROGRESS IN THE FUTURE? 71

Second, can personal assistants be really useful if they cannot pool

information from multiple users to learn more about people in general

and people who are similar to you? Probably not. So, don’t those two

things imply that we have to give up our privacy to benefit from AI in

our daily lives? No. The reason is that learning algorithms can operate

on encrypted data using the techniques of secure multiparty computa-

tion, so that users can benefit from pooling without compromising

privacy in any way.

Will software providers adopt privacy- preserving

technology voluntarily, without legislative encouragement? That re-

mains to be seen. What seems inevitable, however, is that users will

trust a personal assistant only if its primary obligation is to the user

rather than to the corporation that produced it.

Smart homes and domestic robots

The smart home concept has been investigated for several decades.

In 1966, James Sutherland, an engineer at Westinghouse, started col-

lecting surplus computer parts to build ECHO, the first smart- home

controller.

Unfortunately, ECHO weighed eight hundred pounds, con-

sumed 3.5 kilowatts, and managed just three digital clocks and the TV

antenna. Subsequent systems required users to master control interfaces

of mind- boggling complexity. Unsurprisingly, they never caught on.

Beginning in the 1990s, several ambitious projects attempted to

design houses that managed themselves with minimal human interven-

tion, using machine learning to adapt to the lifestyles of the occupants.

To make these experiments meaningful, real people had to live in the

houses. Unfortunately, the frequency of erroneous decisions made the

systems worse than useless— the occupants’ quality of life decreased

rather than increased. For example, inhabitants of the 2003 MavHome

project

at Washington State University often had to sit in the dark if

their visitors stayed later than the usual bedtime.

As with the unintel-

ligent personal assistant, such failings result from inadequate sensory

M 9780525558613_Human_TX.indd 71 8/7/19 11:21 PM

Not

uen

gling com

ng com

ning in thening in th

uses thauses th

for

tely, EC

y, E

s, and mana

s, and man

nt system

t system

Distribution

s tha

gation is to

ion is to

robotsobots

been investen invest

n engineer n enginee

arts to bu

72 HUMAN COMPATIBLE

access to the activities of the occupants and the inability to understand

and keep track of what’s happening in the house.

A truly smart home equipped with cameras and microphones—

and the requisite perceptual and reasoning abilities— can understand

what the occupants are doing: visiting, eating, sleeping, watching TV,

reading, exercising, getting ready for a long trip, or lying helpless on

the floor after a fall. By coordinating with the intelligent personal as-

sistant, the home can have a pretty good idea of who will be in or out

of the house at what time, who’s eating where, and so on. This under-

standing allows it to manage heating, lighting, window blinds, and

security systems, to send timely reminders, and to alert users or emer-

gency services when a problem arises. Some newly built apartment

complexes in the United States and Japan are already incorporating

technology of this kind.

The value of the smart home is limited because of its actuators:

much simpler systems (timed thermostats and motion- sensitive lights

and burglar alarms) can deliver a lot of the same functionality in ways

that are perhaps more predictable, if less context sensitive. The smart

home cannot fold the laundry, clear the dishes, or pick up the news-

paper. It really wants a physical robot to do its bidding.

FIGURE OHIW%5(77IROGLQJWRZHOVULJKWWKH%RVWRQ'\QDPLFV6SRW0LQL

URERWRSHQLQJDGRRU

9780525558613_Human_TX.indd 72 8/7/19 11:21 PM

for

aundry

ndr

s a physical

s a physica

Distribution

ow b

alert users

rt users

newly built ewly built

are alreadyalready

imited becited bec

rmostats anmostats an

r a lot of ther a lot of t

ctable, if le

ctable, if l

clea

HOW MIGHT AI PROGRESS IN THE FUTURE? 73

It may not have too long to wait. Already, robots have demon-

strated many of the required skills. In the Berkeley lab of my colleague

Pieter Abbeel, BRETT (the Berkeley Robot for the Elimination of

Tedious Tasks) has been folding piles of towels since 2011, while the

SpotMini robot from Boston Dynamics can climb stairs and open

doors (figure 5). Several companies are already building cooking robots,

although they require special, enclosed setups and pre- cut ingredients

and won’t work in an ordinary kitchen.

Of the three basic physical capabilities required for a useful do-

mestic robot— perception, mobility, and dexterity— the latter is most

problematic. As Stefanie Tellex, a robotics professor at Brown Univer-

sity, puts it, “Most robots can’t pick up most objects most of the time.”

This is partly a problem of tactile sensing, partly a manufacturing

problem (dexterous hands are currently very expensive to build), and

partly an algorithmic problem: we don’t yet have a good understand-

ing of how to combine sensing and control to grasp and manipulate

the huge variety of objects in a typical household. There are dozens of

grasp types just for rigid objects and there are thousands of distinct

manipulation skills, such as shaking exactly two pills out of a bottle,

peeling the label off a jam jar, spreading hard butter on soft bread,

or lifting one strand of spaghetti from the pot with a fork to see if

it’s ready.

It seems likely that the tactile sensing and hand construction prob-

lems will be solved by 3D printing, which is already being used by

Boston Dynamics for some of the more complex parts of their Atlas

humanoid robot. Robot manipulation skills are advancing rapidly,

thanks in part to deep reinforcement learning.

The final push—

putting all this together into something that begins to approximate the

awesome physical skills of movie robots— is likely to come from the

rather unromantic warehouse industry. Just one company, Amazon,

employs several hundred thousand people who pick products out

of bins in giant warehouses and dispatch them to customers. From

2015 through 2017 Amazon ran an annual “Picking Challenge” to

M 9780525558613_Human_TX.indd 73 8/7/19 11:21 PM

Not

tran

ms likely thms likely t

l be solvl be sol

for

uch as

ff a jam ja

ff a jam j

nd of sp

d of sp

Distribution

he l

or at Brow

at Brow

jects most oects most

g, partly a partly a

very expenvery expen

don’t yet han’t yet ha

and controld control

a typical hoa typical h

bjects and

haki

74 HUMAN COMPATIBLE

accelerate the development of robots capable of doing this task.

There

is still some distance to go, but when the core research problems are

solved— probably within a decade— one can expect a very rapid rollout

of highly capable robots. Initially they will work in warehouses, then in

other commercial applications such as agriculture and construction,

where the range of tasks and objects is fairly predictable. We might also

see them quite soon in the retail sector doing tasks such as stocking

supermarket shelves and refolding clothes.

The first to really benefit from robots in the home will be the el-

derly and infirm, for whom a helpful robot can provide a degree of

independence that would otherwise be impossible. Even if the robot

has a limited repertoire of tasks and only rudimentary comprehension

of what’s going on, it can still be very useful. On the other hand, the

robot butler, managing the household with aplomb and anticipating its

master’s every wish, is still some way off— it requires something ap-

proaching the generality of human- level AI.

Intelligence on a global scale

The development of basic capabilities for understanding speech

and text will allow intelligent personal assistants to do things that

human assistants can already do (but they will be doing it for pennies

per month instead of thousands of dollars per month). Basic speech

and text understanding also enable machines to do things that no hu-

man can do— not because of the depth of understanding but because

of its scale. For example, a machine with basic reading capabilities will

be able to read everything the human race has ever written by lunch-

time, and then it will be looking around for something else to do.

With speech recognition capabilities, it could listen to every radio

and television broadcast before teatime. For comparison, it would take

two hundred thousand full- time humans just to keep up with the

world’s current level of print publication (let alone all the written

9780525558613_Human_TX.indd 74 8/7/19 11:21 PM

ants can a

ts can a

h instead oh instead o

nderstandersta

for

glo

nt of basic

t of basic

intellig

Distribution

ide a

. Even if t

Even if t

entary comntary com

l. On the on the o

h aplomb ah aplomb a

ff—ff—

t ret re

evel AI.vel AI

bal scabal sc

HOW MIGHT AI PROGRESS IN THE FUTURE? 75

material from the past) and another sixty thousand to listen to current

broadcasts.

Such a system, if it could extract even simple factual assertions

and integrate all this information across all languages, would represent

an incredible resource for answering questions and revealing patterns—

probably far more powerful than search engines, which are currently

valued at around $1 trillion. Its research value for fields such as history

and sociology would be inestimable.

Of course, it would also be possible to listen to all the world’s

phone calls (a job that would require about twenty million people).

There are certain clandestine agencies that would find this valuable.

Some of them have been doing simple kinds of large- scale machine

listening, such as spotting key words in conversations, for many years,

and have now made the transition to transcribing entire conversations

into searchable text.

Transcriptions are certainly useful, but not

nearly as useful as simultaneous understanding and content integra-

tion of all conversations.

Another “superpower” that is available to machines is to see the en-

tire world at once. Roughly speaking, satellites image the entire world

every day at an average resolution of around fifty centimeters per pixel.

At this resolution, every house, ship, car, cow, and tree on Earth is

visible. Well over thirty million full- time employees would be needed

to examine all these images;

so, at present, no human ever sees the

vast majority of satellite data. Computer vision algorithms could pro-

cess all this data to produce a searchable database of the whole world,

updated daily, as well as visualizations and predictive models of eco-

nomic activities, changes in vegetation, migrations of animals and peo-

ple, the effects of climate change, and so on. Satellite companies such

as Planet and DigitalGlobe are busy making this idea a reality.

With the possibility of sensing on a global scale comes the possi-

bility of decision making on a global scale. For example, from global

satellite data feeds, it should be possible to create detailed models

M 9780525558613_Human_TX.indd 75 8/7/19 11:21 PM

Not

on,

over thir

ver thir

ne all thesene all thes

ority of ority of

for

ghly sp

age resoluti

age resolu

every h

Distribution

mill

d find this

find this

of of

lala

rge-rge-

versations, sations,

nscribing enscribing e

ns are certare cer

understannderstan

hat is avai

hat is ava

eaki

76 HUMAN COMPATIBLE

for managing the global environment, predicting the effects of environ-

mental and economic interventions, and providing the necessary ana-

lytical inputs to the UN’s sustainable development goals.

We are

already seeing “smart city” control systems that aim to optimize traffic

management, transit, trash collection, road repairs, environmental

maintenance, and other functions for the benefit of citizens, and these

may be extended to the country level. Until recently, this degree of

coordination could be achieved only by huge, inefficient, bureaucratic

hierarchies of humans; inevitably, these will be replaced by mega-

agents that take care of more and more aspects of our collective lives.

Along with this, of course, comes the possibility of privacy invasion and

social control on a global scale, to which I return in the next chapter.

When Will Superintelligent AI Arrive?

I am often asked to predict when superintelligent AI will arrive, and I

usually refuse to answer. There are three reasons for this. First, there

is a long history of such predictions going wrong.

For example, in

1960, the AI pioneer and Nobel Prize– winning economist Herbert

Simon wrote, “ Technologically... machines will be capable, within

twenty years, of doing any work a man can do.”

In 1967, Marvin Min-

sky, a co- organizer of the 1956 Dartmouth workshop that started the

field of AI, wrote, “Within a generation, I am convinced, few compart-

ments of intellect will remain outside the machine’s realm— the prob-

lem of creating ‘artificial intelligence’ will be substantially solved.”

A second reason for declining to provide a date for superintelligent

AI is that there is no clear threshold that will be crossed. Machines

already exceed human capabilities in some areas. Those areas will

broaden and deepen, and it is likely that there will be superhuman

general knowledge systems, superhuman biomedical research systems,

superhuman dexterous and agile robots, superhuman corporate plan-

ning systems, and so on well before we have a completely general

9780525558613_Human_TX.indd 76 8/7/19 11:21 PM

Not

ech

of doing a

fdoing

rganizer ofrganizer o

, wrote,, wrote

for

h pred

pre

er and Nob

er and No

nologica

Distribution

colle

rivacy inva

vacy inva

n in the nexin the nex

ent AI Arnt AI A

en superinten superi

ere are th

ctio

HOW MIGHT AI PROGRESS IN THE FUTURE? 77

superintelligent AI system. These “partially superintelligent” systems

will, individually and collectively, begin to pose many of the same is-

sues that a generally intelligent system would.

A third reason for not predicting the arrival of superintelligent AI

is that it is inherently unpredictable. It requires “conceptual break-

throughs,” as noted by John McCarthy in a 1977 interview.

McCar-

thy went on to say, “What you want is 1.7 Einsteins and 0.3 of the

Manhattan Project, and you want the Einsteins first. I believe it’ll

take five to 500 years.” In the next section I’ll explain what some of

the conceptual breakthroughs are likely to be. Just how unpredictable

are they? Probably as unpredictable as Szilard’s invention of the nu-

clear chain reaction a few hours after Rutherford’s declaration that it

was completely impossible.

Once, at a meeting of the World Economic Forum in 2015, I

answered the question of when we might see superintelligent AI. The

meeting was under Chatham House rules, which means that no re-

marks may be attributed to anyone present at the meeting. Even so,

out of an excess of caution, I prefaced my answer with “Strictly off the

record....” I suggested that, barring intervening catastrophes, it would

probably happen in the lifetime of my children— who were still quite

young and would probably have much longer lives, thanks to advances

in medical science, than many of those at the meeting. Less than two

hours later, an article appeared in the Daily Telegraph citing Professor

Russell’s remarks, complete with images of rampaging Terminator

robots. The headline was

‘

SOCIOPATHIC

’

ROBOTS COULD OVER RUN THE

HUMAN RACE WITHIN A GENERATION.

My timeline of, say, eighty years is considerably more conserva-

tive than that of the typical AI researcher. Recent surveys

suggest

that most active researchers expect human- level AI to arrive around the

middle of this century. Our experience with nuclear physics suggests

that it would be prudent to assume that progress could occur quite

quickly and to prepare accordingly. If just one conceptual break-

through were needed, analogous to Szilard’s idea for a neutron- induced

M 9780525558613_Human_TX.indd 77 8/7/19 11:21 PM

Not

d p

cience, th

ence, th

er, an articler, an artic

remarkremark

for

that, b

hat

n the lifetim

the lifeti

robably

obably

Distribution

w un

nvention o

ention o

ord’s declarrd’s declar

Economic Economic

might see sught see su

ouse rules, se rules,

anyone presanyone pr

I prefaced

arrin

78 HUMAN COMPATIBLE

nuclear chain reaction, superintelligent AI in some form could arrive

quite suddenly. The chances are that we would be unprepared: if we

built superintelligent machines with any degree of autonomy, we

would soon find ourselves unable to control them. I am, however,

fairly confident that we have some breathing space because there are

several major breakthroughs needed between here and superintelli-

gence, not just one.

Conceptual Breakthroughs to Come

The problem of creating general- purpose, human- level AI is far from

solved. Solving it is not a matter of spending money on more engi-

neers, more data, and bigger computers. Some futurists produce

charts that extrapolate the exponential growth of computing power

into the future based on Moore’s law, showing the dates when ma-

chines will become more powerful than insect brains, mouse brains,

human brains, all human brains put together, and so on.

These charts

are meaningless because, as I have already said, faster machines just

give you the wrong answer more quickly. If one were to collect AI’s

leading experts into a single team with unlimited resources, with the

goal of creating an integrated, human- level intelligent system by com-

bining all our best ideas, the result would be failure. The system would

break in the real world. It wouldn’t understand what was going on; it

wouldn’t be able to predict the consequences of its actions; it wouldn’t

understand what people want in any given situation; and so it would

do ridiculously stupid things.

By understanding how the system would break, AI researchers are

able to identify the problems that have to be solved— the conceptual

breakthroughs that are needed— in order to reach human- level AI. I

will now describe some of these remaining problems. Once they are

solved, there may be more, but not very many more.

9780525558613_Human_TX.indd 78 8/7/19 11:21 PM

Not

nto

ng an inte

an inte

our best ideour best id

he real whe real

for

e, as I

answer mo

answer m

a single

Distribution

an-n-

evel AI evel AI

ng money money

rs. Some frs. Some

ial growth l growth

s law, showaw, show

erful than erful than

ins put tog

ins put to

have

HOW MIGHT AI PROGRESS IN THE FUTURE? 79

Language and common sense

Intelligence without knowledge is like an engine without fuel. Hu-

mans acquire a vast amount of knowledge from other humans: it is

passed down through generations in the form of language. Some of it

is factual: Obama became president in 2009, the density of copper is

8.92 grams per cubic centimeter, the code of Ur- Nammu set out pun-

ishments for various crimes, and so on. A great deal of knowledge re-

sides in the language itself— in the concepts that it makes available.

President, 2009, density, copper, gram, centimeter, crime, and the rest all

carry with them a vast amount of information, which represents the

extracted essence of the processes of discovery and organization that

led them to be in the language in the first place.

Take, for example, copper, which refers to some collection of atoms

in the universe, and compare it to arglebarglium, which is my name for

an equally large collection of entirely randomly selected atoms in the

universe. There are many general, useful, and predictive laws one can

discover about copper— about its density, conductivity, malleability,

melting point, stellar origin, chemical compounds, practical uses, and

so on; in comparison, there is essentially nothing that can be said

about arglebarglium. An organism equipped with a language com-

posed of words like arglebarglium would be unable to function, be-

cause it would never discover the regularities that would allow it to

model and predict its universe.

A machine that really understands human language would be in a

position to quickly acquire vast quantities of human knowledge, al-

lowing it to bypass tens of thousands of years of learning by the more

than one hundred billion people who have lived on Earth. It seems

simply impractical to expect a machine to rediscover all this from

scratch, starting from raw sensory data.

At present, however, natural language technology is not up to

the task of reading and understanding millions of books— many of

M 9780525558613_Human_TX.indd 79 8/7/19 11:21 PM

Not

liu

rds like a

s like a

would nevewould nev

d predid predi

for

origin,

gin

son, there

on, there

m. An o

Distribution

, and

hich repre

ch repre

y and organand organ

place.e.

ers to some rs to some

glebargliumbarglium

irely randoely rando

eral, usefuleral, usefu

out its de

hem

80 HUMAN COMPATIBLE

which would stump even a well- educated human. Systems such as

IBM’s Watson, which famously defeated two human champions of the

Jeopardy! quiz game in 2011, can extract simple information from

clearly stated facts but cannot build complex knowledge structures

from text; nor can they answer questions that require extensive chains

of reasoning with information from multiple sources. For example,

the task of reading all available documents up to the end of 1973 and

assessing (with explanations) the probable outcome of the Watergate

impeachment process against then president Nixon would be well be-

yond the current state of the art.

There are serious efforts underway to deepen the level of language

analysis and information extraction. For example, Project Aristo at

the Allen Institute for AI aims to build systems that can pass school

science exams after reading textbooks and study guides.

Here’s a

question from a fourth- grade test:

Fourth graders are planning a roller- skate race. Which surface

would be the best for this race?

(A) gravel (B) sand (C) blacktop (D) grass

A machine faces at least two sources of difficulty in answering this

question. The first is the classical language- understanding problem of

working out what the sentences say: analyzing the syntactic structure,

identifying the meanings of words, and so on. (Try this for yourself:

use an online translation service to translate the sentences into an

unfamiliar language, then use a dictionary for that language to try

translating them back to English.) The second is the need for common-

sense knowledge: to work out that a “ roller- skate race” is probably a

race between people wearing roller skates (on their feet) rather than

a race between roller skates, to understand that the “surface” is what

the skaters will skate on rather than what the spectators will sit on, to

know what “best” means in the context of a surface for a race, and

9780525558613_Human_TX.indd 80 8/7/19 11:21 PM

Not

ces

e first is th

irst is t

ut what theut what th

g the mg the m

for

sandnd

at least

Distribution

he level of

level of

ple, Projecple, Projec

tems that cs that c

and study gnd study

g a g a

oller-oller-

race?

HOW MIGHT AI PROGRESS IN THE FUTURE? 81

so on. Think how the answer might change if we replaced “fourth grad-

ers” with “sadistic army boot- camp trainers.”

One way to summarize the difficulty is to say that reading requires

knowledge and knowledge (largely) comes from reading. In other

words, we face a classic chicken- and- egg situation. We might hope for

a bootstrapping process, whereby the system reads some easy text,

acquires some knowledge, uses that to read more difficult text, ac-

quires still more knowledge, and so on. Unfortunately, what tends to

happen is the opposite: the knowledge acquired is mostly erroneous,

which causes errors in reading, which results in more erroneous knowl-

edge, and so on.

For example, the NELL ( Never- Ending Language Learning) proj-

ect at Carnegie Mellon University is probably the most ambitious

language- bootstrapping project currently underway. From 2010 to

2018, NELL acquired over 120 million beliefs by reading English text

on the Web.

Some of these beliefs are accurate, such as the beliefs

that the Maple Leafs play hockey and won the Stanley Cup. In addi-

tion to facts, NELL acquires new vocabulary, categories, and semantic

relationships all the time. Unfortunately, NELL has confidence in only

3 percent of its beliefs and relies on human experts to clean out false

or meaningless beliefs on a regular basis— such as its beliefs that “Nepal

is a country also known as United States” and “value is an agricultural

product that is usually cut into basis.”

I suspect that there may be no single breakthrough that turns the

downward spiral into an upward spiral. The basic bootstrapping pro-

cess seems right: a program that knows enough facts can figure out

which fact a novel sentence is referring to, and thereby learns a new

textual form for expressing facts— which then lets it discover more

facts, and so the process continues. (Sergey Brin, the co- founder of

Google, published an important paper on the bootstrapping idea in

1998.

) Priming the pump by supplying a good deal of manually en-

coded knowledge and linguistic information would certainly help.

M 9780525558613_Human_TX.indd 81 8/7/19 11:21 PM

beli

also know

o know

hat is usualhat is usua

ect thatect tha

for

me. Un

iefs and rel

efs and re

efs on a r

fs on a

Distribution

rron

nguage Leanguage Lea

bably the mly the

tly underwly underw

on beliefs bn beliefs b

liefs are accfs are acc

ckey and wckey and w

s new voc

ortu

82 HUMAN COMPATIBLE

Increasing the sophistication of the representation of facts— allowing

for complex events, causal relationships, beliefs and attitudes of oth-

ers, and so on— and improving the handling of uncertainty about word

meanings and sentence meanings may eventually result in a self-

reinforcing rather than self- extinguishing process of learning.

Cumulative learning of concepts and theories

Approximately 1.4 billion years ago and 8.2 sextillion miles away,

two black holes, one twelve million times the mass of the Earth and

the other ten million, came close enough to begin orbiting each other.

Gradually losing energy, they spiraled closer and closer to each other

and faster and faster, reaching an orbital frequency of 250 times per

second at a distance of 350 kilometers before finally colliding and

merging.

In the last few milliseconds, the rate of energy emission in

the form of gravitational waves was fifty times larger than the total

energy output of all the stars in the universe. On September 14, 2015,

those gravitational waves arrived at the Earth. They alternately ex-

panded and compressed space itself by a factor of about one in 2.5

sextillion, equivalent to changing the distance to Proxima Centauri

(4.4 light years) by the width of a human hair.

Fortunately, two days earlier, the Advanced LIGO (Laser Interfer-

ometer Gravitational- Wave Observatory) detectors in Washington

and Louisiana had been switched on. Using laser interferometry, they

were able to measure the minuscule distortion of space; using calcula-

tions based on Einstein’s theory of general relativity, the LIGO re-

searchers had predicted— and were therefore looking for— the exact

shape of the gravitational waveform expected from such an event.

This was possible because of the accumulation and communica-

tion of knowledge and concepts by thousands of people across centu-

ries of observation and research. From Thales of Miletus rubbing

amber with wool and observing the static charge buildup, through

9780525558613_Human_TX.indd 82 8/7/19 11:21 PM

by t

y t

ly, two da

two da

ravitationavitation

iana hadiana had

for

d spac

nt to chang

t to chan

he widt

Distribution

fthe

rbiting eac

iting ea

d closer to d closer to

equency of ency of

before finbefore fin

ds, the rate the rate

was fifty tims fifty tim

n the univern the univ

rived at t

itse

HOW MIGHT AI PROGRESS IN THE FUTURE? 83

Galileo dropping rocks from the Leaning Tower of Pisa, to Newton

seeing an apple fall from a tree, and on through thousands more ob-

servations, humanity has gradually accumulated layer upon layer of

concepts, theories, and devices: mass, velocity, acceleration, force,

Newton’s laws of motion and gravitation, orbital equations, electrical

phenomena, atoms, electrons, electric fields, magnetic fields, electro-

magnetic waves, special relativity, general relativity, quantum me-

chanics, semiconductors, lasers, computers, and so on.

Now, in principle we can understand this process of discovery as a

mapping from all the sensory data ever experienced by all humans to

a very complex hypothesis about the sensory data experienced by the

LIGO scientists on September 14, 2015, as they watched their com-

puter screens. This is the purely data- driven view of learning: data in,

hypothesis out, black box in between. If it could be done, it would be

the apotheosis of the “big data, big network” deep learning approach,

but it cannot be done. The only plausible idea we have for how intelli-

gent entities could achieve such a stupendous feat as detecting the

merger of two black holes is that prior knowledge of physics, combined

with the observational data from their instruments, allowed the LIGO

scientists to infer the occurrence of the merger event. Moreover, this

prior knowledge was itself the result of learning with prior knowledge—

and so on, all the way back through history. Thus, we have a roughly

cumulative picture of how intelligent entities can build predictive ca-

pabilities, with knowledge as the building material.

I say roughly because, of course, science has taken a few wrong

turns over the centuries, temporarily pursuing illusory notions such as

phlogiston and the luminiferous aether. But we know for a fact that

the cumulative picture is what actually happened, in the sense that

scientists all along the way wrote down their findings and theories in

books and papers. Later scientists had access only to these forms of

explicit knowledge, and not to the original sensory experiences of ear-

lier, long- dead generations. Because they are scientists, the members

M 9780525558613_Human_TX.indd 83 8/7/19 11:21 PM

Not

ll the way

the way

picture opicture

, with k, with k

for

data fr

he occurren

he occurr

s itself t

itself t

Distribution

by al

experienc

xperienc

ey watchedey watched

n view of leew of l

f it could beit could b

network” dework” d

plausible ideusible ide

uch a stupuch a stu

s that

priopri

om t

84 HUMAN COMPATIBLE

of the LIGO team understood that all the pieces of knowledge they

used, including Einstein’s theory of general relativity, are (and always

will be) in their probationary period and could be falsified by experi-

ment. As it turned out, the LIGO data provided strong confirmation

for general relativity as well as further evidence that the graviton— a

hypothesized particle that mediates the force of gravity— is massless.

We are a very long way from being able to create machine learn-

ing systems that are capable of matching or exceeding the capacity

for cumulative learning and discovery exhibited by the scientific

community— or by ordinary human beings in their own lifetimes.

Deep learning systems

are mostly data driven: at best, we can “wire

in” some very weak forms of prior knowledge in the structure of the

network. Probabilistic programming systems

do allow for prior

knowledge in the learning process, as expressed in the structure and

vocabulary of the probabilistic knowledge base, but we do not yet have

effective methods for generating new concepts and relationships and

using them to expand such a knowledge base.

The difficulty is not one of finding hypotheses that provide a good

fit to data; deep learning systems can find hypotheses that are a good fit

to image data, and AI researchers have built symbolic learning pro-

grams able to recapitulate many historical discoveries of quantitative

scientific laws.

Learning in an autonomous intelligent agent requires

much more than this.

First, what should be included in the “data” from which predic-

tions are made? For example, in the LIGO experiment, the model for

predicting the amount that space stretches and shrinks when a gravi-

tational wave arrives takes into account the masses of the colliding

black holes, the frequency of their orbits, and so on, but it doesn’t take

into account the day of the week or the occurrence of Major League

baseball games. On the other hand, a model for predicting traffic on

the San Francisco Bay Bridge takes into account the day of the week

and the occurrence of Major League baseball games but ignores the

masses and orbital frequencies of colliding black holes. Similarly,

9780525558613_Human_TX.indd 84 8/7/19 11:21 PM

Not

cap

Learn

e than thisthan thi

what shwhat sh

for

g system

yste

AI research

AI researc

tulate m

Distribution

best, we c

st, we c

n the strucn the struc

do al do a

pressed in pressed in

edge base, bge base, b

new concepw concep

nowledge bnowledge

of finding

ms ca

HOW MIGHT AI PROGRESS IN THE FUTURE? 85

programs that learn to recognize the types of objects in images use the

pixels as input, whereas a program that learns to estimate the value of

an antique object would also want to know what it was made of, who

made it and when, its history of usage and ownership, and so on. Why

is this? Obviously, it’s because we humans already know something

about gravitational waves, traffic, visual images, and antiques. We use

this knowledge to decide which inputs are needed for predicting a

specific output. This is called feature engineering, and doing it well re-

quires a good understanding of the specific prediction problem.

Of course, a real intelligent machine cannot rely on human feature

engineers showing up every time there is something new to learn. It

will have to work out for itself what constitutes a reasonable hypothe-

sis space for a learning problem. Presumably, it will do this by bringing

to bear a wide range of relevant knowledge in various forms, but at

present we have only rudimentary ideas about how to do this.

Nel-

son Goodman’s

Fact, Fiction, and Forecast

— written in 1954 and per-

haps one of the most important and underappreciated books on

machine learning— suggests a kind of knowledge called an overhypoth-

esis, because it helps to define what the space of reasonable hypotheses

might be. In the case of traffic prediction, for example, the relevant

overhypothesis would be that the day of the week, time of day, local

events, recent accidents, holidays, transit delays, weather, and sunrise

and sunset times can influence traffic conditions. (Notice that you can

figure out this overhypothesis from your own background knowledge

of the world, without being a traffic expert.) An intelligent learning

system can accumulate and use knowledge of this kind to help formu-

late and solve new learning problems.

Second, and perhaps more important, is the cumulative generation

of new concepts such as mass, acceleration, charge, electron, and grav-

itational force. Without these concepts, scientists (and ordinary peo-

ple) would have to interpret their universe and make predictions on

the basis of raw perceptual inputs. Instead, Newton was able to work

with concepts of mass and acceleration developed by Galileo and

M 9780525558613_Human_TX.indd 85 8/7/19 11:21 PM

Not

nt acciden

acciden

et times canet times ca

t this ovt this o

for

define

efin

ase of traffi

se of traf

uld be t

ld be t

Distribution

n hu

ing new to

g new to

s a reasonaba reasonab

y, it will do will do

edge in varedge in va

deas about as about

orecastecast

424

ortant and ortant and

s a kind of

s a kind o

wha

86 HUMAN COMPATIBLE

others; Rutherford could determine that the atom was composed of a

dense, positively charged nucleus surrounded by electrons because the

concept of an electron had already been developed (by numerous re-

searchers in small steps) in the late nineteenth century; indeed, all

scientific discoveries rely on layer upon layer of concepts that stretch

back through time and human experience.

In the philosophy of science, particularly in the early twentieth

century, it was not uncommon to see the discovery of new concepts

attributed to the three ineffable I’s: intuition, insight, and inspiration.

All these were considered resistant to any rational or algorithmic ex-

planation. AI researchers, including Herbert Simon,

have objected

strongly to this view. Put simply, if a machine learning algorithm can

search in a space of hypotheses that includes the possibility of adding

definitions for new terms not present in the input, then the algorithm

can discover new concepts.

For example, suppose that a robot is trying to learn the rules of

backgammon by watching people playing the game. It observes how

they roll the dice and notices that sometimes players move three or

four pieces rather than one or two and that this happens after a roll of

1- 1, 2- 2, 3- 3, 4- 4, 5- 5, or 6- 6. If the program can add a new concept

of doubles, defined by equality between the two dice, it can express

the same predictive theory much more concisely. It is a straight-

forward process, using methods such as inductive logic programming,

to create programs that propose new concepts and definitions in order

to identify theories that are both accurate and concise.

At present, we know how to do this for relatively simple cases, but

for more complex theories the number of possible new concepts that

could be introduced becomes simply enormous. This makes the recent

success of deep learning methods in computer vision all the more in-

triguing. The deep networks usually succeed in finding useful inter-

mediate features such as eyes, legs, stripes, and corners, even though

they are using very simple learning algorithms. If we can understand

better how this happens, we can apply the same approach to learning

9780525558613_Human_TX.indd 86 8/7/19 11:21 PM

Not

edictive t

ictive t

rocess, usinrocess, usin

programprogram

for

one or

e o

, or 6or 6

by equa

y equa

Distribution

algor

have

earning algoarning alg

s the possibe possib

the input, thhe input, t

robot is trybot is try

ople playingople playin

es that som

es that so

HOW MIGHT AI PROGRESS IN THE FUTURE? 87

new concepts in the more expressive languages needed for science.

This by itself would be a huge boon to humanity as well as a signifi-

cant step towards general- purpose AI.

Discovering actions

Intelligent behavior over long time scales requires the ability

to plan and manage activity hierarchically, at multiple levels of

abstraction— all the way from doing a PhD (one trillion actions) to a

single motor control command sent to one finger as part of typing a

single character in the application cover letter.

Our activities are organized into complex hierarchies with dozens

of levels of abstraction. These levels and the actions they contain are a

key part of our civilization and are handed down through generations

via our language and practices. For example, actions such as catching a

wild boar and applying for a visa and buying a plane ticket may involve

millions of primitive actions, but we can think about them as single

units because they are already in the “library” of actions that our lan-

guage and culture provides and because we know (roughly) how to

do them.

Once they are in the library, we can string these high- level actions

together into still higher- level actions, such as having a tribal feast

for the summer solstice or doing archaeological research for a summer

in a remote part of Nepal. Trying to plan such activities from scratch,

starting with the lowest- level motor control steps, would be com-

pletely hopeless because such activities involve millions or billions of

steps, many of which are very unpredictable. (Where will the wild

boar be found, and which way will he run?) With suitable high- level

actions in the library, on the other hand, one need plan only a dozen

or so steps, because each such step is a large piece of the overall activ-

ity. This is something that even our feeble human brains can manage—

but it gives us the “superpower” of planning over long time scales.

There was a time when these actions didn’t exist as such—for

M 9780525558613_Human_TX.indd 87 8/7/19 11:21 PM

Not

re i

o still still

ummer solstummer sol

ote part ote part

for

ovides

des

n the lib

Distribution

part

hierarchies ierarchies

e actions thtions th

ded down thed down t

xample, actmple, ac

and

buying buying

but we canbut we c

ady in the

nd b

88 HUMAN COMPATIBLE

example, to obtain the right to a plane journey in 1910 would have

required a long, involved, and unpredictable process of research,

letter writing, and negotiation with various aeronautical pioneers.

Other actions recently added to the library include emailing, googling,

and ubering. As Alfred North Whitehead wrote in 1911, “Civilization

advances by extending the number of important operations which we

can perform without thinking about them.”

Saul Steinberg’s famous cover for The New Yorker (figure 6) bril-

liantly shows, in spatial form, how an intelligent agent manages its

own future. The very immediate future is extraordinarily detailed—

in fact, my brain has already loaded up the specific motor control

FIGURE 6: Saul Steinberg’s View of the World from 9th AvenueILUVWSXE-

OLVKHGDVDFRYHURIThe New Yorker magazine.

9780525558613_Human_TX.indd 88 8/7/19 11:21 PM

Not

to obtain tto obtain

a long, a long,

for

View

e New Yorker

ew York

Dis

f the

HOW MIGHT AI PROGRESS IN THE FUTURE? 89

sequences for typing the next few words. Looking a bit further ahead,

there is less detail— my plan is to finish this section, have lunch, write

some more, and watch France play Croatia in the final of the World

Cup. Still further ahead, my plans are larger but vaguer: move back

from Paris to Berkeley in early August, teach a graduate course, and

finish this book. As one moves through time, the future moves closer

to the present and the plans for it become more detailed, while new,

vague plans may be added to the distant future. Plans for the immedi-

ate future become so detailed that they are executable directly by the

motor control system.

At present we have only some pieces of this overall picture in place

for AI systems. If the hierarchy of abstract actions is provided—

including knowledge of how each abstract action can be refined into a

subplan composed of more concrete actions— then we have algorithms

that can construct complex plans to achieve specific goals. There are

algorithms that can execute abstract, hierarchical plans in such a way

that the agent always has a primitive, physical action “ready to go,”

even if actions in the future are still at an abstract level and not yet

executable.

The main missing piece of the puzzle is a method for constructing

the hierarchy of abstract actions in the first place. For example, is it

possible to start from scratch with a robot that knows only that it can

send various electric currents to various motors and have it discover

for itself the action of standing up? It’s important to understand that

I’m not asking whether we can train a robot to stand up, which can be

done simply by applying reinforcement learning with a reward for the

robot’s head being farther away from the ground.

Training a robot to

stand up requires that the human trainer already knows what standing

up means, so that the right reward signal can be defined. What we

want is for the robot to discover for itself that standing up

is a thing— a

useful abstract action, one that achieves the precondition (being up-

right) for walking or running or shaking hands or seeing over a wall

and so forms part of many abstract plans for all kinds of goals.

M 9780525558613_Human_TX.indd 89 8/7/19 11:21 PM

Not

f ab

tart from

rt from

ous electrius electr

the actithe act

for

ng piece of

ng piece o

stract a

Distribution

erall pictur

ll pictur

actions is actions i

action can bon can b

ons—ons—

hen when

achieve spchieve sp

ract, hierarct, hierar

primitive, pprimitive,

re are still

re are stil

90 HUMAN COMPATIBLE

Similarly, we want the robot to discover actions such as moving from

place to place, picking up objects, opening doors, tying knots, cooking

dinner, finding my keys, building houses, and many other actions that

have no names in any human language because we humans have not

discovered them yet.

I believe this capability is the most important step needed to reach

human- level AI. It would, to borrow Whitehead’s phrase again, ex-

tend the number of important operations that AI systems can perform

without thinking about them. Numerous research groups around the

world are hard at work on solving the problem. For example, Deep-

Mind’s 2018 paper showing human- level performance on Quake III

Arena Capture the Flag claims that their learning system “constructs

a temporally hierarchical representation space in a novel way to

promote... temporally coherent action sequences.”

(I’m not com-

pletely sure what this means, but it certainly sounds like progress to-

wards the goal of inventing new high- level actions.) I suspect that we

do not yet have the complete answer, but this is an advance that could

occur any moment, just by putting some existing ideas together in the

right way.

Intelligent machines with this capability would be able to look fur-

ther into the future than humans can. They would also be able to take

into account far more information. These two capabilities combined

lead inevitably to better real- world decisions. In any kind of conflict

situation between humans and machines, we would quickly find, like

Garry Kasparov and Lee Sedol, that our every move has been antici-

pated and blocked. We would lose the game before it even started.

Managing mental activity

If managing activity in the real world seems complex, spare a

thought for your poor brain, managing the activity of the “most com-

plex object in the known universe”— itself. We don’t start out know-

ing how to think, any more than we start out knowing how to walk or

9780525558613_Human_TX.indd 90 8/7/19 11:21 PM

Not

ure

far more

r more

tably to bettably to be

betweenbetween

for

ines with th

nes with t

than hu

han hu

Distribution

xam

ance on Q

ce on Q

ng system “ng system

pace in a e in a

sequencessequences

ertainly soutainly so

igh-h-

evel acevel ac

nswer, but nswer, bu

utting som

HOW MIGHT AI PROGRESS IN THE FUTURE? 91

play the piano. We learn how to do it. We can, to some extent, choose

what thoughts to have. (Go on, think about a juicy hamburger or Bul-

garian customs regulations— your choice!) In some ways, our mental

activity is more complex than our activity in the real world, because

our brains have far more moving parts than our bodies and those parts

move much faster. The same is true for computers: for every move

that AlphaGo makes on the Go board, it performs millions or billions

of units of computation, each of which involves adding a branch to the

lookahead search tree and evaluating the board position at the end of

that branch. And each of those units of computation happens because

the program makes a choice about which part of the tree to explore

next. Very approximately, AlphaGo chooses computations that it ex-

pects will improve its eventual decision on the board.

It has been possible to work out a reasonable scheme for managing

AlphaGo’s computational activity because that activity is simple and

homogeneous: every unit of computation is of the same kind. Com-

pared to other programs that use that same basic unit of computation,

AlphaGo is probably quite efficient, but it’s probably extremely ineffi-

cient compared to other kinds of programs. For example, Lee Sedol,

AlphaGo’s human opponent in the epochal match of 2016, probably

does no more than a few thousand units of computation per move, but

he has a much more flexible computational architecture with many

more kinds of units of computation: these include dividing the board

into subgames and trying to resolve their interactions; recognizing

possible goals to attain and making high- level plans with actions like

“keep this group alive” or “prevent my opponent from connecting

these two groups”; thinking about how to achieve a specific goal, such

as keeping a group alive; and ruling out whole classes of moves be-

cause they fail to address a significant threat.

We simply don’t know how to organize such complex and varied

computational activity— how to integrate and build on the results

from each and how to allocate computational resources to the various

kinds of deliberation so that good decisions are found as quickly as

M 9780525558613_Human_TX.indd 91 8/7/19 11:21 PM

Not

ch more

h more

ds of units ds of unit

games agames a

for

er kind

kin

opponent i

opponent

a few th

few th

Distribution

happ

the tree t

he tree t

omputationomputation

the board.board.

asonable schsonable sc

ecause thatause tha

mputation isutation is

use that samuse that sa

efficient, b

efficient,

sof

92 HUMAN COMPATIBLE

possible. It is clear, however, that a simple computational architecture

like AlphaGo’s cannot possibly work in the real world, where we rou-

tinely need to deal with decision horizons of not tens but billions of

primitive steps and where the number of possible actions at any point

is almost infinite. It’s important to remember that an intelligent agent

in the real world is not restricted to playing Go or even finding Stuart’s

keys— it’s just being. It can do anything next, but it cannot possibly

afford to think about all the things it might do.

A system that can both discover new high- level actions— as de-

scribed earlier— and manage its computational activity to focus on

units of computation that quickly deliver significant improvements in

decision quality would be a formidable decision maker in the real

world. Like those of humans, its deliberations would be “cognitively

efficient,” but it would not suffer from the tiny short- term memory

and slow hardware that severely limit our ability to look far into the

future, handle a large number of contingencies, and consider a large

number of alternative plans.

More things missing?

If we put together everything we know how to do with all the po-

tential new developments listed in this chapter, would it work? How

would the resulting system behave? It would plow through time, ab-

sorbing vast quantities of information and keeping track of the state of

the world on a massive scale by observation and inference. It would

gradually improve its models of the world (which include models of

humans, of course). It would use those models to solve complex prob-

lems and it would encapsulate and reuse its solution processes to make

its deliberations more efficient and to enable the solution of still more

complex problems. It would discover new concepts and actions, and

these would allow it to improve its rate of discovery. It would make

effective plans over increasingly long time scales.

In summary, it’s not obvious that anything else of great signifi-

9780525558613_Human_TX.indd 92 8/7/19 11:21 PM

Not

eth

evelopme

elopme

e resulting se resulting

st quantst quan

for

issing?sing?

er everyt

r every

Distribution

ty to

t improve

mprove

on maker on maker

ons would bwould

the tiny he tiny

t our abilitour abilit

contingencontingenc

HOW MIGHT AI PROGRESS IN THE FUTURE? 93

cance is missing, from the point of view of systems that are effective

in achieving their objectives. Of course, the only way to be sure is to

build it (once the breakthroughs have been achieved) and see what

happens.

Imagining a Superintelligent Machine

The technical community has suffered from a failure of imagination

when discussing the nature and impact of superintelligent AI. Often,

we see discussions of reduced medical errors,

safer cars,

or other

advances of an incremental nature. Robots are imagined as individual

entities carrying their brains with them, whereas in fact they are likely

to be wirelessly connected into a single, global entity that draws on

vast stationary computing resources. It’s as if researchers are afraid of

examining the real consequences of success in AI.

A general- purpose intelligent system can, by assumption, do what

any human can do. For example, some humans did a lot of mathemat-

ics, algorithm design, coding, and empirical research to come up with

the modern search engine. The results of all this work are very useful

and of course very valuable. How valuable? A recent study showed

that the median American adult surveyed would need to be paid at

least $17,500 to give up using search engines for a year,

which trans-

lates to a global value in the tens of trillions of dollars.

Now imagine that search engines don’t exist yet because the nec-

essary decades of work have not been done, but you have access in-

stead to a superintelligent AI system. Simply by asking the question,

you now have access to search engine technology, courtesy of the AI

system. Done! Trillions of dollars in value, just for the asking, and not

a single line of additional code written by you. The same goes for any

other missing invention or series of inventions: if humans could do it,

so can the machine.

This last point provides a useful lower bound— a pessimistic

M 9780525558613_Human_TX.indd 93 8/7/19 11:21 PM

Not

very

dian Ame

an Ame

500 to give500 to giv

global vglobal

for

coding

din

engine. Th

engine. T

valuab

Distribution

igent

afer cars,er cars,

imagined aimagined a

ereas in facas in fac

e, global en, global en

It’s as if ret’s as if re

of success f success

gent systement system

mple, som

and

94 HUMAN COMPATIBLE

estimate— on what a superintelligent machine can do. By assumption,

the machine is more capable than an individual human. There are

many things an individual human cannot do, but a collection of n

humans can do: put an astronaut on the Moon, create a gravitational-

wave detector, sequence the human genome, run a country with hun-

dreds of millions of people. So, roughly speaking, we create n software

copies of the machine and connect them in the same way— with the

same information and control flows— as the n humans. Now we have

a machine that can do whatever n humans can do, except better, be-

cause each of its n components is superhuman.

This

multi- agent cooperation design for an intelligent system is just

a lower bound on the possible capabilities of machines because there

are other designs that work better. In a collection of n humans, the

total available information is kept separately in n brains and commu-

nicated very slowly and imperfectly between them. That’s why the n

humans spend most of their time in meetings. In the machine, there

is no need for this separation, which often prevents connecting the

dots. For an example of disconnected dots in scientific discovery, a

brief perusal of the long history of penicillin is quite eye- opening.

Another useful method of stretching your imagination is to think

about some particular form of sensory input— say, reading— and scale

it up. Whereas a human can read and understand one book in a week,

a machine could read and understand every book ever written— all 150

million of them— in a few hours. This requires a decent amount of

processing power, but the books can be read largely in parallel, mean-

ing that simply adding more chips allows the machine to scale up its

reading process. By the same token, the machine can see everything at

once through satellites, robots, and hundreds of millions of surveil-

lance cameras; watch all the world’s TV broadcasts; and listen to all

the world’s radio stations and phone conversations. Very quickly it

would gain a far more detailed and accurate understanding of the world

and its inhabitants than any human could possibly hope to acquire.

One can also imagine scaling the machine’s capacity for action. A

9780525558613_Human_TX.indd 94 8/7/19 11:21 PM

Not

cul

as a huma

a huma

could read could read

hem—hem—

for

g histo

ist

method of s

method of

ar form

r form

Distribution

gent syste

nt syste

achines becachines bec

lection of tion of

tely in tely in

br br

between thetween th

in meetingn meeting

which oft which o

connected

connecte

yof

HOW MIGHT AI PROGRESS IN THE FUTURE? 95

human has direct control over only one body, while a machine can

control thousands or millions. Some automated factories already ex-

hibit this characteristic. Outside the factory, a machine that controls

thousands of dexterous robots can, for example, produce vast numbers

of houses, each one tailored to its future occupants’ needs and desires.

In the lab, existing robotic systems for scientific research

could be

scaled up to perform millions of experiments simultaneously— perhaps

to create complete predictive models of human biology down to the

molecular level. Note that the machine’s reasoning capabilities will

give it a far greater capacity to detect inconsistencies between scien-

tific theories and between theories and observations. Indeed, it may

already be the case that we have enough experimental evidence about

biology to devise a cure for cancer: we just haven’t put it together.

In the cyber realm, machines already have access to billions of ef-

fectors—namely, the displays on all the phones and computers in the

world. This partly explains the ability of IT companies to generate

enormous wealth with very few employees; it also points to the severe

vulnerability of the human race to manipulation via screens.

Scale of a different kind comes from the machine’s ability to look

further into the future, with greater accuracy, than is possible for hu-

mans. We have seen this for chess and Go already; with the capacity

for generating and analyzing hierarchical plans over long time scales

and the ability to identify new abstract actions and high- level descrip-

tive models, machines will transfer this advantage to domains such as

mathematics (proving novel, useful theorems) and decision making in

the real world. Tasks such as evacuating a large city in the event of an

environmental disaster will be relatively straightforward, with the

machine able to generate individual guidance for every person and

vehicle to minimize the number of casualties.

The machine might work up a slight sweat when devising pol-

icy recommendations to prevent global warming. Earth systems mod-

eling requires knowledge of physics (atmosphere, oceans), chemistry

(carbon cycle, soils), biology (decomposition, migration), engineering

M 9780525558613_Human_TX.indd 95 8/7/19 11:21 PM

Not

see

ng and an

and an

ability to ideability to id

els, macels, mac

for

kind

ind

ture, with g

ure, with

n this fo

Distribution

s bet

ons. Indee

ns. Indee

imental evimental evi

haven’t puten’t pu

y have accey have acce

the phonese phone

ability of Ibility of I

ew employew employ

race to m

ome

96 HUMAN COMPATIBLE

(renewable energy, carbon capture), economics (industry, energy use),

human nature (stupidity, greed), and politics (even more stupidity,

even more greed). As noted, the machine will have access to vast

quantities of evidence to feed all these models. It will be able to sug-

gest or carry out new experiments and expeditions to narrow down

the inevitable uncertainties— for example, to discover the true extent

of gas hydrates in shallow ocean reservoirs. It will be able to consider

a vast range of possible policy recommendations— laws, nudges, mar-

kets, inventions, and geoengineering interventions— but of course it

will also need to find ways to persuade us to go along with them.

The Limits of Superintelligence

While stretching your imagination, don’t stretch it too far. A common

mistake is to attribute godlike powers of omniscience to superintelli-

gent AI systems— complete and perfect knowledge not just of the

present but also of the future.

This is quite implausible because it

requires an unphysical ability to determine the exact current state of

the world as well as an unrealizable ability to simulate, much faster

than real time, the operation of a world that includes the machine it-

self (not to mention billions of brains, which would still be the second-

most- complex objects in the universe).

This is not to say that it is impossible to predict some aspects of the

future with a reasonable degree of certainty— for example, I know

what class I’ll be teaching in what room at Berkeley almost a year from

now, despite the protestations of chaos theorists about butterfly wings

and all that. (Nor do I think that humans are anywhere close to pre-

dicting the future as well as the laws of physics allow!) Prediction

depends on having the right abstractions— for example, I can predict

that “I” will be “on stage in Wheeler Auditorium” on the Berkeley

campus on the last Tuesday in April, but I cannot predict my exact

9780525558613_Human_TX.indd 96 8/7/19 11:21 PM

Not

ention bil

tion bil

plex objectplex objec

not to snot to

for

ability

lit

s an unreal

an unrea

operatio

peratio

Distribution

with

on’t stretcht stretch

wers of omers of om

nd perfect nd perfec

re.

This This

to de

HOW MIGHT AI PROGRESS IN THE FUTURE? 97

location down to the millimeter or which atoms of carbon will have

been incorporated into my body by then.

Machines are also subject to certain speed limits imposed by the

real world on the rate at which new knowledge of the world can

be acquired— one of the valid points made by Kevin Kelly in his arti-

cle on oversimplified predictions about superhuman AI.

For exam-

ple, to determine whether a specific drug cures a certain kind of

cancer in an experimental animal, a scientist— human or machine—

has two choices: inject the animal with the drug and wait several

weeks or run a sufficiently accurate simulation. To run a simulation,

however, requires a great deal of empirical knowledge of biology, some

of which is currently unavailable; so, more model- building experi-

ments would have to be done first. Undoubtedly, these would take

time and must be done in the real world.

On the other hand, a machine scientist could run vast numbers of

model- building experiments in parallel, could integrate their out-

comes into an internally consistent (albeit very complex) model, and

could compare the model’s predictions with the entirety of experi-

mental evidence known to biology. Moreover, simulating the model

does not necessarily require a quantum- mechanical simulation of the

entire organism down to the level of individual molecular reactions—

which, as Kelly points out, would take more time than simply doing

the experiment in the real world. Just as I can predict my future loca-

tion on Tuesdays in April with some certainty, properties of biological

systems can be predicted accurately with abstract models. (Among

other reasons, this is because biology operates with robust control sys-

tems based on aggregate feedback loops, so that small variations in

initial conditions usually don’t lead to large variations in outcomes.)

Thus, while instantaneous machine discoveries in the empirical sci-

ences are unlikely, we can expect that science will proceed much

faster with the help of machines. Indeed, it already is.

A final limitation of machines is that they are not human. This puts

M 9780525558613_Human_TX.indd 97 8/7/19 11:21 PM

Not

elly point

y point

riment in thriment in t

uesdaysuesdays

for

wn to b

y require a

y require

wn to th

Distribution

un a

dge of biol

e of biol

odel-odel-

uilduild

ubtedly, thedly, th

ientist coulntist cou

parallel, carallel, c

sistent (albeistent (alb

s predictio

s predicti

olog

98 HUMAN COMPATIBLE

them at an intrinsic disadvantage when trying to model and predict

one particular class of objects: humans. Our brains are all quite simi-

lar, so we can use them to simulate— to experience, if you will— the

mental and emotional lives of others. This, for us, comes for free. (If

you think about it, machines have an even greater advantage with each

other: they can actually run each other’s code!) For example, I don’t

need to be an expert on neural sensory systems to know what it feels

like when you hit your thumb with a hammer. I can just hit my thumb

with a hammer. Machines, on the other hand, have to start almost

from scratch in their understanding of humans: they have access only

to our external behavior, plus all the neuroscience and psychology lit-

erature, and have to develop an understanding of how we work on that

basis. In principle, they will be able to do this, but it’s reasonable to

suppose that acquiring a human- level or superhuman understanding of

humans will take them longer than most other capabilities.

How Will AI Benefit Humans?

Our intelligence is responsible for our civilization. With access to

greater intelligence we could have a greater— and perhaps far better—

civilization. One can speculate about solving major open problems

such as extending human life indefinitely or developing faster- than-

light travel, but these staples of science fiction are not yet the driving

force for progress in AI. (With superintelligent AI, we’ll probably be

able to invent all sorts of quasi- magical technologies, but it’s hard to

say now what those might be.) Consider, instead, a far more prosaic

goal: raising the living standard of everyone on Earth, in a sustainable

way, to a level that would be viewed as quite respectable in a devel-

oped country. Choosing (somewhat arbitrarily) respectable to mean

the eighty- eighth percentile in the United States, the stated goal rep-

resents almost a tenfold increase in global gross domestic product

(GDP), from $76 trillion to $750 trillion per year.

9780525558613_Human_TX.indd 98 8/7/19 11:21 PM

Not

One can

ne can

xtending huxtending h

l, but thl, but t

for

responsible

responsib

we could

Distribution

ave

and psycho

d psycho

f how we whow we w

his, but it’sbut it’

uperhumanuperhuman

ost other cat other c

it Hum

HOW MIGHT AI PROGRESS IN THE FUTURE? 99

To calculate the cash value of such a prize, economists use the net

present value of the income stream, which takes into account the dis-

counting of future income relative to the present. The extra income of

$674 trillion per year has a net present value of roughly $13,500 tril-

lion,

assuming a discount factor of 5 percent. So, in very crude terms,

this is a ballpark figure for what human- level AI might be worth if it

can deliver a respectable living standard for everyone. With numbers

like this, it’s not surprising that companies and countries are investing

tens of billions of dollars annually in AI research and development.

Even so, the sums invested are minuscule compared to the size of

the prize.

Of course, these are all made- up numbers unless one has some

idea of how human- level AI could achieve the feat of raising living

standards. It can do this only by increasing the per- capita production

of goods and services. Put another way: the average human can never

expect to consume more than the average human produces. The ex-

ample of self- driving taxis discussed earlier in the chapter illustrates

the multiplier effect of AI: with an automated service, it should be

possible for (say) ten people to manage a fleet of one thousand vehi-

cles, so each person is producing one hundred times as much transpor-

tation as before. The same goes for manufacturing the cars and for

extracting the raw materials from which the cars are made. Indeed,

some iron-ore mining operations in northern Australia, where tem-

peratures regularly exceed 45 degrees Celsius (113 degrees Fahren-

heit), are almost completely automated already.

These present- day applications of AI are special- purpose systems:

self- driving cars and self- operating mines have required huge invest-

ments in research, mechanical design, software engineering, and test-

ing to develop the necessary algorithms and to make sure that they

work as intended. That’s just how things are done in all spheres of

engineering. That’s how things used to be done in personal travel too:

if you wanted to travel from Europe to Australia and back in the sev-

enteenth century, it would have involved a huge project costing vast

M 9780525558613_Human_TX.indd 99 8/7/19 11:21 PM

Not

e. T

he raw m

raw m

n-ore minin-ore min

s regulas regula

for

people

opl

is producin

is produc

he same

he sam

Distribution

d to

s unless on unless on

e the feat oe feat

ing the ng the

way: the avey: the ave

he average haverage h

scussed earscussed ea

: with an

100 HUMAN COMPATIBLE

sums of money, requiring years of planning, and carrying a high risk of

death. Now we are used to the idea of transportation as a service

(TaaS): if you need to be in Melbourne early next week, it just requires

a few taps on your phone and a relatively minuscule amount of money.

General- purpose AI would be everything as a service (EaaS). There

would be no need to employ armies of specialists in different disci-

plines, organized into hierarchies of contractors and subcontractors,

in order to carry out a project. All embodiments of general- purpose

AI would have access to all the knowledge and skills of the human

race, and more besides. The only differentiation would be in the

physical capabilities: dexterous legged robots for construction or sur-

gery, wheeled robots for large- scale goods transportation, quadcopter

robots for aerial inspections, and so on. In principle— politics and eco-

nomics aside— everyone could have at their disposal an entire organi-

zation composed of software agents and physical robots, capable of

designing and building bridges, improving crop yields, cooking dinner

for a hundred guests, running elections, or doing whatever else needs

doing. It’s the generality

of general- purpose intelligence that makes

this possible.

History has shown, of course, that a tenfold increase in global GDP

per capita is possible without AI— it’s just that it took 190 years (from

1820 to 2010) to achieve that increase.

It required the development

of factories, machine tools, automation, railways, steel, cars, airplanes,

electricity, oil and gas production, telephones, radio, television, com-

puters, the Internet, satellites, and many other revolutionary inven-

tions. The tenfold increase in GDP posited in the preceding paragraphs

is predicated not on further revolutionary technologies but on the

ability of AI systems to employ what we already have more effectively

and at greater scale.

Of course, there will be effects besides the purely material benefit

of raising living standards. For example, personal tutoring is known to

be far more effective than classroom teaching, but when done by hu-

mans it is simply unaffordable— and always will be— for the vast

9780525558613_Human_TX.indd 100 8/7/19 11:21 PM

Not

ible

) to achie

o achie

s, machine s, machine

, oil and, oil and

for

wn, of cours

n, of cour

withou

Distribution

ould

onstructio

structio

portation, qportation, q

inciple—ple—

eir disposaeir disposa

and physicnd physic

mproving croroving cro

elections, oelections,

eneral-eneral-

HOW MIGHT AI PROGRESS IN THE FUTURE? 101

majority of people. With AI tutors, the potential of each child, no

matter how poor, can be realized. The cost per child would be negli-

gible, and that child would live a far richer and more productive life.

The pursuit of artistic and intellectual endeavors, whether individu-

ally or collectively, would be a normal part of life rather than a rar-

efied luxury.

In the area of health, AI systems should enable researchers to un-

ravel and master the vast complexities of human biology and thereby

gradually banish disease. Greater insights into human psychology and

neurochemistry should lead to broad improvements in mental health.

Perhaps more unconventionally, AI could enable far more effective

authoring tools for virtual reality (VR) and could populate VR envi-

ronments with far more interesting entities. This might turn VR into

the medium of choice for literary and artistic expression, creating ex-

periences of a richness and depth that is currently unimaginable.

And in the mundane world of daily life, an intelligent assistant and

guide would— if well designed and not co- opted by economic and po-

litical interests— empower every individual to act effectively on their

own behalf in an increasingly complex and sometimes hostile eco-

nomic and political system. You would, in effect, have a high- powered

lawyer, accountant, and political adviser on call at any time. Just as

traffic jams are expected to be smoothed out by intermixing even a

small percentage of autonomous vehicles, one can only hope that wiser

policies and fewer conflicts will emerge from a better- informed and

better- advised global citizenry.

These developments taken together could change the dynamic of

history— at least that part of history that has been driven by conflicts

within and between societies for access to the wherewithal of life. If

the pie is essentially infinite, then fighting others for a larger share

makes little sense. It would be like fighting over who gets the most

digital copies of the newspaper— completely pointless when anyone

can make as many digital copies as they want for free.

There are some limits to what AI can provide. The pies of land and

M 9780525558613_Human_TX.indd 101 8/7/19 11:21 PM

Not

ant

are expec

e expec

rcentage of arcentage of

and fewand few

for

reasing

sin

system. Yo

system. Y

, and po

and po

Distribution

n me

le far more

far more

ould populauld popula

. This mighhis mig

rtistic exprtistic expr

at is currenis curren

f daily life, aaily life, a

d and not cod and not

every indi

every ind

yco

102 HUMAN COMPATIBLE

raw materials are not infinite, so there cannot be unlimited popula-

tion growth and not everyone will have a mansion in a private park.

(This will eventually necessitate mining elsewhere in the solar system

and constructing artificial habitats in space; but I promised not to talk

about science fiction.) The pie of pride is also finite: only 1 percent of

people can be in the top 1 percent on any given metric. If human hap-

piness requires being in the top 1 percent, then 99 percent of humans

are going to be unhappy, even when the bottom 1 percent has an ob-

jectively splendid lifestyle.

It will be important, then, for our cul-

tures to gradually down- weight pride and envy as central elements of

perceived self- worth.

As Nick Bostrom puts it at the end of his book Superintelligence,

success in AI will yield “a civilizational trajectory that leads to a com-

passionate and jubilant use of humanity’s cosmic endowment.” If we

fail to take advantage of what AI has to offer, we will have only our-

selves to blame.

9780525558613_Human_TX.indd 102 8/7/19 11:21 PM

Not

for

Distribution

ral e

book ook

SuperSuper

ctory that ly that

s cosmic ens cosmic e

to offer, wo offer, w

MISUSES OF AI

compassionate and jubilant use of humanity’s cosmic endow-

ment sounds wonderful, but we also have to reckon with

the rapid rate of innovation in the malfeasance sector. Ill-

intentioned people are thinking up new ways to misuse AI so quickly

that this chapter is likely to be outdated even before it attains printed

form. Think of it not as depressing reading, however, but as a call to

act before it is too late.

6XUYHLOODQFH3HUVXDVLRQDQG&RQWURO

The automated Stasi

The Ministerium für Staatsicherheit of East Germany, more com-

monly known as the Stasi, is widely regarded as “one of the most effec-

tive and repressive intelligence and secret police agencies to have ever

existed.”

It maintained files on the great majority of East German

households. It monitored phone calls, read letters, and planted hidden

cameras in apartments and hotels. It was ruthlessly effective at identi-

fying and eliminating dissident activity. Its preferred modus operandi

M 9780525558613_Human_TX.indd 103 8/7/19 11:21 PM

Not

LOODQFHLOODQFH

for

as dep

ate.ate.

Distribution

of humanitof humanit

ut we also we also

tion in theon in the

ng up new ng up new

be outdat

be outda

essin

104 HUMAN COMPATIBLE

was psychological destruction rather than imprisonment or execution.

This level of control came at great cost, however: by some estimates,

more than a quarter of working- age adults were Stasi informants. Stasi

paper records have been estimated at twenty billion pages

and the

task of processing and acting on the huge incoming flows of informa-

tion began to exceed the capacity of any human organization.

It should come as no surprise, then, that intelligence agencies have

spotted the potential for using AI in their work. For many years, they

have been applying simple forms of AI technology, including voice

recognition and identification of key words and phrases in both speech

and text. Increasingly, AI systems are able to understand the content of

what people are saying and doing, whether in speech, text, or video

surveillance. In regimes where this technology is adopted for the pur-

poses of control, it will be as if every citizen had their own personal

Stasi operative watching over them twenty- four hours a day.

Even in the civilian sphere, in relatively free countries, we are sub-

ject to increasingly effective surveillance. Corporations collect and

sell information about our purchases, Internet and social network us-

age, electrical appliance usage, calling and texting records, employ-

ment, and health. Our locations can be tracked through our cell

phones and our Internet- connected cars. Cameras recognize our faces

on the street. All this data, and much more, can be pieced together by

intelligent information integration systems to produce a fairly com-

plete picture of what each of us is doing, how we live our lives, who

we like and dislike, and how we will vote.

The Stasi will look like

amateurs by comparison.

Controlling your behavior

Once surveillance capabilities are in place, the next step is to mod-

ify your behavior to suit those who are deploying this technology. One

rather crude method is automated, personalized blackmail: a system

that understands what you are doing— whether by listening, reading,

9780525558613_Human_TX.indd 104 8/7/19 11:21 PM

Not

nte

All this d

ll this d

informatioinformat

ure of wure of w

for

e usag

usa

Our locati

Our locat

net-net-

onon

Distribution

in b

tand the c

nd the c

speech, texpeech, tex

gy is adoptes adopt

tizen had thzen had th

wenty-nty-

our our

fff

relatively fratively fr

surveillancesurveillan

urchases,

,ca

MISUSES OF AI 105

or watching you— can easily spot things you should not be doing.

Once it finds something, it will enter into correspondence with you to

extract the largest possible amount of money (or to coerce behavior, if

the goal is political control or espionage). The extraction of money

works as the perfect reward signal for a reinforcement learning algo-

rithm, so we can expect AI systems to improve rapidly in their ability

to identify and profit from misbehavior. Early in 2015, I suggested to

a computer security expert that automated blackmail systems, driven

by reinforcement learning, might soon become feasible; he laughed

and said it was already happening. The first blackmail bot to be widely

publicized was Delilah, identified in July 2016.

A more subtle way to change people’s behavior is to modify their

information environment so that they believe different things and

make different decisions. Of course, advertisers have been doing this

for centuries as a way of modifying the purchasing behavior of individ-

uals. Propaganda as a tool of war and political domination has an even

longer history.

So what’s different now? First, because AI systems can track an

individual’s online reading habits, preferences, and likely state of

knowledge, they can tailor specific messages to maximize impact on

that individual while minimizing the risk that the information will be

disbelieved. Second, the AI system knows whether the individual

reads the message, how long they spend reading it, and whether they

follow additional links within the message. It then uses these signals as

immediate feedback on the success or failure of its attempt to influ-

ence each individual; in this way, it quickly learns to become more

effective in its work. This is how content selection algorithms on so-

cial media have had their insidious effect on political opinions.

Another recent change is that the combination of AI, computer

graphics, and speech synthesis is making it possible to generate

deepfakes— realistic video and audio content of just about anyone say-

ing or doing just about anything. The technology will require little

more than a verbal description of the desired event, making it usable

M 9780525558613_Human_TX.indd 105 8/7/19 11:21 PM

Not

Second,

econd,

e message, he message,

ditionalditiona

for

ading

ing

an tailor sp

n tailor sp

le minim

e minim

Distribution

bot t

avior is to mvior is to m

lieve differe diffe

vertisers havertisers ha

he purchasipurchas

and politicnd politic

w? First, b

habit

106 HUMAN COMPATIBLE

by more or less anyone in the world. Cell phone video of Senator X

accepting a bribe from cocaine dealer Y at shady establishment Z? No

problem! This kind of content can induce unshakeable beliefs in things

that never happened.

In addition, AI systems can generate millions of

false identities— the so- called bot armies— that can pump out billions

of comments, tweets, and recommendations daily, swamping the ef-

forts of mere humans to exchange truthful information. Online market-

places such as eBay, Taobao, and Amazon that rely on reputation

systems

to build trust between buyers and sellers are constantly at

war with bot armies designed to corrupt the markets.

Finally, methods of control can be direct if a government is able to

implement rewards and punishments based on behavior. Such a sys-

tem treats people as reinforcement learning algorithms, training them

to optimize the objective set by the state. The temptation for a gov-

ernment, particularly one with a top- down, engineering mind-set, is to

reason as follows: it would be better if everyone behaved well, had a

patriotic attitude, and contributed to the progress of the country;

technology enables measurement of individual behavior, attitudes,

and contributions; therefore, everyone will be better off if we set up a

technology- based system of monitoring and control based on rewards

and punishments.

There are several problems with this line of thinking. First, it ig-

nores the psychic cost of living under a system of intrusive monitoring

and coercion; outward harmony masking inner misery is hardly an

ideal state. Every act of kindness ceases to be an act of kindness and

becomes instead an act of personal score maximization and is per-

ceived as such by the recipient. Or worse, the very concept of a volun-

tary act of kindness gradually becomes just a fading memory of

something people used to do. Visiting an ailing friend in hospital will,

under such a system, have no more moral significance and emotional

value than stopping at a red light. Second, the scheme falls victim to

the same failure mode as the standard model of AI, in that it assumes

that the stated objective is in fact the true, underlying objective.

9780525558613_Human_TX.indd 106 8/7/19 11:21 PM

Not

several p

everal p

psychic cospsychic co

ion; oution; ou

for

efore, e

re,

stem of mo

stem of m

Distribution

vernment i

rnment

behavior. Sbehavior. S

algorithms,rithms

e. The teme. The tem

own, enginwn, engin

ter if everyr if every

buted to thbuted to

ement of

very

MISUSES OF AI 107

Inevitably, Goodhart’s law will take over, whereby individuals opti-

mize the official measure of outward behavior, just as universities

have learned to optimize the “objective” measures of “quality” used by

university ranking systems instead of improving their real (but un-

measured) quality.

Finally, the imposition of a uniform measure of

behavioral virtue misses the point that a successful society may com-

prise a wide variety of individuals, each contributing in their own way.

A right to mental security

One of the great achievements of civilization has been the gradual

improvement in physical security for humans. Most of us can expect

to conduct our daily lives without constant fear of injury and death.

Article 3 of the 1948 Universal Declaration of Human Rights states,

“Everyone has the right to life, liberty and security of person.”

I would like to suggest that everyone should also have the right to

mental security— the right to live in a largely true information envi-

ronment. Humans tend to believe the evidence of our eyes and ears.

We trust our family, friends, teachers, and (some) media sources to

tell us what they believe to be the truth. Even though we do not ex-

pect used- car salespersons and politicians to tell us the truth, we have

trouble believing that they are lying as brazenly as they sometimes do.

We are, therefore, extremely vulnerable to the technology of misin-

formation.

The right to mental security does not appear to be enshrined in the

Universal Declaration. Articles 18 and 19 establish the rights of “free-

dom of thought” and “freedom of opinion and expression.” One’s

thoughts and opinions are, of course, partly formed by one’s informa-

tion environment, which, in turn, is subject to Article 19’s “right to...

impart information and ideas through any media and regardless of

frontiers.” That is, anyone, anywhere in the world, has the right to

impart false information to you. And therein lies the difficulty: dem-

ocratic nations, particularly the United States, have for the most part

M 9780525558613_Human_TX.indd 107 8/7/19 11:21 PM

Not

ales

eving that

ng that

therefore, etherefore,

for

friends

end

elieve to be

elieve to b

persons

ersons

Distribution

has been th

been th

Most of usMost of us

t fear of injar of in

ation of Hution of Hu

ty and secuand secu

veryone shoryone sho

o live in a la live in a

believe th

tea

108 HUMAN COMPATIBLE

been reluctant— or constitutionally unable— to prevent the imparting

of false information on matters of public concern because of justifiable

fears regarding government control of speech. Rather than pursuing

the idea that there is no freedom of thought without access to true

information, democracies seem to have placed a naïve trust in the idea

that the truth will win out in the end, and this trust has left us unpro-

tected. Germany is an exception; it recently passed the Network En-

forcement Act, which requires content platforms to remove proscribed

hate speech and fake news, but this has come under considerable crit-

icism as being unworkable and undemocratic.

For the time being, then, we can expect our mental security to

remain under attack, protected mainly by commercial and volunteer

efforts. These efforts include fact- checking sites such as factcheck.org

and snopes. com— but of course other “ fact- checking” sites are spring-

ing up to declare truth as lies and lies as truth.

The major information utilities such as Google and Facebook have

come under extreme pressure in Europe and the United States to “do

something about it.” They are experimenting with ways to flag or rel-

egate false content— using both AI and human screeners— and to

direct users to verified sources that counteract the effects of misinfor-

mation. Ultimately, all such efforts rely on circular reputation sys-

tems, in the sense that sources are trusted because trusted sources

report them to be trustworthy. If enough false information is propa-

gated, these reputation systems can fail: sources that are actually

trustworthy can become untrusted and vice versa, as appears to be

occurring today with major media sources such as CNN and Fox News

in the United States. Aviv Ovadya, a technologist working against mis-

information, has called this the “ infopocalypse— a catastrophic failure

of the marketplace of ideas.”

One way to protect the functioning of reputation systems is to

inject sources that are as close as possible to ground truth. A single

fact that is certainly true can invalidate any number of sources that are

9780525558613_Human_TX.indd 108 8/7/19 11:21 PM

Not

ely,

sense tha

nse tha

em to be truem to be t

se repuse repu

for

sing b

ed sources

all such

all suc

Distribution

mental sec

ental sec

mercial andmercial and

ites such assuch a

ct-ct-

heckinghecking

as truth.s truth.

s such as Guch as G

in Europe ain Europe

e experim

th A

MISUSES OF AI 109

only somewhat trustworthy, if those sources disseminate information

contrary to the known fact. In many countries, notaries function as

sources of ground truth to maintain the integrity of legal and real-

estate information; they are usually disinterested third parties in any

transaction and are licensed by governments or professional societies.

(In the City of London, the Worshipful Company of Scriveners has

been doing this since 1373, suggesting that a certain stability inheres

in the role of truth telling.) If formal standards, professional qualifica-

tions, and licensing procedures emerge for fact- checkers, that would

tend to preserve the validity of the information flows on which we

depend. Organizations such as the W3C Credible Web group and the

Credibility Coalition aim to develop technological and crowdsourcing

methods for evaluating information providers, which would then al-

low users to filter out unreliable sources.

A second way to protect reputation systems is to impose a cost for

purveying false information. Thus, some hotel rating sites accept re-

views concerning a particular hotel only from those who have booked

and paid for a room at that hotel through the site, while other rating

sites accept reviews from anyone. It will come as no surprise that rat-

ings at the former sites are far less biased, because they impose a cost

(paying for an unnecessary hotel room) for fraudulent reviews.

Regu-

latory penalties are more controversial: no one wants a Ministry of

Truth, and Germany’s Network Enforcement Act penalizes only the

content platform, not the person posting the fake news. On the other

hand, just as many nations and many US states make it illegal to record

telephone calls without permission, it ought, at least, to be possible to

impose penalties for creating fictitious audio and video recordings of

real people.

Finally, there are two other facts that work in our favor. First, al-

most no one actively wants, knowingly, to be lied to. (This is not to say

that parents always inquire vigorously into the truthfulness of those

who praise their children’s intelligence and charm; it’s just that they

M 9780525558613_Human_TX.indd 109 8/7/19 11:21 PM

Not

ties are m

s are m

nd Germannd Germa

platformplatform

for

om any

sites are far

ites are fa

ecessary

cessary

Distribution

ws o

Web grou

ical and crocal and cro

ers, which which

on systemsn systems

us, some ho some ho

hotel only hotel onl

hotel thr

one.

110 HUMAN COMPATIBLE

are less likely to seek such approval from someone who is known to

lie at every opportunity.) This means that people of all political per-

suasions have an incentive to adopt tools that help them distinguish

truth from lies. Second, no one wants to be known as a liar, least of all

news outlets. This means that information providers— at least those

for who reputation matters— have an incentive to join industry associ-

ations and subscribe to codes of conduct that favor truth telling. In

turn, social media platforms can offer users the option of seeing con-

tent from only reputable sources that subscribe to these codes and

subject themselves to third- party fact- checking.

Lethal Autonomous Weapons

The United Nations defines lethal autonomous weapons systems

(AWS for short, because LAWS is quite confusing) as weapons sys-

tems that “locate, select, and eliminate human targets without human

intervention.” AWS have been described, with good reason, as the

“third revolution in warfare,” after gunpowder and nuclear weapons.

You may have read articles in the media about AWS; usually the

article will call them killer robots and will be festooned with images

from the Terminator movies. This is misleading in at least two ways:

first, it suggests that autonomous weapons are a threat because they

might take over the world and destroy the human race; second, it sug-

gests that autonomous weapons will be humanoid, conscious, and evil.

The net effect of the media’s portrayal of the issue has been to make

it seem like science fiction. Even the German government has been

taken in: it recently issued a statement

asserting that “having the abil-

ity to learn and develop self- awareness constitutes an indispensable at-

tribute to be used to define individual functions or weapon systems as

autonomous.” (This makes as much sense as asserting that a missile isn’t

a missile unless it goes faster than the speed of light.) In fact, autono-

mous weapons will have the same degree of autonomy as a chess

9780525558613_Human_TX.indd 110 8/7/19 11:21 PM

Not

minator m

nator m

ggests that ggests tha

e over the over t

for

fare,”

re,

ead articles

ad article

killer killer

Distribution

autonomoutonomo

is quite conquite con

liminate huliminate h

een descri

een descr

fter

MISUSES OF AI 111

program, which is given the mission of winning the game but decides by

itself where to move its pieces and which enemy pieces to eliminate.

AWS are not science fiction. They already exist. Probably the clear-

est example is Israel’s Harop (figure 7, left), a loitering munition with a

ten- foot wingspan and a fifty- pound warhead. It searches for up to six

hours in a given geographical region for any target that meets a given

criterion and then destroys it. The criterion could be “emits a radar

signal resembling antiaircraft radar” or “looks like a tank.”

By combining recent advances in miniature quadrotor design, min-

iature cameras, computer vision chips, navigation and mapping algo-

rithms, and methods for detecting and tracking humans, it would be

possible in fairly short order to field an antipersonnel weapon like the

Slaughterbot

shown in figure 7 (right). Such a weapon could be

tasked with attacking anyone meeting certain visual criteria (age, gen-

der, uniform, skin color, and so on) or even specific individuals based

on face recognition. I’m told that the Swiss Defense Department has

already built and tested a real Slaughterbot and found that, as ex-

pected, the technology is both feasible and lethal.

Since 2014, diplomatic discussions have been underway in Geneva

that may lead to a treaty banning AWS. At the same time, some of the

major participants in these discussions (the United States, China,

FIGURE OHIW+DURSORLWHULQJZHDSRQSURGXFHGE\,VUDHO$HURVSDFH,QGXV-

WULHVULJKWVWLOOLPDJHIURPWKHSlaughterbotsYLGHRVKRZLQJDSRVVLEOHGHVLJQ

IRUDQDXWRQRPRXVZHDSRQFRQWDLQLQJDVPDOOH[SORVLYHGULYHQSURMHFWLOH

M 9780525558613_Human_TX.indd 111 8/7/19 11:21 PM

Not

ng r

ras, comp

, comp

nd methodnd metho

n fairly n fairly

for

troys

oys

ntiaircraft r

ntiaircraft

ecent ad

cent ad

Distribution

ing the gamthe gam

h enemy pieenemy pie

y already exlready ex

ure 7, left), e 7, left),

ound waround wa

al region

.Th

SRVVL

YHQSURMHFW

SURMHFW

112 HUMAN COMPATIBLE

Russia, and to some extent Israel and the UK) are engaged in a danger-

ous competition to develop autonomous weapons. In the United States,

for example, the CODE (Collaborative Operations in Denied Environ-

ments) program aims to move towards autonomy by enabling drones to

function with at best intermittent radio contact. The drones will “hunt

in packs, like wolves” according to the program manager.

In 2016, the

US Air Force demonstrated the in- flight deployment of 103 Perdix

micro- drones from three F/ A- 18 fighters. According to the announce-

ment, “Perdix are not pre- programmed synchronized individuals, they

are a collective organism, sharing one distributed brain for decision-

making and adapting to each other like swarms in nature.”

You may think it’s pretty obvious that building machines that can

decide to kill humans is a bad idea. But “pretty obvious” is not always

persuasive to governments— including some of those listed in the pre-

ceding paragraph— who are bent on achieving what they think of as

strategic superiority. A more convincing reason to reject autonomous

weapons is that they are scalable weapons of mass destruction.

Scalable is a term from computer science; a process is scalable if

you can do a million times more of it essentially by buying a million

times more hardware. Thus, Google handles roughly five billion

search requests per day by having not millions of employees but mil-

lions of computers. With autonomous weapons, you can do a million

times more killing by buying a million times more weapons, pre-

cisely because the weapons are autonomous. Unlike remotely piloted

drones or AK- 47s, they don’t need individual human supervision to do

their work.

As weapons of mass destruction, scalable autonomous weapons

have advantages for the attacker compared to nuclear weapons and

carpet bombing: they leave property intact and can be applied selec-

tively to eliminate only those who might threaten an occupying force.

They could certainly be used to wipe out an entire ethnic group or all

the adherents of a particular religion (if adherents have visible indicia).

Moreover, whereas the use of nuclear weapons represents a cataclys-

9780525558613_Human_TX.indd 112 8/7/19 11:21 PM

Not

per

uters. Wi

ters. W

re killing bre killing

use the use the

for

mes m

es m

are. Thus,

re. Thus

day by h

Distribution

in fo

nature.”ure.”

155

ing machinng machin

tty obviousobviou

me of thoseme of those

achieving whieving w

vincing reascing reas

ble weapons ble weapon

computer

re o

MISUSES OF AI 113

mic threshold that we have (often by sheer luck) avoided crossing since

1945, there is no such threshold with scalable autonomous weapons.

Attacks could escalate smoothly from one hundred casualties to one

thousand to ten thousand to one hundred thousand. In addition to

actual attacks, the mere threat of attacks by such weapons makes them

an effective tool for terror and oppression. Autonomous weapons will

greatly reduce human security at all levels: personal, local, national,

and international.

This is not to say that autonomous weapons will be the end of the

world in the way envisaged in the Terminator movies. They need not

be especially intelligent— a self- driving car probably needs to be

smarter— and their missions will not be of the “take over the world”

variety. The existential risk from AI does not come primarily from

simple- minded killer robots. On the other hand, superintelligent ma-

chines in conflict with humanity could certainly arm themselves this

way, by turning relatively stupid killer robots into physical extensions

of a global control system.

Eliminating Work as We Know It

Thousands of media articles and opinion pieces and several books

have been written on the topic of robots taking jobs from humans.

Research centers are springing up all over the world to understand

what is likely to happen.

The titles of Martin Ford’s Rise of the Robots:

Technology and the Threat of a Jobless Future

and Calum Chace’s The

Economic Singularity: Artificial Intelligence and the Death of Capital-

ism

do a pretty good job of summarizing the concern. Although, as

will soon become evident, I am by no means qualified to opine on

what is essentially a matter for economists,

I suspect that the issue is

too important to leave entirely to them.

The issue of technological unemployment was brought to the fore

in a famous article, “Economic Possibilities for Our Grandchildren,” by

M 9780525558613_Human_TX.indd 113 8/7/19 11:21 PM

Not

of media

media

n written on written

centerscenter

for

Work as

Work a

Distribution

. Th

bably nee

ably nee

“take over“take over

not come t come

her hand, suer hand, su

uld certainld certain

killer robotller robot

114 HUMAN COMPATIBLE

John Maynard Keynes. He wrote the article in 1930, when the Great

Depression had created mass unemployment in Britain, but the topic

has a much longer history. Aristotle, in Book I of his Politics, presents

the main point quite clearly:

For if every instrument could accomplish its own work, obeying

or anticipating the will of others... if, in like manner, the shut-

tle would weave and the plectrum touch the lyre without a hand

to guide them, chief workmen would not want servants, nor mas-

ters slaves.

Everyone agrees with Aristotle’s observation that there is an im-

mediate reduction in employment when an employer finds a mechan-

ical method to perform work previously done by a person. The issue is

whether the so- called compensation effects that ensue— and that tend

to increase employment— will eventually make up for this reduction.

The optimists say yes— and in the current debate, they point to all the

new jobs that emerged after previous industrial revolutions. The pes-

simists say no— and in the current debate, they argue that machines

will do all the “new jobs” too. When a machine replaces one’s physical

labor, one can sell mental labor. When a machine replaces one’s men-

tal labor, what does one have left to sell?

In Life 3.0, Max Tegmark depicts the debate as a conversation

between two horses discussing the rise of the internal combustion

engine in 1900. One predicts “new jobs for horses.... That’s what’s

always happened before, like with the invention of the wheel and the

plow.” For most horses, alas, the “new job” was to be pet food.

The debate has persisted for millennia because there are effects in

both directions. The actual outcome depends on which effects matter

more. Consider, for example, what happens to housepainters as tech-

nology improves. For the sake of simplicity, I’ll let the width of the

paintbrush stand for the degree of automation:

9780525558613_Human_TX.indd 114 8/7/19 11:21 PM

Not

ll m

t does on

does on

3.0 3.0

, Max , Max

two hortwo hor

for

the cu

e c

jobs” too. W

obs” too.

mental la

ental la

Distribution

n that thern that ther

employer fiployer f

done by a peone by a p

ffects that ects that

entually matually ma

the currentthe curren

rent

MISUSES OF AI 115

• If the brush is one hair (a tenth of a millimeter) wide, it takes

thousands of person- years to paint a house and essentially no

housepainters are employed.

• With brushes a millimeter wide, perhaps a few delicate murals

are painted in the royal palace by a handful of painters. At one

centimeter, the nobility begin to follow suit.

• At ten centimeters (four inches), we reach the realm of practi-

cality: most homeowners have their houses painted inside and

out, although perhaps not all that frequently, and thousands of

housepainters find jobs.

• Once we get to wide rollers and spray guns— the equivalent of

a paintbrush about a meter wide— the price goes down consid-

erably, but demand may begin to saturate so the number of

housepainters drops somewhat.

• When one person manages a team of one hundred housepaint-

ing robots— the productivity equivalent of a paintbrush one

hundred meters wide—then whole houses can be painted in an

hour and very few housepainters will be working.

Thus, the direct effects of technology work both ways: at first, by

increasing productivity, technology can increase employment by re-

ducing the price of an activity and thereby increasing demand; subse-

quently, further increases in technology mean that fewer and fewer

humans are required. Figure 8 illustrates these developments.

Many technologies exhibit similar curves. If, in some given sector

of the economy, we are to the left of the peak, then improving tech-

nology increases employment in that sector; present- day examples

might include tasks such as graffiti removal, environmental cleanup,

inspection of shipping containers, and housing construction in less de-

veloped countries, all of which might become more economically fea-

sible if we have robots to help us. If we are already to the right of the

peak, then further automation decreases employment. For example,

M 9780525558613_Human_TX.indd 115 8/7/19 11:21 PM

Not

uct

rice of an

ce of an

further incfurther in

are requare requ

for

effects of

ivity, te

vity, te

Distribution

he equiva

e equiva

ce goes dowe goes dow

urate so thte so th

am of one hm of one h

ty equivaleequivale

hen whole hen whol

usepainter

usepainte

116 HUMAN COMPATIBLE

it’s not hard to predict that elevator operators will continue to be

squeezed out. In the long run, we have to expect that most industries

are going to be pushed to the far right on the curve. One recent article,

based on a careful econometric study by economists David Autor and

Anna Salomons, states that “over the last 40 years, jobs have fallen in

every single industry that introduced technologies to enhance

productivity.”

What about the compensation effects described by the economic

optimists?

• Some people have to make the painting robots. How many?

Far fewer than the number of housepainters the robots

replace— otherwise, it would cost more to paint houses with

robots, not less, and no one would buy the robots.

number of

housepainters employed

eective

brush width

0.1mm 1mm 1cm 10cm 1m 10m 100m

FIGURE 8: A notional graph of housepainting employment as painting technol-

ogy improves.

9780525558613_Human_TX.indd 116 8/7/19 11:21 PM

Not

ush

reful econ

ful econ

omons, stateomons, sta

gle indgle ind

for

t that

e long run, w

long run,

ed to the

d to the

Distri

eleva

ibution

but

100

tio

ion

ng employmeemployme

MISUSES OF AI 117

• Housepainting becomes somewhat cheaper, so people call in

the housepainters a bit more often.

• Finally, because we pay less for housepainting, we have more

money to spend on other things, thereby increasing employ-

ment in other sectors.

Economists have tried to measure the size of these effects in vari-

ous industries experiencing increased automation, but the results are

generally inconclusive.

Historically, most mainstream economists have argued from the

“big picture” view: automation increases productivity, so, as a whole,

humans are better off, in the sense that we enjoy more goods and ser-

vices for the same amount of work.

Economic theory does not, unfortunately, predict that each hu-

man will be better off as a result of automation. Generally, automa-

tion increases the share of income going to capital (the owners of

the housepainting robots) and decreases the share going to labor (the

ex- housepainters). The economists Erik Brynjolfsson and Andrew

McAfee, in The Second Machine Age, argue that this has already been

happening for several decades. Data for the United States are shown

in figure 9. They indicate that between 1947 and 1973, wages and

productivity increased together, but after 1973, wages stagnated even

while productivity roughly doubled. Brynjolfsson and McAfee call

this the Great Decoupling. Other leading economists have also sounded

the alarm, including Nobel laureates Robert Shiller, Mike Spence, and

Paul Krugman; Klaus Schwab, head of the World Economic Forum;

and Larry Summers, former chief economist of the World Bank and

Treasury secretary under President Bill Clinton.

Those arguing against the notion of technological unemployment

often point to bank tellers, whose work can be done in part by ATMs,

and retail cashiers, whose work is sped up by barcodes and RFID tags

on merchandise. It is often claimed that these occupations are growing

M 9780525558613_Human_TX.indd 117 8/7/19 11:21 PM

Not

increased

ncrease

oductivity oductivity

Great DeGreat D

for

d Mach

ral decades

ral decade

ndicate

Distribution

argue

vity, so, ty, so,

oy more gooy more go

unately, prenately, pre

f automatioutomatio

me going te going t

d decreasesd decrease

onomists

ne A

118 HUMAN COMPATIBLE

because of technology. Indeed, the number of tellers in the United

States roughly doubled from 1970 to 2010, although it should be noted

that the US population grew by 50 percent and the financial sector by

over 400 percent in the same period,

so it is difficult to attribute all,

or perhaps any, of the employment growth to ATMs. Unfortunately,

between 2010 and 2016 about one hundred thousand tellers lost their

jobs, and the US Bureau of Labor Statistics (BLS) predicts another

forty thousand job losses by 2026: “Online banking and automation

technology are expected to continue replacing more job duties that

tellers traditionally performed.”

The data on retail cashiers are no

more encouraging: the number per capita dropped by 5 percent from

1997 to 2015, and the BLS says, “Advances in technology, such as self-

service checkout stands in retail stores and increasing online sales, will

continue to limit the need for cashiers.” Both sectors appear to be on

the downslope. The same is true of almost all low- skilled occupations

that involve working with machines.

Which occupations are about to decline as new, AI- based technol-

FIGURE (FRQRPLFSURGXFWLRQDQGUHDOPHGLDQZDJHVLQWKH8QLWHG6WDWHV

VLQFH'DWDIURPWKH%XUHDXRI/DERU6WDWLVWLFV

45 55 65 75

Year

Productivity and

average real earnings

Index relative to 1970

100

150

200

250

Real wages of

goods-producing workers

Major sector

productivity

105

9780525558613_Human_TX.indd 118 8/7/19 11:21 PM

of t

and 201

nd 201

the US Buthe US Bu

usand jousand jo

for

grew b

the same p

he empl

e empl

Distributio

, the numb, the num

1970 to 20

1970 to 2

y50

DQZDJHVLQWDQZDJHVLQ

6WDWLVWLFVDWLVWLFV

tion

MISUSES OF AI 119

ogy arrives? The prime example cited in the media is that of driving.

In the United States there are about 3.5 million truck drivers; many of

these jobs would be vulnerable to automation. Amazon, among other

companies, is already using self- driving trucks for freight haulage on

interstate freeways, albeit currently with human backup drivers.

seems very likely that the long- haul part of each truck journey will soon

be autonomous, while humans, for the time being, will handle city

traffic, pickup, and delivery. As a consequence of these expected devel-

opments, very few young people are interested in trucking as a career;

ironically, there is currently a significant shortage of truck drivers in the

Unites States, which is only hastening the onset of automation.

White- collar jobs are also at risk. For example, the BLS projects a

13 percent decline in per- capita employment of insurance underwrit-

ers from 2016 to 2026: “Automated underwriting software allows

workers to process applications more quickly than before, reducing

the need for as many underwriters.” If language technology develops

as expected, many sales and customer service jobs will also be vulner-

able, as well as jobs in the legal profession. (In a 2018 competition, AI

software outscored experienced law professors in analyzing standard

nondisclosure agreements and completed the task two hundred times

faster.

) Routine forms of computer programming— the kind that is

often outsourced today— are also likely to be automated. Indeed, al-

most anything that can be outsourced is a good candidate for automa-

tion, because outsourcing involves decomposing jobs into tasks that

can be parceled up and distributed in a decontextualized form. The

robot process automation industry produces software tools that achieve

exactly this effect for clerical tasks performed online.

As AI progresses, it is certainly possible— perhaps even likely—

that within the next few decades essentially all routine physical and

mental labor will be done more cheaply by machines. Since we ceased

to be hunter- gatherers thousands of years ago, our societies have used

most people as robots, performing repetitive manual and mental tasks,

so it is perhaps not surprising that robots will soon take on these roles.

M 9780525558613_Human_TX.indd 119 8/7/19 11:21 PM

Not

ne f

e f

rced ed

thing that cthing that

ause ouause ou

for

perien

rie

ements and

ements an

orms of

Distribution

uck d

automatio

utomatio

mple, the BLple, the BL

nt of insuraof insur

underwritinnderwritin

re quickly quickly

ers.” If lang.” If lang

ustomer seustomer s

egal profes

egal profe

ed l

120 HUMAN COMPATIBLE

When this happens, it will push wages below the poverty line for

those people who are unable to compete for the highly skilled jobs

that remain. Larry Summers put it this way: “It may well be that,

given the possibilities for substitution [of capital for labor], some cat-

egories of labor will not be able to earn a subsistence income.”

This

is precisely what happened to the horses: mechanical transportation

became cheaper than the upkeep cost of a horse, so horses became pet

food. Faced with the socioeconomic equivalent of becoming pet food,

humans will be rather unhappy with their governments.

Faced with potentially unhappy humans, governments around the

world are beginning to devote some attention to the issue. Most have

already discovered that the idea of retraining everyone as a data scien-

tist or robot engineer is a nonstarter— the world might need five or ten

million of these, but nowhere close to the billion or so jobs that are at

risk. Data science is a very tiny lifeboat for a giant cruise ship.

Some are working on “transition plans”— but transition to what?

We need a plausible destination in order to plan a transition—that is,

we need a plausible picture of a desirable future economy where most

of what we currently call work is done by machines.

One rapidly emerging picture is that of an economy where far

fewer people work because work is unnecessary. Keynes envisaged just

such a future in his essay “Economic Possibilities for Our Grandchil-

dren.” He described the high unemployment afflicting Great Britain

in 1930 as a “temporary phase of maladjustment” caused by an “in-

crease of technical efficiency” that took place “faster than we can deal

with the problem of labour absorption.” He did not, however, imagine

that in the long run— after a century of further technological

advances— there would be a return to full employment:

Thus for the first time since his creation man will be faced with

his real, his permanent problem— how to use his freedom from

pressing economic cares, how to occupy the leisure, which science

9780525558613_Human_TX.indd 120 8/7/19 11:21 PM

Not

rk b

in his es

n his es

described described

s a “tems a “tem

for

ll wor

erging pict

erging pic

ecause w

Distribution

ents

he issue. M

issue. M

eryone as a ryone as a

orld might nmight n

e billion or billion or

at for a gianfor a gia

ion plans”—n plans”—

n in order tn in order

of a desirab

of a desira

is d

MISUSES OF AI 121

and compound interest will have won for him, to live wisely and

agreeably and well.

Such a future requires a radical change in our economic system,

because, in many countries, those who do not work face poverty or

destitution. Thus, modern proponents of Keynes’s vision usually sup-

port some form of universal basic income

, or UBI. Funded by value-

added taxes or by taxes on income from capital, UBI would provide a

reasonable income to every adult, regardless of circumstance. Those

who aspire to a higher standard of living can still work without losing

the UBI, while those who do not can spend their time as they see fit.

Perhaps surprisingly, UBI has support across the political spectrum,

ranging from the Adam Smith Institute

to the Green Party.

For some, UBI represents a version of paradise.

For others, it rep-

resents an admission of failure— an assertion that most people will

have nothing of economic value to contribute to society. They can be

fed and housed—mostly by machines—but otherwise left to their

own devices. The truth, as always, lies somewhere in between, and it

depends largely on how one views human psychology. Keynes, in his

essay, made a clear distinction between those who strive and those

who enjoy—those “purposive” people for whom “jam is not jam unless

it is a case of jam to- morrow and never jam to- day” and those “delight-

ful” people who are “capable of taking direct enjoyment in things.”

The UBI proposal assumes that the great majority of people are of the

delightful variety.

Keynes suggests that striving is one of the “habits and instincts of

the ordinary man, bred into him for countless generations” rather than

one of the “real values of life.” He predicts that this instinct will grad-

ually disappear. Against this view, one may suggest that striving is in-

trinsic to what it means to be truly human. Rather than striving and

enjoying being mutually exclusive, they are often inseparable: true

enjoyment and lasting fulfillment come from having a purpose and

M 9780525558613_Human_TX.indd 121 8/7/19 11:21 PM

Not

jam tom to

ple who areple who ar

proposapropos

for

w one

r distinctio

distincti

purposi

Distribution

k wi

time as th

me as th

the politicahe politica

o the Greene Gree

f paradise.paradise.

assertion ssertion

to contributcontribu

achines—achines

always, lie

always, li

view

122 HUMAN COMPATIBLE

achieving it (or at least trying), usually in the face of obstacles, rather

than from passive consumption of immediate pleasure. There is a dif-

ference between climbing Everest and being deposited on top by

helicopter.

The connection between striving and enjoying is a central theme

for our understanding of how to fashion a desirable future. Perhaps

future generations will wonder why we ever worried about such a fu-

tile thing as “work.” Just in case that change in attitudes is slow in

coming, let’s consider the economic implications of the view that most

people will be better off with something useful to do, even though the

great majority of goods and services will be produced by machines

with very little human supervision. Inevitably, most people will be

engaged in supplying interpersonal services that can be provided— or

which we prefer to be provided— only by humans. That is, if we can no

longer supply routine physical labor and routine mental labor, we can

still supply our humanity. We will need to become good at being

human.

Current professions of this kind include psychotherapists, execu-

tive coaches, tutors, counselors, companions, and those who care for

children and the elderly. The phrase caring professions is often used in

this context, but that is misleading: it has a positive connotation for

those providing care, to be sure, but a negative connotation of depen-

dency and helplessness for the recipients of care. But consider this

observation, again from Keynes:

It will be those peoples, who can keep alive, and cultivate into

a fuller perfection, the art of life itself and do not sell themselves

for the means of life, who will be able to enjoy the abundance

when it comes.

All of us need help in learning “the art of life itself.” This is not a mat-

ter of dependency but of growth. The capacity to inspire others and

to confer the ability to appreciate and to create— be it in art, music,

9780525558613_Human_TX.indd 122 8/7/19 11:21 PM

Not

t th

ng care, t

care, t

d helplessned helplessn

n, againn, again

for

unselo

sel

erly. The p

erly. The

at is mi

Distribution

ven

uced by m

ced by m

most peopmost peo

that can bet can b

humans. Thhumans. Th

nd routine d routine

will need tol need to

his kind in

his kind i

s, co

MISUSES OF AI 123

literature, conversation, gardening, architecture, food, wine, or video

games— is likely to be more needed than ever.

The next question is income distribution. In most countries, this

has been moving in the wrong direction for several decades. It’s a

complex issue, but one thing is clear: high incomes and high social

standing usually follow from providing high added value. The profes-

sion of childcare, to pick one example, is associated with low incomes

and low social standing. This is, in part, a consequence of the fact that

we don’t really know how to do it. Some practitioners are naturally

good at it, but many are not. Contrast this with, say, orthopedic sur-

gery. We wouldn’t just hire bored teenagers who need a bit of spare

cash and put them to work as orthopedic surgeons at five dollars an

hour plus all they can eat from the fridge. We have put centuries of

research into understanding the human body and how to fix it when

it’s broken, and practitioners must undergo years of training to learn

all this knowledge and the skills necessary to apply it. As a result, or-

thopedic surgeons are highly paid and highly respected. They are

highly paid not just because they know a lot and have a lot of training

but also because all that knowledge and training actually works. It en-

ables them to add a great deal of value to other people’s lives— especially

people with broken bits.

Unfortunately, our scientific understanding of the mind is shock-

ingly weak and our scientific understanding of happiness and fulfill-

ment is even weaker. We simply don’t know how to add value to each

other’s lives in consistent, predictable ways. We have had moderate

success with certain psychiatric disorders, but we are still fighting a

Hundred Years’ Literacy War over something as basic as teaching chil-

dren to read.

We need a radical rethinking of our educational system

and our scientific enterprise to focus more attention on the human

rather than the physical world. (Joseph Aoun, president of Northeast-

ern University, argues that universities should be teaching and study-

ing “humanics.”

) It sounds odd to say that happiness should be an

engineering discipline, but that seems to be the inevitable conclusion.

M 9780525558613_Human_TX.indd 123 8/7/19 11:21 PM

Not

ken

nately, ou

tely, ou

ak and our ak and our

ven weaven we

for

at know

great deal o

great deal

bits.

Distribution

orth

need a bi

eed a bi

geons at fiveons at fiv

We have pe have

body and body and

undergo yeadergo ye

necessary tcessary t

y paid andy paid an

e they kno

wledg

124 HUMAN COMPATIBLE

Such a discipline would build on basic science— a better understand-

ing of how human minds work at the cognitive and emotional levels—

and would train a wide variety of practitioners, ranging from life

architects, who help individuals plan the overall shape of their life

trajectories, to professional experts in topics such as curiosity en-

hancement and personal resilience. If based on real science, these

professions need be no more woo- woo than bridge designers and or-

thopedic surgeons are today.

Reworking our education and research institutions to create this

basic science and to convert it into training programs and credentialed

professions will take decades, so it’s a good idea to start now and a pity

we didn’t start long ago. The final result— if it works— would be a

world well worth living in. Without such a rethinking, we risk an un-

sustainable level of socioeconomic dislocation.

Usurping Other Human Roles

We should think twice before allowing machines to take over roles

involving interpersonal services. If being human is our main selling

point to other humans, so to speak, then making imitation humans

seems like a bad idea. Fortunately for us, we have a distinct advantage

over machines when it comes to knowing how other humans feel and

how they will react. Nearly every human knows what it’s like to hit

one’s thumb with a hammer or to feel unrequited love.

Counteracting this natural human advantage is a natural human

disadvantage: the tendency to be fooled by appearances— especially

human appearances. Alan Turing warned against making robots re-

semble humans:

I certainly hope and believe that no great efforts will be put into

making machines with the most distinctively human, but non-

intellectual, characteristics such as the shape of the human body;

9780525558613_Human_TX.indd 124 8/7/19 11:21 PM

Not

bad idea. F

d idea. F

hines when hines when

will reawill rea

for

befor

efo

nal service

nal servic

ans, so t

ns, so

Distribution

nd cr

tart now a

rt now a

orks—orks—

ethinking, inking,

ation.tion.

man Rolman Rol

allo

MISUSES OF AI 125

it appears to me quite futile to make such attempts and their re-

sults would have something like the unpleasant quality of artificial

flowers.

Unfortunately, Turing’s warning has gone unheeded. Several research

groups have produced eerily lifelike robots, as shown in figure 10.

As research tools, the robots may provide insights into how hu-

mans interpret robot behavior and communication. As prototypes for

future commercial products, they represent a form of dishonesty.

They bypass our conscious awareness and appeal directly to our emo-

tional selves, perhaps convincing us that they are endowed with real

intelligence. Imagine, for example, how much easier it would be to

switch off and recycle a squat, gray box that was malfunctioning—

even if it was squawking about not wanting to be switched off— than

it would be to do the same for JiaJia or Geminoid DK. Imagine also

how confusing and perhaps psychologically disturbing it would be for

babies and small children to be cared for by entities that appear to be

human, like their parents, but are somehow not; that appear to care

about them, like their parents, but in fact do not.

FIGURE OHIW-LD-LDDURERWEXLOWDWWKH8QLYHUVLW\RI6FLHQFHDQG7HFKQRO-

RJ\RI&KLQDULJKW*HPLQRLG'.DURERWGHVLJQHGE\+LURVKL,VKLJXURDW

2VDND8QLYHUVLW\LQ-DSDQDQGPRGHOHGRQ+HQULN6FKlUIHRI$DOERUJ8QLYHU-

sity in Denmark.

M 9780525558613_Human_TX.indd 125 8/7/19 11:21 PM

for

parent

ren

Distribution

ctly

endowed

ndowed

h easier it weasier it w

hat was was

ting to be sing to be s

a or Geminor Gemi

hologically logically

be cared forbe cared fo

but are so

,bu

126 HUMAN COMPATIBLE

Beyond a basic capability to convey nonverbal information via fa-

cial expression and movement— which even Bugs Bunny manages to do

with ease— there is no good reason for robots to have humanoid

form.

There are also good, practical reasons not to have humanoid form—

for

example, our bipedal stance is relatively unstable compared to qua-

drupedal locomotion. Dogs, cats, and horses fit into our lives well, and

their physical form is a very good clue as to how they are likely to

behave. (Imagine if a horse suddenly started behaving like a dog!)

The same should be true of robots. Perhaps a four- legged, two- armed,

centaur- like morphology would be a good standard. An accurately hu-

manoid robot makes as much sense as a Ferrari with a top speed of five

miles per hour or a “raspberry” ice- cream cone made from beetroot-

tinted cream of chopped liver.

The humanoid aspect of some robots has already contributed to

political as well as emotional confusion. On October 25, 2017, Saudi

Arabia granted citizenship to Sophia, a humanoid robot that has been

described as little more than “a chatbot with a face”

and worse.

Perhaps this was a public relations stunt, but a proposal emanating

from the European Parliament’s Committee on Legal Affairs is en-

tirely serious.

It recommends

creating a specific legal status for robots in the long run, so that at

least the most sophisticated autonomous robots could be estab-

lished as having the status of electronic persons responsible for

making good any damage they may cause.

In other words, the robot itself would be legally responsible for damage,

rather than the owner or manufacturer. This implies that robots will

own financial assets and be subject to sanctions if they do not comply.

Taken literally, this does not make sense. For example, if we were to

imprison the robot for nonpayment, why would it care?

In addition to the needless and even absurd elevation of the status

of robots, there is a danger that the increased use of machines in

9780525558613_Human_TX.indd 126 8/7/19 11:21 PM

Not

specific le

ecific le

he most sophe most so

as havinas havin

for

rliamen

commends

commend

Distribution

n acc

a top spee

top spee

made frommade from

s has alrea has alrea

on. On Oct. On Oc

hia, a humaa, a huma

“a chatbot chatbo

elations st

elations s

t’s C

MISUSES OF AI 127

decisions affecting people will degrade the status and dignity of hu-

mans. This possibility is illustrated perfectly in a scene from the

science-fiction movie Elysium, when Max (Matt Damon) pleads his

case before his “parole officer” (figure 11) to explain why the exten-

sion of his sentence is unjustified. Needless to say, Max is unsuccess-

ful. The parole officer even chides him for failing to display a suitably

deferential attitude.

One can think of such an assault on human dignity in two ways.

The first is obvious: by giving machines authority over humans, we

relegate ourselves to a second- class status and lose the right to partic-

ipate in decisions that affect us. (A more extreme form of this is giving

machines the authority to kill humans, as discussed earlier in the

chapter.) The second is indirect: even if you believe it is not the ma-

chines making the decision but

those humans who designed and com-

missioned the machines, the fact that those human designers and

commissioners do not consider it worthwhile to weigh the individual

circumstances of each human subject in such cases suggests that they

attach little value to the lives of others. This is perhaps a symptom of

the beginning of a great separation between an elite served by humans

and a vast underclass served, and controlled, by machines.

In the EU, Article 22 of the 2018 General Data Protection Regu-

lation, or GDPR, explicitly forbids the granting of authority to ma-

chines in such cases:

FIGURE 11: Max (Matt Da-

mon) meets his parole offi-

cer in Elysium.

M 9780525558613_Human_TX.indd 127 8/7/19 11:21 PM

ns th

s t

he author

author

The seconThe secon

aking thaking th

for

by givi

giv

o a

econd-

econd

hat affec

at affec

Distribution

and dign

nd dign

y in a scenin a scen

(Matt Damatt Da

11) to expla1) to expla

Needless to edless to

es him for fhim for f

an assault

128 HUMAN COMPATIBLE

The data subject shall have the right not to be subject to a decision

based solely on automated processing, including profiling, which

produces legal effects concerning him or her or similarly signifi-

cantly affects him or her.

Although this sounds admirable in principle, it remains to be seen— at

least at the time of writing— how much impact this will have in prac-

tice. It is often so much easier, faster, and cheaper to leave the deci-

sions to the machine.

One reason for all the concern about automated decisions is the po-

tential for algorithmic bias— the tendency of machine learning algo-

rithms to produce inappropriately biased decisions about loans, housing,

jobs, insurance, parole, sentencing, college admission, and so on. The

explicit use of criteria such as race in these decisions has been illegal for

decades in many countries and is prohibited by Article 9 of the GDPR

for a very wide range of applications. That does not mean, of course,

that by excluding race from the data we necessarily get racially unbi-

ased decisions. For example, beginning in the 1930s, the government-

sanctioned practice of redlining caused certain zip codes in the United

States to be off- limits for mortgage lending and other forms of invest-

ment, leading to declining real-estate values. It just so happened that

those zip codes were largely populated by African Americans.

To prevent redlining, now only the first three digits of the five- digit

zip code can be used in making credit decisions. In addition, the deci-

sion process must be amenable to inspection, to ensure no other “acci-

dental” biases are creeping in. The EU’s GDPR is often said to provide

a general “right to an explanation” for any automated decision,

but

the actual language of Article 14 merely requires

meaningful information about the logic involved, as well as the

significance and the envisaged consequences of such processing for

the data subject.

9780525558613_Human_TX.indd 128 8/7/19 11:21 PM

es were la

were la

vent redlinivent redlin

an be uan be u

for

edlinin

ini

ts for mortg

ts for mor

clining

lining

Distribution

ision

hine learni

ne learni

ns about loans about loan

dmission, aission,

e decisions hdecisions

hibited by Aited by A

ons. That ds. That d

he data we ne data we

beginning

cau

MISUSES OF AI 129

At present, it is unknown how courts will enforce this clause. It’s pos-

sible that the hapless consumer will just be handed a description of the

particular deep learning algorithm used to train the classifier that

made the decision.

Nowadays, the likely causes of algorithmic bias lie in the data

rather than in the deliberate malfeasance of corporations. In 2015,

Glamour magazine reported a disappointing finding: “The first female

Google image search result for ‘CEO’ appears TWELVE rows down—

and it’s Barbie.” (There were some actual women in the 2018 results,

but most of them were models portraying CEOs in generic stock pho-

tos, rather than actual female CEOs; the 2019 results are somewhat

better.) This is a consequence not of deliberate gender bias in Google’s

image search ranking but of preexisting bias in the culture that pro-

duces the data: there are far more male than female CEOs, and when

people want to depict an “archetypal” CEO in a captioned image, they

almost always pick a male figure. The fact that the bias lies primarily

in the data does not, of course, mean that there is no obligation to take

steps to counteract the problem.

There are other, more technical reasons why the naïve applica-

tion of machine learning methods can produce biased outcomes.

For example, minorities are, by definition, less well represented in

population- wide data samples; hence, predictions for individual mem-

bers of minorities may be less accurate if such predictions are made

largely on the basis of data from other members of the same group.

Fortunately, a good deal of attention has been paid to the problem of

removing inadvertent bias from machine learning algorithms, and

there are now methods that produce unbiased results according to

several plausible and desirable definitions of fairness.

The mathe-

matical analysis of these definitions of fairness shows that they cannot

be achieved simultaneously and that, when enforced, they result in

lower prediction accuracy and, in the case of lending decisions, lower

profit for the lender. This is perhaps disappointing, but at least it

M 9780525558613_Human_TX.indd 129 8/7/19 11:21 PM

Not

min

ide data

de data

minorities mminorities

n the ban the b

for

more t

earning me

earning m

orities a

rities a

Distribution

neric

sults are s

ults are s

gender biasgender bias

as in the cun the c

than femalthan femal

” CEO in aCEO in a

. The fact tThe fact t

, mean that, mean th

blem.

chni

130 HUMAN COMPATIBLE

makes clear the trade- offs involved in avoiding algorithmic bias. One

hopes that awareness of these methods and of the issue itself will

spread quickly among policy makers, practitioners, and users.

If handing authority over individual humans to machines is some-

times problematic, what about authority over lots of humans? That is,

should we put machines in political and management roles? At present

this may seem far- fetched. Machines cannot sustain an extended con-

versation and lack the basic understanding of the factors that are rele-

vant to making decisions with broad scope, such as whether to raise

the minimum wage or to reject a merger proposal from another cor-

poration. The trend, however, is clear: machines are making decisions

at higher and higher levels of authority in many areas. Take airlines,

for example. First, computers helped in the construction of flight

schedules. Soon, they took over allocation of flight crews, the booking

of seats, and the management of routine maintenance. Next, they

were connected to global information networks to provide real- time

status reports to airline managers, so that managers could cope with

disruption effectively. Now they are taking over the job of managing

disruption: rerouting planes, rescheduling staff, rebooking passengers,

and revising maintenance schedules.

This is all to the good from the point of view of airline economics

and passenger experience. The question is whether the computer sys-

tem remains a tool of humans, or humans become tools of the com-

puter system— supplying information and fixing bugs when necessary,

but no longer understanding in any depth how the whole thing is

working. The answer becomes clear when the system goes down and

global chaos ensues until it can be brought back online. For example,

a single “computer glitch” on April 3, 2018, caused fifteen thousand

flights in Europe to be significantly delayed or canceled.

When trad-

ing algorithms caused the 2010 “flash crash” on the New York Stock

Exchange, wiping out $1 trillion in a few minutes, the only solution

was to shut down the exchange. What happened is still not well

understood.

9780525558613_Human_TX.indd 130 8/7/19 11:21 PM

Not

the

r experien

xperien

ins a tool oins a tool

m—m—

for

anes, r

es,

nance sched

nance sche

good fr

Distribution

m an

e making d

making d

y areas. Tay areas. Ta

he construcconstru

n of flight cof flight c

utine mainine main

ation netwoon netwo

gers, so thagers, so th

they are t

sche

MISUSES OF AI 131

Before there was any technology, human beings lived, like most

animals, hand to mouth. We stood directly on the ground, so to speak.

Technology gradually raised us up on a pyramid of machinery, increas-

ing our footprint as individuals and as a species. There are different

ways we can design the relationship between humans and machines.

If we design it so that humans retain sufficient understanding, author-

ity, and autonomy, the technological parts of the system can greatly

magnify human capabilities, allowing each of us to stand on a vast

pyramid of capabilities— a demigod, if you like. But consider the

worker in an online- shopping fulfillment warehouse. She is more pro-

ductive than her predecessors because she has a small army of robots

bringing her storage bins to pick items from; but she is a part of a

larger system controlled by intelligent algorithms that decide where

she should stand and which items she should pick and dispatch. She is

already partly buried in the pyramid, not standing on top of it. It’s

only a matter of time before the sand fills the spaces in the pyramid

and her role is eliminated.

M 9780525558613_Human_TX.indd 131 8/7/19 11:21 PM

Not

for

Distribution

She i

mall army

all army

but she isbut she is

orithms thahms th

hould pick aould pick a

id, not stannot stan

sand fills tand fills t

OVERLY INTELLIGENT AI

The Gorilla Problem

It doesn’t require much imagination to see that making something

smarter than yourself could be a bad idea. We understand that our

control over our environment and over other species is a result of our

intelligence, so the thought of something else being more intelligent

than us— whether it’s a robot or an alien— immediately induces a

queasy feeling.

Around ten million years ago, the ancestors of the modern gorilla

created (accidentally, to be sure) the genetic lineage leading to modern

humans. How do the gorillas feel about this? Clearly, if they were able

to tell us about their species’ current situation vis- à- vis humans, the

consensus opinion would be very negative indeed. Their species has

essentially no future beyond that which we deign to allow. We do not

want to be in a similar situation vis- à- vis superintelligent machines.

I’ll call this the gorilla problem— specifically, the problem of whether

humans can maintain their supremacy and autonomy in a world that

includes machines with substantially greater intelligence.

Charles Babbage and Ada Lovelace, who designed and wrote pro-

9780525558613_Human_TX.indd 132 8/7/19 11:21 PM

Not

d ten milliod ten mill

ccidentaccidenta

for

nment

men

thought of

thought o

t’s a ro

Distribution

ination to ination to

be a bad

and

OVERLY INTELLIGENT AI 133

grams for the Analytical Engine in 1842, were aware of its potential

but seemed to have no qualms about it.

In 1847, however, Richard

Thornton, editor of the Primitive Expounder, a religious journal, railed

against mechanical calculators:

Mind... outruns itself and does away with the necessity of its

own existence by inventing machines to do its own thinking....

But who knows that such machines when brought to greater per-

fection, may not think of a plan to remedy all their own defects

and then grind out ideas beyond the ken of mortal mind!

This is perhaps the first speculation concerning existential risk from

computing devices, but it remained in obscurity.

In contrast, Samuel Butler’s novel Erewhon, published in 1872, de-

veloped the theme in far greater depth and achieved immediate suc-

cess. Erewhon is a country in which all mechanical devices have been

banned after a terrible civil war between the machinists and anti-

machinists. One part of the book, called “The Book of the Machines,”

explains the origins of this war and presents the arguments of both

sides.

It is eerily prescient of the debate that has re- emerged in the

early years of the twenty- first century.

The anti- machinists’ main argument is that machines will advance

to the point where humanity loses control:

Are we not ourselves creating our successors in the supremacy of

the earth? Daily adding to the beauty and delicacy of their organi-

zation, daily giving them greater skill and supplying more and

more of that self- regulating self- acting power which will be better

than any intellect?... In the course of ages we shall find ourselves

the inferior race....

We must choose between the alternative of undergoing much

present suffering, or seeing ourselves gradually superseded by our

own creatures, till we rank no higher in comparison with them,

M 9780525558613_Human_TX.indd 133 8/7/19 11:21 PM

Not

achinis

int where hint where

for

f this w

his

prescient of

rescient o

wenty-wenty-

Distribution

mind!

ng existentig existenti

urity.ty.

ewhonewhon

, pub, pub

pth and achh and ac

ich all mech all mec

war betwear betw

e book, cal

book, ca

war a

134 HUMAN COMPATIBLE

than the beasts of the field with ourselves.... Our bondage will

steal upon us noiselessly and by imperceptible approaches.

The narrator also relates the pro- machinists’ principal counter-

argument, which anticipates the man– machine symbiosis argument

that we will explore in the next chapter:

There was only one serious attempt to answer it. Its author said

that machines were to be regarded as a part of man’s own physical

nature, being really nothing but extra- corporeal limbs.

Although the anti- machinists in Erewhon win the argument, Butler

himself appears to be of two minds. On the one hand, he complains

that “Erewhonians are. .. quick to offer up common sense at the

shrine of logic, when a philosopher arises among them, who carries

them away through his reputation for especial learning” and says,

“They cut their throats in the matter of machinery.” On the other

hand, the Erewhonian society he describes is remarkably harmonious,

productive, and even idyllic. The Erewhonians fully accept the folly

of re- embarking on the course of mechanical invention, and regard

those remnants of machinery kept in museums “with the feelings of

an English antiquarian concerning Druidical monuments or flint ar-

row heads.”

Butler’s story was evidently known to Alan Turing, who consid-

ered the long- term future of AI in a lecture given in Manchester

in 1951:

It seems probable that once the machine thinking method had

started, it would not take long to outstrip our feeble powers. There

would be no question of the machines dying, and they would

be able to converse with each other to sharpen their wits. At some

stage therefore we should have to expect the machines to take

control, in the way that is mentioned in Samuel Butler’s Erewhon.

9780525558613_Human_TX.indd 134 8/7/19 11:21 PM

Not

of m

ntiquarian

quarian

s story s story

for

dyllic.

lic

the course

the cours

machiner

machine

Distribution

the argumthe argum

e one hand,ne hand

fer up comer up com

arises amonses amo

on for espefor espe

e matter ofe matter

y he descr

y he desc

The E

The

OVERLY INTELLIGENT AI 135

In the same year, Turing repeated these concerns in a radio lecture

broadcast throughout the UK on the BBC Third Programme:

If a machine can think, it might think more intelligently than we

do, and then where should we be? Even if we could keep the ma-

chines in a subservient position, for instance by turning off the

power at strategic moments, we should, as a species, feel greatly

humbled.... This new danger... is certainly something which

can give us anxiety.

When the Erewhonian anti- machinists “feel seriously uneasy about

the future,” they see it as their “duty to check the evil while we can

still do so,” and they destroy all the machines. Turing’s response to the

“new danger” and “anxiety” is to consider “turning off the power” (al-

though it will be clear shortly that this is not really an option). In

Frank Herbert’s classic science- fiction novel Dune, set in the far fu-

ture, humanity has barely survived the Butlerian Jihad, a cataclysmic

war with the “thinking machines.” A new commandment has emerged:

“Thou shalt not make a machine in the likeness of a human mind.” This

commandment precludes computing devices of any kind.

All these drastic responses reflect the inchoate fears that machine

intelligence evokes. Yes, the prospect of superintelligent machines does

make one uneasy. Yes, it is logically possible that such machines could

take over the world and subjugate or eliminate the human race. If that

is all one has to go on, then indeed the only plausible response available

to us, at the present time, is to attempt to curtail artificial intelligence

research— specifically, to ban the development and deployment of

general- purpose, human- level AI systems.

Like most other AI researchers, I recoil at this prospect. How dare

anyone tell me what I can and cannot think about? Anyone proposing

an end to AI research is going to have to do a lot of convincing. Ending

AI research would mean forgoing not just one of the principal avenues

for understanding how human intelligence works but also a golden

M 9780525558613_Human_TX.indd 135 8/7/19 11:21 PM

Not

asti

evokes. Ye

okes. Ye

e uneasy. Yee uneasy. Y

the wothe wo

for

a mach

mac

cludes com

c respon

respon

Distribution

riously une

usly une

k the evil wthe evil w

es. Turing’s Turing’s

er “turning r “turning

this is nothis is no

ction noveon nove

rvived the Brvived the

hines.” A n

hines.” A

ne in

136 HUMAN COMPATIBLE

opportunity to improve the human condition— to make a far better

civilization. The economic value of human- level AI is measurable in

the thousands of trillions of dollars, so the momentum behind AI re-

search from corporations and governments is likely to be enormous. It

will overwhelm the vague objections of a philosopher, no matter how

great his or her “reputation for especial learning,” as Butler puts it.

A second drawback to the idea of banning general- purpose AI is that

it’s a difficult thing to ban. Progress on general- purpose AI occurs pri-

marily on the whiteboards of research labs around the world, as mathe-

matical problems are posed and solved. We don’t know in advance

which ideas and equations to ban, and, even if we did, it doesn’t seem

reasonable to expect that such a ban could be enforceable or effective.

To compound the difficulty still further, researchers making prog-

ress on general- purpose AI are often working on something else. As

I have already argued, research on tool AI— those specific, innocu-

ous applications such as game playing, medical diagnosis, and travel

planning— often leads to progress on general- purpose techniques that

are applicable to a wide range of other problems and move us closer to

human- level AI.

For these reasons, it’s very unlikely that the AI community— or

the governments and corporations that control the laws and research

budgets— will respond to the gorilla problem by ending progress in

AI. If the gorilla problem can be solved only in this way, it isn’t going

to be solved.

The only approach that seems likely to work is to understand why

it is that making better AI might be a bad thing. It turns out that we

have known the answer for thousands of years.

7KH.LQJ0LGDV3UREOHP

Norbert Wiener, whom we met in Chapter 1, had a profound impact

on many fields, including artificial intelligence, cognitive science, and

9780525558613_Human_TX.indd 136 8/7/19 11:21 PM

Not

l respond

espond

gorilla probgorilla pro

ed.ed.

for

s, it’s very

d corpo

Distribution

now

did, it does

d, it does

forceable ororceable or

researchersearcher

orking on sorking on so

ool

I—I—

aying, meding, medi

ess on ess on

eneen

e of other

OVERLY INTELLIGENT AI 137

control theory. Unlike most of his contemporaries, he was particularly

concerned with the unpredictability of complex systems operating in

the real world. (He wrote his first paper on this topic at the age of

ten.) He became convinced that the overconfidence of scientists and

engineers in their ability to control their creations, whether military

or civilian, could have disastrous consequences.

In 1950, Wiener published The Human Use of Human Beings,

whose front- cover blurb reads, “The ‘mechanical brain’ and similar

machines can destroy human values or enable us to realize them as

never before.”

He gradually refined his ideas over time and by 1960

had identified one core issue: the impossibility of defining true human

purposes correctly and completely. This, in turn, means that what I

have called the standard model— whereby humans attempt to imbue

machines with their own purposes— is destined to fail.

We might call this the King Midas problem: Midas, a legendary

king in ancient Greek mythology, got exactly what he asked for—

namely, that everything he touched should turn to gold. Too late, he

discovered that this included his food, his drink, and his family mem-

bers, and he died in misery and starvation. The same theme is ubiqui-

tous in human mythology. Wiener cites Goethe’s tale of the sorcerer’s

apprentice, who instructs the broom to fetch water— but doesn’t say

how much water and doesn’t know how to make the broom stop.

A technical way of saying this is that we may suffer from a failure

value alignment— we may, perhaps inadvertently, imbue machines

with objectives that are imperfectly aligned with our own. Until re-

cently, we were shielded from the potentially catastrophic conse-

quences by the limited capabilities of intelligent machines and the

limited scope that they have to affect the world. (Indeed, most AI

work was done with toy problems in research labs.) As Norbert Wie-

ner put it in his 1964 book God and Golem,

In the past, a partial and inadequate view of human purpose has

been relatively innocuous only because it has been accompanied by

M 9780525558613_Human_TX.indd 137 8/7/19 11:21 PM

Not

o in

water and

ter and

hnical way hnical way

alignmenalignme

for

isery a

hology. Wi

hology. W

structs t

tructs t

Distribution

me a

defining tru

fining tru

urn, means rn, means

humans attmans at

destined todestined to

das problemas proble

gy, got exagot exa

ouched shououched sh

d his food

dst

138 HUMAN COMPATIBLE

technical limitations.... This is only one of the many places where

human impotence has shielded us from the full destructive impact

of human folly.

Unfortunately, this period of shielding is rapidly coming to an end.

We have already seen how content- selection algorithms on social

media wrought havoc on society in the name of maximizing ad reve-

nues. In case you are thinking to yourself that ad revenue maximiza-

tion was already an ignoble goal that should never have been pursued,

let’s suppose instead that we ask some future superintelligent system

to pursue the noble goal of finding a cure for cancer— ideally as quickly

as possible, because someone dies from cancer every 3.5 seconds.

Within hours, the AI system has read the entire biomedical literature

and hypothesized millions of potentially effective but previously un-

tested chemical compounds. Within weeks, it has induced multiple

tumors of different kinds in every living human being so as to carry

out medical trials of these compounds, this being the fastest way to

find a cure. Oops.

If you prefer solving environmental problems, you might ask the

machine to counter the rapid acidification of the oceans that results

from higher carbon dioxide levels. The machine develops a new cata-

lyst that facilitates an incredibly rapid chemical reaction between

ocean and atmosphere and restores the oceans’ pH levels. Unfortu-

nately, a quarter of the oxygen in the atmosphere is used up in the

process, leaving us to asphyxiate slowly and painfully. Oops.

These kinds of world- ending scenarios are unsubtle— as one might

expect, perhaps, for world- ending scenarios. But there are many sce-

narios in which a kind of mental asphyxiation “steals upon us noiselessly

and by imperceptible approaches.” The prologue to Max Tegmark’s

Life 3.0 describes in some detail a scenario in which a superintelligent

machine gradually assumes economic and political control over the

entire world while remaining essentially undetected. The Internet and

the global-scale machines that it supports—the ones that already

9780525558613_Human_TX.indd 138 8/7/19 11:21 PM

litates an

ates an

d atmosphed atmosph

quarter quarter

for

g envir

the rapid a

the rapid

dioxide

Distribution

ellig

—

deally a

er every 3.r every 3.

ntire biomed biome

effective beffective b

weeks, it hweeks, it

y living humiving hum

mpounds, tmpounds,

onm

OVERLY INTELLIGENT AI 139

interact with billions of “users” on a daily basis— provide the perfect

medium for the growth of machine control over humans.

I don’t expect that the purpose put into such machines will be of

the “take over the world” variety. It is more likely to be profit maximi-

zation or engagement maximization or, perhaps, even an apparently

benign goal such as achieving higher scores on regular user happiness

surveys or reducing our energy usage. Now, if we think of ourselves as

entities whose actions are expected to achieve our objectives, there

are two ways to change our behavior. The first is the old- fashioned

way: leave our expectations and objectives unchanged, but change our

circumstances—for example, by offering money, pointing a gun at us,

or starving us into submission. That tends to be expensive and diffi-

cult for a computer to do. The second way is to change our expecta-

tions and objectives. This is much easier for a machine. It is in contact

with you for hours every day, controls your access to information, and

provides much of your entertainment through games, TV, movies, and

social interaction.

The reinforcement learning algorithms that optimize social- media

click- through have no capacity to reason about human behavior— in

fact, they do not even know in any meaningful sense that humans

exist. For machines with much greater understanding of human psy-

chology, beliefs, and motivations, it should be relatively easy to gradu-

ally guide us in directions that increase the degree of satisfaction of

the machine’s objectives. For example, it might reduce our energy

consumption by persuading us to have fewer children, eventually—

and inadvertently— achieving the dreams of anti- natalist philosophers

who wish to eliminate the noxious impact of humanity on the natu-

ral world.

With a bit of practice, you can learn to identify ways in which the

achievement of more or less any fixed objective can result in arbi-

trarily bad outcomes. One of the most common patterns involves

omitting something from the objective that you do actually care

about. In such cases— as in the examples given above— the AI system

M 9780525558613_Human_TX.indd 139 8/7/19 11:21 PM

Not

ine

iefs, and m

s, and m

e us in diree us in dir

hine’s ohine’s o

for

capac

apa

even know

ven know

s with m

with m

Distribution

, but

pointing a

inting a

be expensive expensiv

is to changto chan

r for a machfor a mach

ls your acceyour acc

ment througnt throug

ning algori

ning algor

ty to

140 HUMAN COMPATIBLE

will often find an optimal solution that sets the thing you do care

about, but forgot to mention, to an extreme value. So, if you say to

your self- driving car, “Take me to the airport as fast as possible!” and

it interprets this literally, it will reach speeds of 180 miles per hour

and you’ll go to prison. (Fortunately, the self- driving cars currently

contemplated won’t accept such a request.) If you say, “Take me to the

airport as fast as possible while not exceeding the speed limit,” it will

accelerate and brake as hard as possible, swerving in and out of traffic

to maintain the maximum speed in between. It may even push other

cars out of the way to gain a few seconds in the scrum at the airport

terminal. And so on— eventually, you will add enough considerations

so that the car’s driving roughly approximates that of a skilled human

driver taking someone to the airport in a bit of a hurry.

Driving is a simple task with only local impacts, and the AI sys-

tems currently being built for driving are not very intelligent. For

these reasons, many of the potential failure modes can be anticipated;

others will reveal themselves in driving simulators or in millions of

miles of testing with professional drivers ready to take over if some-

thing goes wrong; still others will appear only later, when the cars are

already on the road and something weird happens.

Unfortunately, with superintelligent systems that can have a global

impact, there are no simulators and no do- overs. It’s certainly very

hard, and perhaps impossible, for mere humans to anticipate and rule

out in advance all the disastrous ways the machine could choose to

achieve a specified objective. Generally speaking, if you have one goal

and a superintelligent machine has a different, conflicting goal, the

machine gets what it wants and you don’t.

)HDUDQG*UHHG,QVWUXPHQWDO*RDOV

If a machine pursuing an incorrect objective sounds bad enough,

there’s worse. The solution suggested by Alan Turing— turning off the

9780525558613_Human_TX.indd 140 8/7/19 11:21 PM

Not

y, w

e are no s

re no s

perhaps imperhaps im

vance alvance al

for

others

and someth

and somet

with supe

ith supe

Distribution

m at t

ugh consid

h consid

hat of a skilat of a skil

of a hurry.a hurry

cal impactscal impact

ng are not are not

ial failure mfailure m

in driving in driving

sional driv

sional dri

will a

OVERLY INTELLIGENT AI 141

power at strategic moments— may not be available, for a very simple

reason: you can’t fetch the coffee if you’re dead.

Let me explain. Suppose a machine has the objective of fetching

the coffee. If it is sufficiently intelligent, it will certainly understand

that it will fail in its objective if it is switched off before completing its

mission. Thus, the objective of fetching coffee creates, as a necessary

subgoal, the objective of disabling the off- switch. The same is true for

curing cancer or calculating the digits of pi. There’s really not a lot you

can do once you’re dead, so we can expect AI systems to act preemp-

tively to preserve their own existence, given more or less any definite

objective.

If that objective is in conflict with human preferences, then we

have exactly the plot of 2001: A Space Odyssey, in which the HAL

9000 computer kills four of the five astronauts on board the ship to

prevent interference with its mission. Dave, the last remaining astro-

naut, manages to switch HAL off after an epic battle of wits—

presumably to keep the plot interesting. But if HAL had been truly

superintelligent, Dave would have been switched off.

It is important to understand that self- preservation doesn’t have to

be any sort of built- in instinct or prime directive in machines. (So

Isaac Asimov’s Third Law of Robotics,

which begins “A robot must

protect its own existence,” is completely unnecessary.) There is no

need to build self- preservation in because it is an instrumental goal— a

goal that is a useful subgoal of almost any original objective.

Any

entity that has a definite objective will automatically act as if it also

has instrumental goals.

In addition to being alive, having access to money is an instrumen-

tal goal within our current system. Thus, an intelligent machine might

want money, not because it’s greedy but because money is useful for

achieving all sorts of goals. In the movie Transcendence, when Johnny

Depp’s brain is uploaded into the quantum supercomputer, the first

thing the machine does is copy itself onto millions of other computers

on the Internet so that it cannot be switched off. The second thing it

M 9780525558613_Human_TX.indd 141 8/7/19 11:21 PM

Not

own exist

n exist

uild ld

elf-elf-

is a usis a u

for

nderst

ers

n instinc

n instin

rd Law

Distribution

less

n preferencpreferenc

dysseyey

, in w, in w

tronauts onronauts on

n. Dave, theDave, th

off after ff after

interestinginterestin

ld have be

nd t

142 HUMAN COMPATIBLE

does is make a quick killing on the stock market to fund its expan-

sion plans.

And what, exactly, are those expansion plans? They include de-

signing and building a much larger quantum supercomputer; doing AI

research; and discovering new knowledge of physics, neuroscience, and

biology. These resource objectives— computing power, algorithms, and

knowledge— are also instrumental goals, useful for achieving any over-

arching objective.

They seem harmless enough until one realizes

that the acquisition process will continue without limit. This seems to

create inevitable conflict with humans. And of course, the machine,

equipped with ever- better models of human decision making, will

anticipate and defeat our every move in this conflict.

Intelligence Explosions

I. J. Good was a brilliant mathematician who worked with Alan Turing

at Bletchley Park, breaking German codes during World War II. He

shared Turing’s interests in machine intelligence and statistical infer-

ence. In 1965, he wrote what is now his best- known paper, “Specula-

tions Concerning the First Ultraintelligent Machine.”

The first sentence

suggests that Good, alarmed by the nuclear brinkmanship of the Cold

War, regarded AI as a possible savior for humanity: “The survival of man

depends on the early construction of an ultraintelligent machine.” As

the paper proceeds, however, he becomes more circumspect. He intro-

duces the notion of an intelligence explosion, but, like Butler, Turing, and

Wiener before him, he worries about losing control:

Let an ultraintelligent machine be defined as a machine that can

far surpass all the intellectual activities of any man however clever.

Since the design of machines is one of these intellectual activities,

an ultraintelligent machine could design even better machines;

there would then unquestionably be an “intelligence explosion,”

9780525558613_Human_TX.indd 142 8/7/19 11:21 PM

Not

the

Good, ala

ood, ala

ded AI as a ded AI as a

n the en the e

for

s in m

n m

rote what i

rote what

First Ul

Distribution

e, th

ision mak

on mak

nflict.flict.

ematician wmatician

German c

German

chin

OVERLY INTELLIGENT AI 143

and the intelligence of man would be left far behind. Thus the first

ultraintelligent machine is the last invention that man need ever

make, provided that the machine is docile enough to tell us how to

keep it under control. It is curious that this point is made so sel-

dom outside science fiction.

This paragraph is a staple of any discussion of superintelligent AI,

although the caveats at the end are usually left out. Good’s point can

be strengthened by noting that not only could the ultraintelligent ma-

chine improve its own design; it’s likely that it would do so because, as

we have seen, an intelligent machine expects to benefit from improv-

ing its hardware and software. The possibility of an intelligence explo-

sion is often cited as the main source of risk to humanity from AI

because it would give us so little time to solve the control problem.

Good’s argument certainly has plausibility via the natural analogy

to a chemical explosion in which each molecular reaction releases

enough energy to initiate more than one additional reaction. On the

other hand, it is logically possible that there are diminishing returns to

intelligence improvements, so that the process peters out rather than

exploding.

There’s no obvious way to prove that an explosion will

necessarily occur.

The diminishing- returns scenario is interesting in its own right. It

could arise if it turns out that achieving a given percentage improve-

ment becomes much harder as the machine becomes more intelligent.

(I’m assuming for the sake of argument that general-purpose machine

intelligence is measurable on some kind of linear scale, which I doubt

will ever be strictly true.) In that case, humans won’t be able to cre-

ate superintelligence either. If a machine that is already superhuman

runs out of steam when trying to improve its own intelligence, then

humans will run out of steam even sooner.

Now, I’ve never heard a serious argument to the effect that creat-

ing any given level of machine intelligence is simply beyond the capac-

ity of human ingenuity, but I suppose one must concede it’s logically

M 9780525558613_Human_TX.indd 143 8/7/19 11:21 PM

Not

inishing-ishing-

se if it turnse if it tur

omes momes m

for

ents, s

ts,

’s no obvio

s no obvi

Distribution

do so

enefit from

efit from

of an intelligf an intellig

risk to humto hum

o solve the csolve the

lausibility vusibility

ch each meach m

re than onere than o

ssible that

ssible tha

tha

144 HUMAN COMPATIBLE

possible. “Logically possible” and “I’m willing to bet the future of the

human race on it” are, of course, two completely different things. Bet-

ting against human ingenuity seems like a losing strategy.

If an intelligence explosion does occur, and if we have not already

solved the problem of controlling machines with only slightly super-

human intelligence— for example, if we cannot prevent them from

making these recursive self- improvements— then we would have no

time left to solve the control problem and the game would be over.

This is Bostrom’s hard takeoff scenario, in which the machine’s intelli-

gence increases astronomically in just days or weeks. In Turing’s words,

it is “certainly something which can give us anxiety.”

The possible responses to this anxiety seem to be to retreat from

AI research, to deny that there are risks inherent in developing ad-

vanced AI, to understand and mitigate the risks through the design of

AI systems that necessarily remain under human control, and to

resign— simply to cede the future to intelligent machines.

Denial and mitigation are the subjects of the remainder of the book.

As I have already argued, retreat from AI research is both unlikely to

happen (because the benefits forgone are too great) and very difficult

to bring about. Resignation seems to be the worst possible response. It

is often accompanied by the idea that AI systems that are more intelli-

gent than us somehow deserve to inherit the planet, leaving humans to

go gentle into that good night, comforted by the thought that our bril-

liant electronic progeny are busy pursuing their objectives. This view

was promulgated by the roboticist and futurist Hans Moravec,

who

writes, “The immensities of cyberspace will be teeming with unhuman

superminds, engaged in affairs that are to human concerns as ours are

to those of bacteria.” This seems to be a mistake. Value, for humans, is

defined primarily by conscious human experience. If there are no hu-

mans and no other conscious entities whose subjective experience mat-

ters to us, there is nothing of value occurring.

9780525558613_Human_TX.indd 144 8/7/19 11:21 PM

Not

nie

omehow mehow

nto that gonto that go

ronic prronic p

for

enefits

fit

gnation seem

nation see

d by the

by the

Distribution

Turi

to be to reto be to re

herent in dent in

he risks throe risks thro

under humnder hu

to intelligeo intellige

he subjects ohe subjects

treat from

orgo

THE NOT- SO- GREAT

AI DEBATE

he implications of introducing a second intelligent species onto

Earth are far-reaching enough to deserve hard thinking.”

ended The Economist magazine’s review of Nick Bostrom’s Super-

intelligence. Most would interpret this as a classic example of British

understatement. Surely, you might think, the great minds of today are

already doing this hard thinking—engaging in serious debate, weigh-

ing up the risks and benefits, seeking solutions, ferreting out loopholes

in solutions, and so on. Not yet, as far as I am aware.

When one first introduces these ideas to a technical audience, one

can see the thought bubbles popping out of their heads, beginning

with the words “But, but, but...” and ending with exclamation marks.

The first kind of but takes the form of denial. The deniers say,

“But this can’t be a real problem, because XYZ.” Some of the XYZs

reflect a reasoning process that might charitably be described as wish-

ful thinking, while others are more substantial. The second kind of

but takes the form of deflection: accepting that the problems are

real but arguing that we shouldn’t try to solve them, either because

M 9780525558613_Human_TX.indd 145 8/7/19 11:21 PM

Not

his

sks and be

s and be

ons, and so ons, and so

n one firn one fi

for

ld inte

int

urely, you m

rely, you

hard

hihi

Distribution

ucing a secong a seco

g enough tog enough

magazine

pret

146 HUMAN COMPATIBLE

they’re unsolvable or because there are more important things to fo-

cus on than the end of civilization or because it’s best not to mention

them at all. The third kind of but takes the form of an oversimpli-

fied, instant solution: “But can’t we just do ABC?” As with denial,

some of the ABCs are instantly regrettable. Others, perhaps by acci-

dent, come closer to identifying the true nature of the problem.

I don’t mean to suggest that there cannot be any reasonable objec-

tions to the view that poorly designed superintelligent machines would

present a serious risk to humanity. It’s just that I have yet to see such

an objection. Since the issue seems to be so important, it deserves a

public debate of the highest quality. So, in the interests of having that

debate, and in the hope that the reader will contribute to it, let me

provide a quick tour of the highlights so far, such as they are.

Denial

Denying that the problem exists at all is the easiest way out. Scott

Alexander, author of the Slate Star Codex

blog, began a well- known

article on AI risk as follows:

“I first became interested in AI risk back

around 2007. At the time, most people’s response to the topic was

‘Haha, come back when anyone believes this besides random Internet

crackpots.’ ”

Instantly regrettable remarks

A perceived threat to one’s lifelong vocation can lead a perfectly

intelligent and usually thoughtful person to say things they might

wish to retract on further analysis. That being the case, I will not

name the authors of the following arguments, all of whom are well-

known AI researchers. I’ve included refutations of the arguments, even

though they are quite unnecessary.

9780525558613_Human_TX.indd 146 8/7/19 11:21 PM

Not

t th

back whe

ck whe

”

for

Sla

follows:follows:

“I

“

e time,

Distribution

nt, it

rests of hav

ts of hav

ontribute tontribute to

such as theh as th

exists at a

Sta

THE NOT- SO- GREAT AI DEBATE 147

• Electronic calculators are superhuman at arithmetic. Calcula-

tors didn’t take over the world; therefore, there is no reason to

worry about superhuman AI.

• Refutation: intelligence is not the same as arithmetic, and the

arithmetic ability of calculators does not equip them to take

over the world.

• Horses have superhuman strength, and we don’t worry about

proving that horses are safe; so we needn’t worry about proving

that AI systems are safe.

• Refutation: intelligence is not the same as physical strength, and

the strength of horses does not equip them to take over the world.

• Historically, there are zero examples of machines killing mil-

lions of humans, so, by induction, it cannot happen in the

future.

• Refutation: there’s a first time for everything, before which there

were zero examples of it happening.

• No physical quantity in the universe can be infinite, and that

includes intelligence, so concerns about superintelligence are

overblown.

• Refutation: superintelligence doesn’t need to be infinite to be

problematic; and physics allows computing devices billions of

times more powerful than the human brain.

• We don’t worry about species- ending but highly unlikely pos-

sibilities such as black holes materializing in near- Earth orbit,

so why worry about superintelligent AI?

• Refutation: if most physicists on Earth were working to make

such black holes, wouldn’t we ask them if it was safe?

It’s complicated

It is a staple of modern psychology that a single IQ number cannot

characterize the full richness of human intelligence.

There are, the

M 9780525558613_Human_TX.indd 147 8/7/19 11:21 PM

Not

mat

es more po

more po

We don’t worWe don’t wo

bilities sbilities s

for

: superintell

superinte

c; and p

Distribution

l stre

ake over the

e over the

machines kilachines k

t cannot hannot h

for everythinr everythi

appening.pening.

the univerthe unive

so concer

148 HUMAN COMPATIBLE

theory says, different dimensions of intelligence: spatial, logical, lin-

guistic, social, and so on. Alice, our soccer player from Chapter 2,

might have more spatial intelligence than her friend Bob, but less so-

cial intelligence. Thus, we cannot line up all humans in strict order of

intelligence.

This is even more true of machines, because their abilities are

much narrower. The Google search engine and AlphaGo have almost

nothing in common, besides being products of two subsidiaries of the

same parent corporation, and so it makes no sense to say that one is

more intelligent than the other. This makes notions of “machine IQ”

problematic and suggests that it’s misleading to describe the future as

a one- dimensional IQ race between humans and machines.

Kevin Kelly, founding editor of Wired magazine and a remarkably

perceptive technology commentator, takes this argument one step

further. In “The Myth of a Superhuman AI,”

he writes, “Intelligence

is not a single dimension, so ‘smarter than humans’ is a meaningless

concept.” In a single stroke, all concerns about superintelligence are

wiped away.

Now, one obvious response is that a machine could exceed human

capabilities in all relevant dimensions of intelligence. In that case,

even by Kelly’s strict standards, the machine would be smarter than a

human. But this rather strong assumption is not necessary to refute

Kelly’s argument. Consider the chimpanzee. Chimpanzees probably

have better short- term memory than humans, even on human-

oriented tasks such as recalling sequences of digits.

Short- term mem-

ory is an important dimension of intelligence. By Kelly’s argument,

then, humans are not smarter than chimpanzees; indeed, he would

claim that “smarter than a chimpanzee” is a meaningless concept. This

is cold comfort to the chimpanzees (and bonobos, gorillas, orangutans,

whales, dolphins, and so on) whose species survive only because we

deign to allow it. It is colder comfort still to all those species that we

have already wiped out. It’s also cold comfort to humans who might

be worried about being wiped out by machines.

9780525558613_Human_TX.indd 148 8/7/19 11:21 PM

Not

tric

this rathe

s rathe

gument. Cogument. C

erer

horhor

for

espons

elevant dim

elevant di

t standa

standa

Distribution

f “m

cribe the f

ibe the f

d machinesmachines

agazine andzine an

kes this arkes this ar

man AI,”n AI,”

he h

rter than her than h

ll concerns ll concern

is th

THE NOT- SO- GREAT AI DEBATE 149

It’s impossible

Even before the birth of AI in 1956, august intellectuals were har-

rumphing and saying that intelligent machines were impossible. Alan

Turing devoted much of his seminal 1950 paper, “Computing Machin-

ery and Intelligence,” to refuting these arguments. Ever since, the AI

community has been fending off similar claims of impossibility from

philosophers,

mathematicians,

and others. In the current debate over

superintelligence, several philosophers have exhumed these impossi-

bility claims to prove that humanity has nothing to fear.

8,9

This comes

as no surprise.

The One Hundred Year Study on Artificial Intelligence, or AI100,

is an ambitious, long- term project housed at Stanford University. Its

goal is to keep track of AI, or, more precisely, to “study and anticipate

how the effects of artificial intelligence will ripple through every as-

pect of how people work, live and play.” Its first major report, “Artifi-

cial Intelligence and Life in 2030,” does come as a surprise.

As might

be expected, it emphasizes the benefits of AI in areas such as medical

diagnosis and automotive safety. What’s unexpected is the claim that

“unlike in the movies, there is no race of superhuman robots on the

horizon or probably even possible.”

To my knowledge, this is the first time that serious AI researchers

have publicly espoused the view that human- level or superhuman AI

is impossible— and this in the middle of a period of extremely rapid

progress in AI research, when barrier after barrier is being breached.

It’s as if a group of leading cancer biologists announced that they had

been fooling us all along: they’ve always known that there will never

be a cure for cancer.

What could have motivated such a volte- face? The report provides

no arguments or evidence whatever. (Indeed, what evidence could

there be that no physically possible arrangement of atoms outperforms

the human brain?) I suspect there are two reasons. The first is the

natural desire to disprove the existence of the gorilla problem, which

M 9780525558613_Human_TX.indd 149 8/7/19 11:21 PM

Not

abl

nowledge,

wledge,

licly espoulicly espou

ible—ible—

for

ive saf

e sa

ies, there i

ies, there

y even p

even p

Distribution

ar.

IntelligencIntelligenc

at StanfordStanfor

cisely, to “stisely, to “st

nce will ripe will ri

d play.” Its fplay.” Its

030,” 030,”

doesdoes

the benef

ty. W

150 HUMAN COMPATIBLE

presents a very uncomfortable prospect for the AI researcher; cer-

tainly, if human- level AI is impossible, the gorilla problem is neatly

dispatched. The second reason is tribalism— the instinct to circle the

wagons against what are perceived to be “attacks” on AI.

It seems odd to perceive the claim that superintelligent AI is pos-

sible as an attack on AI, and even odder to defend AI by saying that AI

will never succeed in its goals. We cannot insure against future ca-

tastrophe simply by betting against human ingenuity.

We have made such bets before and lost. As we saw earlier, the

physics establishment of the early 1930s, personified by Lord Ruther-

ford, confidently believed that extracting atomic energy was impossi-

ble; yet Leo Szilard’s invention of the neutron- induced nuclear chain

reaction in 1933 proved that confidence to be misplaced.

Szilard’s breakthrough came at an unfortunate time: the begin-

ning of an arms race with Nazi Germany. There was no possibility of

developing nuclear technology for the greater good. A few years later,

having demonstrated a nuclear chain reaction in his laboratory, Szilard

wrote, “We switched everything off and went home. That night, there

was very little doubt in my mind that the world was headed for grief.”

It’s too soon to worry about it

It’s common to see sober- minded people seeking to assuage public

concerns by pointing out that because human- level AI is not likely to

arrive for several decades, there is nothing to worry about. For exam-

ple, the AI100 report says there is “no cause for concern that AI is an

imminent threat to humankind.”

This argument fails on two counts. The first is that it attacks a

straw man. The reasons for concern are not predicated on imminence.

For example, Nick Bostrom writes in Superintelligence, “It is no part of

the argument in this book that we are on the threshold of a big break-

through in artificial intelligence, or that we can predict with any pre-

cision when such a development might occur.” The second is that a

9780525558613_Human_TX.indd 150 8/7/19 11:21 PM

Not

oon to

n t

mmon to semon to s

by pointby point

for

my m

y m

Distribution

y Lo

nergy was

rgy was

nduced nunduced nu

be misplacemisplace

unfortunatenfortunate

many. Thereny. There

the greaterhe greater

chain reactchain rea

hing off an

nd th

THE NOT- SO- GREAT AI DEBATE 151

long- term risk can still be cause for immediate concern. The right

time to worry about a potentially serious problem for humanity de-

pends not just on when the problem will occur but also on how long

it will take to prepare and implement a solution.

For example, if we were to detect a large asteroid on course to

collide with Earth in 2069, would we say it’s too soon to worry? Quite

the opposite! There would be a worldwide emergency project to de-

velop the means to counter the threat. We wouldn’t wait until 2068

to start working on a solution, because we can’t say in advance how

much time is needed. Indeed, NASA’s Planetary Defense project is

already working on possible solutions, even though “no known aster-

oid poses a significant risk of impact with Earth over the next 100

years.” In case that makes you feel complacent, they also say, “About

74 percent of near- Earth objects larger than 460 feet still remain to be

discovered.”

And if we consider the global catastrophic risks from climate

change, which are predicted to occur later in this century, is it too

soon to take action to prevent them? On the contrary, it may be too

late. The relevant time scale for superhuman AI is less predictable, but

of course that means it, like nuclear fission, might arrive considerably

sooner than expected.

One formulation of the “it’s too soon to worry” argument that has

gained currency is Andrew Ng’s assertion that “it’s like worrying about

overpopulation on Mars.”

(He later upgraded this from Mars to Al-

pha Centauri.) Ng, a former Stanford professor, is a leading expert on

machine learning, and his views carry some weight. The assertion ap-

peals to a convenient analogy: not only is the risk easily managed and

far in the future but also it’s extremely unlikely we’d even try to

move billions of humans to Mars in the first place. The analogy is a

false one, however. We are already devoting huge scientific and tech-

nical resources to creating ever- more- capable AI systems, with very

little thought devoted to what happens if we succeed. A more apt

analogy, then, would be working on a plan to move the human race to

M 9780525558613_Human_TX.indd 151 8/7/19 11:21 PM

pect

mulation o

lation o

urrency is Aurrency is A

ulation oulation

for

scale f

ale

ns it, like n

ed.

Distribution

efens

gh “no kno

“no kno

arth over thrth over th

cent, they at, they

han 460 feehan 460 fee

bal catastrol catastr

to occur lato occur

ent them?

ent them

or su

152 HUMAN COMPATIBLE

Mars with no consideration for what we might breathe, drink, or eat

once we arrive. Some might call this plan unwise. Alternatively, one

could take Ng’s point literally, and respond that landing even a single

person on Mars would constitute overpopulation, because Mars has a

carrying capacity of zero. Thus, groups that are currently planning to

send a handful of humans to Mars are worrying about overpopulation

on Mars, which is why they are developing life-support systems.

We’re the experts

In every discussion of technological risk, the pro- technology camp

wheels out the claim that all concerns about risk arise from ignorance.

For example, here’s Oren Etzioni, CEO of the Allen Institute for AI

and a noted researcher in machine learning and natural language

understanding:

At the rise of every technology innovation, people have been scared.

From the weavers throwing their shoes in the mechanical looms at

the beginning of the industrial era to today’s fear of killer robots,

our response has been driven by not knowing what impact the new

technology will have on our sense of self and our livelihoods. And

when we don’t know, our fearful minds fill in the details.

Popular Science published an article titled “Bill Gates Fears AI, but AI

Researchers Know Better”:

When you talk to A.I. researchers— again, genuine A.I. research-

ers, people who grapple with making systems that work at all,

much less work too well— they are not worried about superintelli-

gence sneaking up on them, now or in the future. Contrary to the

spooky stories that Musk seems intent on telling, A.I. researchers

aren’t frantically installing firewalled summoning chambers and

self- destruct countdowns.

9780525558613_Human_TX.indd 152 8/7/19 11:21 PM

Not

ll h

don’t know

n’t know

ienceience

pupu

for

indust

dus

been driven

ave on o

ve on o

Distribution

echnolo

k arise fromarise from

the Allen InAllen I

arning and rning and

y innovationy innovatio

g their sho

al e

THE NOT- SO- GREAT AI DEBATE 153

This analysis was based on a sample of four, all of whom in fact said in

their interviews that the long- term safety of AI was an important issue.

Using very similar language to the Popular Science article, David

Kenny, at that time a vice president at IBM, wrote a letter to the US

Congress that included the following reassuring words:

When you actually do the science of machine intelligence, and when

you actually apply it in the real world of business and society— as

we have done at IBM to create our pioneering cognitive computing

system, Watson— you understand that this technology does not

support the fear- mongering commonly associated with the AI de-

bate today.

The message is the same in all three cases: “Don’t listen to them; we’re

the experts.” Now, one can point out that this is really an ad hominem

argument that attempts to refute the message by delegitimizing the

messengers, but even if one takes it at face value, the argument doesn’t

hold water. Elon Musk, Stephen Hawking, and Bill Gates are certainly

very familiar with scientific and technological reasoning, and Musk

and Gates in particular have supervised and invested in many AI re-

search projects. And it would be even less plausible to argue that Alan

Turing, I. J. Good, Norbert Wiener, and Marvin Minsky are unquali-

fied to discuss AI. Finally, Scott Alexander’s blog piece mentioned

earlier, which is titled “AI Researchers on AI Risk,” notes that “AI re-

searchers, including some of the leaders in the field, have been instru-

mental in raising issues about AI risk and superintelligence from the

very beginning.” He lists several such researchers, and the list is now

much longer.

Another standard rhetorical move for the “defenders of AI” is to

describe their opponents as Luddites. Oren Etzioni’s reference to

“weavers throwing their shoes in the mechanical looms” is just this: the

Luddites were artisan weavers in the early nineteenth century protest-

ing the introduction of machinery to replace their skilled labor. In 2015,

M 9780525558613_Human_TX.indd 153 8/7/19 11:21 PM

Not

Good, N

ood, N

discuss AI. discuss AI

which is which is

for

entific

tifi

cular have s

ular have

d it wou

Distribution

ogy

d with the

with the

es: “Don’t lis: “Don’t li

t that this ishat this i

e the messthe mess

kes it at fackes it at fa

phen Haw

and

154 HUMAN COMPATIBLE

the Information Technology and Innovation Foundation gave its annual

Luddite Award to “alarmists touting an artificial intelligence apoca-

lypse.” It’s an odd definition of “Luddite” that includes Turing, Wiener,

Minsky, Musk, and Gates, who rank among the most prominent con-

tributors to technological progress in the twentieth and twenty- first

centuries.

The accusation of Luddism represents a misunderstanding of the

nature of the concerns raised and the purpose for raising them. It is as

if one were to accuse nuclear engineers of Luddism if they point out

the need for control of the fission reaction. As with the strange phe-

nomenon of AI researchers suddenly claiming that AI is impossible, I

think we can attribute this puzzling episode to tribalism in defense of

technological progress.

Deflection

Some commentators are willing to accept that the risks are real, but

still present arguments for doing nothing. These arguments include

the impossibility of doing anything, the importance of doing some-

thing else entirely, and the need to keep quiet about the risks.

You can’t control research

A common answer to suggestions that advanced AI might present

risks to humanity is to claim that banning AI research is impossible.

Note the mental leap here: “Hmm, someone is discussing risks! They

must be proposing a ban on my research!!” This mental leap might be

appropriate in a discussion of risks based only on the gorilla problem,

and I would tend to agree that solving the gorilla problem by prevent-

ing the creation of superintelligent AI would require some kind of

constraints on AI research.

Recent discussions of risks have, however, focused not on the gen-

9780525558613_Human_TX.indd 154 8/7/19 11:21 PM

Not

y, a

can’t concan’t con

for

for d

doing anyt

doing any

nd the n

Distribution

he st

AI is imp

I is imp

tribalism inribalism in

ling to ac

ing

THE NOT- SO- GREAT AI DEBATE 155

eral gorilla problem (journalistically speaking, the humans vs. super-

intelligence smackdown) but on the King Midas problem and variants

thereof. Solving the King Midas problem also solves the gorilla

problem— not by preventing superintelligent AI or finding a way to

defeat it but by ensuring that it is never in conflict with humans in

the first place. Discussions of the King Midas problem generally avoid

proposing that AI research be curtailed; they merely suggest that at-

tention be paid to the issue of preventing negative consequences of

poorly designed systems. In the same vein, a discussion of the risks of

containment failure in nuclear plants should be interpreted not as an

attempt to ban nuclear physics research but as a suggestion to focus

more effort on solving the containment problem.

There is, as it happens, a very interesting historical precedent for

cutting off research. In the early 1970s, biologists began to be con-

cerned that novel recombinant DNA methods— splicing genes from

one organism into another— might create substantial risks for human

health and the global ecosystem. Two meetings at Asilomar in Califor-

nia in 1973 and 1975 led first to a moratorium on such experiments

and then to detailed biosafety guidelines consonant with the risks

posed by any proposed experiment.

Some classes of experiments,

such as those involving toxin genes, were deemed too hazardous to be

allowed.

Immediately after the 1975 meeting, the National Institutes of

Health (NIH), which funds virtually all basic medical research in the

United States, began the process of setting up the Recombinant DNA

Advisory Committee. The RAC, as it is known, was instrumental in

developing the NIH guidelines that essentially implemented the Asi-

lomar recommendations. Since 2000, those guidelines have included

a ban on

funding approval for any protocol involving human germline

alteration—

the modification of the human genome in ways that can be

inherited by subsequent generations. This ban was followed by legal

prohibitions in over fifty countries.

The goal of “improving the human stock” had been one of the

M 9780525558613_Human_TX.indd 155 8/7/19 11:21 PM

Not

vol

ediately aftediately af

NIH), wNIH), w

for

biosaf

osa

posed exper

osed exp

ving tox

ing tox

Distribution

prete

suggestion

uggestion

m.m.

ng historicahistoric

s, biologists, biologist

ethodsethods

ht create sucreate su

m. Two mem. Two m

rst to a m

ty g

156 HUMAN COMPATIBLE

dreams of the eugenics movement in the late nineteenth and early

twentieth centuries. The development of CRISPR- Cas9, a very pre-

cise method for genome editing, has reignited this dream. An interna-

tional summit held in 2015 left the door open for future applications,

calling for restraint until “there is broad societal consensus about the

appropriateness of the proposed application.”

In November 2018,

the Chinese scientist He Jiankui announced that he had edited the

genomes of three human embryos, at least two of which had led to

live births. An international outcry followed, and at the time of

writing, Jiankui appears to be under house arrest. In March 2019, an

international panel of leading scientists called explicitly for a formal

moratorium.

The lesson of this debate for AI is mixed. On the one hand, it

shows that we can refrain from proceeding with an area of research

that has huge potential. The international consensus against germline

alteration has been almost completely successful up to now. The fear

that a ban would simply drive the research underground, or into coun-

tries with no regulation, has not materialized. On the other hand,

germline alteration is an easily identifiable process, a specific use case

of more general knowledge about genetics that requires specialized

equipment and real humans to experiment on. Moreover, it falls within

an area— reproductive medicine— that is already subject to close over-

sight and regulation. These characteristics do not apply to general-

purpose AI, and, as yet, no one has come up with any plausible form

that a regulation to curtail AI research might take.

Whataboutery

I was introduced to the term whataboutery by an adviser to a Brit-

ish politician who had to deal with it on a regular basis at public meet-

ings. No matter the topic of the speech he was giving, someone would

invariably ask, “What about the plight of the Palestinians?”

9780525558613_Human_TX.indd 156 8/7/19 11:21 PM

Not

eal

roductive ductive

regulation.regulation

AI, and, AI, and,

for

n easily

asi

owledge ab

owledge a

humans

umans

Distribution

Marc

licitly for

itly for

xed. On theOn th

ing with anng with an

onal consennal conse

etely succesely succes

he researchhe researc

s not mat

s not ma

iden

THE NOT- SO- GREAT AI DEBATE 157

In response to any mention of risks from advanced AI, one is likely

to hear, “What about the benefits of AI?” For example, here is Oren

Etzioni:

Doom- and- gloom predictions often fail to consider the potential

benefits of AI in preventing medical errors, reducing car accidents,

and more.

And here is Mark Zuckerberg, CEO of Facebook, in a recent media-

fueled exchange with Elon Musk:

If you’re arguing against AI, then you’re arguing against safer cars

that aren’t going to have accidents and you’re arguing against being

able to better diagnose people when they’re sick.

Leaving aside the tribal notion that anyone mentioning risks is “against

AI,” both Zuckerberg and Etzioni are arguing that to talk about risks

is to ignore the potential benefits of AI or even to negate them.

This is precisely backwards, for two reasons. First, if there were no

potential benefits of AI, there would be no economic or social impe-

tus for AI research and hence no danger of ever achieving human-

level AI. We simply wouldn’t be having this discussion at all. Second,

if the risks are not successfully mitigated, there will be no benefits. The

potential benefits of nuclear power have been greatly reduced because

of the partial core meltdown at Three Mile Island in 1979, the uncon-

trolled reaction and catastrophic releases at Chernobyl in 1986, and

the multiple meltdowns at Fukushima in 2011. Those disasters se-

verely curtailed the growth of the nuclear industry. Italy abandoned

nuclear power in 1990 and Belgium, Germany, Spain, and Switzer-

land have announced plans to do so. Since 1990, the worldwide rate of

commissioning of nuclear plants has been about a tenth of what it was

before Chernobyl.

M 9780525558613_Human_TX.indd 157 8/7/19 11:21 PM

Not

arc

simply w

mply w

ks are not suks are not

benefitbenefit

for

ckwar

of AI, there

f AI, ther

h and h

and h

Distribution

ing against sng against

’re arguing aarguing

hey’re sick.ey’re sick.

at anyone manyone m

zioni are arzioni are a

nefits of A

nefits of

s, fo

158 HUMAN COMPATIBLE

Silence

The most extreme form of deflection is simply to suggest that we

should keep silent about the risks. For example, the aforementioned

AI100 report includes the following admonition:

If society approaches these technologies primarily with fear and

suspicion, missteps that slow AI’s development or drive it under-

ground will result, impeding important work on ensuring the

safety and reliability of AI technologies.

Robert Atkinson, director of the Information Technology and In-

novation Foundation (the very same foundation that gives out the

Luddite Award), made a similar argument in a 2015 debate.

While

there are valid questions about precisely how risks should be described

when talking to the media, the overall message is clear: “Don’t men-

tion the risks; it would be bad for funding.” Of course, if no one were

aware of the risks, there would be no funding for research on risk

mitigation and no reason for anyone to work on it.

The renowned cognitive scientist Steven Pinker gives a more opti-

mistic version of Atkinson’s argument. In his view, the “culture of

safety in advanced societies” will ensure that all serious risks from AI

will be eliminated; therefore, it is inappropriate and counterproduc-

tive to call attention to those risks.

Even if we disregard the fact that

our advanced culture of safety has led to Chernobyl, Fukushima, and

runaway global warming, Pinker’s argument entirely misses the point.

The culture of safety consists precisely of people pointing to possible

failure modes and finding ways to ensure they don’t happen. (And

with AI, the standard model is the failure mode.) Saying that it’s ridic-

ulous to point to a failure mode because the culture of safety will fix

it anyway is like saying no one should call an ambulance when they see

a hit- and- run accident because someone will call an ambulance.

In attempting to portray the risks to the public and to policy mak-

9780525558613_Human_TX.indd 158 8/7/19 11:21 PM

Not

f A

anced soci

ced soci

iminated; timinated;

l attentil attenti

for

n for a

for

ognitive sci

ognitive sc

tkinson’

kinson’

Distribution

on Technolon Technolo

dation that on that

nt in a 201nt in a 201

ely how riskhow risk

verall messarall messa

for fundingfor fundin

ould be n

nyon

THE NOT- SO- GREAT AI DEBATE 159

ers, AI researchers are at a disadvantage compared to nuclear physi-

cists. The physicists did not need to write books explaining to the

public that assembling a critical mass of highly enriched uranium

might present a risk, because the consequences had already been

demonstrated at Hiroshima and Nagasaki. It did not require a great

deal of further persuasion to convince governments and funding agen-

cies that safety was important in developing nuclear energy.

Tribalism

In Butler’s Erewhon, focusing on the gorilla problem leads to a prema-

ture and false dichotomy between pro- machinists and anti- machinists.

The pro- machinists believe the risk of machine domination to be min-

imal or nonexistent; the anti- machinists believe it to be insuperable

unless all machines are destroyed. The debate becomes tribal, and no

one tries to solve the underlying problem of retaining human control

over the machines.

To varying degrees, all the major technological issues of the

twentieth century— nuclear power, genetically modified organisms

(GMOs), and fossil fuels— succumbed to tribalism. On each issue,

there are two sides, pro and anti. The dynamics and outcomes of

each have been different, but the symptoms of tribalism are similar:

mutual distrust and denigration, irrational arguments, and a refusal to

concede any (reasonable) point that might favor the other tribe. On

the pro- technology side, one sees denial and concealment of risks

combined with accusations of Luddism; on the anti side, one sees a

conviction that the risks are insuperable and the problems unsolvable.

A member of the pro- technology tribe who is too honest about a prob-

lem is viewed as a traitor, which is particularly unfortunate as the

pro- technology tribe usually includes most of the people qualified to

solve the problem. A member of the anti- technology tribe who dis-

cusses possible mitigations is also a traitor, because it is the technology

M 9780525558613_Human_TX.indd 159 8/7/19 11:21 PM

Not

oss

wo sides, sides,

e been diffe been dif

istrust aistrust

for

es, all

—

uclear p

uclear

uels—uels—

Distribution

oblem leads blem leads

inists and sts and

achine domachine dom

nists believsts belie

d. The debaThe deba

ing probleming proble

the

160 HUMAN COMPATIBLE

itself that has come to be viewed as evil, rather than its possible ef-

fects. In this way, only the most extreme voices— those least likely to

be listened to by the other side— can speak for each tribe.

In 2016, I was invited to No. 10 Downing Street to meet with

some of then prime minister David Cameron’s advisers. They were

worried that the AI debate was starting to resemble the GMO

debate— which, in Europe, had led to what the advisers considered to

be premature and overly restrictive regulations on GMO production

and labeling. They wanted to avoid the same thing happening to AI.

Their concerns had some validity: the AI debate is in danger of be-

coming tribal, of creating pro- AI and anti- AI camps. This would be

damaging to the field because it’s simply not true that being concerned

about the risks inherent in advanced AI is an anti- AI stance. A physi-

cist who is concerned about the risks of nuclear war or the risk of a

poorly designed nuclear reactor exploding is not “ anti- physics.” To say

that AI will be powerful enough to have a global impact is a compli-

ment to the field rather than an insult.

It is essential that the AI community own the risks and work to

mitigate them. The risks, to the extent that we understand them, are

neither minimal nor insuperable. We need to do a substantial amount

of work to avoid them, including reshaping and rebuilding the founda-

tions of AI.

&DQ·W:H-XVW

. . . switch it off?

Once they understand the basic idea of existential risk, whether in

the form of the gorilla problem or the King Midas problem, many

people— myself included— immediately begin casting around for an

easy solution. Often, the first thing that comes to mind is switching

off the machine. For example, Alan Turing himself, as quoted earlier,

9780525558613_Human_TX.indd 160 8/7/19 11:21 PM

Not

the

for

ks, to t

insuperabl

insuperab

m, inclu

Distribution

n dan

ps. This w

. This w

e that beingthat being

n antinti

I stI s

nuclear wanuclear w

ding is not ng is not

to have a glhave a gl

n insult.insult.

I commun

I commu

eex

THE NOT- SO- GREAT AI DEBATE 161

speculates that we might “keep the machines in a subservient posi-

tion, for instance by turning off the power at strategic moments.”

This won’t work, for the simple reason that a superintelligent

entity will already have thought of that possibility and taken steps to

prevent it. And it will do that not because it wants to stay alive but

because it is pursuing whatever objective we gave it and knows that it

will fail if it is switched off.

There are some systems being contemplated that really cannot

be switched off without ripping out a lot of the plumbing of our

civilization. These are systems implemented as so- called smart contracts

in the blockchain. The

blockchain

is a highly distributed form of com-

puting and record keeping based on encryption; it is specifically de-

signed so that no datum can be deleted and no smart contract can be

interrupted without essentially taking control of a very large number of

machines and undoing the chain, which might in turn destroy a large

part of the Internet and/ or the financial system. It is debatable whether

this incredible robustness is a feature or a bug. It’s certainly a tool that a

superintelligent AI system could use to protect itself.

. . . put it in a box?

If you can’t switch AI systems off, can you seal the machines inside

a kind of firewall, extracting useful question- answering work from

them but never allowing them to affect the real world directly? This is

the idea behind Oracle AI, which has been discussed at length in the

AI safety community.

An Oracle AI system can be arbitrarily intel-

ligent, but can answer only yes or no (or give corresponding probabil-

ities) to each question. It can access all the information the human

race possesses through a read- only connection—that is, it has no di-

rect access to the Internet. Of course, this means giving up on super-

intelligent robots, assistants, and many other kinds of AI systems, but

a trustworthy Oracle AI would still have enormous economic value

because we could ask it questions whose answers are important to

M 9780525558613_Human_TX.indd 161 8/7/19 11:21 PM

Not

n’t switch

switch

f firewall, f firewall,

never anever a

for

n a box?

a box?

Distribution

d sm

buted form

uted form

on; it is spen; it is spe

d no smart o smart

ntrol of a ventrol of a v

hich might h might

ancial systencial syste

feature or a feature or

ould use t

162 HUMAN COMPATIBLE

us, such as whether Alzheimer’s disease is caused by an infectious

organism or whether it’s a good idea to ban autonomous weapons.

Thus, the Oracle AI is certainly an interesting possibility.

Unfortunately, there are some serious difficulties. First, the Oracle

AI system will be at least as assiduous in understanding the physics

and origins of its world— the computing resources, their mode of op-

eration, and the mysterious entities that produced its information

store and are now asking questions— as we are in understanding ours.

Second, if the objective of the Oracle AI system is to provide accurate

answers to questions in a reasonable amount of time, it will have an

incentive to break out of its cage to acquire more computational re-

sources and to control the questioners so that they ask only simple

questions. And, finally, we have yet to invent a firewall that is secure

against ordinary humans, let alone superintelligent machines.

I think there might be solutions to some of these problems, partic-

ularly if we limit Oracle AI systems to be provably sound logical or

Bayesian calculators. That is, we could insist that the algorithm can

output only a conclusion that is warranted by the information pro-

vided, and we could check mathematically that the algorithm satisfies

this condition. This still leaves the problem of controlling the process

that decides which logical or Bayesian computations to do, in order to

reach the strongest possible conclusion as quickly as possible. Because

this process has an incentive to reason quickly, it has an incentive to

acquire computational resources and of course to preserve its own

existence.

In 2018, the Center for Human- Compatible AI at Berkeley ran a

workshop at which we asked the question, “What would you do if you

knew for certain that superintelligent AI would be achieved within a

decade?” My answer was as follows: persuade the developers to hold

off on building a general- purpose intelligent agent— one that can

choose its own actions in the real world— and build an Oracle AI in-

stead. Meanwhile, we would work on solving the problem of making

Oracle AI systems provably safe to the extent possible. The reason

9780525558613_Human_TX.indd 162 8/7/19 11:21 PM

Not

ongest pos

gest pos

ss has an inss has an

omputatomputa

for

eck ma

k m

still leaves

still leave

ogical or

gical or

Distribution

it w

computat

omputat

they ask othey ask o

nt a firewallfirewal

intelligent mntelligent m

some of thome of th

ms to be ps to be p

we could inwe could

at is warr

at is war

hem

THE NOT- SO- GREAT AI DEBATE 163

this strategy might work is twofold: first, a superintelligent Oracle AI

system would still be worth trillions of dollars, so the developers

might be willing to accept this restriction; and second, controlling

Oracle AI systems is almost certainly easier than controlling a general-

purpose intelligent agent, so we’d have a better chance of solving the

problem within the decade.

. . . work in human– machine teams?

A common refrain in the corporate world is that AI is no threat to

employment or to humanity because we’ll just have collaborative

human– AI teams. For example, David Kenny’s letter to Congress,

quoted earlier in this chapter, stated that “ high- value artificial intelli-

gence systems are specifically designed to augment human intelli-

gence, not replace workers.”

While a cynic might suggest that this is merely a public relations

ploy to sugarcoat the process of eliminating human employees from

the corporations’ clients, I think it does move the ball forward a few

inches. Collaborative human– AI teams are indeed a desirable goal.

Clearly, a team will be unsuccessful if the objectives of the team mem-

bers are not aligned, so the emphasis on human– AI teams highlights

the need to solve the core problem of value alignment. Of course,

highlighting the problem is not the same as solving it.

. . . merge with the machines?

Human– machine teaming, taken to its extreme, becomes a human–

machine merger in which electronic hardware is attached directly to

the brain and forms part of a single, extended, conscious entity. The

futurist Ray Kurzweil describes the possibility as follows:

We are going to directly merge with it, we are going to become

the AIs.... As you get to the late 2030s or 2040s, our thinking

M 9780525558613_Human_TX.indd 163 8/7/19 11:21 PM

Not

gne

solve the

olve the

ing the proing the pr

for

umam

be unsucce

be unsucc

d, so the

, so the

Distribution

AI is

have col

y’s letter ty’s letter t

igh-h-

alue aalue

d to augmed to augm

that this isat this is

of eliminaof elimin

think it d

164 HUMAN COMPATIBLE

will be predominately non-biological and the non-biological part

will ultimately be so intelligent and have such vast capacity it’ll

be able to model, simulate and understand fully the biological

part.

Kurzweil views these developments in a positive light. Elon Musk, on

the other hand, views the human– machine merger primarily as a de-

fensive strategy:

If we achieve tight symbiosis, the AI wouldn’t be “other”— it would

be you and [it would have] a relationship to your cortex analogous

to the relationship your cortex has with your limbic system....

We’re going to have the choice of either being left behind and be-

ing effectively useless or like a pet— you know, like a house cat or

something— or eventually figuring out some way to be symbiotic

and merge with AI.

Musk’s Neuralink Corporation is working on a device dubbed

“neural lace” after a technology described in Iain Banks’s Culture nov-

els. The aim is to create a robust, permanent connection between the

human cortex and external computing systems and networks. There

are two main technical obstacles: first, the difficulties of connecting

an electronic device to brain tissue, supplying it with power, and con-

necting it to the outside world; and second, the fact that we under-

stand almost nothing about the neural implementation of higher levels

of cognition in the brain, so we don’t know where to connect the de-

vice and what processing it should do.

I am not completely convinced that the obstacles in the preceding

paragraph are insuperable. First, technologies such as neural dust are

rapidly reducing the size and power requirements of electronic de-

vices that can be attached to neurons and provide sensing, stimula-

tion, and transcranial communication.

(The technology as of 2018

had reached a size of about one cubic millimeter, so neural grit might

9780525558613_Human_TX.indd 164 8/7/19 11:21 PM

Not

nd e

technica

echnica

nic device tnic device

to the to the

for

hnolog

olo

eate a robu

xternal

Distribution

r”—

ortex analo

tex analo

limbic limbic

ystys

ing left behleft beh

u know, likeu know, like

out some wt some w

oration is

oration i

des

THE NOT- SO- GREAT AI DEBATE 165

be a more accurate term.) Second, the brain itself has remarkable

powers of adaptation. It used to be thought, for example, that we

would have to understand the code that the brain uses to control the

arm muscles before we could connect a brain to a robot arm success-

fully, and that we would have to understand the way the cochlea ana-

lyzes sound before we could build a replacement for it. It turns out,

instead, that the brain does most of the work for us. It quickly learns

how to make the robot arm do what its owner wants, and how to map

the output of a cochlear implant to intelligible sounds. It’s entirely

possible that we may hit upon ways to provide the brain with addi-

tional memory, with communication channels to computers, and per-

haps even with communication channels to other brains— all without

ever really understanding how any of it works.

Regardless of the technological feasibility of these ideas, one has to

ask whether this direction represents the best possible future for hu-

manity. If humans need brain surgery merely to survive the threat

posed by their own technology, perhaps we’ve made a mistake some-

where along the line.

. . . avoid putting in human goals?

A common line of reasoning has it that problematic AI behaviors

arise from putting in specific kinds of objectives; if these are left out,

everything will be fine. Thus, for example, Yann LeCun, a pioneer of

deep learning and director of AI research at Facebook, often cites this

idea when downplaying the risk from AI:

There is no reason for AIs to have self- preservation instincts, jeal-

ousy, etc.... AIs will not have these destructive “emotions” unless

we build these emotions into them. I don’t see why we would want

to do that.

In a similar vein, Steven Pinker provides a gender- based analysis:

M 9780525558613_Human_TX.indd 165 8/7/19 11:21 PM

Not

on line of

n line o

m putting im putting

ng will bng will b

for

utting in

tting in

Distribution

brain

computers

mputers

her er

rains—rains

rks.

2727

bility of theility of the

ts the best pthe best

urgery mergery mer

gy, perhaps gy, perhap

166 HUMAN COMPATIBLE

AI dystopias project a parochial alpha-male psychology onto the

concept of intelligence. They assume that superhumanly intelli-

gent robots would develop goals like deposing their masters or

taking over the world.... It’s telling that many of our techno-

prophets don’t entertain the possibility that artificial intelligence

will naturally develop along female lines: fully capable of solving

problems, but with no desire to annihilate innocents or dominate

the civilization.

As we have already seen in the discussion of instrumental goals, it

doesn’t matter whether we build in “emotions” or “desires” such as self-

preservation, resource acquisition, knowledge discovery, or, in the ex-

treme case, taking over the world. The machine is going to have those

emotions anyway, as subgoals of any objective we do build in— and

regardless of its gender. For a machine, death isn’t bad per se. Death

is to be avoided, nonetheless, because it’s hard to fetch the coffee if

you’re dead.

An even more extreme solution is to avoid putting objectives into

the machine altogether. Voilà, problem solved. Alas, it’s not as simple

as that. Without objectives, there is no intelligence: any action is as

good as any other, and the machine may as well be a random number

generator. Without objectives, there is also no reason for the machine

to prefer a human paradise to a planet turned into a sea of paperclips

(a scenario described at length by Nick Bostrom). Indeed, the latter

outcome may be utopian for the iron-eating bacterium Thiobacillus

ferrooxidans. Absent some notion that human preferences matter, who

is to say the bacterium is wrong?

A common variant on the “avoid putting in objectives” idea is the

notion that a sufficiently intelligent system will necessarily, as a con-

sequence of its intelligence, develop the “right” goals on its own. Of-

ten, proponents of this notion appeal to the theory that people of

greater intelligence tend to have more altruistic and lofty objectives—

a view that may be related to the self-conception of the proponents.

9780525558613_Human_TX.indd 166 8/7/19 11:21 PM

Not

r, a

thout obj

out obj

a human paa human p

o descrio descri

for

Voilà

oil

jectives, th

ectives, t

nd the m

Distribution

ment

esires” suc

res” suc

iscovery, orscovery, or

hine is goingis goin

jective we ective we

ne, death isndeath is

ause it’s hause it’s ha

olution is

prob

THE NOT- SO- GREAT AI DEBATE 167

The idea that it is possible to perceive objectives in the world was

discussed at length by the famous eighteenth- century philosopher

David Hume in A Treatise of Human Nature.

He called it the is- ought

problem and concluded that it was simply a mistake to think that moral

imperatives could be deduced from natural facts. To see why, consider,

for example, the design of a chessboard and chess pieces. One cannot

perceive in these the goal of checkmate, for the same chessboard and

pieces can be used for suicide chess or indeed many other games still

to be invented.

Nick Bostrom, in Superintelligence, presents the same underlying

idea in a different form, which he calls the orthogonality thesis:

Intelligence and final goals are orthogonal: more or less any level of

intelligence could in principle be combined with more or less any

final goal.

Here, orthogonal means “at right angles” in the sense that the de-

gree of intelligence is one axis defining an intelligent system and its

goals are another axis, and we can vary these independently. For ex-

ample, a self- driving car can be given any particular address as its

destination; making the car a better driver doesn’t mean that it will

start refusing to go to addresses that are divisible by seventeen. By the

same token, it is easy to imagine that a general- purpose intelligent

system could be given more or less any objective to pursue— including

maximizing the number of paperclips or the number of known dig-

its of pi. This is just how reinforcement learning systems and other

kinds of reward optimizers work: the algorithms are completely gen-

eral and accept any reward signal. For engineers and computer scien-

tists operating within the standard model, the orthogonality thesis is

just a given.

The idea that intelligent systems could simply observe the world to

acquire the goals that should be pursued suggests that a sufficiently

intelligent system will naturally abandon its initial objective in favor

M 9780525558613_Human_TX.indd 167 8/7/19 11:21 PM

Not

kin

g to go to

o go to

ken, it is eaken, it is e

ould be ould be

for

and w

ng car can

g the ca

Distribution

same

onality thes

ality thes

more or lesre or le

bined with bined with

at right angt right an

axis defin

eca

168 HUMAN COMPATIBLE

of the “right” objective. It’s hard to see why a rational agent would do

this. Furthermore, it presupposes that there is a “right” objective out

there in the world; it would have to be an objective on which iron-

eating bacteria and humans and all other species agree, which is hard

to imagine.

The most explicit critique of Bostrom’s orthogonality thesis comes

from the noted roboticist Rodney Brooks, who asserts that it’s impossi-

ble for a program to be “smart enough that it would be able to invent

ways to subvert human society to achieve goals set for it by humans,

without understanding the ways in which it was causing problems for

those same humans.”

Unfortunately, it’s not only possible for a pro-

gram to behave like this; it is, in fact, inevitable, given the way Brooks

defines the issue. Brooks posits that the optimal plan to “achieve goals

set for it by humans” is causing problems for humans. It follows that

those problems reflect things of value to humans that were omitted

from the goals set for it by humans. The optimal plan being carried out

by the machine may well cause problems for humans, and the machine

may well be aware of this. But, by definition, the machine will not rec-

ognize those problems as problematic. They are none of its concern.

Steven Pinker seems to agree with Bostrom’s orthogonality thesis,

writing that “intelligence is the ability to deploy novel means to attain

a goal; the goals are extraneous to the intelligence itself.”

On the

other hand, he finds it inconceivable that “the AI would be so brilliant

that it could figure out how to transmute elements and rewire brains,

yet so imbecilic that it would wreak havoc based on elementary blun-

ders of misunderstanding.”

He continues, “The ability to choose an

action that best satisfies conflicting goals is not an add- on that engi-

neers might forget to install and test; it is intelligence. So is the ability

to interpret the intentions of a language user in context.” Of course,

“satisf[ying] conflicting goals” is not the problem— that’s something

that’s been built into the standard model from the early days of deci-

sion theory. The problem is that the conflicting goals of which the

machine is aware do not constitute the entirety of human concerns;

9780525558613_Human_TX.indd 168 8/7/19 11:21 PM

ellig

oals are e

ls are e

d, he finds id, he finds

ld figurld figur

for

as prob

pro

ems to agre

ems to agr

ence is t

Distribution

ng pr

possible fo

ossible fo

given the wgiven the w

mal plan toplan to

s for humafor huma

e to humanto huma

. The optimThe optim

problems fproblems

ut, by defin

ut, by defi

ema

THE NOT- SO- GREAT AI DEBATE 169

moreover, within the standard model, there’s nothing to say that the

machine has to care about goals it’s not told to care about.

There are, however, some useful clues in what Brooks and Pinker

say. It does seem stupid to us for the machine to, say, change the color

of the sky as a side effect of pursuing some other goal, while ignoring

the obvious signs of human displeasure that result. It seems stupid to

us because we are attuned to noticing human displeasure and (usu-

ally) we are motivated to avoid causing it— even if we were previously

unaware that the humans in question cared about the color of the sky.

That is, we humans (1) care about the preferences of other humans

and (2) know that we don’t know what all those preferences are. In the

next chapter, I argue that these characteristics, when built into a ma-

chine, may provide the beginnings of a solution to the King Midas

problem.

7KH'HEDWH5HVWDUWHG

This chapter has provided a glimpse into an ongoing debate in the

broad intellectual community, a debate between those pointing to the

risks of AI and those who are skeptical about the risks. It has been

conducted in books, blogs, academic papers, panel discussions, inter-

views, tweets, and newspaper articles. Despite their valiant efforts,

the “skeptics”— those who argue that the risk from AI is negligible—

have failed to explain why superintelligent AI systems will necessarily

remain under human control; and they have not even tried to explain

why superintelligent AI systems will never be developed.

Many skeptics will admit, if pressed, that there is a real problem,

even if it’s not imminent. Scott Alexander, in his Slate Star Codex

blog, summed it up brilliantly:

The “skeptic” position seems to be that, although we should prob-

ably get a couple of bright people to start working on preliminary

M 9780525558613_Human_TX.indd 169 8/7/19 11:21 PM

Not

n books, b

books, b

weets, and weets, and

tics”—tics”—

for

vided a

ommunity,

ommunit

ose who

se who

Distribution

of ot

eferences a

erences a

, when builwhen buil

olution to tion to

UWHGUWHG

glim

170 HUMAN COMPATIBLE

aspects of the problem, we shouldn’t panic or start trying to ban

AI research.

The “believers,” meanwhile, insist that although we shouldn’t

panic or start trying to ban AI research, we should probably get a

couple of bright people to start working on preliminary aspects of

the problem.

Although I would be happy if the skeptics came up with an irre-

futable objection, perhaps in the form of a simple and foolproof (and

evil- proof) solution to the control problem for AI, I think it’s quite

likely that this isn’t going to happen, any more than we’re going to find

a simple and foolproof solution for cybersecurity or a simple and fool-

proof way to generate nuclear energy with zero risk. Rather than con-

tinue the descent into tribal name- calling and repeated exhumation of

discredited arguments, it seems better, as Alexander puts it, to start

working on some preliminary aspects of the problem.

The debate has highlighted the conundrum we face: if we build

machines to optimize objectives, the objectives we put into the ma-

chines have to match what we want, but we don’t know how to define

human objectives completely and correctly. Fortunately, there is a

middle way.

9780525558613_Human_TX.indd 170 8/7/19 11:21 PM

Not

for

hat we

t w

ompletely

Distribution

thin

we’re goin

e’re goin

y or a simplor a simpl

ero risk. Rarisk. Ra

and repeatand repeat

er, as Alexaas Alex

ects of the pts of the p

d the conund the con

tives, the

want

wan

AI: A DIFFERENT APPROACH

nce the skeptic’s arguments have been refuted and all the

but but buts have been answered, the next question is usu-

ally, “OK, I admit there’s a problem, but there’s no solution,

is there?” Yes, there is a solution.

Let’s remind ourselves of the task at hand: to design machines with

a high degree of intelligence— so that they can help us with difficult

problems— while ensuring that those machines never behave in ways

that make us seriously unhappy.

The task is, fortunately, not the following: given a machine that

possesses a high degree of intelligence, work out how to control it. If

that were the task, we would be toast. A machine viewed as a black

box, a fait accompli, might as well have arrived from outer space. And

our chances of controlling a superintelligent entity from outer space

are roughly zero. Similar arguments apply to methods of creating AI

systems that guarantee we won’t understand how they work; these

methods include

whole-brain emulation

—creating souped-up elec-

tronic copies of human brains— as well as methods based on simulated

evolution of programs.

I won’t say more about these proposals be-

cause they are so obviously a bad idea.

So, how has the field of AI approached the “design machines with

M 9780525558613_Human_TX.indd 171 8/7/19 11:21 PM

Not

riou

is, fortu

s, fortu

s a high degs a high de

e the tae the ta

for

igence

nsuring tha

nsuring th

usly unh

sly unh

Distribution

have been rave been r

wered, theered, the

e’s a problema problem

ion.on.

f the task

172 HUMAN COMPATIBLE

a high degree of intelligence” part of the task in the past? Like many

other fields, AI has adopted the standard model: we build optimizing

machines, we feed objectives into them, and off they go. That worked

well when the machines were stupid and had a limited scope of action;

if you put in the wrong objective, you had a good chance of being

able to switch off the machine, fix the problem, and try again.

As machines designed according to the standard model become

more intelligent, however, and as their scope of action becomes more

global, the approach becomes untenable. Such machines will pur-

sue their objective, no matter how wrong it is; they will resist attempts

to switch them off; and they will acquire any and all resources that

contribute to achieving the objective. Indeed, the optimal behavior for

the machine might include deceiving the humans into thinking they

gave the machine a reasonable objective, in order to gain enough time

to achieve the actual objective given to it. This wouldn’t be “deviant”

or “malicious” behavior requiring consciousness and free will; it would

just be part of an optimal plan to achieve the objective.

In Chapter 1, I introduced the idea of beneficial machines— that is,

machines whose actions can be expected to achieve our objectives

rather than their objectives. My goal in this chapter is to explain in

simple terms how this can be done, despite the apparent drawback

that the machines don’t know what our objectives are. The resulting

approach should lead eventually to machines that present no threat to

us, no matter how intelligent they are.

Principles for Beneficial Machines

I find it helpful to summarize the approach in the form of three

prin-

ciples. When reading these principles, keep in mind that they are in-

tended primarily as a guide to AI researchers and developers in

thinking about how to create beneficial AI systems; they are not

intended as explicit laws for AI systems to follow:

9780525558613_Human_TX.indd 172 8/7/19 11:21 PM

Not

w t

hines don

nes don

should leadshould lea

tter howtter how

for

ns can

can

jectives. M

ectives. M

his can

Distribution

resis

d all resou

ll resou

e optimal be optimal b

umans into ans into

in order to n order to

to it. This wit. This

consciousnensciousne

to achieve to achiev

d the idea

be e

AI: A DIFFERENT APPROACH 173

1. The machine’s only objective is to maximize the realization of

human preferences.

2. The machine is initially uncertain about what those prefer-

ences are.

3. The ultimate source of information about human preferences is

human behavior.

Before delving into more detailed explanations, it’s important to

remember the broad scope of what I mean by preferences in these prin-

ciples. Here’s a reminder of what I wrote in Chapter 2: if you were

somehow able to watch two movies, each describing in sufficient detail

and breadth a future life you might lead, such that each constitutes a vir-

tual experience, you could say which you prefer, or express indifference.

Thus, preferences here are all- encompassing; they cover everything

you might care about, arbitrarily far into the future.

And they are

yours: the machine is not looking to identify or adopt one ideal set of

preferences but to understand and satisfy (to the extent possible) the

preferences of each person.

The first principle: Purely altruistic machines

The first principle, that the machine’s only objective is to maxi-

mize the realization of human preferences, is central to the notion of

a beneficial machine. In particular, it will be beneficial to humans,

rather than to, say, cockroaches. There’s no getting around this recipient-

specific notion of benefit.

The principle means that the machine is purely altruistic—that is,

it attaches absolutely no intrinsic value to its own well- being or even

its own existence. It might protect itself in order to continue doing

useful things for humans, or because its owner would be unhappy

about having to pay for repairs, or because the sight of a dirty or dam-

aged robot might be mildly distressing to passersby, but not because it

wants to be alive. Putting in any preference for self-preservation sets

M 9780525558613_Human_TX.indd 173 8/7/19 11:21 PM

Not

principle

rinciple

realizationrealizatio

ial macial mac

for

nciple: P

ciple: P

Distribution

er 2:

g in suffic

in suffic

at each const each cons

efer, or expr or exp

assing; theyassing; the

r into the into the

g to identifyo identify

d and satisfd and satis

174 HUMAN COMPATIBLE

up an additional incentive within the robot that is not strictly aligned

with human well- being.

The wording of the first principle brings up two questions of fun-

damental importance. Each merits an entire bookshelf to itself, and in

fact many books have already been written on these questions.

The first question is whether humans really have preferences in a

meaningful or stable sense. In truth, the notion of a “preference” is an

idealization that fails to match reality in several ways. For example,

we aren’t born with the preferences we have as adults, so they must

change over time. For now, I will assume that the idealization is rea-

sonable. Later, I will examine what happens when we give up the

idealization.

The second question is a staple of the social sciences: given that it is

usually impossible to ensure that everyone gets their most preferred

outcome— we can’t all be Emperor of the Universe— how should the

machine trade off the preferences of multiple humans? Again, for the

time being— and I promise to return to this question in the next

chapter— it seems reasonable to adopt the simple approach of treating

everyone equally. This is reminiscent of the roots of eighteenth- century

utilitarianism in the phrase “the greatest happiness for the greatest

numbers,”

and there are many caveats and elaborations required to

make this work in practice. Perhaps the most important of these is the

matter of the possibly vast number of people not yet born, and how

their preferences are to be taken into account.

The issue of future humans brings up another, related question:

How do we take into account the preferences of nonhuman entities?

That is, should the first principle include the preferences of animals?

(And possibly plants too?) This is a question worthy of debate, but the

outcome seems unlikely to have a strong impact on the path forward

for AI. For what it’s worth, human preferences can and do include

terms for the well- being of animals, as well as for the aspects of hu-

man well- being that benefit directly from animals’ existence.

To say

that the machine should pay attention to the preferences of animals in

9780525558613_Human_TX.indd 174 8/7/19 11:21 PM

Not

the

rk in prac

in prac

the possiblthe possib

rences arences

for

s remin

e phrase “t

phrase “

e are m

Distribution

aliza

en we give

we give

al sciences:ciences

ne gets thene gets the

f the he

niveniv

of multiplef multiple

o return too return

e to adopt

scen

AI: A DIFFERENT APPROACH 175

addition to this is to say that humans should build machines that care

more about animals than humans do, which is a difficult position to

sustain. A more tenable position is that our tendency to engage in

myopic decision making— which works against our own interests—

often leads to negative consequences for the environment and its

animal inhabitants. A machine that makes less myopic decisions

would help humans adopt more environmentally sound policies. And

if, in the future, we give substantially greater weight to the well- being

of animals than we currently do— which probably means sacrificing

some of our own intrinsic well- being— then machines will adapt

accordingly.

The second principle: Humble machines

The second principle, that the machine is initially uncertain

about what human preferences are, is the key to creating beneficial

machines.

A machine that assumes it knows the true objective perfectly will

pursue it single- mindedly. It will never ask whether some course of

action is OK, because it already knows it’s an optimal solution for the

objective. It will ignore humans jumping up and down screaming,

“Stop, you’re going to destroy the world!” because those are just words.

Assuming perfect knowledge of the objective decouples the machine

from the human: what the human does no longer matters, because the

machine knows the goal and pursues it.

On the other hand, a machine that is uncertain about the true

objective will exhibit a kind of humility: it will, for example, defer to

humans and allow itself to be switched off. It reasons that the human

will switch it off only if it’s doing something wrong— that is, doing

something contrary to human preferences. By the first principle, it

wants to avoid doing that, but, by the second principle, it knows that’s

possible because it doesn’t know exactly what “wrong” is. So, if the

human does switch the machine off, then the machine avoids doing

M 9780525558613_Human_TX.indd 175 8/7/19 11:21 PM

Not

ll i

e going to

oing to

g perfect kg perfect k

humanhuman

for

edly. It

use it alread

use it alrea

gnore h

nore h

Distribution

hines

machinesachines

e machine machine

are, is the e, is the

s it knows

will

176 HUMAN COMPATIBLE

the wrong thing, and that’s what it wants. In other words, the machine

has a positive incentive to allow itself to be switched off. It remains

coupled to the human, who is a potential source of information that

will allow it to avoid mistakes and do a better job.

Uncertainty has been a central concern in AI since the 1980s; in-

deed the phrase “modern AI” often refers to the revolution that took

place when uncertainty was finally recognized as a ubiquitous issue in

real- world decision making. Yet uncertainty in the objective of the AI

system was simply ignored. In all the work on utility maximization,

goal achievement, cost minimization, reward maximization, and loss

minimization, it is assumed that the utility function, the goal, the cost

function, the reward function, and the loss function are known per-

fectly. How could this be? How could the AI community (and the

control theory, operations research, and statistics communities) have

such a huge blind spot for so long, even while embracing uncertainty

in all other aspects of decision making?

One could make some rather complicated technical excuses,

but

I suspect the truth is that, with some honorable exceptions,

AI re-

searchers simply bought into the standard model that maps our notion

of human intelligence onto machine intelligence: humans have objec-

tives and pursue them, so machines should have objectives and pursue

them. They, or should I say we, never really examined this fundamen-

tal assumption. It is built into all existing approaches for constructing

intelligent systems.

The third principle: Learning to predict human

preferences

The third principle, that the ultimate source of information about

human preferences is human behavior, serves two purposes.

The first purpose is to provide a definite grounding for the term

human preferences. By assumption, human preferences aren’t in the

machine and it cannot observe them directly, but there must still be

9780525558613_Human_TX.indd 176 8/7/19 11:21 PM

Not

the

or should

should

ption. It is bption. It is

systemsystem

for

t into t

nto

ce onto mac

e onto m

m, so m

Distribution

zatio

n, the goal,

the goal,

nction are kction are k

AI commucomm

statistics costatistics c

en while emwhile e

king?ng?

her complicher compli

with some

est

AI: A DIFFERENT APPROACH 177

some definite connection between the machine and human prefer-

ences. The principle says that the connection is through the observa-

tion of human choices: we assume that choices are related in some

(possibly very complicated) way to underlying preferences. To see why

this connection is essential, consider the converse: if some human

preference had no effect whatsoever on any actual or hypothetical choice

the human might make, then it would probably be meaningless to say

that the preference exists.

The second purpose is to enable the machine to become more use-

ful as it learns more about what we want. (After all, if it knew nothing

about human preferences, it would be of no use to us.) The idea is

simple enough: human choices reveal information about human pref-

erences. Applied to the choice between pineapple pizza and sausage

pizza, this is straightforward. Applied to choices between future lives

and choices made with the goal of influencing the robot’s behavior,

things get more interesting. In the next chapter I explain how to for-

mulate and solve such problems. The real complications arise, how-

ever, because humans are not perfectly rational: imperfection comes

between human preferences and human choices, and the machine

must take into account those imperfections if it is to interpret human

choices as evidence of human preferences.

Not what I mean

Before going into more detail, I want to head off some potential

misunderstandings.

The first and most common misunderstanding is that I am propos-

ing to install in machines a single, idealized value system of my own

design that guides the machine’s behavior. “Whose values are you go-

ing to put in?” “Who gets to decide what the values are?” Or even,

“What gives Western, well- off, white male cisgender scientists such as

Russell the right to determine how the machine encodes and develops

human values?”

M 9780525558613_Human_TX.indd 177 8/7/19 11:21 PM

Not

nce

what I mwhat I m

for

erences

ount those i

unt those

of hum

Distribution

f it k

to us.) T

o us.) T

tion about hon about h

neapple pizpple pi

o choices bechoices b

influencingfluencin

he next chanext cha

ems. The reems. The

not perfec

and

178 HUMAN COMPATIBLE

I think this confusion comes partly from an unfortunate conflict

between the commonsense meaning of value and the more technical

sense in which it is used in economics, AI, and operations research. In

ordinary usage, values are what one uses to help resolve moral dilem-

mas; as a technical term, on the other hand, value is roughly synony-

mous with utility, which measures the degree of desirability of anything

from pizza to paradise. The meaning I want is the technical one: I just

want to make sure the machines give me the right pizza and don’t ac-

cidentally destroy the human race. (Finding my keys would be an un-

expected bonus.) To avoid this confusion, the principles talk about

human preferences rather than human values, since the former term

seems to steer clear of judgmental preconceptions about morality.

“Putting in values” is, of course, exactly the mistake I am saying we

should avoid, because getting the values (or preferences) exactly right

is so difficult and getting them wrong is potentially catastrophic. I am

proposing instead that machines learn to predict better, for each per-

son, which life that person would prefer, all the while being aware that

the predictions are highly uncertain and incomplete. In principle, the

machine can learn billions of different predictive preference models,

one for each of the billions of people on Earth. This is really not too

much to ask for the AI systems of the future, given that present- day

Facebook systems are already maintaining more than two billion indi-

vidual profiles.

A related misunderstanding is that the goal is to equip machines

with “ethics” or “moral values” that will enable them to resolve moral

dilemmas. Often, people bring up the so- called trolley problems,

where one has to choose whether to kill one person in order to save

others, because of their supposed relevance to self- driving cars. The

whole point of moral dilemmas, however, is that they are dilemmas:

there are good arguments on both sides. The survival of the human

race is not a moral dilemma. Machines could solve most moral dilem-

mas the wrong way (whatever that is) and still have no catastrophic

impact on humanity.

9780525558613_Human_TX.indd 178 8/7/19 11:21 PM

Not

the

tems are a

ms are a

files.files.

ted misuted mis

for

ons of

s o

billions of p

billions of

AI syst

Distribution

ples

e the form

the form

ns about mns about m

he mistake mistake

(or preferenor preferen

g is potentiapotentia

earn to prern to pre

uld prefer, auld prefer,

ncertain an

ncertain a

diffe

AI: A DIFFERENT APPROACH 179

Another common supposition is that machines that follow the

three principles will adopt all the sins of the evil humans they observe

and learn from. Certainly, there are many of us whose choices leave

something to be desired, but there is no reason to suppose that ma-

chines who study our motivations will make the same choices, any

more than criminologists become criminals. Take, for example, the

corrupt government official who demands bribes to approve building

permits because his paltry salary won’t pay for his children to go to

university. A machine observing this behavior will not learn to take

bribes; it will learn that the official, like many other people, has a very

strong desire for his children to be educated and successful. It will find

ways to help him that don’t involve lowering the well- being of others.

This is not to say that all cases of evil behavior are unproblematic for

machines—for example, machines may need to treat differently those

who actively prefer the suffering of others.

Reasons for Optimism

In a nutshell, I am suggesting that we need to steer AI in a radically

new direction if we want to retain control over increasingly intelligent

machines. We need to move away from one of the driving ideas of

twentieth- century technology: machines that optimize a given objec-

tive. I am often asked why I think this is even remotely feasible, given

the huge momentum behind the standard model in AI and related

disciplines. In fact, I am quite optimistic that it can be done.

The first reason for optimism is that there are strong economic

incentives to develop AI systems that defer to humans and gradually

align themselves to user preferences and intentions. Such systems will

be highly desirable: the range of behaviors they can exhibit is simply

far greater than that of machines with fixed, known objectives. They

will ask humans questions or ask for permission when appropriate;

they will do “trial runs” to see if we like what they propose to do; they

M 9780525558613_Human_TX.indd 179 8/7/19 11:21 PM

Not

fwe

We need t

need t

entury tentury

m often am often

for

suggesting

suggestin

want to

Distribution

eopl

ccessful. I

essful. I

he e

ell-ell-

einein

vior are unr are un

need to treneed to tre

others.hers.

mism

180 HUMAN COMPATIBLE

will accept correction when they do something wrong. On the other

hand, systems that fail to do this will have severe consequences. Up to

now, the stupidity and limited scope of AI systems has protected us

from these consequences, but that will change. Imagine, for example,

some future domestic robot charged with looking after your children

while you are working late. The children are hungry, but the refriger-

ator is empty. Then the robot notices the cat. Alas, the robot under-

stands the cat’s nutritional value but not its sentimental value. Within

a few short hours, headlines about deranged robots and roasted cats

are blanketing the world’s media and the entire domestic- robot indus-

try is out of business.

The possibility that one industry player could destroy the entire

industry through careless design provides a strong economic motiva-

tion to form safety- oriented industry consortia and to enforce safety

standards. Already, the Partnership on AI, which includes as members

nearly all the world’s leading technology companies, has agreed to

cooperate to ensure that “AI research and technology is robust, reliable,

trustworthy, and operates within secure constraints.” To my knowl-

edge, all the major players are publishing their safety- oriented research

in the open literature. Thus, the economic incentive is in operation long

before we reach human- level AI and will only strengthen over time.

Moreover, the same cooperative dynamic may be starting at the inter-

national level—for example, the stated policy of the Chinese govern-

ment is to “cooperate to preemptively prevent the threat of AI.”

A second reason for optimism is that the raw data for learning

about human preferences— namely, examples of human behavior— are

abundant. The data come not just in the form of direct observation

via camera, keyboard, and touch screen by billions of machines shar-

ing data with one another about billions of humans (subject to privacy

constraints, of course) but also in indirect form. The most obvious

kind of indirect evidence is the vast human record of books, films, and

television and radio broadcasts, which is almost entirely concerned

9780525558613_Human_TX.indd 180 8/7/19 11:21 PM

Not

e same co

ame co

evel—for exevel—for e

“cooper“coope

for

ers are

are

e. Thus, the

. Thus, th

man-man-

eveev

Distribution

ic-

uld destroyld destroy

strong econong eco

nsortia andnsortia and

n AI, whichAI, which

chnology conology co

search and tearch and

within secu

within sec

ubli

AI: A DIFFERENT APPROACH 181

with people doing things (and other people being upset about it). Even

the earliest and most tedious Sumerian and Egyptian records of cop-

per ingots being traded for sacks of barley give some insight into hu-

man preferences for different commodities.

There are, of course, difficulties involved in interpreting this raw

material, which includes propaganda, fiction, the ravings of lunatics,

and even the pronouncements of politicians and presidents, but there

is certainly no reason for the machine to take it all at face value. Ma-

chines can and should interpret all communications from other intel-

ligent entities as moves in a game rather than as statements of fact; in

some games, such as cooperative games with one human and one ma-

chine, the human has an incentive to be truthful, but in many other

situations there are incentives to be dishonest. And of course, whether

honest or dishonest, humans may be deluded in their own beliefs.

There is a second kind of indirect evidence that is staring us in the

face: the way we have made the world.

We made it that way because—

very roughly— we like it that way. (Obviously, it’s not perfect!) Now,

imagine you are an alien visiting Earth while all the humans are away

on holiday. As you peer inside their houses, can you begin to grasp the

basics of human preferences? Carpets are on floors because we like to

walk on soft, warm surfaces and we don’t like loud footsteps; vases are

on the middle of the table rather than the edge because we don’t want

them to fall and break; and so on— everything that isn’t arranged by

nature itself provides clues to the likes and dislikes of the strange bi-

pedal creatures who inhabit this planet.

Reasons for Caution

You may find the Partnership on AI’s promises of cooperation on AI

safety less than reassuring if you have been following progress in self-

driving cars. That field is ruthlessly competitive, for some very good

M 9780525558613_Human_TX.indd 181 8/7/19 11:21 PM

Not

arm

le of the t

of the t

fall and brefall and br

self provself pro

for

r insid

nsi

eferences? C

eferences?

surface

Distribution

men

human and

man and

hful, but in ful, but in

st. And of cAnd of

luded in thuded in th

t evidence tvidence

orld.ld.

1515

We m We m

way. (Obviway. (Ob

siting Eart

siting Ear

the

182 HUMAN COMPATIBLE

reasons: the first car manufacturer to release a fully autonomous vehi-

cle will gain a huge market advantage; that advantage will be self-

reinforcing because the manufacturer will be able to collect more data

more quickly to improve the system’s performance; and ride- hailing

companies such as Uber would quickly go out of business if another

company were to roll out fully autonomous taxis before Uber does.

This has led to a high- stakes race in which caution and careful engi-

neering appear to be less important than snazzy demos, talent grabs,

and premature rollouts.

Thus, life- or- death economic competition provides an impetus to

cut corners on safety in the hope of winning the race. In a 2008 retro-

spective paper on the 1975 Asilomar conference that he co- organized—

the conference that led to a moratorium on genetic modification of

humans— the biologist Paul Berg wrote,

There is a lesson in Asilomar for all of science: the best way to re-

spond to concerns created by emerging knowledge or early- stage

technologies is for scientists from publicly funded institutions

to find common cause with the wider public about the best

way to regulate— as early as possible. Once scientists from corpo-

rations begin to dominate the research enterprise, it will simply be

too late.

Economic competition occurs not just between corporations but

also between nations. A recent flurry of announcements of multibillion-

dollar national investments in AI from the United States, China, France,

Britain, and the EU certainly suggests that none of the major powers

wants to be left behind. In 2017, Russian president Vladimir Putin

said, “The one who becomes the leader in [AI] will be the ruler of the

world.”

This analysis is essentially correct. Advanced AI would, as

we saw in Chapter 3, lead to greatly increased productivity and rates

of innovation in almost all areas. If not shared, it would allow its pos-

sessor to outcompete any rival nation or bloc.

9780525558613_Human_TX.indd 182 8/7/19 11:21 PM

Not

o d

mic commic com

for

use wi

e w

s early as p

s early as

ominate

minate

Distribution

s an

e. In a 200

In a 200

that he cohat he co

n genetic menetic m

r all of scienll of scien

y emergingy emergin

ists from

hth

AI: A DIFFERENT APPROACH 183

Nick Bostrom, in Superintelligence, warns against exactly this moti-

vation. National competition, just like corporate competition, would

tend to focus more on advances in raw capabilities and less on the

problem of control. Perhaps, however, Putin has read Bostrom; he

went on to say, “It would be strongly undesirable if someone wins a

monopolist position.” It would also be rather pointless, because

human- level AI is not a zero- sum game and nothing is lost by sharing

it. On the other hand, competing to be the first to achieve human-

level AI, without first solving the control problem, is a negative- sum

game. The payoff for everyone is minus infinity.

There’s only a limited amount that AI researchers can do to influ-

ence the evolution of global policy on AI. We can point to possible

applications that would provide economic and social benefits; we can

warn about possible misuses such as surveillance and weapons; and we

can provide roadmaps for the likely path of future developments and

their impacts. Perhaps the most important thing we can do is to design

AI systems that are, to the extent possible, provably safe and benefi-

cial for humans. Only then will it make sense to attempt general reg-

ulation of AI.

M 9780525558613_Human_TX.indd 183 8/7/19 11:21 PM

Not

for

Distribution

hers can do

rs can do

e can pointcan point

and social bsocial

veillance aneillance an

path of futath of fut

mportant thportant th

xtent possibxtent poss

will it ma

will it m

PROVABLY BENEFICIAL AI

f we are going to rebuild AI along new lines, the foundations must

be solid. When the future of humanity is at stake, hope and good

intentions— and educational initiatives and industry codes of con-

duct and legislation and economic incentives to do the right thing—

are not enough. All of these are fallible, and they often fail. In such

situations, we look to precise definitions and rigorous step- by- step

mathematical proofs to provide incontrovertible guarantees.

That’s a good start, but we need more. We need to be sure, to the

extent possible, that what is guaranteed is actually what we want and

that the assumptions going into the proof are actually true. The proofs

themselves belong in journal papers written for specialists, but I think

it is useful nonetheless to understand what proofs are and what they

can and cannot provide in the way of real safety. The “provably bene-

ficial” in the title of the chapter is an aspiration rather than a promise,

but it is the right aspiration.

9780525558613_Human_TX.indd 184 8/7/19 11:21 PM

Not

sta

le, that w

that w

ssumptions ssumption

s belongs belon

for

precis

eci

s to provide

to provid

rt, but w

Distribution

w lines, the lines, the

anity is at sity is at

tiatives andatives and

omic incentomic incen

e are fallib

def

PROVABLY BENEFICIAL AI 185

Mathematical Guarantees

We will want, eventually, to prove theorems to the effect that a

particular way of designing AI systems ensures that they will be ben-

eficial to humans. A theorem is just a fancy name for an assertion,

stated precisely enough so that its truth in any particular situation can

be checked. Perhaps the most famous theorem is Fermat’s Last Theo-

rem, which was conjectured by the French mathematician Pierre

de Fermat in 1637 and finally proved by Andrew Wiles in 1994 after

357 years of effort (not all of it by Wiles).

The theorem can be written

in one line, but the proof is over one hundred pages of dense

mathematics.

Proofs begin from axioms, which are assertions whose truth is

simply assumed. Often, the axioms are just definitions, such as the

definitions of integers, addition, and exponentiation needed for

Fermat’s theorem. The proof proceeds from the axioms by logically

incontrovertible steps, adding new assertions until the theorem itself

is established as a consequence of one of the steps.

Here’s a fairly obvious theorem that follows almost immediately

from the definitions of integers and addition: 1 + 2 = 2 + 1. Let’s call

this Russell’s theorem. It’s not much of a discovery. On the other hand,

Fermat’s Last Theorem feels like something completely new— a dis-

covery of something previously unknown. The difference, however, is

just a matter of degree. The truth of both Russell’s and Fermat’s the-

orems is already contained in the axioms. Proofs merely make explicit

what was already implicit. They can be long or short, but they add

nothing new. The theorem is only as good as the assumptions that go

into it.

That’s fine when it comes to mathematics, because mathematics is

about abstract objects that we

define— numbers, sets, and so on. The

axioms are true because we say so. On the other hand, if you want to

prove something about the real world— for example, that AI systems

M 9780525558613_Human_TX.indd 185 8/7/19 11:21 PM

Not

tion

theoremeorem

Last TheorLast Theo

fsomethfsometh

for

equen

obvious the

obvious th

s of inte

Distribution

les in

rem can b

m can b

ndred pagendred page

are assertiore assertio

are just dre just d

n, and exand ex

f proceeds ffproceeds

ng new as

ng new a

eof

186 HUMAN COMPATIBLE

designed like so won’t kill you on purpose— your axioms have to be

true in the real world. If they aren’t true, you’ve proved something

about an imaginary world.

Science and engineering have a long and honorable tradition of

proving results about imaginary worlds. In structural engineering, for

example, one might see a mathematical analysis that begins, “Let AB

be a rigid beam. . . .” The word rigid here doesn’t mean “made of

something hard like steel”; it means “infinitely strong,” so that it doesn’t

bend at all. Rigid beams do not exist, so this is an imaginary world.

The trick is to know how far one can stray from the real world and still

obtain useful results. For example, if the rigid- beam assumption al-

lows an engineer to calculate the forces in a structure that includes

the beam, and those forces are small enough to bend a real steel

beam by only a tiny amount, then the engineer can be reasonably con-

fident that the analysis will transfer from the imaginary world to the

real world.

A good engineer develops a sense for when this transfer might fail—

for example, if the beam is under compression, with huge forces push-

ing on it from each end, then even a tiny amount of bending might

lead to greater lateral forces causing more bending, and so on, result-

ing in catastrophic failure. In that case, the analysis is redone with

“Let AB be a flexible beam with stiffness

K....” This is still an imag-

inary world, of course, because real beams do not have uniform stiff-

ness; instead, they have microscopic imperfections that can lead to

cracks forming if the beam is subject to repeated bending. The process

of removing unrealistic assumptions continues until the engineer is

fairly confident that the remaining assumptions are true enough in the

real world. After that, the engineered system can be tested in the real

world; but the test results are just that. They do not prove that the

same system will work in other circumstances or that other instances

of the system will behave the same way as the original.

One of the classic examples of assumption failure in computer sci-

ence comes from cybersecurity. In that field, a huge amount of

9780525558613_Human_TX.indd 186 8/7/19 11:21 PM

hic

flexible b

xible b

ld, of coursld, of cour

ead, theead, the

for

d, then

the

al forces ca

al forces c

failure.

ailure.

Distribution

l wor

am assum

m assum

tructure thructure th

ugh to bento be

gineer can bineer can b

from the imm the im

sense for whsense for w

nder comp

nder com

eve

PROVABLY BENEFICIAL AI 187

mathematical analysis goes into showing that certain digital protocols

are provably secure—for example, when you type a password into a

Web application, you want to be sure that it is encrypted before trans-

mission so that someone eavesdropping on the network cannot read

your password. Such digital systems are often provably secure but still

vulnerable to attack in reality. The false assumption here is that this is

a digital process. It isn’t. It operates in the real, physical world. By lis-

tening to the sound of your keyboard or measuring voltages on the

electrical line that supplies power to your desktop computer, an at-

tacker can “hear” your password or observe the encryption/ decryption

calculations that are occurring as it is processed. The cybersecurity

community is now responding to these so- called side- channel attacks—

for example, by writing encryption code that produces the same volt-

age fluctuations regardless of what message is being encrypted.

Let’s look at the kind of theorem we would like eventually to prove

about machines that are beneficial to humans. One type might go

something like this:

Suppose a machine has components A, B, C, connected to each

other like so and to the environment like so, with internal learn-

ing algorithms l

, l

that optimize internal feedback rewards r

, r

defined like so, and [a few more conditions] . . . then,

with very high probability, the machine’s behavior will be very

close in value (for humans) to the best possible behavior realizable

on any machine with the same computational and physical

capabilities.

The main point here is that such a theorem should hold regardless of

how smart the components become—that is, the vessel never springs a

leak and the machine always remains beneficial to humans.

There are three other points worth making about this kind of the-

orem. First, we cannot try to prove that the machine produces optimal

(or even near- optimal) behavior on our behalf, because that’s almost

M 9780525558613_Human_TX.indd 187 8/7/19 11:21 PM

Not

fined like

ed like

very high pvery high

n value n value

for

has co

as c

to the env

to the en

, ,

thth

Distribution

ption

The cybe

ide-ide-

hannhann

at producesproduce

sage is beingage is bein

we would lie would l

cial to huml to hum

mpo

188 HUMAN COMPATIBLE

certainly computationally impossible. For example, we might want

the machine to play Go perfectly, but there is good reason to be-

lieve that cannot be done in any practical amount of time on any phys-

ically realizable machine. Optimal behavior in the real world is even

less feasible. Hence, the theorem says “best possible” rather than

“optimal.”

Second, we say “very high probability... very close” because that’s

typically the best that can be done with machines that learn. For ex-

ample, if the machine is learning to play roulette for us and the ball

lands in zero forty times in a row, the machine might reasonably de-

cide the table was rigged and bet accordingly. But it could have hap-

pened by chance; so there is always a small— perhaps vanishingly

small— chance of being misled by freak occurrences. Finally, we are a

long way from being able to prove any such theorem for really intelli-

gent machines operating in the real world!

There are also analogs of the side- channel attack in AI. For exam-

ple, the theorem begins with “Suppose a machine has components A,

B, C, connected to each other like

so....” This is typical of all correct-

ness theorems in computer science: they begin with a description of

the program being proved correct. In AI, we typically distinguish be-

tween the agent (the program doing the deciding) and the environment

(on which the agent acts). Since we design the agent, it seems reason-

able to assume that it has the structure we give it. To be extra safe, we

can prove that its learning processes can modify its program only in

certain circumscribed ways that cannot cause problems. Is this

enough? No. As with side- channel attacks, the assumption that the

program operates within a digital system is incorrect. Even if a learn-

ing algorithm is constitutionally incapable of overwriting its own code

by digital means, it may, nonetheless, learn to persuade humans to do

“brain surgery” on it— to violate the agent/ environment distinction

and change the code by physical means.

Unlike the structural engineer reasoning about rigid beams, we

have very little experience with the assumptions that will eventually

9780525558613_Human_TX.indd 188 8/7/19 11:21 PM

Not

(the

the

e agent ac

gent ac

ume that itume that i

that itsthat its

for

uter sc

proved corre

roved cor

program

Distribution

reas

couldcould

—

erhaps verhaps v

urrences. Finces. F

uch theoremch theorem

world!ld!

de-

hannelhanne

Suppose a mSuppose a

r like

sos

ence

PROVABLY BENEFICIAL AI 189

underlie theorems about provably beneficial AI. In this chapter, for

example, we will typically be assuming a rational human. This is a bit

like assuming a rigid beam, because there are no perfectly rational

humans in reality. (It’s probably much worse, however, because hu-

mans are not even close to being rational.) The theorems we can prove

seem to provide some insights, and the insights survive the introduc-

tion of a certain degree of randomness in human behavior, but it is as

yet far from clear what happens when we consider some of the com-

plexities of real humans.

So, we are going to have to be very careful in examining our as-

sumptions. When a proof of safety succeeds, we need to make sure it’s

not succeeding because we have made unrealistically strong assump-

tions or because the definition of safety is too weak. When a proof of

safety fails, we need to resist the temptation to strengthen the as-

sumptions to make the proof go through— for example, by adding the

assumption that the program’s code remains fixed. Instead, we need

to tighten up the design of the AI system— for example, by ensuring

that it has no incentive to modify critical parts of its own code.

There are some assumptions that I call OWMAWGH assumptions,

standing for “otherwise we might as well go home.” That is, if these

assumptions are false, the game is up and there is nothing to be done.

For example, it is reasonable to assume that the universe operates ac-

cording to constant and somewhat discernible laws. If this is not the

case, we will have no assurance that learning processes— even very

sophisticated ones— will work at all. Another basic assumption is that

humans care about what happens; if not, provably beneficial AI has no

purpose because beneficial has no meaning. Here, caring means hav-

ing roughly coherent and more- or- less stable preferences about the

future. In the next chapter, I examine the consequences of plasticity in

human preferences, which presents a serious philosophical challenge

to the very idea of provably beneficial AI.

For now, I focus on the simplest case: a world with one human and

one robot. This case serves to introduce the basic ideas, but it’s also

M 9780525558613_Human_TX.indd 189 8/7/19 11:21 PM

Not

e fa

, it is reas

t is reas

to constant to constan

will hawill ha

for

umptio

mpt

wise we m

se, the g

Distribution

amin

eed to mak

d to mak

stically strotically stro

too weak. Wweak. W

ptation to ptation to

ough—gh—

or eor e

fff

ode remainde remain

he AI he AI

ysteyst

modify crit

modify cri

ns th

190 HUMAN COMPATIBLE

useful in its own right: you can think of the human as standing in for all

of humanity and the robot as standing in for all machines. Additional

complications arise when considering multiple humans and machines.

Learning Preferences from Behavior

Economists elicit preferences from human subjects by offering them

choices.

This technique is widely used in product design, marketing,

and interactive e- commerce systems. For example, by offering test

subjects choices among cars with different paint colors, seating ar-

rangements, trunk sizes, battery capacities, cup holders, and so on, a

car designer learns how much people care about various car features

and how much they are willing to pay for them. Another important

application is in the medical domain, where an oncologist considering

a possible limb amputation might want to assess the patient’s prefer-

ences between mobility and life expectancy. And of course, pizza

restaurants want to know how much more someone is willing to pay

for sausage pizza than plain pizza.

Preference elicitation typically considers only single choices made

between objects whose value is assumed to be immediately apparent

to the subject. It’s not obvious how to extend it to preferences be-

tween future lives. For that, we (and machines) need to learn from

observations of behavior over time— behavior that involves multiple

choices and uncertain outcomes.

Early in 1997, I was involved in discussions with my colleagues

Michael Dickinson and Bob Full about ways in which we might be

able to apply ideas from machine learning to understand the locomo-

tive behavior of animals. Michael studied in exquisite detail the wing

motions of fruit flies. Bob was especially fond of creepy- crawlies and

had built a little treadmill for cockroaches to see how their gait

changed with speed. We thought it might be possible to use reinforce-

ment learning to train a robotic or simulated insect to reproduce these

9780525558613_Human_TX.indd 190 8/7/19 11:21 PM

Not

t. It’s not

It’s not

ure lives. Fure lives.

ns of bns of b

for

plain p

ation typica

ation typic

ose valu

Distribution

by of

colors, se

olors, se

p holders, aholders, a

about variouut vario

or them. Aor them. A

where an ohere an o

want to aswant to as

life expectlife expec

ow much

zza.

PROVABLY BENEFICIAL AI 191

complex behaviors. The problem we faced was that we didn’t know

what reward signal to use. What were the flies and cockroaches opti-

mizing? Without that information, we couldn’t apply reinforcement

learning to train the virtual insect, so we were stuck.

One day, I was walking down the road that leads from our house

in Berkeley to the local supermarket. The road has a downhill slope,

and I noticed, as I am sure most people have, that the slope induced

a slight change in the way I walked. Moreover, the uneven paving re-

sulting from decades of minor earthquakes induced additional gait

changes, including raising my feet a little higher and planting them

less stiffly because of the unpredictable ground level. As I pondered

these mundane observations, I realized we had got it backwards.

While reinforcement learning generates behavior from rewards, we

actually wanted the opposite: to learn the rewards given the behavior.

We already had the behavior, as produced by the flies and cockroaches;

we wanted to know the specific reward signal being optimized by this

behavior. In other words, we needed algorithms for inverse reinforce-

ment learning, or IRL.

(I did not know at the time that a similar

problem had been studied under the perhaps less wieldy name of

structural estimation of Markov decision processes, a field pioneered by

Nobel laureate Tom Sargent in the late 1970s.

) Such algorithms

would not only be able to explain animal behavior but also to predict

their behavior in new circumstances. For example, how would a cock-

roach run on a bumpy treadmill that sloped sideways?

The prospect of answering such fundamental questions was al-

most too exciting to bear, but even so it took some time to work out

the first algorithms for IRL.

Many different formulations and algo-

rithms for IRL have been proposed since then. There are formal guar-

antees that the algorithms work, in the sense that they can acquire

enough information about an entity’s preferences to be able to behave

just as successfully as the entity they are observing.

Perhaps the easiest way to understand IRL is this: the observer

starts with some vague estimate of the true reward function and then

M 9780525558613_Human_TX.indd 191 8/7/19 11:21 PM

Not

nly be abl

y be abl

havior in newhavior in ne

n on a bn on a b

for

udied

n of Markov

n of Marko

m Sarg

Distribution

d pla

evel. As I

el. As I

had got it had got it

ehavior frovior fro

he rewards e rewards

uced by theed by the

eward signaward sign

needed algneeded al

did not k

did not

nde

192 HUMAN COMPATIBLE

refines this estimate, making it more precise, as more behavior is ob-

served. Or, in Bayesian language:

start with a prior probability over

possible reward functions and then update the probability distribution

on reward functions as evidence arrives.

For example, suppose Robbie

the robot is watching Harriet the human and wondering how much

she prefers aisle seats to window seats. Initially, he is quite uncertain

about this. Conceptually, Robbie’s reasoning might go like this: “If

Harriet really cared about an aisle seat, she would have looked at the

seat map to see if one was available rather than just accepting the win-

dow seat that the airline gave her, but she didn’t, even though she

probably noticed it was a window seat and she probably wasn’t in a

hurry; so now it’s considerably more likely that she either is roughly

indifferent between window and aisle or even prefers a window seat.”

The most striking example of IRL in practice is the work of my

colleague Pieter Abbeel on learning to do helicopter aerobatics.

Ex-

pert human pilots can make model helicopters do amazing things—

loops, spirals, pendulum swings, and so on. Trying to copy what the

human does turns out not to work very well because conditions are not

perfectly reproducible: repeating the same control sequences in differ-

ent circumstances can lead to disaster. Instead, the algorithm learns

what the human pilot wants, in the form of trajectory constraints that

it can achieve. This approach actually produces results that are even

better than the human expert’s, because the human has slower reac-

tions and is constantly making small mistakes and correcting for them.

Assistance Games

IRL is already an important tool for building effective AI systems, but

it makes some simplifying assumptions. The first is that the robot is

going to adopt the reward function once it has learned it by observing

the human, so that it can perform the same task. This is fine for driv-

ing or helicopter piloting, but it’s not fine for drinking coffee: a robot

9780525558613_Human_TX.indd 192 8/7/19 11:21 PM

Not

pil

e. This ap

This ap

n the humn the hum

s constas consta

for

repeat

pea

an lead to

wantswants

Distribution

ven t

obably wa

bably wa

t she eithershe either

n prefers a refers a

n practice ipractice i

to do helicodo helic

del helicopthelicopt

gs, and so ogs, and so

work very

ng th

PROVABLY BENEFICIAL AI 193

observing my morning routine should learn that I (sometimes) want

coffee, but should not learn to want coffee itself. Fixing this issue is

easy— we simply ensure that the robot associates the preferences with

the human, not with itself.

The second simplifying assumption in IRL is that the robot is ob-

serving a human who is solving a single- agent decision problem. For

example, suppose the robot is in medical school, learning to be a sur-

geon by watching a human expert. IRL algorithms assume that the

human performs the surgery in the usual optimal way, as if the robot

were not there. But that’s not what would happen: the human surgeon

is motivated to have the robot (like any other medical student) learn

quickly and well, and so she will modify her behavior considerably.

She might explain what she is doing as she goes along; she might point

out mistakes to avoid, such as making the incision too deep or the

stitches too tight; she might describe the contingency plans in case

something goes wrong during surgery. None of these behaviors make

sense when performing surgery in isolation, so IRL algorithms will not

be able to interpret the preferences they imply. For this reason, we

will need to generalize IRL from the single- agent setting to the multi-

agent setting—that is, we will need to devise learning algorithms that

work when the human and robot are part of the same environment

and interacting with each other.

With a human and a robot in the same environment, we are in the

realm of game theory— just as in the penalty shoot- out between Alice

and Bob on page 28. We assume, in this first version of the theory,

that the human has preferences and acts according to those prefer-

ences. The robot doesn’t know what preferences the human has, but it

wants to satisfy them anyway. We’ll call any such situation an assis-

tance game, because the robot is, by definition, supposed to be helpful

to the human.

Assistance games instantiate the three principles from the preced-

ing chapter: the robot’s only objective is to satisfy human preferences,

it doesn’t initially know what they are, and it can learn more by

M 9780525558613_Human_TX.indd 193 8/7/19 11:21 PM

Not

ing with e

g with e

a human ana human a

game game

for

IRL f

is, we will

is, we wi

man an

Distribution

hum

dical stud

cal stud

behavior cbehavior c

goes along; along;

the incisiothe incisio

be the contthe con

rgery. Noneery. None

ry in isolatiory in isolat

eferences

om t

194 HUMAN COMPATIBLE

observing human behavior. Perhaps the most interesting property of

assistance games is that, by solving the game, the robot can work out

for itself how to interpret the human’s behavior as providing informa-

tion about human preferences.

The paperclip game

The first example of an assistance game is the paperclip game. It’s

a very simple game in which Harriet the human has an incentive to

“signal” to Robbie the robot some information about her preferences.

Robbie is able to interpret that signal because he can solve the game,

and therefore he can understand what would have to be true about

Harriet’s preferences in order for her to signal in that way.

The steps of the game are depicted in figure 12. It involves making

paperclips and staples. Harriet’s preferences are expressed by a payoff

function that depends on the number of paperclips and the number of

staples produced, with a certain “exchange rate” between the two. For

R RR

0 paperclips

2 staples

2 paperclips

0 staples

1 paperclip

1 staple

0 paperclips

90 sta

les

50 paperclips

50 sta

les

90 paperclips

0 sta

les

FIGURE 12: The paperclip game. Harriet the human can choose to make 2

SDSHUFOLSVVWDSOHVRURIHDFK5REELHWKHURERWWKHQKDVDFKRLFHWRPDNH

SDSHUFOLSVVWDSOHVRURIHDFK

9780525558613_Human_TX.indd 194 8/7/19 11:21 PM

Not

90 pape

0 st

for

ibution

her p

an solve th

solve th

have to be ave to be

al in that wn that w

Distrib

Dist

str

1 pape

1 st

PROVABLY BENEFICIAL AI 195

example, she might value paperclips at 45¢ and staples at 55¢ each.

(We’ll assume the two values always add up to $1.00; it’s only the

ratio that matters.) So, if 10 paperclips and 20 staples are produced,

Harriet’s payoff will be 10 × 45¢ + 20 × 55¢ = $15.50. Robbie the

robot is initially completely uncertain about Harriet’s preferences: he

has a uniform distribution for the value of a paperclip (that is, it’s

equally likely to be any value from 0¢ to $1.00). Harriet goes first and

can choose to make two paperclips, two staples, or one of each. Then

Robbie can choose to make 90 paperclips, 90 staples, or 50 of each.

Notice that if she were doing this by herself, Harriet would just

make two staples, with a value of $1.10. But Robbie is watching, and he

learns from her choice. What exactly does he learn? Well, that depends

on how Harriet makes her choice. How does Harriet make her choice?

That depends on how Robbie is going to interpret it. So, we seem to

have a circular problem! That’s typical in game- theoretic problems,

and that’s why Nash proposed the concept of equilibrium solutions.

To find an equilibrium solution, we need to identify strategies for

Harriet and Robbie such that neither has an incentive to change their

strategy, assuming the other remains fixed. A strategy for Harriet

specifies how many paperclips and staples to make, given her prefer-

ences; a strategy for Robbie specifies how many paperclips and staples

to make, given Harriet’s action.

It turns out there is only one equilibrium solution, and it looks

like this:

• Harriet decides as follows based on her value for paperclips:

• If the value is less than 44.6¢, make 0 paperclips and 2 staples.

• If the value is between 44.6¢ and 55.4¢, make 1 of each.

• If the value is more than 55.4¢, make 2 paperclips and 0

staples.

• Robbie responds as follows:

• If Harriet makes 0 paperclips and 2 staples, make 90 staples.

M 9780525558613_Human_TX.indd 195 8/7/19 11:21 PM

Not

y fo

en Harrie

Harrie

ns out thens out th

for

e othe

oth

y paperclip

r Robbie

Robbie

’

Distribution

rriet

e is watchin

watchin

arn? Well, trn? Well, t

s Harriet marriet m

o interpret o interpret

ical in al in

amam

e concept ooncept o

lution, we nlution, we

at neither

at neithe

rem

196 HUMAN COMPATIBLE

• If Harriet makes 1 of each, make 50 of each.

• If Harriet makes 2 paperclips and 0 staples, make 90

paperclips.

(In case you are wondering exactly how the solution is obtained, the

details are in the notes.

) With this strategy, Harriet is, in effect, teach-

ing Robbie about her preferences using a simple code— a language, if

you like— that emerges from the equilibrium analysis. As in the exam-

ple of surgical teaching, a single-agent IRL algorithm wouldn’t under-

stand this code. Note also that Robbie never learns Harriet’s preferences

exactly, but he learns enough to act optimally on her behalf— that is, he

acts just as he would if he did know her preferences exactly. He is prov-

ably beneficial to Harriet under the assumptions stated and under the

assumption that Harriet is playing the game correctly.

One can also construct problems where, like a good student, Rob-

bie will ask questions, and, like a good teacher, Harriet will show Rob-

bie the pitfalls to avoid. These behaviors occur not because we write

scripts for Harriet and Robbie to follow, but because they are the op-

timal solution to the assistance game in which Harriet and Robbie are

participants.

The off- switch game

An instrumental goal is one that is generally useful as a subgoal of

almost any original goal. Self-preservation is one of these instrumental

goals, because very few original goals are better achieved when dead.

This leads to the off- switch problem: a machine that has a fixed objec-

tive will not allow itself to be switched off and has an incentive to

disable its own off-switch.

The off- switch problem is really the core of the problem of control

for intelligent systems. If we cannot switch a machine off because it

won’t let us, we’re really in trouble. If we can, then we may be able to

control it in other ways too.

9780525558613_Human_TX.indd 196 8/7/19 11:21 PM

Not

ff-f-

switch g

trumentrumen

for

sistanc

tan

Distribution

et’s p

ehalf—half—

ces exactly. es exactly.

ions stated s stated

me correctlyme correctl

where, like here, like

ood teacherod teache

e behaviors behavior

ie to follow

ie to follo

gam

PROVABLY BENEFICIAL AI 197

It turns out that uncertainty about the objective is essential for

ensuring that we can switch the machine off— even when it’s more

intelligent than us. We saw the informal argument in the previous

chapter: by the first principle of beneficial machines, Robbie cares

only about Harriet’s preferences, but, by the second principle, he’s

unsure about what they are. He knows he doesn’t want to do the

wrong thing, but he doesn’t know what that means. Harriet, on the

other hand, does know (or so we assume, in this simple case). There-

fore, if she switches Robbie off it’s to avoid him doing something

wrong, so he’s happy to be switched off.

To make this argument more precise, we need a formal model of

the problem.

I’ll make it as simple as possible, but no simpler (see

figure 13).

FIGURE 7KHRIIVZLWFKJDPH5REELHFDQFKRRVHWRDFWQRZZLWKDKLJKO\

XQFHUWDLQSD\RIIWRFRPPLWVXLFLGHRUWRZDLWIRU+DUULHW+DUULHWFDQVZLWFK

5REELHRIIRUOHWKLPJRDKHDG5REELHQRZKDVWKHVDPHFKRLFHDJDLQ$FWLQJ

VWLOOKDVDQXQFHUWDLQSD\RIIWR+DUULHWEXWQRZ5REELHNQRZVWKHSD\RIILVQRW

negative.

switch self o

wait

act

go ahead

switch robot o

switch self o

U = 0

U = ?

−40 +60

M 9780525558613_Human_TX.indd 197 8/7/19 11:21 PM

Not

for

+60

bution

d a formal

a formal

le, but no e, but no

Distribu

act

= ?

ibu

Dist

Dis

198 HUMAN COMPATIBLE

Robbie, now working as Harriet’s personal assistant, has the first

choice. He can act now— let’s say he can book Harriet into an expen-

sive hotel. He’s quite unsure how much Harriet will like the hotel and

its price— let’s say he has a uniform probability for its net value to

Harriet between −40 and + 60, with an average of + 10. He could also

“switch himself off”— less melodramatically, take himself out of the

hotel booking process altogether— which we define to have value 0 to

Harriet. If those were his two choices, he would go ahead and book the

hotel, incurring a significant risk of making Harriet unhappy. (If the

range were −60 to + 40, with an average of −10, he’d switch himself

off.) We’ll give Robbie a third choice, however: explain his plan, wait,

and let Harriet switch him off. Harriet can either switch him off or let

him go ahead and book the hotel. What possible good could this do,

you may ask, given that he could make both of those choices himself?

The point is that Harriet’s choice— to switch Robbie off or let him

go ahead— provides Robbie with new information about Harriet’s

preferences. If Harriet lets Robbie go ahead, it’s because the value to

Harriet is positive. Now Robbie’s belief is uniform between 0 and 60,

with an average of 30.

So, if we evaluate Robbie’s initial choices from his point of view:

• Acting now and booking the hotel has an expected value

of + 10.

• Switching himself off has a value of 0.

• Waiting and letting Harriet switch him off (if she so desires)

leads to two possible outcomes:

• There is a 40 percent chance (based on Robbie’s uncertainty

about the hotel plan) that Harriet will hate it and will switch

Robbie off, with value 0.

• There’s a 60 percent chance Harriet will like it and allow Rob-

bie to go ahead, with expected value + 30.

• Thus, waiting has expected value 40% × 0 + 60% × 30 = + 18,

which is better than acting now at + 10.

9780525558613_Human_TX.indd 198 8/7/19 11:21 PM

Not

now and

ow and

10.0

tching htching

for

e Robbie’s i

e Robbie’s

Distribution

swit

lain his pl

in his pl

r switch himswitch hi

ssible good le good

oth of thoseth of those

—

o switch o switch

new inforew info

bbie go ahebbie go ah

bie’s belie

bie’s beli

PROVABLY BENEFICIAL AI 199

The upshot is that Robbie has a positive incentive to allow himself to be

switched off. This incentive comes directly from Robbie’s uncertainty

about Harriet’s preferences. Robbie is aware that there’s a chance

(40 percent in this example) that he might be about to do something

that will make Harriet unhappy, in which case being switched off

would be preferable to going ahead. Were Robbie already certain about

Harriet’s preferences, he would just go ahead and make the decision (or

switch himself off). There would be absolutely nothing to be gained

from consulting Harriet, because, according to Robbie’s definite be-

liefs, he can already predict exactly what she is going to decide.

In fact, it is possible to prove the same result in the general case: as

long as Robbie is not completely certain that he’s about to do what

Harriet herself would do, he will prefer to allow her to switch him

off.

Her decision provides Robbie with information, and information

is always useful for improving Robbie’s decisions. Conversely, if Rob-

bie is certain about Harriet’s decision, her decision provides no new

information, and so Robbie has no incentive to allow her to decide.

There are some obvious elaborations on the model that are worth

exploring immediately. The first elaboration is to impose a cost for

asking Harriet to make decisions or answer questions. (That is, we

assume Robbie knows at least this much about Harriet’s preferences:

her time is valuable.) In that case, Robbie is less inclined to bother

Harriet if he is nearly certain about her preferences; the larger the

cost, the more uncertain Robbie has to be before bothering Harriet.

This is as it should be. And if Harriet is really grumpy about being

interrupted, she shouldn’t be too surprised if Robbie occasionally does

things she doesn’t like.

The second elaboration is to allow for some probability of human

error— that is, Harriet might sometimes switch Robbie off even when

his proposed action is reasonable, and she might sometimes let Robbie

go ahead even when his proposed action is undesirable. We can put

this probability of human error into the mathematical model of the

assistance game and find the solution, as before. As one might expect,

M 9780525558613_Human_TX.indd 199 8/7/19 11:21 PM

Not

valuable.)

luable.)

f he is neaf he is ne

more umore u

for

y. The

make decis

make deci

ows at le

ws at le

Distribution

o de

the gener

he gener

he’s about he’s about

o allow herlow he

informatioinformatio

ie’s decisions decisio

cision, her ion, her

as no incentas no ince

elaborati

elaborat

first

200 HUMAN COMPATIBLE

the solution to the game shows that Robbie is less inclined to defer to

an irrational Harriet who sometimes acts against her own best inter-

ests. The more randomly she behaves, the more uncertain Robbie has

to be about her preferences before deferring to her. Again, this is as it

should be—for example, if Robbie is an autonomous car and Harriet is

his naughty two- year- old passenger, Robbie should not allow himself

to be switched off by Harriet in the middle of the freeway.

There are many more ways in which the model can be elaborated or

embedded into complex decision problems.

I am confident, however,

that the core idea— the essential connection between helpful, deferen-

tial behavior and machine uncertainty about human preferences— will

survive these elaborations and complications.

Learning preferences exactly in the long run

There is one important question that may have occurred to you in

reading about the off- switch game. (Actually, you probably have loads

of important questions, but I’m going to answer only this one.) What

happens as Robbie acquires more and more information about Harri-

et’s preferences, becoming less and less uncertain? Does that mean

he will eventually stop deferring to her altogether? This is a ticklish

question, and there are two possible answers: yes and yes.

The first yes is benign: as a general matter, as long as Robbie’s ini-

tial beliefs about Harriet’s preferences ascribe some probability, how-

ever small, to the preferences that she actually has, then as Robbie

becomes more and more certain, he will become more and more right.

That is, he will eventually be certain that Harriet has the preferences

that she does in fact have. For example, if Harriet values paperclips at

12¢ and staples at 88¢, Robbie will eventually learn these values. In

that case, Harriet doesn’t care whether Robbie defers to her, because

she knows he will always do exactly what she would have done in his

place. There will never be an occasion where Harriet wants to switch

Robbie off.

9780525558613_Human_TX.indd 200 8/7/19 11:21 PM

Not

ly s

there are

here are

yesyes

is be is b

about about

for

uires m

coming less

oming le

top defe

op defe

Distribution

elpfu

eferenc

in the lonn the lon

on that maythat may

ame. (Actuaame. (Actu

I’m going

ore a

PROVABLY BENEFICIAL AI 201

The second

yes

is less benign. If Robbie rules out, a priori, the true

preferences that Harriet has, he will never learn those true preferences,

but his beliefs may nonetheless converge to an incorrect assessment. In

other words, over time, he becomes more and more certain about a false

belief concerning Harriet’s preferences. Typically, that false belief will

be whichever hypothesis is closest to Harriet’s true preferences, out of

all the hypotheses that Robbie initially believes are possible. For exam-

ple, if Robbie is absolutely certain that Harriet’s value for paperclips lies

between 25¢ and 75¢, and Harriet’s true value is 12¢, then Robbie will

eventually become certain that she values paperclips at 25¢.

As he approaches certainty about Harriet’s preferences, Robbie

will resemble more and more the bad old AI systems with fixed objec-

tives: he won’t ask permission or give Harriet the option to turn him

off, and he has the wrong objective. This is hardly dire if it’s just paper-

clips versus staples, but it might be quality of life versus length of life

if Harriet is seriously ill, or population size versus resource consump-

tion if Robbie is supposedly acting on behalf of the human race.

We have a problem, then, if Robbie rules out in advance prefer-

ences that Harriet might in fact have: he may converge to a definite

but incorrect belief about her preferences. The solution to this prob-

lem seems obvious: don’t do it! Always allocate some probability,

however small, to preferences that are logically possible. For example,

it’s logically possible that Harriet actively wants to get rid of staples

and would pay you to take them away. (Perhaps as a child she stapled

her finger to the table, and now she cannot stand the sight of them.)

So, we should allow for negative exchange rates, which makes things a

bit more complicated but still perfectly manageable.

But what if Harriet values paperclips at 12¢ on weekdays and 80¢

on weekends? This new preference is not describable by any single

number, and so Robbie has, in effect, ruled it out in advance. It’s just

not in his set of possible hypotheses about Harriet’s preferences. More

generally, there might be many, many things besides paperclips and

staples that Harriet cares about. (Really!) Suppose, for example, that

M 9780525558613_Human_TX.indd 201 8/7/19 11:21 PM

Not

iou

all, to pre

, to pre

ally possibleally possib

d pay yd pay y

for

ght in

t in

f about her

about he

s: don’t

: don’t

Distribution

t 25¢

preference

eference

stems with tems with

iet the optithe opt

is hardly diis hardly d

quality of liality of l

lation size vtion size

cting on becting on b

en, if Rob

act h

act

202 HUMAN COMPATIBLE

Harriet is concerned about the climate, and suppose that Robbie’s ini-

tial belief allows for a whole laundry list of possible concerns including

sea level, global temperatures, rainfall, hurricanes, ozone, invasive

species, and deforestation. Then Robbie will observe Harriet’s behav-

ior and choices and gradually refine his theory of her preferences to

understand the weight she gives to each item on the list. But, just as in

the paperclip case, Robbie won’t learn about things that aren’t on the

laundry list. Let’s say that Harriet is also concerned about the color of

the sky— something I guarantee you will not find in typical lists of

stated concerns of climate scientists. If Robbie can do a slightly better

job of optimizing sea level, global temperatures, rainfall, and so forth

by turning the sky orange, he will not hesitate to do it.

There is, once again, a solution to this problem: don’t do it! Never

rule out in advance possible attributes of the world that could be part

of Harriet’s preference structure. That sounds fine, but actually mak-

ing it work in practice is more difficult than dealing with a single num-

ber for Harriet’s preferences. Robbie’s initial uncertainty has to allow

for an unbounded number of unknown attributes that might contrib-

ute to Harriet’s preferences. Then, when Harriet’s decisions are inex-

plicable in terms of the attributes Robbie knows about already, he can

infer that one or more previously unknown attributes (for example, the

color of the sky) may be playing a role, and he can try to work out what

those attributes might be. In this way, Robbie avoids the problems

caused by an overly restrictive prior belief. There are, as far as I know,

no working examples of Robbies of this kind, but the general idea is

encompassed within current thinking about machine learning.

Prohibitions and the loophole principle

Uncertainty about human objectives may not be the only way to

persuade a robot not to disable its off- switch while fetching the coffee.

The distinguished logician Moshe Vardi has proposed a simpler solu-

tion based on a prohibition:

instead of giving the robot the goal “fetch

9780525558613_Human_TX.indd 202 8/7/19 11:21 PM

Not

ky) may be

may b

ibutes mighibutes mig

an overan over

for

nces. T

es.

the attribut

he attribu

e previo

Distribution

a slig

infall, and

fall, and

o do it.o do it.

oblem: donem: don

the world tthe world

at sounds fisounds fi

cult than dult than d

Robbie’s initRobbie’s in

f unknow

hen,

PROVABLY BENEFICIAL AI 203

the coffee,” give it the goal “fetch the coffee while not disabling your

off- switch.” Unfortunately, a robot with such a goal will satisfy the let-

ter of the law while violating the spirit—for example by surrounding

the off- switch with a piranha- infested moat or simply zapping anyone

who comes near the switch. Writing such prohibitions in a foolproof

way is like trying to write loophole- free tax law— something we have

been trying and failing to do for thousands of years. A sufficiently in-

telligent entity with a strong incentive to avoid paying taxes is likely to

find a way to do it. Let’s call this the loophole principle: if a sufficiently

intelligent machine has an incentive to bring about some condition,

then it is generally going to be impossible for mere humans to write

prohibitions on its actions to prevent it from doing so or to prevent it

from doing something effectively equivalent.

The best solution for preventing tax avoidance is to make sure that

the entity in question wants to pay taxes. In the case of a potentially

misbehaving AI system, the best solution is to make sure it wants to

defer to humans.

5HTXHVWVDQG,QVWUXFWLRQV

The moral of the story so far is that we should avoid “putting a pur-

pose into the machine,” as Norbert Wiener put it. But suppose that

the robot does receive a direct human order, such as “Fetch me a cup

of coffee!” How should the robot understand this order?

Traditionally, it would become the robot’s goal. Any sequence of

actions that satisfies the goal— that leads to the human having a cup of

coffee— counts as a solution. Typically, the robot would also have a

way of ranking solutions, perhaps based on the time taken, the dis-

tance traveled, and the cost and quality of the coffee.

This is a very literal- minded way of interpreting the instruction. It

can lead to pathological behavior by the robot. For example, perhaps

Harriet the human has stopped at a gas station in the middle of the

M 9780525558613_Human_TX.indd 203 8/7/19 11:21 PM

Not

of the stor

the stor

o the macho the mach

t does ret does r

for

G,QVWUX

Distribution

som

re human

human

oing so or toing so or t

nt.

avoidance iavoidance i

taxes. In thxes. In th

t solution isolution is

204 HUMAN COMPATIBLE

desert; she sends Robbie the robot to fetch coffee, but the gas station

has none and Robbie trundles off at three miles per hour to the nearest

town, two hundred miles away, returning ten days later with the des-

iccated remains of a cup of coffee. Meanwhile, Harriet, waiting pa-

tiently, has been well supplied with iced tea and Coca- Cola by the gas

station owner.

Were Robbie human (or a well- designed robot) he would not inter-

pret Harriet’s command quite so literally. The command is not a goal

to be achieved at all costs. It is a way of conveying some information

about Harriet’s preferences with the intent of inducing some behavior

on the part of Robbie. The question is, what information?

One proposal is that Harriet prefers coffee to no coffee, all other

things being equal.

This means that if Robbie has a way to get coffee

without changing anything else about the world, then it’s a good idea

to do it even if he has no clue about Harriet’s preferences concerning other

aspects of the environment state. As we expect that machines will be

perennially uncertain about human preferences, it’s nice to know they

can still be useful despite this uncertainty. It seems likely that the

study of planning and decision making with partial and uncertain

preference information will become a central part of AI research and

product development.

On the other hand, all other things being equal means that no other

changes are allowed—for example, adding coffee while subtracting

money may or may not be a good idea if Robbie knows nothing about

Harriet’s relative preferences for coffee and money.

Fortunately, Harriet’s instruction probably means more than a

simple preference for coffee, all other things being equal. The extra

meaning comes not just from what she said but also from the fact that

she said it, the particular situation in which she said it, and the fact

that she didn’t say anything else. The branch of linguistics called prag-

matics studies exactly this extended notion of meaning. For example,

it wouldn’t make sense for Harriet to say, “Fetch me a cup of coffee!”

if Harriet believes there is no coffee available nearby or that it is

9780525558613_Human_TX.indd 204 8/7/19 11:21 PM

Not

men

her hand,

r hand,

re allowed—re allowed

ay or maay or ma

for

decisi

eci

ion will bec

on will b

Distribution

som

mation?tion?

to no coffeto no coffe

ie has a wahas a wa

e world, the world, th

rriet’s preferet’s prefe

As we expecwe expec

uman preferuman pref

his uncer

PROVABLY BENEFICIAL AI 205

exorbitantly expensive. Therefore, when Harriet says, “Fetch me a cup

of coffee!” Robbie infers not just that Harriet wants coffee but also

that Harriet believes there is coffee available nearby at a price she is

willing to pay. Thus, if Robbie finds coffee at a price that seems rea-

sonable (that is, a price that it would be reasonable for Harriet to ex-

pect to pay) he can go ahead and buy it. On the other hand, if Robbie

finds that the nearest coffee is two hundred miles away or costs

twenty- two dollars, it might be reasonable for him to report this fact

rather than pursue his quest blindly.

This general style of analysis is often called Gricean, after H. Paul

Grice, a Berkeley philosopher who proposed a set of maxims for infer-

ring the extended meaning of utterances like Harriet’s.

In the case of

preferences, the analysis can become quite complicated. For example,

it’s quite possible that Harriet doesn’t specifically want coffee; she

needs perking up, but is operating under the false belief that the gas

station has coffee, so she asks for coffee. She might be equally happy

with tea, Coca- Cola, or even some luridly packaged energy drink.

These are just a few of the considerations that arise when inter-

preting requests and commands. The variations on this theme are

endless because of the complexity of Harriet’s preferences, the huge

range of circumstances in which Harriet and Robbie might find them-

selves, and the different states of knowledge and belief that Harriet

and Robbie might occupy in those circumstances. While precomputed

scripts might allow Robbie to handle a few common cases, flexible and

robust behavior can emerge only from interactions between Harriet

and Robbie that are, in effect, solutions of the assistance game in

which they are engaged.

:LUHKHDGLQJ

In Chapter 2, I described the brain’s reward system, based on dopa-

mine, and its function in guiding behavior. The role of dopamine was

M 9780525558613_Human_TX.indd 205 8/7/19 11:21 PM

Not

sta

the differ

e differ

bie might ocbie might o

ight alloight allo

for

comm

the comple

the comp

nces in w

ces in w

Distribution

, a

of maxims

maxims

Harriet’s.arriet’s.

2121

InI

complicatemplicate

specificallyspecificall

under the fader the f

r coffee. Shoffee. Sh

some luridlsome luri

the consi

the cons

nds.

nds

206 HUMAN COMPATIBLE

discovered in the late 1950s, but even before that, by 1954, it was

known that direct electrical stimulation of the brain in rats could pro-

duce a reward- like response.

The next step was to give the rat access

to a lever, connected to a battery and a wire, that produced the elec-

trical stimulation in its own brain. The result was sobering: the rat

pressed the lever over and over again, never stopping to eat or drink,

until it collapsed.

Humans fare no better, self- stimulating thousands

of times and neglecting food and personal hygiene.

(Fortunately, ex-

periments with humans are usually terminated after one day.) The

tendency of animals to short- circuit normal behavior in favor of direct

stimulation of their own reward system is called wireheading.

Could something similar happen to machines that are running

reinforcement learning algorithms, such as AlphaGo? Initially, one

might think this is impossible, because the only way that AlphaGo

can gain its + 1 reward for winning is actually to win the simulated Go

games that it is playing. Unfortunately, this is true only because of an

enforced and artificial separation between AlphaGo and its external

environment and the fact that AlphaGo is not very intelligent. Let

me explain these two points in more detail, because they are impor-

tant for understanding some of the ways that superintelligence can

go wrong.

AlphaGo’s world consists only of the simulated Go board, com-

posed of 361 locations that can be empty or contain a black or white

stone. Although AlphaGo runs on a computer, it knows nothing of

this computer. In particular, it knows nothing of the small section of

code that computes whether it has won or lost each game; nor, during

the learning process, does it have any idea about its opponent, which

is actually a version of itself. AlphaGo’s only actions are to place a

stone on an empty location, and these actions affect only the Go board

and nothing else— because there is nothing else in AlphaGo’s model

of the world. This setup corresponds to the abstract mathematical

model of reinforcement learning, in which the reward signal arrives

from outside the universe. Nothing AlphaGo can do, as far as it knows,

9780525558613_Human_TX.indd 206 8/7/19 11:21 PM

Not

s world c

world c

361 location361 locatio

hough Ahough

for

points

nts

ing some o

ing some

Distribution

fav

ireheadingheading

ines that anes that a

AlphaGo?phaGo

the only wthe only w

actually to wually to

ately, this isely, this i

ion betweeion betwe

hat Alpha

hat Alph

nmo

PROVABLY BENEFICIAL AI 207

has any effect on the code that generates the reward signal, so

Alpha Go

cannot indulge in wireheading.

Life for AlphaGo during the training period must be quite frus-

trating: the better it gets, the better its opponent gets— because its

opponent is a near- exact copy of itself. Its win percentage hovers

around 50 percent, no matter how good it becomes. If it were more

intelligent— if it had a design closer to what one might expect of a

human- level AI system— it would be able to fix this problem. This

AlphaGo++ would not assume that the world is just the Go board,

because that hypothesis leaves a lot of things unexplained. For exam-

ple, it doesn’t explain what “physics” is supporting the operation of

AlphaGo++’s own decisions or where the mysterious “opponent

moves” are coming from. Just as we curious humans have gradually

come to understand the workings of our cosmos, in a way that (to

some extent) also explains the workings of our own minds, and just

like the Oracle AI discussed in Chapter 6, AlphaGo++ will, by a pro-

cess of experimentation, learn that there is more to the universe than

the Go board. It will work out the laws of operation of the computer

it runs on and of its own code, and it will realize that such a system

cannot easily be explained without the existence of other entities in

the universe. It will experiment with different patterns of stones

on the board, wondering if those entities can interpret them. It will

eventually communicate with those entities through a language of

patterns and persuade them to reprogram its reward signal so that

it always gets + 1. The inevitable conclusion is that a sufficiently capa-

ble AlphaGo++ that is designed as a reward- signal maximizer will

wirehead.

The AI safety community has discussed wireheading as a possibil-

ity for several years.

The concern is not just that a reinforcement

learning system such as AlphaGo might learn to cheat instead of

mastering its intended task. The real issue arises when humans are

the source of the reward signal. If we propose that an AI system

can be trained to behave well through reinforcement learning, with

M 9780525558613_Human_TX.indd 207 8/7/19 11:21 PM

Not

t w

d, wonder

wonde

ly communly commu

and perand pe

for

wn cod

n c

xplained wi

plained w

ill expe

ll expe

Distribution

ined

ng the op

the op

mysteriousmysterious

us humanshuman

ur cosmos,ur cosmos

kings of ourngs of ou

Chapter 6, Aapter 6, A

n that theren that the

out the la

e, an

208 HUMAN COMPATIBLE

humans giving feedback signals that define the direction of improve-

ment, the inevitable result is that the AI system works out how to

control the humans and forces them to give maximal positive rewards

at all times.

You might think that this would just be a form of pointless self-

delusion on the part of the AI system, and you’d be right. But it’s a

logical consequence of the way reinforcement learning is defined. The

process works fine when the reward signal comes from “outside the

universe” and is generated by some process that can never be modified

by the AI system; but it fails if the reward- generating process (that is,

the human) and the AI system inhabit the same universe.

How can we avoid this kind of self- delusion? The problem comes

from confusing two distinct things: reward signals and actual rewards.

In the standard approach to reinforcement learning, these are one and

the same. That seems to be a mistake. Instead, they should be treated

separately, just as they are in assistance games: reward signals provide

information about the accumulation of actual reward, which is the

thing to be maximized. The learning system is accumulating brownie

points in heaven, so to speak, while the reward signal is, at best, just

providing a tally of those brownie points. In other words, the reward

signal reports on (rather than constitutes) reward accumulation. With

this model, it’s clear that taking over control of the reward- signal

mechanism simply loses information. Producing fictitious reward sig-

nals makes it impossible for the algorithm to learn about whether its

actions are actually accumulating brownie points in heaven, and so a

rational learner designed to make this distinction has an incentive to

avoid any kind of wireheading.

Recursive Self- Improvement

I. J. Good’s prediction of an intelligence explosion (see page 142) is

one of the driving forces that have led to current concerns about the

9780525558613_Human_TX.indd 208 8/7/19 11:21 PM

Not

(ra

t’s clear

clear

m simply lom simply l

s it imps it imp

for

speak,

eak

those brow

hose brow

ther tha

her tha

Distribution

proce

iverse.erse.

? The prob? The prob

ignals and aals and

nt learning,t learning,

. Instead, thnstead, t

tance gamence game

ulation of aulation of

learning s

learning

whil

whi

PROVABLY BENEFICIAL AI 209

potential risks of superintelligent AI. If humans can design a machine

that is a bit more intelligent than humans, then— the argument goes—

that machine will be a bit better than humans at designing machines.

It will design a new machine that is still more intelligent, and the

process will repeat itself until, in Good’s words, “the intelligence of

man would be left far behind.”

Researchers in AI safety, particularly at the Machine Intelligence

Research Institute in Berkeley, have studied the question of whether

intelligence explosions can occur safely.

Initially, this might seem

quixotic— wouldn’t it just be “game over”?— but there is, perhaps,

hope. Suppose the first machine in the series, Robbie Mark I, starts

with perfect knowledge of Harriet’s preferences. Knowing that his

cognitive limitations lead to imperfections in his attempts to make

Harriet happy, he builds Robbie Mark II. Intuitively, it seems that

Robbie Mark I has an incentive to build his knowledge of Harriet’s

preferences into Robbie Mark II, since that leads to a future where

Harriet’s preferences are better satisfied— which is precisely Robbie

Mark I’s purpose in life according to the first principle. By the same

argument, if Robbie Mark I is uncertain about Harriet’s preferences,

that uncertainty should be transferred to Robbie Mark II. So perhaps

explosions are safe after all.

The fly in the ointment, from a mathematical viewpoint, is that

Robbie Mark I will not find it easy to reason about how Robbie Mark II

is going to behave, given that Robbie Mark II is, by assumption, a more

advanced version. There will be questions about Robbie Mark II’s be-

havior that Robbie Mark I cannot answer.

More serious still, we do

not yet have a clear mathematical definition of what it means in reality

for a machine to have a particular purpose, such as the purpose of

satisfying Harriet’s preferences.

Let’s unpack this last concern a bit. Consider AlphaGo: What pur-

pose does it have? That’s easy, one might think: AlphaGo has the pur-

pose of winning at Go. Or does it? It’s certainly not the case that

AlphaGo always makes moves that are guaranteed to win. (In fact, it

M 9780525558613_Human_TX.indd 209 8/7/19 11:21 PM

Not

afe

n the oin

the oin

Mark I will nMark I will

o behavo behav

for

Mark I

ould be tra

ould be tr

after all

Distribution

here

obbie Mar

bie Mar

nces. Knowces. Know

s in his atthis at

k II. IntuitiII. Intuiti

build his kuild his k

, since thatsince tha

ter ter

atisfiedatisfie

cording to

sun

210 HUMAN COMPATIBLE

nearly always loses to AlphaZero.) It’s true that when it’s only a few

moves from the end of the game, AlphaGo will pick the winning move

if there is one. On the other hand, when no move is guaranteed to

win— in other words, when AlphaGo sees that the opponent has a

winning strategy no matter what AlphaGo does— then AlphaGo will

pick moves more or less at random. It won’t try the trickiest move in

the hope that the opponent will make a mistake, because it assumes

that its opponent will play perfectly. It acts as if it has lost the will to

win. In other cases, when the truly optimal move is too hard to calcu-

late, AlphaGo will sometimes make mistakes that lead to losing the

game. In those instances, in what sense is it true that AlphaGo actu-

ally wants to win? Indeed, its behavior might be identical to that of a

machine that just wants to give its opponent a really exciting game.

So, saying that AlphaGo “has the purpose of winning” is an over-

simplification. A better description would be that AlphaGo is the re-

sult of an imperfect training process— reinforcement learning with

self- play— for which winning was the reward. The training process is

imperfect in the sense that it cannot produce a perfect Go player:

AlphaGo learns an evaluation function for Go positions that is good

but not perfect, and it combines that with a lookahead search that is

good but not perfect.

The upshot of all this is that discussions beginning with “suppose

that robot R has purpose P” are fine for gaining some intuition about

how things might unfold, but they cannot lead to theorems about real

machines. We need much more nuanced and precise definitions of

purposes in machines before we can obtain guarantees of how they

will behave over the long term. AI researchers are only just beginning

to get a handle on how to analyze even the simplest kinds of real

decision- making systems,

let alone machines intelligent enough to

design their own successors. We have work to do.

9780525558613_Human_TX.indd 210 8/7/19 11:21 PM

Not

fec

ot of all th

of all t

has purphas pur

s mights might

for

luation

atio

d it combine

it combin

Distribution

ad to

hat AlphaG

t AlphaG

e identical tidentical t

t a really exeally ex

rpose of wipose of wi

would be thuld be th

cess—s—

einfeinf

was the rewwas the re

it cannot

it canno

func

fun

COMPLICATIONS: US

f the world contained one perfectly rational Harriet and one helpful

and deferential Robbie, we’d be in good shape. Robbie would grad-

ually learn Harriet’s preferences as unobtrusively as possible and

would become her perfect helper. We might hope to extrapolate from

this promising beginning, perhaps viewing Harriet and Robbie’s rela-

tionship as a model for the relationship between the human race and

its machines, each construed monolithically.

Alas, the human race is not a single, rational entity. It is composed

of nasty, envy- driven, irrational, inconsistent, unstable, computation-

ally limited, complex, evolving, heterogeneous entities. Loads and

loads of them. These issues are the staple diet—perhaps even the rai-

sons d’ être— of the social sciences. To AI we will need to add ideas

from psychology, economics, political theory, and moral philosophy.

We need to melt, re- form, and hammer those ideas into a structure

that will be strong enough to resist the enormous strain that increas-

ingly intelligent AI systems will place on it. Work on this task has

barely started.

M 9780525558613_Human_TX.indd 211 8/7/19 11:21 PM

Not

human r

uman r

nvy-vy-

riverive

ted, comted, co

for

ing, pe

g, p

l for the rel

for the r

construe

onstru

Distribution

y rational Hational H

e in good shn good s

rences as uences as

helper. We

helper. W

hap

212 HUMAN COMPATIBLE

Different Humans

I will start with what is probably the easiest of the issues: the fact that

humans are heterogeneous. When first exposed to the idea that ma-

chines should learn to satisfy human preferences, people often object

that different cultures, even different individuals, have widely differ-

ent value systems, so there cannot be one correct value system for the

machine. But of course, that’s not a problem for the machine: we don’t

want it to have one correct value system of its own; we just want it to

predict the preferences of others.

The confusion about machines having difficulty with heteroge-

neous human preferences may come from the mistaken idea that the

machine is adopting

the preferences it learns— for example, the idea

that a domestic robot in a vegetarian household is going to adopt veg-

etarian preferences. It won’t. It just needs to learn to predict what the

dietary preferences of vegetarians are. By the first principle, it will

then avoid cooking meat for that household. But the robot also learns

about the dietary preferences of the rabid carnivores next door, and,

with its owner’s permission, will happily cook meat for them if they

borrow it for the weekend to help out with a dinner party. The robot

doesn’t have a single set of preferences of its own, beyond the prefer-

ence for helping humans achieve their preferences.

In a sense, this is no different from a restaurant chef who learns to

cook several different dishes to please the varied palates of her clients,

or the multinational car company that makes left- hand- drive cars for

the US market and right- hand- drive cars for the UK market.

In principle, a machine could learn eight billion preference mod-

els, one for each person on Earth. In practice, this isn’t as hopeless as

it sounds. For one thing, it’s easy for machines to share what they learn

with each other. For another, the preference structures of humans

have a great deal in common, so the machine will usually not be learn-

ing each model from scratch.

9780525558613_Human_TX.indd 212 8/7/19 11:21 PM

Not

a single se

ingle se

elping humelping hum

nse, thisnse, thi

for

rences

nce

mission, w

ekend t

Distribution

e just

iculty withculty with

he mistakenmistake

arns—arns—

or eor

fff

household usehold

st needs to lneeds to

ians are. Bians are.

that hous

of th

COMPLICATIONS: US 213

Imagine, for example, the domestic robots that may one day be

purchased by the inhabitants of Berkeley, California. The robots come

out of the box with a fairly broad prior belief, perhaps tailored for the

US market but not for any particular city, political viewpoint, or so-

cioeconomic class. The robots begin to encounter members of the

Berkeley Green Party, who turn out, compared to the average Ameri-

can, to have a much higher probability of being vegetarian, of using

recycling and composting bins, of using public transportation when-

ever possible, and so on. Whenever a newly commissioned robot finds

itself in a Green household, it can immediately adjust its expectations

accordingly. It does not need to begin learning about these particular

humans as if it had never seen a human, let alone a Green Party mem-

ber, before. This adjustment is not irreversible— there may be Green

Party members in Berkeley who feast on endangered whale meat and

drive gas- guzzling monster trucks— but it allows the robot to be more

useful more quickly. The same argument applies to a vast range of

other personal characteristics that are, to some degree, predictive of

aspects of an individual’s preference structures.

Many Humans

The other obvious consequence of the existence of more than one

human being is the need for machines to make trade- offs among the

preferences of different people. The issue of trade- offs among humans

has been the main focus of large parts of the social sciences for centu-

ries. It would be naïve for AI researchers to expect that they can sim-

ply alight on the correct solutions without understanding what is

already known. The literature on the topic is, alas, vast and I cannot

possibly do justice to it here— not just because there isn’t space but

also because I haven’t read most of it. I should also point out that al-

most all the literature is concerned with decisions made by humans,

whereas I am concerned here with decisions made by machines. This

M 9780525558613_Human_TX.indd 213 8/7/19 11:21 PM

man

er obvious er obvious

eing is teing is

for

Distribution

its e

out these p

t these

ne a Green e a Green

ble——

hereher

n endangern endanger

ut it allowt it allow

argument agument

s that are, t that are,

reference

214 HUMAN COMPATIBLE

makes all the difference in the world, because humans have individual

rights that may conflict with any supposed obligation to act on behalf

of others, whereas machines do not. For example, we do not expect or

require typical humans to sacrifice their lives to save others, whereas

we will certainly require robots to sacrifice their existence to save the

lives of humans.

Several thousand years of work by philosophers, economists, legal

scholars, and political scientists have produced constitutions, laws,

economic systems, and social norms that serve to help (or hinder, de-

pending on who’s in charge) the process of reaching satisfactory solu-

tions to the problem of trade- offs. Moral philosophers in particular

have been analyzing the notion of rightness of actions in terms of their

effects, beneficial or otherwise, on other people. They have studied

quantitative models of trade- offs since the eighteenth century under

the heading of utilitarianism. This work is directly relevant to our

present concerns, because it attempts to define a formula by which

moral decisions can be made on behalf of many individuals.

The need to make trade- offs arises even if everyone has the same

preference structure, because it’s usually impossible to maximally

satisfy everyone’s preferences. For example, if everyone wants to be

All- Powerful Ruler of the Universe, most people are going to be

disappointed. On the other hand, heterogeneity does make some

problems more difficult: if everyone is happy with the sky being blue,

the robot that handles atmospheric matters can work on keeping it

that way; but if many people are agitating for a color change, the robot

will need to think about possible compromises such as an orange sky

on the third Friday of each month.

The presence of more than one person in the world has another

important consequence: it means that, for each person, there are other

people to care about. This means that satisfying the preferences of an

individual has implications for other people, depending on the indi-

vidual’s preferences about the well- being of others.

9780525558613_Human_TX.indd 214 8/7/19 11:21 PM

Not

uler

On the

more difficmore diffi

that hathat ha

for

becaus

references.

eferences

of the

Distribution

tisfa

phers in p

ers in p

ctions in tertions in ter

eople. Theyle. The

he eighteenhe eighteen

work is direrk is dir

mpts to defpts to def

n behalf of n behalf o

ffs arisesffs arise

it’s

COMPLICATIONS: US 215

Loyal AI

Let’s begin with a very simple proposal for how machines should

deal with the presence of multiple humans: they should ignore it. That

is, if Harriet owns Robbie, then Robbie should pay attention only to

Harriet’s preferences. This loyal

form of AI bypasses the issue of trade-

offs, but it leads to problems:

ROBBIE: Your husband called to remind you about dinner tonight.

HARRIET: Wait! What? What dinner?

ROBBIE: For your twentieth anniversary, at seven.

HARRIET: I can’t! I’m meeting the secretary- general at seven thirty!

How did this happen?

ROBBIE: I did warn you, but you overrode my recommendation....

HARRIET: OK, sorry— but what am I going to do now? I can’t just tell

the SG I’m too busy!

ROBBIE: Don’t worry. I arranged for her plane to be delayed— some

kind of computer malfunction.

HARRIET: Really? You can do that?!

ROBBIE: The secretary- general sends her profound apologies and is

happy to meet you for lunch tomorrow.

Here, Robbie has found an ingenious solution to Harriet’s problem,

but his actions have had a negative impact on other people. If Harriet

is a morally scrupulous and altruistic person, then Robbie, who aims

to satisfy Harriet’s preferences, will never dream of carrying out such

a dubious scheme. But what if Harriet doesn’t give a fig for the prefer-

ences of others? In that case, Robbie won’t mind delaying planes. And

might he not spend his time pilfering money from online bank ac-

counts to swell indifferent Harriet’s coffers, or worse?

Obviously, the actions of loyal machines will need to be con-

strained by rules and prohibitions, just as the actions of humans are

M 9780525558613_Human_TX.indd 215 8/7/19 11:21 PM

Not

o m

obbie has foobbie has f

ctions hctions h

for

ou can

cretary-

ene

eet you

Distribution

eneral at seneral at

ode my ode my

ecoeco

m I going to going to

anged for hanged for

alfunction

do th

216 HUMAN COMPATIBLE

constrained by laws and social norms. Some have proposed strict lia-

bility as a solution:

Harriet (or Robbie’s manufacturer, depending on

where you prefer to place the liability) is financially and legally re-

sponsible for any act carried out by Robbie, just as a dog’s owner is li-

able in most states if the dog bites a small child in a public park. This

idea sounds promising because Robbie would then have an incentive

to avoid doing anything that would land Harriet in trouble. Unfortu-

nately, strict liability doesn’t work: it simply ensures that Robbie will

act undetectably when he delays planes and steals money on Harriet’s

behalf. This is another example of the loophole principle in operation.

If Robbie is loyal to an unscrupulous Harriet, attempts to contain his

behavior with rules will probably fail.

Even if we can somehow prevent the outright crimes, a loyal Rob-

bie working for an indifferent Harriet will exhibit other unpleasant

behaviors. If he is buying groceries at the supermarket, he will cut in

line at the checkout whenever possible. If he is bringing the groceries

home and a passerby suffers a heart attack, he will carry on regardless,

lest Harriet’s ice cream melt. In summary, he will find innumerable

ways to benefit Harriet at the expense of others— ways that are strictly

legal but become intolerable when carried out on a large scale. Socie-

ties will find themselves passing hundreds of new laws every day to

counteract all the loopholes that machines will find in existing laws.

Humans tend not to take advantage of these loopholes, either because

they have a general understanding of the underlying moral principles

or because they lack the ingenuity required to find the loopholes in

the first place.

A Harriet who is indifferent to the well- being of others is bad

enough. A sadistic Harriet who actively prefers the suffering of others is

far worse. A Robbie designed to satisfy the preferences of such a Harriet

would be a serious problem, because he would look for— and find—

ways to harm others for Harriet’s pleasure, either legally or illegally but

undetectably. He would of course need to report back to Harriet so she

could derive enjoyment from the knowledge of his evil deeds.

9780525558613_Human_TX.indd 216 8/7/19 11:21 PM

Not

ems

l the loop

he loop

end not to tend not to

a genera gener

for

at the e

the

tolerable wh

olerable w

elves pa

Distribution

le in

mpts to co

ts to co

right crimeht crim

will exhibit ill exhibit

the supermhe super

ssible. If he ble. If he

heart attackheart attac

t. In summ

t. In sum

xpen

COMPLICATIONS: US 217

It seems difficult, then, to make the idea of a loyal AI work, unless

the idea is extended to include consideration of the preferences of

other humans, in addition to the preferences of the owner.

Utilitarian AI

The reason we have moral philosophy is that there is more than

one person on Earth. The approach that is most relevant for under-

standing how AI systems should be designed is often called consequen-

tialism: the idea that choices should be judged according to expected

consequences. The two other principal approaches are deontological

ethics and virtue ethics, which are, very roughly, concerned with the

moral character of actions and individuals, respectively, quite apart

from the consequences of choices.

Absent any evidence of self-

awareness on the part of machines, I think it makes little sense to

build machines that are virtuous or that choose actions in accordance

with moral rules if the consequences are highly undesirable for hu-

manity. Put another way, we build machines to bring about conse-

quences, and we should prefer to build machines that bring about

consequences that we prefer.

This is not to say that moral rules and virtues are irrelevant; it’s

just that, for the utilitarian, they are justified in terms of consequences

and the more practical achievement of those consequences. This point

is made by John Stuart Mill in Utilitarianism:

The proposition that happiness is the end and aim of morality

doesn’t mean that no road ought to be laid down to that goal, or

that people going to it shouldn’t be advised to take one direction

rather than another.... Nobody argues that the art of navigation

is not based on astronomy because sailors can’t wait to calculate

the Nautical Almanack. Because they are rational creatures, sail-

ors go to sea with the calculations already done; and all rational

creatures go out on the sea of life with their minds made up on the

M 9780525558613_Human_TX.indd 217 8/7/19 11:21 PM

Not

to s

the utilit

he utilit

more practicmore pract

by John by John

for

uld pre

we prefer.

ay that

Distribution

ding

hes are are

deo

ly, concerny, concern

s, respectivspectiv

Absent anybsent any

, I think itthink it

or that chothat cho

equences arequences

we build

fer t

218 HUMAN COMPATIBLE

common questions of right and wrong, as well as on many of the

much harder questions of wise and foolish.

This view is entirely consistent with the idea that a finite machine

facing the immense complexity of the real world may produce better

consequences by following moral rules and adopting a virtuous atti-

tude rather than trying to calculate the optimal course of action from

scratch. In the same way, a chess program achieves checkmate more

often using a catalog of standard opening move sequences, endgame

algorithms, and an evaluation function, rather than trying to reason its

way to checkmate with no “moral” guideposts. A consequentialist ap-

proach also gives some weight to the preferences of those who believe

strongly in preserving a given deontological rule, because unhappiness

that a rule has been broken is a real consequence. However, it is not a

consequence of infinite weight.

Consequentialism is a difficult principle to argue against—

although many have tried!—because it’s incoherent to object to

consequentialism on the grounds that it would have undesirable con-

sequences. One cannot say, “But if you follow the consequentialist

approach in such- and- such case, then this really terrible thing will

happen!” Any such failings would simply be evidence that the theory

had been misapplied.

For example, suppose Harriet wants to climb Everest. One might

worry that a consequentialist Robbie would simply pick her up and

deposit her on top of Everest, since that is her desired consequence. In

all probability Harriet would strenuously object to this plan, because

it would deprive her of the challenge and therefore of the exultation

that results from succeeding in a difficult task through one’s own

efforts. Now, obviously, a properly designed consequentialist Robbie

would understand that the consequences include all of Harriet’s expe-

riences, not just the end goal. He might want to be available in case of

an accident and to make sure she was properly equipped and trained,

9780525558613_Human_TX.indd 218 8/7/19 11:21 PM

Not

applied.plied.

ample, suppample, sup

t a const a con

for

t say, “

ay,

nd-nd

uch ca

uch c

failings w

ailings

Distribution

ing to

onsequent

sequent

s of those wof those w

ule, becaus becau

equence. Hquence. H

cult principlt princi

ecause itecause

unds that

But

COMPLICATIONS: US 219

but he might also have to accept Harriet’s right to expose herself to an

appreciable risk of death.

If we plan to build consequentialist machines, the next question is

how to evaluate consequences that affect multiple people. One plau-

sible answer is to give equal weight to everyone’s preferences— in

other words, to maximize the sum of everyone’s utilities. This answer

is usually attributed to the eighteenth- century British philosopher

Jeremy Bentham

and his pupil John Stuart Mill,

who developed the

philosophical approach of utilitarianism. The underlying idea can be

traced to the works of the ancient Greek philosopher Epicurus and

appears explicitly in Mozi, a book of writings attributed to the Chi-

nese philosopher of the same name. Mozi was active at the end of

the fifth century BCE and promoted the idea of jian ai, variously

translated as “inclusive care” or “universal love,” as the defining char-

acteristic of moral actions.

Utilitarianism has something of a bad name, partly because of sim-

ple misunderstandings about what it advocates. (It certainly doesn’t

help that the word utilitarian means “designed to be useful or

practical rather than attractive.”) Utilitarianism is often thought to be

incompatible with individual rights, because a utilitarian would, sup-

posedly, think nothing of removing a living person’s organs without

permission to save the lives of five others; of course, such a policy

would render life intolerably insecure for everyone on Earth, so a util-

itarian wouldn’t even consider it. Utilitarianism is also incorrectly

identified with a rather unattractive maximization of total wealth and

is thought to give little weight to poetry or suffering. In fact, Ben-

tham’s version focused specifically on human happiness, while Mill

confidently asserted the far greater value of intellectual pleasures over

mere sensations. (“It is better to be a human being dissatisfied than a

pig satisfied.”) The ideal utilitarianism of G. E. Moore went even fur-

ther: he advocated the maximization of mental states of intrinsic

worth, epitomized by the aesthetic contemplation of beauty.

M 9780525558613_Human_TX.indd 219 8/7/19 11:21 PM

Not

not

to save th

save th

nder life intnder life in

wouldn’twouldn’t

for

ttractiv

act

individual r

ndividual

hing of

ing of

Distribution

er E

ributed to

buted to

as active ats active at

idea of ea of

jiaji

sal love,” asal love,” as

of a bad nama bad na

what it adwhat it a

arian

me me

e.”) U

e.”)

220 HUMAN COMPATIBLE

I think there is no need for utilitarian philosophers to stipulate the

ideal content of human utility or human preferences. (And even less

reason for AI researchers to do so.) Humans can do that for them-

selves. The economist John Harsanyi propounded this view with his

principle of preference autonomy:

In deciding what is good and what is bad for a given individual, the

ultimate criterion can only be his own wants and his own

preferences.

Harsanyi’s preference utilitarianism is therefore roughly consistent

with the first principle of beneficial AI, which says that a machine’s

only purpose is the realization of human preferences. AI researchers

should definitely not be in the business of deciding what human pref-

erences should be! Like Bentham, Harsanyi views such principles as a

guide for public decisions; he does not expect individuals to be so self-

less. Nor does he expect individuals to be perfectly rational— for

example, they might have short- term desires that contradict their

“deeper preferences.” Finally, he proposes to ignore the preferences of

those who, like the sadistic Harriet mentioned earlier, actively wish to

reduce the well- being of others.

Harsanyi also gives a kind of proof that optimal moral decisions

should maximize the average utility across a population of humans.

He assumes fairly weak postulates similar to those that underlie util-

ity theory for individuals. (The primary additional postulate is that if

everyone in a population is indifferent between two outcomes, then

an agent acting on behalf of the population should be indifferent be-

tween those outcomes.) From these postulates, he proves what be-

came known as the social aggregation theorem: an agent acting on behalf

of a population of individuals must maximize a weighted linear com-

bination of the utilities of the individuals. He further argues that an

“impersonal” agent should use equal weights.

The theorem requires one crucial additional (and unstated) as-

9780525558613_Human_TX.indd 220 8/7/19 11:21 PM

Not

ein

also gives

so gives

aximize theaximize th

es fairlyes fairly

for

nally,

lly

adistic Har

adistic Ha

ng of oth

g of oth

Distribution

roughly co

ughly co

says that asays that a

eferences. Arences.

f deciding wf deciding

rsanyi viewanyi view

not expectot expect

dividuals todividuals t

hort-hort-

ermerm

epr

COMPLICATIONS: US 221

sumption: each individual has the same prior factual beliefs about the

world and how it will evolve. Now, any parent knows that this isn’t

even true for siblings, let alone individuals from different social back-

grounds and cultures. So, what happens when individuals differ in

their beliefs? Something rather strange:

the weight assigned to each

individual’s utility has to change over time, in proportion to how well

that individual’s prior beliefs accord with unfolding reality.

This rather inegalitarian- sounding formula is quite familiar to any

parent. Let’s say that Robbie the robot has been tasked with looking

after two children, Alice and Bob. Alice wants to go to the movies and

is sure it’s going to rain today; Bob, on the other hand, wants to go to

the beach and is sure it’s going to be sunny. Robbie could announce,

“We’re going to the movies,” making Bob unhappy; or he could an-

nounce, “We’re going to the beach,” making Alice unhappy; or he

could announce, “If it rains, we’re going to the movies, but if it’s sunny,

we’ll go to the beach.” This last plan makes both Alice and Bob happy,

because both believe in their own beliefs.

Challenges to utilitarianism

Utilitarianism is one proposal to emerge from humanity’s long-

standing search for a moral guide; among many such proposals, it is

the most clearly specified— and therefore the most susceptible to

loopholes. Philosophers have been finding these loopholes for more

than a hundred years. For example, G. E. Moore, objecting to Ben-

tham’s emphasis on maximizing pleasure, imagined a “world in which

absolutely nothing except pleasure existed— no knowledge, no love,

no enjoyment of beauty, no moral qualities.”

This finds its modern

echo in Stuart Armstrong’s point that superintelligent machines

tasked with maximizing pleasure might “entomb everyone in concrete

coffins on heroin drips.”

Another example: in 1945, Karl Popper

proposed the laudable goal of minimizing human suffering,

arguing

that it was immoral to trade one person’s pain for another person’s

M 9780525558613_Human_TX.indd 221 8/7/19 11:21 PM

Not

rch for a

h for a

t clearly t clearly

s. Philos. Philo

for

utilitari

tilita

s one p

Distribution

o the

hand, want

nd, want

Robbie couldobbie could

unhappy; ohappy;

making Alicmaking Ali

ing to the mg to the m

lan makes bn makes b

own beliefswn belie

222 HUMAN COMPATIBLE

pleasure; R. N. Smart responded that this could best be achieved by

rendering the human race extinct.

Nowadays, the idea that a ma-

chine might end human suffering by ending our existence is a staple

of debates over the existential risk from AI.

A third example is G. E.

Moore’s emphasis on the reality of the source of happiness, amending

earlier definitions that seemed to have a loophole allowing maximiza-

tion of happiness through self- delusion. The modern analogs of this

point include The Matrix (in which present- day reality turns out to be

an illusion produced by a computer simulation) and recent work on

the self- delusion problem in reinforcement learning.

These examples, and more, convince me that the AI community

should pay careful attention to the thrusts and counterthrusts of phil-

osophical and economic debates on utilitarianism because they are di-

rectly relevant to the task at hand. Two of the most important, from the

point of view of designing AI systems that will benefit multiple individ-

uals, concern interpersonal comparisons of utilities and comparisons of

utilities across different population sizes. Both of these debates have

been raging for 150 years or more, which leads one to suspect their

satisfactory resolution may not be entirely straightforward.

The debate on interpersonal comparisons of utilities matters be-

cause Robbie cannot maximize the sum of Alice’s and Bob’s utilities

unless those utilities can be added; and they can be added only if they

are measurable on the same scale. The nineteenth- century British lo-

gician and economist William Stanley Jevons (also the inventor of an

early mechanical computer called the logical piano) argued in 1871

that interpersonal comparisons are impossible:

The susceptibility of one mind may, for what we know, be a thou-

sand times greater than that of another. But, provided that the

susceptibility was different in a like ratio in all directions, we

should never be able to discover the profoundest difference. Every

mind is thus inscrutable to every other mind, and no common

denominator of feeling is possible.

9780525558613_Human_TX.indd 222 8/7/19 11:21 PM

Not

utilities ca

lities ca

rable on thble on th

economeconom

for

may not

y n

nterpersona

nterperson

t maxim

maxim

Distribution

the AI com

e AI com

counterthruounterthru

nism becaum becau

the most imhe most im

that will benat will be

risons of utions of uti

ation sizes.ation size

r more, w

be e

COMPLICATIONS: US 223

The American economist Kenneth Arrow, founder of modern social

choice theory and 1972 Nobel laureate, was equally adamant:

The viewpoint will be taken here that interpersonal comparison

of utilities has no meaning and, in fact, there is no meaning rele-

vant to welfare comparisons in the measurability of individual

utility.

The difficulty to which Jevons and Arrow are referring is that there is

no obvious way to tell if Alice values pinpricks and lollipops at −1 and

+ 1 or −1000 and + 1000 in terms of her subjective experience of hap-

piness. In either case, she will pay up to one lollipop to avoid one

pinprick. Indeed, if Alice is a humanoid automaton, her external be-

havior might be the same even though there is no subjective experi-

ence of happiness whatsoever.

In 1974, the American philosopher Robert Nozick suggested that

even if interpersonal comparisons of utility could be made, maximiz-

ing the sum of utilities would still be a bad idea because it would fall

foul of the

utility monster— a person whose experiences of pleasure

and pain are many times more intense than those of ordinary people.

Such a person could assert that any additional unit of resources would

yield a greater increment to the sum total of human happiness if given

to him rather than to others; indeed, removing resources from others

to benefit the utility monster would also be a good idea.

This might seem to be an obviously undesirable consequence, but

consequentialism by itself cannot come to the rescue: the problem lies

in how we measure the desirability of consequences. One possible re-

sponse is that the utility monster is merely theoretical— there are no

such people. But this response probably won’t do: in a sense, all hu-

mans are utility monsters relative to, say, rats and bacteria, which is

why we pay little attention to the preferences of rats and bacteria in

setting public policy.

If the idea that different entities have different utility scales is

M 9780525558613_Human_TX.indd 223 8/7/19 11:21 PM

Not

oul

er increm

increm

ather than tather than

t the utt the ut

for

ster

——

times more

imes mor

d assert

Distribution

llipo

experienc

xperienc

e lollipop tlollipop t

utomaton, hmaton,

there is nothere is no

sopher Robpher Rob

sons of utilsons of ut

uld still be

uld still b

per

224 HUMAN COMPATIBLE

already built into our way of thinking, then it seems entirely possible

that different people have different scales too.

Another response is to say “Tough luck!” and operate on the as-

sumption that everyone has the same scale, even if they don’t.

One

could also try to investigate the issue by scientific means unavailable

to Jevons, such as measuring dopamine levels or the degree of electri-

cal excitation of neurons related to pleasure and pain, happiness and

misery. If Alice’s and Bob’s chemical and neural responses to a lollipop

are pretty much identical, as well as their behavioral responses (smil-

ing, making lip- smacking noises, and so on), it seems odd to insist

that, nevertheless, their subjective degrees of enjoyment differ by a

factor of a thousand or a million. Finally, one could use common cur-

rencies such as time (of which we all have, very roughly, the same

amount)—for example, by comparing lollipops and pinpricks against,

say, five minutes extra waiting time in the airport departure lounge.

I am far less pessimistic than Jevons and Arrow. I suspect that it is

indeed meaningful to compare utilities across individuals, that scales

may differ but typically not by very large factors, and that machines

can begin with reasonably broad prior beliefs about human preference

scales and learn more about the scales of individuals by observation

over time, perhaps correlating natural observations with the findings

of neuroscience research.

The second debate— about utility comparisons across populations

of different sizes— matters when decisions have an impact on who will

exist in the future. In the movie Avengers: Infinity War, for example,

the character Thanos develops and implements the theory that if there

were half as many people, everyone who remained would be more

than twice as happy. This is the kind of naïve calculation that gives

utilitarianism a bad name.

The same question— minus the Infinity Stones and the gargantuan

budget— was discussed in 1874 by the British philosopher Henry

Sidgwick in his famous treatise, The Methods of Ethics.

Sidgwick, in

apparent agreement with Thanos, concluded that the right choice was

9780525558613_Human_TX.indd 224 8/7/19 11:21 PM

Not

ce researc

researc

cond nd

ebateba

ntnt

izes—izes—

for

bly bro

re about th

re about t

correlati

orrelati

Distribution

s od

oyment dif

ment di

ould use comuld use com

, very rougery rou

lipops and ipops and

n the airporhe airpo

evons and Aons and A

e utilities ace utilities

by very l

dpr

COMPLICATIONS: US 225

to adjust the population size until the maximum total happiness was

reached. (Obviously, this does not mean increasing the population

without limit, because at some point everyone would be starving to

death and hence rather unhappy.) In 1984, the British philosopher

Derek Parfit took up the issue again in his groundbreaking work

Reasons and Persons.

Parfit argues that for any situation with a pop-

ulation of N very happy people, there is (according to utilitarian prin-

ciples) a preferable situation with 2N people who are ever so slightly

less happy. This seems highly plausible. Unfortunately, it’s also a slip-

pery slope. By repeating the process, we reach the so- called Repug-

nant Conclusion (usually capitalized thus, perhaps to emphasize its

Victorian roots): that the most desirable situation is one with a vast

population, all of whom have a life barely worth living.

As you can imagine, such a conclusion is controversial. Parfit himself

struggled for over thirty years to find a solution to his own conundrum,

without success. I suspect we are missing some fundamental axioms,

analogous to those for individually rational preferences, to handle

choices between populations of different sizes and happiness levels.

It is important that we solve this problem, because machines with

sufficient foresight may be able to consider courses of action leading to

different population sizes, just as the Chinese government did with its

one- child policy in 1979. It’s quite likely, for example, that we will be

asking AI systems for help in devising solutions for global climate

change— and those solutions may well involve policies that tend to

limit or even reduce population size.

On the other hand, if we decide

that larger populations really are better and if we give significant

weight to the well- being of potentially vast human populations centu-

ries from now, then we will need to work much harder on finding ways

to move beyond the confines of Earth. If the machines’ calculations

lead to the Repugnant Conclusion or to its opposite— a tiny popula-

tion of optimally happy people— we may have reason to regret our

lack of progress on the question.

Some philosophers have argued that we may need to make

M 9780525558613_Human_TX.indd 225 8/7/19 11:21 PM

Not

atio

olicy in 19

cy in 19

AI systems AI systems

nd thond th

for

t we so

e s

may be able

may be ab

n sizes, j

Distribution

ps to emp

to emp

tion is one ion is one

worth livingth livin

n is controveis controve

a solution tsolution

e missing somissing so

vidually ratvidually ra

s of differe

s of differ

ve th

226 HUMAN COMPATIBLE

decisions in a state of moral uncertainty— that is, uncertainty about the

appropriate moral theory to employ in making decisions.

One solu-

tion is to allocate some probability to each moral theory and make de-

cisions using an “expected moral value.” It’s not clear, however, that it

makes sense to ascribe probabilities to moral theories in the same way

one applies probabilities to tomorrow’s weather. (What’s the probabil-

ity that Thanos is exactly right?) And even if it does make sense, the

potentially vast differences between the recommendations of compet-

ing moral theories mean that resolving the moral uncertainty— working

out which moral theory avoids unacceptable consequences— has to

happen before we make such momentous decisions or entrust them to

machines.

Let’s be optimistic and suppose that Harriet eventually solves this

and other problems arising from the existence of more than one person

on Earth. Suitably altruistic and egalitarian algorithms are downloaded

into robots all over the world. Cue the high fives and happy- sounding

music. Then Harriet goes home....

ROBBIE: Welcome home! Long day?

HARRIET: Yes, worked really hard, not even time for lunch.

ROBBIE: So you must be quite hungry!

HARRIET: Starving! Can you make me some dinner?

ROBBIE: There’s something I need to tell you....

HARRIET: What? Don’t tell me the fridge is empty!

ROBBIE: No, there are humans in Somalia in more urgent need of help.

I am leaving now. Please make your own dinner.

While Harriet might be quite proud of Robbie and of her own

contributions towards making him such an upstanding and decent

machine, she cannot help but wonder why she shelled out a small for-

tune to buy a robot whose first significant act is to disappear. In prac-

tice, of course, no one would buy such a robot, so no such robots would

be built and there would be no benefit to humanity. Let’s call this the

9780525558613_Human_TX.indd 226 8/7/19 11:21 PM

u m

Starving!

arving!

There’s soThere’s s

Wha Wh

for

me! Lo

e! L

orked really

rked really

must be q

ust be q

Distribution

uenc

or entrust

entrust

riet eventueventu

ence of mornce of mor

arian algoriian algor

e the high fhe high f

e.... ..

ng d

COMPLICATIONS: US 227

Somalia problem. For the whole utilitarian- robot scheme to work, we

have to find a solution to this problem. Robbie will need to have some

amount of loyalty to Harriet in particular— perhaps an amount re-

lated to the amount Harriet paid for Robbie. Possibly, if society wants

Robbie to help people besides Harriet, society will need to compen-

sate Harriet for its claim on Robbie’s services. It’s quite likely that ro-

bots will coordinate with one another so that they don’t all descend on

Somalia at once— in which case, Robbie might not need to go after all.

Or perhaps some completely new kinds of economic relationships will

emerge to handle the (certainly unprecedented) presence of billions of

purely altruistic agents in the world.

1LFH1DVW\DQG(QYLRXV+XPDQV

Human preferences go far beyond pleasure and pizza. They certainly

extend to the well- being of others. Even Adam Smith, the father of

economics who is often cited when a justification for selfishness is

required, began his first book by emphasizing the crucial importance

of concern for others:

How selfish soever man may be supposed, there are evidently

some principles in his nature, which interest him in the fortune of

others, and render their happiness necessary to him, though he

derives nothing from it except the pleasure of seeing it. Of this

kind is pity or compassion, the emotion which we feel for the mis-

ery of others, when we either see it, or are made to conceive it in a

very lively manner. That we often derive sorrow from the sorrow

of others, is a matter of fact too obvious to require any instances to

prove it.

In modern economic parlance, concern for others usually goes un-

der the heading of altruism.

The theory of altruism is fairly well

M 9780525558613_Human_TX.indd 227 8/7/19 11:21 PM

Not

fish soeve

h soeve

principles inprinciples

, and re, and r

for

st boo

rs:rs

244

Distribution

nce o

+XPDQV+XPDQ

d pleasure pleasure

others. Evenothers. Ev

ted when

228 HUMAN COMPATIBLE

developed and has significant implications for tax policy among other

matters. Some economists, it must be said, treat altruism as another

form of selfishness designed to provide the giver with a “warm glow.”

This is certainly a possibility that robots need to be aware of as they

interpret human behavior, but for now let’s give humans the benefit of

the doubt and assume they do actually care.

The easiest way to think about altruism is to divide one’s prefer-

ences into two kinds: preferences for one’s own intrinsic well- being

and preferences concerning the well- being of others. (There is consid-

erable dispute about whether these can be neatly separated, but I’ll

put that dispute to one side.) Intrinsic well- being refers to qualities of

one’s own life, such as shelter, warmth, sustenance, safety, and so on,

that are desirable in themselves rather than by reference to qualities of

the lives of others.

To make this notion more concrete, let’s suppose that the world

contains two people, Alice and Bob. Alice’s overall utility is composed

of her own intrinsic well- being plus some factor C

times Bob’s in-

trinsic well- being. The caring factor C

indicates how much Alice

cares about Bob. Similarly, Bob’s overall utility is composed of his in-

trinsic well- being plus some caring factor C

times Alice’s intrinsic

well- being, where C

indicates how much Bob cares about Alice.

Robbie is trying to help both Alice and Bob, which means (let’s say)

maximizing the sum of their two utilities. Thus, Robbie needs to pay

attention not just to the individual well- being of each but also to how

much each cares about the well- being of the other.

The signs of the caring factors C

and C

matter a lot. For exam-

ple, if C

is positive, Alice is “nice”: she derives some happiness from

Bob’s well- being. The more positive C

is, the more Alice is willing

to sacrifice some of her own well- being to help Bob. If C

is zero,

then Alice is completely selfish: if she can get away with it, she will

divert any amount of resources away from Bob and towards herself,

even if Bob is left destitute and starving. Faced with selfish Alice and

nice Bob, a utilitarian Robbie will obviously protect Bob from Alice’s

9780525558613_Human_TX.indd 228 8/7/19 11:21 PM

Not

ing to hel

g to hel

ng the sum ng the sum

not just not just

for

rly, Bo

y, B

lus some ca

us some c

indic indic

Distribution

parat

efers to qu

rs to qu

nce, safety,nce, safety,

by referenceeferenc

ete, let’s sup, let’s su

b. Alice’s ovAlice’s ov

g plus someg plus som

ng factor

b’s o

COMPLICATIONS: US 229

worst depredations. It’s interesting that the final equilibrium will typ-

ically leave Bob with less intrinsic well- being than Alice, but he may

have greater overall happiness because he cares about her well- being.

You might feel that Robbie’s decisions are grossly unfair if they leave

Bob with less well- being than Alice merely because he is nicer than

she is: Wouldn’t he resent the outcome and be unhappy?

Well, he

might, but that would be a different model— one that includes a term

for resentment over differences in well- being. In our simple model

Bob would be at peace with the outcome. Indeed, in the equilibrium

situation, he would resist any attempt to transfer resources from Alice

to himself, since that would reduce his overall happiness. If you think

this is completely unrealistic, consider the case where Alice is Bob’s

newborn daughter.

The really problematic case for Robbie to deal with is when C

negative: in that case, Alice is truly nasty. I’ll use the phrase negative

altruism to refer to such preferences. As with the sadistic Harriet

mentioned earlier, this is not about garden- variety greed and selfish-

ness, whereby Alice is content to reduce Bob’s share of the pie in order

to enhance her own. Negative altruism means that Alice derives hap-

piness purely from the reduced well- being of others, even if her own

intrinsic well- being is unchanged.

In his paper that introduced preference utilitarianism, Harsanyi

attributes negative altruism to “sadism, envy, resentment, and malice”

and argues that they should be ignored in calculating the sum total of

human utility in a population:

No amount of goodwill to individual X can impose the moral ob-

ligation on me to help him in hurting a third person, individual Y.

This seems to be one area in which it is reasonable for the designers of

intelligent machines to put a (cautious) thumb on the scales of justice,

so to speak.

Unfortunately, negative altruism is far more common than one

M 9780525558613_Human_TX.indd 229 8/7/19 11:21 PM

Not

eing

aper that

er that

s negative as negative

es that tes that t

for

Negativ

gat

the reduce

the reduc

is unch

Distribution

urces

ppiness. If

ness. If

se where Ae where A

bie to deal wie to deal w

nasty. I’ll usty. I’ll u

ences. As wces. As

about about

ardar

ent to redu

altr

230 HUMAN COMPATIBLE

might expect. It arises not so much from sadism and malice

but from

envy and resentment and their converse emotion, which I will call

pride (for want of a better word). If Bob envies Alice, he derives un-

happiness from the difference

between Alice’s well- being and his own;

the greater the difference, the more unhappy he is. Conversely, if Al-

ice is proud of her superiority over Bob, she derives happiness not just

from her own intrinsic well- being but also from the fact that it is

higher than Bob’s. It is easy to show that, in a mathematical sense,

pride and envy work in roughly the same way as sadism; they lead

Alice and Bob to derive happiness purely from reducing each other’s

well- being, because a reduction in Bob’s well- being increases Alice’s

pride, while a reduction in Alice’s well- being reduces Bob’s envy.

Jeffrey Sachs, the renowned development economist, once told me

a story that illustrated the power of these kinds of preferences in peo-

ple’s thinking. He was in Bangladesh soon after a major flood had

devastated one region of the country. He was speaking to a farmer

who had lost his house, his fields, all his animals, and one of his chil-

dren. “I’m so sorry— you must be terribly sad,” Sachs ventured. “Not

at all,” replied the farmer. “I’m pretty happy because my damned

neighbor has lost his wife and all his children too!”

The economic analysis of pride and envy— particularly in the con-

text of social status and conspicuous consumption— came to the fore

in the work of the American sociologist Thorstein Veblen, whose 1899

book, The Theory of the Leisure Class, explained the toxic consequences

of these attitudes.

In 1977, the British economist Fred Hirsch pub-

lished The Social Limits to Growth,

in which he introduced the idea

of positional goods

. A positional good is anything— it could be a car, a

house, an Olympic medal, an education, an income, or an accent—

that derives its perceived value not just from its intrinsic benefits but

also from its relative properties, including the properties of scarcity

and being superior to someone else’s. The pursuit of positional goods,

driven by pride and envy, has the character of a zero- sum game, in the

9780525558613_Human_TX.indd 230 8/7/19 11:21 PM

Not

c a

status an

atus an

k of the Amk of the Am

Theory oTheory

for

mer. “

s wife and a

wife and

nalysis o

alysis o

Distribution

ng e

ng increase

increase

duces Bob’sduces Bob’

t economistonomis

e kinds of pkinds of p

h soon aftesoon aft

ntry. He wry. He w

lds, all his alds, all his

ust be terr

ust be ter

COMPLICATIONS: US 231

sense that Alice cannot improve her relative position without worsen-

ing the relative position of Bob, and vice versa. (This doesn’t seem to

prevent vast sums being squandered in this pursuit.) Positional goods

seem to be ubiquitous in modern life, so machines will need to under-

stand their overall importance in the preferences of individuals. More-

over, social identity theorists propose that membership and standing

within a group and the overall status of the group relative to other

groups are essential constituents of human self- esteem.

Thus, it is

difficult to understand human behavior without understanding how

individuals perceive themselves as members of groups— whether those

groups are species, nations, ethnic groups, political parties, profes-

sions, families, or supporters of a particular football team.

As with sadism and malice, we might propose that Robbie should

give little or no weight to pride and envy in his plans for helping Alice

and Bob. There are some difficulties with this proposal, however. Be-

cause pride and envy counteract caring in Alice’s attitude to Bob’s

well- being, it may not be easy to tease them apart. It may be that Alice

cares a lot, but also suffers from envy; it is hard to distinguish this

Alice from a different Alice who cares only a little bit but has no envy

at all. Moreover, given the prevalence of pride and envy in human

preferences, it’s essential to consider very carefully the ramifications

of ignoring them. It might be that they are essential for self- esteem,

especially in their positive forms— self- respect and admiration for

others.

Let me reemphasize a point made earlier: suitably designed ma-

chines will not behave like those they observe, even if those machines

are learning about the preferences of sadistic demons. It’s possible, in

fact, that if we humans find ourselves in the unfamiliar situation of

dealing with purely altruistic entities on a daily basis, we may learn to

be better people ourselves— more altruistic and less driven by pride

and envy.

M 9780525558613_Human_TX.indd 231 8/7/19 11:21 PM

Not

them. It m

em. It m

y in their y in their

for

Alice w

ice

iven the p

ential t

Distribution

—

ical partie

al partie

otball teamtball team

ropose thatose tha

y in his planin his plan

with this pwith this

t caring in caring in

to tease theto tease th

s from env

s from en

who c

232 HUMAN COMPATIBLE

6WXSLG(PRWLRQDO+XPDQV

The title of this section is not meant to refer to some particular subset

of humans. It refers to all of us. We are all incredibly stupid compared

to the unreachable standard set by perfect rationality, and we are all

subject to the ebb and flow of the varied emotions that, to a large ex-

tent, govern our behavior.

Let’s begin with stupidity. A perfectly rational entity maximizes

the expected satisfaction of its preferences over all possible future

lives it could choose to lead. I cannot begin to write down a number

that describes the complexity of this decision problem, but I find the

following thought experiment helpful. First, note that the number of

motor control choices that a human makes in a lifetime is about twenty

trillion. (See Appendix A for the detailed calculations.) Next, let’s see

how far brute force will get us with the aid of Seth Lloyd’s ultimate-

physics laptop, which is one billion trillion trillion times faster than

the world’s fastest computer. We’ll give it the task of enumerating all

possible sequences of English words (perhaps as a warmup for Jorge

Luis Borges’s Library of Babel), and we’ll let it run for a year. How long

are the sequences that it can enumerate in that time? A thousand

pages of text? A million pages? No. Eleven words. This tells you some-

thing about the difficulty of designing the best possible life of twenty

trillion actions. In short, we are much further from being rational than

a slug is from overtaking the starship Enterprise traveling at warp nine.

We have absolutely no idea what a rationally chosen life would be like.

The implication of this is that humans will often act in ways that

are contrary to their own preferences. For example, when Lee Sedol

lost his Go match to AlphaGo, he played one or more moves that guar-

anteed he would lose, and AlphaGo could (in some cases at least) de-

tect that he had done this. It would be incorrect, however, for AlphaGo

to infer that Lee Sedol has a preference for losing. Instead, it would be

reasonable to infer that Lee Sedol has a preference for winning but has

9780525558613_Human_TX.indd 232 8/7/19 11:21 PM

Not

es t

A million

A millio

ut the difficut the diff

tions. Intions. In

for

nglish

lis

y of Babel),

of Babel)

hat it c

Distribution

poss

te down a

down a

roblem, buoblem, bu

note that tte that

s in a lifetimin a lifetim

ailed calculaed calcul

th the aid othe aid o

illion trillioillion tril

. We’ll giv

. We’ll gi

word

wor

COMPLICATIONS: US 233

some computational limitations that prevent him from choosing the

right move in all cases. Thus, in order to understand Lee Sedol’s be-

havior and learn about his preferences, a robot following the third

principle (“the ultimate source of information about human prefer-

ences is human behavior”) has to understand something about the

cognitive processes that generate his behavior. It cannot assume he is

rational.

This gives the AI, cognitive science, psychology, and neuroscience

communities a very serious research problem: to understand enough

about human cognition

that we (or rather, our beneficial machines)

can “ reverse- engineer” human behavior to get at the deep underlying

preferences, to the extent that they exist. Humans manage to do some

of this, learning their values from others with a little bit of guidance

from biology, so it seems possible. Humans have an advantage: they

can use their own cognitive architecture to simulate that of other hu-

mans, without knowing what that architecture is— “If I wanted X, I’d

do just the same thing as Mum does, so Mum must want X.”

Machines do not have this advantage. They can simulate other ma-

chines easily, but not people. It’s unlikely that they will soon have ac-

cess to a complete model of human cognition, whether generic or

tailored to specific individuals. Instead, it makes sense from a practi-

cal point of view to look at the major ways in which humans deviate

from rationality and to study how to learn preferences from behavior

that exhibits such deviations.

One obvious difference between humans and rational entities is

that, at any given moment, we are not choosing among all possible

first steps of all possible future lives. Not even close. Instead, we are

typically embedded in a deeply nested hierarchy of “subroutines.”

Generally speaking, we are pursuing near- term goals rather than max-

imizing preferences over future lives, and we can act only according to

the constraints of the subroutine we’re in at present. Right now, for

example, I’m typing this sentence: I can choose how to continue after

the colon, but it never occurs to me to wonder if I should stop writing

M 9780525558613_Human_TX.indd 233 8/7/19 11:21 PM

Not

ific

view to lo

ew to lo

onality andonality an

bits sucbits suc

for

people

opl

e model of

model o

individu

Distribution

eficia

he deep u

e deep u

ans manageans manage

with a little a little

mans have mans have

ture to simre to sim

t architectuarchitectu

m does, so Mm does, so

his advanta

his advant

It’s u

234 HUMAN COMPATIBLE

the sentence and take an online rap course or burn down the house

and claim the insurance or any other of a gazillion things I could do

next. Many of these other things might actually be better than what

I’m doing, but, given my hierarchy of commitments, it’s as if those

other things didn’t exist.

Understanding human action, then, seems to require understand-

ing this subroutine hierarchy (which may be quite individual): which

subroutine the person is executing at present, which near- term objec-

tives are being pursued within this subroutine, and how they relate to

deeper, long- term preferences. More generally, learning about human

preferences seems to require learning about the actual structure of

human lives. What are all the things that we humans can be engaged

in, either singly or jointly? What activities are characteristic of differ-

ent cultures and types of individuals? These are tremendously inter-

esting and demanding research questions. Obviously, they do not have

a fixed answer because we humans are adding new activities and be-

havioral structures to our repertoires all the time. But even partial and

provisional answers would be very useful for all kinds of intelligent

systems designed to help humans in their daily lives.

Another obvious property of human actions is that they are often

driven by emotion. In some cases, this is a good thing— emotions such

as love and gratitude are of course partially constitutive of our prefer-

ences, and actions guided by them can be rational even if not fully de-

liberated. In other cases, emotional responses lead to actions that even

we stupid humans recognize as less than rational— after the fact, of

course. For example, an angry and frustrated Harriet who slaps a re-

calcitrant ten- year- old Alice may regret the action immediately. Rob-

bie, observing the action, should (typically, although not in all cases)

attribute the action to anger and frustration and a lack of self- control

rather than deliberate sadism for its own sake. For this to work, Rob-

bie has to have some understanding of human emotional states, in-

cluding their causes, how they evolve over time in response to external

stimuli, and the effects they have on action. Neuroscientists are

9780525558613_Human_TX.indd 234 8/7/19 11:21 PM

Not

n. I

atitude ar

itude a

d actions gud actions gu

In otherIn othe

for

p hum

hum

property o

property

n some

Distribution

g ab

actual stru

ual stru

umans can bmans can

re characterharacte

hese are trehese are tre

ons. Obvious. Obvio

ns are addinare addin

rtoires all thrtoires all

be very us

be very u

ans i

COMPLICATIONS: US 235

beginning to get a handle on the mechanics of some emotional states

and their connections to other cognitive processes,

and there is some

useful work on computational methods for detecting, predicting, and

manipulating human emotional states,

but there is much more to be

learned. Again, machines are at a disadvantage when it comes to emo-

tions: they cannot generate an internal simulation of an experience to

see what emotional state it would engender.

As well as affecting our actions, emotions reveal useful informa-

tion about our underlying preferences. For example, little Alice may

be refusing to do her homework, and Harriet is angry and frustrated

because she really wants Alice to do well in school and have a better

chance in life than Harriet herself did. If Robbie is equipped to under-

stand this— even if he cannot experience it himself— he may learn a

great deal from Harriet’s less- than- rational actions. It ought to be pos-

sible, then, to create rudimentary models of human emotional states

that suffice to avoid the most egregious errors in inferring human

preferences from behavior.

Do Humans Really Have Preferences?

The entire premise of this book is that there are futures that we would

like and futures we would prefer to avoid, such as near- term extinc-

tion or being turned into human battery farms à la The Matrix. In this

sense, yes, of course humans have preferences. Once we get into the

details of how humans would prefer their lives to play out, however,

things become much murkier.

Uncertainty and error

One obvious property of humans, if you think about it, is that they

don’t always know what they want. For example, the durian fruit

elicits different responses from different people: some find that “it

M 9780525558613_Human_TX.indd 235 8/7/19 11:21 PM

Not

remise of

mise of

futures we futures we

eing tureing tur

for

Really H

Really

Distribution

y an

ol and hav

and hav

e is equippee is equipp

imself—mself—

nal actions.nal actions.

models of hudels of h

egregious eregious e

236 HUMAN COMPATIBLE

surpasses in flavour all other fruits of the world”

while others liken

it to “sewage, stale vomit, skunk spray and used surgical swabs.”

have deliberately refrained from trying durian prior to publication, so

that I can maintain neutrality on this point: I simply don’t know which

camp I will be in. The same might be said for many people considering

future careers, future life partners, future post- retirement activities,

and so on.

There are at least two kinds of preference uncertainty. The first is

real, epistemic uncertainty, such as I experience about my durian pref-

erence.

No amount of thought is going to resolve this uncertainty.

There is an empirical fact of the matter, and I can find out more by

trying some durian, by comparing my DNA with that of durian lovers

and haters, and so on. The second arises from computational limita-

tions: looking at two Go positions, I am not sure which I prefer because

the ramifications of each are beyond my ability to resolve completely.

Uncertainty also arises from the fact that the choices we are pre-

sented with are usually incompletely specified— sometimes so incom-

pletely that they barely qualify as choices at all. When Alice is about

to graduate from high school, a career counselor might offer her a

choice between “librarian” and “coal miner”; she may, quite reason-

ably, say, “I’m uncertain about which I prefer.” Here, the uncertainty

comes from epistemic uncertainty about her own preferences for,

say, coal dust versus book dust; from computational uncertainty as

she struggles to work out how she might make the best of each career

choice; and from ordinary uncertainty about the world, such as her

doubts about the long- term viability of her local coal mine.

For these reasons, it’s a bad idea to identify human preferences

with simple choices between incompletely described options that are

intractable to evaluate and include elements of unknown desirability.

Such choices provide indirect evidence of underlying preferences, but

they are not constitutive of those preferences. That’s why I have

couched the notion of preferences in terms of future lives—for exam-

ple by imagining that you could experience, in a compressed form,

9780525558613_Human_TX.indd 236 8/7/19 11:21 PM

Not

ncer

epistemic

istemic

dust versusdust versu

les to wles to w

for

schoo

rarian” and

rarian” an

tain abo

ain abo

Distribution

his u

n find out

ind out

h that of duh that of du

om computcompu

ot sure whict sure whic

my ability toability t

he fact thatfact that

pletely pletely

pecpe

ify as cho

COMPLICATIONS: US 237

two different movies of your future life and then express a preference

between them (see page 26). The thought experiment is of course

impossible to carry out in practice, but one can imagine that in many

cases a clear preference would emerge long before all the details of

each movie had been filled in and fully experienced. You may not

know in advance which you will prefer, even given a plot summary;

but there is an answer to the actual question, based on who you are

now, just as there is an answer to the question of whether you will like

durian when you try it.

The fact that you might be uncertain about your own preferences

does not cause any particular problems for the preference- based ap-

proach to provably beneficial AI. Indeed, there are already some algo-

rithms that take into account both Robbie’s and Harriet’s uncertainty

about Harriet’s preferences and allow for the possibility that Harriet

may be learning about her preferences while Robbie is.

Just as

Robbie’s uncertainty about Harriet’s preferences can be reduced by

observing Harriet’s behavior, Harriet’s uncertainty about her own

preferences can be reduced by observing her own reactions to experi-

ences. The two kinds of uncertainty need not be directly related; nor

is Robbie necessarily more uncertain than Harriet about Harriet’s

preferences. For example, Robbie might be able to detect that Harriet

has a strong genetic predisposition to despise the flavor of durian. In

that case, he would have very little uncertainty about her durian pref-

erence, even while she remains completely in the dark.

If Harriet can be uncertain about her preferences over future

events, then, quite probably, she can also be wrong. For example, she

might be convinced that she will not like durian (or, say, green eggs

and ham) and so she avoids it at all costs, but it may turn out— if some-

one slips some into her fruit salad one day— that she finds it sublime

after all. Thus, Robbie cannot assume that Harriet’s actions reflect

accurate knowledge of her own preferences: some may be thor-

oughly grounded in experience, while others may be based primarily

on supposition, prejudice, fear of the unknown, or weakly supported

M 9780525558613_Human_TX.indd 237 8/7/19 11:21 PM

Not

r ex

genetic p

enetic p

, he would , he would

ven whiven wh

for

of unc

ily more u

ily more

ample, R

Distribution

own

eference-

erence-

are alreadyare already

s and Harrid Harr

or the possor the poss

ences whileces whil

riet’s prefert’s prefer

, Harriet’s , Harriet’

by observ

by obser

rtain

238 HUMAN COMPATIBLE

generalizations.

A suitably tactful Robbie could be very helpful to

Harriet in alerting her to such situations.

Experience and memory

Some psychologists have called into question the very notion that

there is one self whose preferences are sovereign in the way that

Harsanyi’s principle of preference autonomy suggests. Most promi-

nent among these psychologists is my former Berkeley colleague Dan-

iel Kahneman. Kahneman, who won the 2002 Nobel Prize for his

work in behavioral economics, is one of the most influential thinkers

on the topic of human preferences. His recent book, Thinking, Fast

and Slow,

recounts in some detail a series of experiments that con-

vinced him that there are two selves— the experiencing self and the

remembering self— whose preferences are in conflict.

The experiencing self is the one being measured by the hedonime-

ter

, which the nineteenth- century British economist Francis Edge-

worth imagined to be “an ideally perfect instrument, a psychophysical

machine, continually registering the height of pleasure experienced by

an individual, exactly according to the verdict of consciousness.”

Ac-

cording to hedonic utilitarianism, the overall value of any experience

to an individual is simply the sum of the hedonic values of each instant

during the experience. This notion applies equally well to eating an

ice cream or living an entire life.

The remembering self, on the other hand, is the one who is “in

charge” when there is any decision to be made. This self chooses new

experiences based on memories of previous experiences and their de-

sirability. Kahneman’s experiments suggest that the remembering self

has very different ideas from the experiencing self.

The simplest experiment to understand involves plunging a sub-

ject’s hand into cold water. There are two different regimes: in the first,

the immersion is for 60 seconds in water at 14 degrees Celsius; in the

second, the immersion is for 60 seconds in water at 14 degrees followed

9780525558613_Human_TX.indd 238 8/7/19 11:21 PM

Not

nic

ual is simp

is simp

e experience experien

or livingor livin

for

gisteri

ter

y according

y accordin

utilitaria

tilitaria

Distribution

el Pr

nfluential

luential

t book, book,

ThiThi

of experimexperim

he he

experieexperie

are in confle in conf

ne being mebeing me

ntury Britisntury Brit

eally perfe

ally perfe

gth

COMPLICATIONS: US 239

by 30 seconds at 15 degrees. (These temperatures are similar to ocean

temperatures in Northern California— cold enough that almost every-

one wears a wetsuit in the water.) All subjects report the experience as

unpleasant. After experiencing both regimes (in either order, with a

7- minute gap in between), the subject is asked to choose which one

they would like to repeat. The great majority of subjects prefer to re-

peat the 60 + 30 rather than just the 60- second immersion.

Kahneman posits that, from the point of view of the experienc-

ing self, 60 + 30 has to be strictly worse than 60, because it includes 60

and another unpleasant experience. Yet the remembering self chooses

60 + 30. Why?

Kahneman’s explanation is that the remembering self looks back

with rather weirdly tinted spectacles, paying attention mainly to the

“peak” value (the highest or lowest hedonic value) and the “end” value

(the hedonic value at the end of the experience). The durations of

different parts of the experience are mostly neglected. The peak dis-

comfort levels for 60 and 60 + 30 are the same, but the end levels are

different: in the 60 + 30 case, the water is one degree warmer. If the

remembering self evaluates experiences by the peak and end values,

rather than by summing up hedonic values over time, then 60 + 30 is

better, and this is what is found. The peak- end model seems to explain

many other equally weird findings in the literature on preferences.

Kahneman seems (perhaps appropriately) to be of two minds

about his findings. He asserts that the remembering self “simply made

a mistake” and chose the wrong experience because its memory is

faulty and incomplete; he regards this as “bad news for believers in the

rationality of choice.” On the other hand, he writes, “A theory of well-

being that ignores what people want cannot be sustained.” Suppose,

for example, that Harriet has tried Pepsi and Coke and now strongly

prefers Pepsi; it would be absurd to force her to drink Coke based on

adding up secret hedonimeter readings taken during each trial.

The fact is that no law requires our preferences between experi-

ences to be defined by the sum of hedonic values over instants of time.

M 9780525558613_Human_TX.indd 239 8/7/19 11:21 PM

Not

is w

equally w

ually w

eman seememan see

findingfinding

for

uates

tes

ming up he

ming up h

hat is fo

Distribution

ring

mbering selfbering sel

ng attentionattentio

onic value) anic value) a

he experienexperien

are mostlyre mostly

+ 30 are the+ 30 are t

ase, the wa

ase, the w

xper

xpe

240 HUMAN COMPATIBLE

It is true that standard mathematical models focus on maximizing a

sum of rewards,

but the original motivation for this was mathemati-

cal convenience. Justifications came later in the form of technical as-

sumptions under which it is rational to decide based on adding up

rewards,

but those technical assumptions need not hold in reality.

Suppose, for example, that Harriet is choosing between two sequences

of hedonic values: [10,10,10,10,10] and [0,0,40,0,0]. It’s entirely pos-

sible that she just prefers the second sequence; no mathematical law

can force her to make choices based on the sum rather than, say, the

maximum.

Kahneman acknowledges that the situation is complicated still fur-

ther by the crucial role of anticipation and memory in well- being. The

memory of a single, delightful experience— one’s wedding day, the

birth of a child, an afternoon spent picking blackberries and making

jam— can carry one through years of drudgery and disappointment.

Perhaps the remembering self is evaluating not just the experience

per se but its total effect on life’s future value through its effect on fu-

ture

memories. And presumably it’s the remembering self and not the

experiencing self that is the best judge of what will be remembered.

Time and change

It goes almost without saying that sensible people in the twenty-

first century would not want to emulate the preferences of, say, Ro-

man society in the second century, replete with gladiatorial slaughter

for public entertainment, an economy based on slavery, and brutal

massacres of defeated peoples. (We need not dwell on the obvious

parallels to these characteristics in modern society.) Standards of mo-

rality clearly evolve over time as our civilization progresses— or drifts,

if you prefer. This suggests, in turn, that future generations might find

utterly repulsive our current attitudes to, say, the well- being of ani-

mals. For this reason, it is important that machines charged with im-

plementing human preferences be able to respond to changes in those

9780525558613_Human_TX.indd 240 8/7/19 11:21 PM

Not

d chan

han

almost wialmost w

ury wouury wou

for

the b

Distribution

mplicated

plicated

ory in ory in

ell-ell-

—

ne’s wedne’s we

king blackbeng blackb

f drudgery drudgery

evaluating valuating

e’s future vae’s future

ably it’s th

st ju

COMPLICATIONS: US 241

preferences over time rather than fixing them in stone. The three

principles from Chapter 7 accommodate such changes in a natural

way, because they require machines to learn and implement the cur-

rent preferences of current humans— lots of them, all different—

rather than a single idealized set of preferences or the preferences of

machine designers who may be long dead.

The possibility of changes in the typical preferences of human

populations over historical time naturally focuses attention on the

question of how each individual’s preferences are formed and the plas-

ticity of adult preferences. Our preferences are certainly influenced

by our biology: we usually avoid pain, hunger, and thirst, for example.

Our biology has remained fairly constant, however, so the remaining

preferences must arise from cultural and family influences. Quite

possibly, children are constantly running some form of inverse re-

inforcement learning to identify the preferences of parents and peers

in order to explain their behavior; children then adopt these prefer-

ences as their own. Even as adults, our preferences evolve through the

influence of the media, government, friends, employers, and our own

direct experiences. It may be the case, for example, that many sup-

porters of the Third Reich did not start out as genocidal sadists thirst-

ing for racial purity.

Preference change presents a challenge for theories of rationality at

both the individual and societal level. For example, Harsanyi’s princi-

ple of preference autonomy seems to say that everyone is entitled to

whatever preferences they have and no one else should touch them.

Far from being untouchable, however, preferences are touched and

modified all the time, by every experience a person has. Machines

cannot help but modify human preferences, because machines modify

human experiences.

It’s important, although sometimes difficult, to separate preference

change from preference update, which occurs when an initially uncer-

tain Harriet learns more about her own preferences through experi-

ence. Preference update can fill in gaps in self- knowledge and perhaps

M 9780525558613_Human_TX.indd 241 8/7/19 11:21 PM

Not

rity

ce change

change

individual individual

eferenceeferenc

for

may b

d Reich did

d Reich di

Distribution

ainly

thirst, for

hirst, for

wever, so thever, so th

family infmily in

ing some fong some f

preferencereference

or; childrenchildren

dults, our prdults, our

ernment,

the

242 HUMAN COMPATIBLE

add definiteness to preferences that were previously weakly held and

provisional. Preference change, on the other hand, is not a process that

results from additional evidence about what one’s preferences actually

are. In the extreme case, you can imagine it as resulting from drug

administration or even brain surgery— it occurs from processes we

may not understand or agree with.

Preference change is problematic for at least two reasons. The

first reason is that it’s not clear which preferences should hold sway

when making a decision: the preferences that Harriet has at the time

of the decision or the preferences that she will have during and after

the events that result from her decision. In bioethics, for example, this

is a very real dilemma because people’s preferences about medical

interventions and end- of- life care do change, often dramatically, after

they become seriously ill.

Assuming these changes do not result

from diminished intellectual capacity, whose preferences should be

respected?

The second reason that preference change is problematic is that

there seems to be no obvious rational basis for changing (as opposed to

updating) one’s preferences. If Harriet prefers A to B, but could choose

to undergo an experience that she knows will result in her preferring

B to A, why would she ever do that? The outcome would be that she

would then choose B, which she currently does not want.

The issue of preference change appears in dramatic form in the

legend of Ulysses and the Sirens. The Sirens were mythical beings

whose singing lured sailors to their doom on the rocks of certain is-

lands in the Mediterranean. Ulysses, wishing to hear the song, ordered

his sailors to plug their ears with wax and to bind him to the mast;

under no circumstances were they to obey his subsequent entreaties

to release him. Obviously, he wanted the sailors to respect the pref-

erences he had initially, not the preferences he would have after the

Sirens bewitch him. This legend became the title of a book by the

Norwegian philosopher Jon Elster,

dealing with weakness of will and

other challenges to the theoretical idea of rationality.

9780525558613_Human_TX.indd 242 8/7/19 11:21 PM

Not

uld

hoose B, w

ose B, w

sue of prefsue of pre

UlyssesUlysses

for

nces. If

rience that

ience that

she ever

he ever

Distribution

urin

s, for exam

for exam

rences aboences abo

, often dramten dra

these chanhese chan

y, whose pwhose p

eference cheference

rational b

Harr

COMPLICATIONS: US 243

Why might an intelligent machine deliberately set out to modify

the preferences of humans? The answer is quite simple: to make the

preferences easier to satisfy. We saw this in Chapter 1 with the case of

social- media click- through optimization. One response might be to

say that machines must treat human preferences as sacrosanct: noth-

ing can be allowed to change the human’s preferences. Unfortunately,

this is completely impossible. The very existence of a useful robot aide

is likely to have an effect on human preferences.

One possible solution is for machines to learn about human

meta-

preferences—that is, preferences about what kinds of preference change

processes might be acceptable or unacceptable. Notice the use of

“preference change processes” rather than “preference changes” here.

That’s because wanting one’s preferences to change in a specific direc-

tion often amounts to having that preference already; what’s really

wanted in such a case is the ability to be better at implementing the

preference. For example, if Harriet says, “I want my preferences to

change so that I don’t want cake as much as I do now,” then she already

has a preference for a future with less cake consumption; what she

really wants is to alter her cognitive architecture so that her behavior

more closely reflects that preference.

By “preferences about what kinds of preference change processes

might be acceptable or unacceptable,” I mean, for example, a view that

one may end up with “better” preferences by traveling the world and

experiencing a wide variety of cultures, or by participating in a vibrant

intellectual community that thoroughly explores a wide range of

moral traditions, or by setting aside some hermit time for introspec-

tion and hard thinking about life and its meaning. I’ll call these pro-

cesses preference-neutral, in the sense that one does not anticipate

that the process will change one’s preferences in any particular direc-

tion, while recognizing that some may strongly disagree with that

characterization.

Of course, not all preference- neutral processes are desirable—

for example, few people expect to develop “better” preferences by

M 9780525558613_Human_TX.indd 243 8/7/19 11:21 PM

Not

nces

eptable o

ptable o

end up witend up w

cing a wcing a w

for

her co

er c

ts that pref

ts that pre

about w

Distribution

refer

Notice th

eference chference ch

change in ange in

ference alrerence alr

to be betto be bett

riet says, “Iet says, “I

ke as much ke as muc

ure with le

ure with l

gniti

244 HUMAN COMPATIBLE

whacking themselves on the head. Subjecting oneself to an acceptable

process of preference change is analogous to running an experiment to

find out something about how the world works: you never know in ad-

vance how the experiment will turn out, but you expect, nonetheless,

to be better off in your new mental state.

The idea that there are acceptable routes to preference modifica-

tion seems related to the idea that there are acceptable methods of

behavior modification whereby, for example, an employer engineers

the choice situation so that people make “better” choices about saving

for retirement. Often this can be done by manipulating the “ non-

rational” factors that influence choice, rather than by restricting

choices or taxing “bad” choices. Nudge, a book by economist Richard

Thaler and legal scholar Cass Sunstein, lays out a wide range of sup-

posedly acceptable methods and opportunities to “influence people’s

behavior in order to make their lives longer, healthier, and better.”

It’s unclear whether behavior modification methods are really just

modifying behavior. If, when the nudge is removed, the modified be-

havior persists— which is presumably the desired outcome of such

interventions— then something has changed in the individual’s cogni-

tive architecture (the thing that turns underlying preferences into be-

havior) or in the individual’s underlying preferences. It’s quite likely to

be a bit of both. What is clear, however, is that the nudge strategy is

assuming that everyone shares a preference for “longer, healthier, and

better” lives; each nudge is based on a particular definition of a “bet-

ter” life, which seems to go against the grain of preference autonomy.

It might be better, instead, to design preference- neutral assistive pro-

cesses that help people bring their decisions and their cognitive archi-

tectures into better alignment with their underlying preferences. For

example, it’s possible to design cognitive aides that highlight the

longer- term consequences of decisions and teach people to recognize

the seeds of those consequences in the present.

That we need a better understanding of the processes whereby

human preferences are formed and shaped seems obvious, not least

9780525558613_Human_TX.indd 244 8/7/19 11:21 PM

Not

ind

th. What

. What

that everyothat every

es; eaches; each

for

methin

e thing tha

e thing th

vidual’s

Distribution

ating

han by re

n by re

by economby econom

out a widet a wid

unities to “inities to “

longer, healnger, hea

modificatioodificatio

he nudge ishe nudge

resumably

resumabl

ghas

COMPLICATIONS: US 245

because such an understanding would help us design machines that

avoid accidental and undesirable changes in human preferences of the

kind wrought by social- media content selection algorithms. Armed

with such an understanding, of course, we will be tempted to engi-

neer changes that would result in a “better” world.

Some might argue that we should provide much greater opportu-

nities for preference- neutral “improving” experiences such as travel,

debate, and training in analytical and critical thinking. We might, for

example, provide opportunities for every high- school student to live

for a few months in at least two other cultures distinct from his or

her own.

Almost certainly, however, we will want to go further—for exam-

ple, by instituting social and educational reforms that increase the co-

efficient of altruism— the weight that each individual places on the

welfare of others— while decreasing the coefficients of sadism, pride,

and envy. Would this be a good idea? Should we recruit our machines

to help in the process? It’s certainly tempting. Indeed, Aristotle him-

self wrote, “The main concern of politics is to engender a certain char-

acter in the citizens and to make them good and disposed to perform

noble actions.” Let’s just say that there are risks associated with inten-

tional preference engineering on a global scale. We should proceed

with extreme caution.

M 9780525558613_Human_TX.indd 245 8/7/19 11:21 PM

Not

ce e

e caution

caution

for

nd to m

s just say th

s just say t

ngineer

Distribution

nct

go furthergo further

forms that ims that

each indivieach indivi

the coeffiche coeffic

dea? Shoulda? Should

rtainly temrtainly tem

ern of polit

ern of pol

ake

PROBLEM SOLVED?

f we succeed in creating provably beneficial AI systems, we would

eliminate the risk that we might lose control over superintelligent

machines. Humanity could proceed with their development and

reap the almost unimaginable benefits that would flow from the

ability to wield far greater intelligence in advancing our civilization.

We would be released from millennia of servitude as agricultural,

industrial, and clerical robots and we would be free to make the

best of life’s potential. From the vantage point of this golden age, we

would look back on our lives in the present time much as Thomas

Hobbes imagined life without government: solitary, poor, nasty, brut-

ish, and short.

Or perhaps not. Bondian villains may circumvent our safeguards

and unleash uncontrollable superintelligences against which human-

ity has no defense. And if we survive that, we may find ourselves

gradually enfeebled as we entrust more and more of our knowledge

and skills to machines. The machines may advise us not to do this,

understanding the long- term value of human autonomy, but we may

overrule them.

9780525558613_Human_TX.indd 246 8/7/19 11:21 PM

Not

cler

potential.

tential.

ok back on ok back on

maginedmagined

for

ater in

r i

sed from m

sed from

ical rob

cal rob

Distribution

beneficial Aneficial A

ht lose contlose cont

proceed wproceed

ble benef

ble bene

ellig

PROBLEM SOLVED? 247

Beneficial Machines

The standard model underlying a good deal of twentieth- century

technology relies on machinery that optimizes a fixed, exogenously

supplied objective. As we have seen, this model is fundamentally

flawed. It works only if the objective is guaranteed to be complete and

correct, or if the machinery can easily be reset. Neither condition will

hold as AI becomes increasingly powerful.

If the exogenously supplied objective can be wrong, then it makes

no sense for the machine to act as if it is always correct. Hence my

proposal for beneficial machines: machines whose actions can be ex-

pected to achieve our objectives. Because these objectives are in us,

and not in them, the machines will need to learn more about what we

really want from observations of the choices we make and how we

make them. Machines designed in this way will defer to humans: they

will ask permission; they will act cautiously when guidance is unclear;

and they will allow themselves to be switched off.

While these initial results are for a simplified and idealized set-

ting, I believe they will survive the transition to more realistic set-

tings. Already, my colleagues have successfully applied the same

approach to practical problems such as self- driving cars interacting

with human drivers.

For example, self- driving cars are notoriously

bad at handling four- way stop signs when it’s not clear who has the

right of way. By formulating this as an assistance game, however, the

car comes up with a novel solution: it actually backs up a little bit to

show that it’s definitely not planning to go first. The human under-

stands this signal and goes ahead, confident that there will be no col-

lision. Obviously, we human experts could have thought of this

solution and programmed it into the vehicle, but that’s not what hap-

pened; this is a form of communication that the vehicle invented en-

tirely by itself.

M 9780525558613_Human_TX.indd 247 8/7/19 11:21 PM

Not

practical

ractica

man driversn drive

andling andling

for

l resul

esu

y will survi

will surv

y collea

collea

Distribution

g, th

correct. H

orrect. H

hose actionose action

these objecse obje

d to learn mto learn m

he choices wchoices

n this way wthis way w

act cautiouact cautio

lves to be

sare

248 HUMAN COMPATIBLE

As we gain more experience in other settings, I expect that we will

be surprised by the range and fluency of machine behaviors as they

interact with humans. We are so used to the stupidity of machines

that execute inflexible, preprogrammed behaviors or pursue definite

but incorrect objectives that we may be shocked by how sensible they

become. The technology of provably beneficial machines is the core of

a new approach to AI and the basis for a new relationship between

humans and machines.

It seems possible, also, to apply similar ideas to the redesign of

other “machines” that ought to be serving humans, beginning with

ordinary software systems. We are taught to build software by com-

posing subroutines, each of which has a well- defined specification that

says what the output should be for any given input— just like the

square- root button on a calculator. This specification is the direct an-

alog of the objective given to an AI system. The subroutine is not

supposed to terminate and return control to the higher layers of the

software system until it has produced an output that meets the speci-

fication. (This should remind you of the AI system that persists in its

single- minded pursuit of its given objective.) A better approach would

be to allow for uncertainty in the specification. For example, a subrou-

tine that carries out some fearsomely complicated mathematical com-

putation is typically given an error bound that defines the required

precision for the answer and has to return a solution that is correct

within that error bound. Sometimes, this may require weeks of com-

putation. Instead, it might be better to be less precise about the al-

lowed error, so that the subroutine could come back after twenty

seconds and say, “I’ve found a solution that’s this good. Is that OK or

do you want me to continue?” In some cases, the question may perco-

late all the way to the top level of the software system, so that the

human user can provide further guidance to the system. The human’s

answers would then help in refining the specifications at all levels.

The same kind of thinking can be applied to entities such as gov-

ernments and corporations. The obvious failings of government in-

9780525558613_Human_TX.indd 248 8/7/19 11:21 PM

Not

out

ypically g

ically g

for the ansfor the an

at error at error

for

of its gi

ts g

rtainty in th

tainty in t

some fe

ome fe

Distribution

begin

software

oftware

fined ined

specifispecif

iven n

nput—nput

specificatiopecificatio

system. Tystem. T

n control toontrol to

oduced an ooduced an

d you of t

en o

PROBLEM SOLVED? 249

clude paying too much attention to the preferences (financial as well

as political) of those in government and too little attention to the pref-

erences of the governed. Elections are supposed to communicate pref-

erences to the government, but they seem to have a remarkably small

bandwidth (on the order of one byte of information every few years)

for such a complex task. In far too many countries, government is sim-

ply a means for one group of people to impose its will on others. Cor-

porations go to greater lengths to learn the preferences of customers,

whether through market research or direct feedback in the form of

purchase decisions. On the other hand, the molding of human prefer-

ences through advertising, cultural influences, and even chemical ad-

diction is an accepted way of doing business.

Governance of AI

AI has the power to reshape the world, and the process of reshaping

will have to be managed and guided in some way. If the sheer number

of initiatives to develop effective governance of AI is any guide, then

we are in excellent shape. Everyone and their uncle is setting up a

Board or a Council or an International Panel. The World Economic

Forum has identified nearly three hundred separate efforts to develop

ethical principles for AI. My email inbox can be summarized as one

long invitation to the Global World Summit Conference Forum on the

Future of International Governance of the Social and Ethical Impacts

of Emerging Artificial Intelligence Technologies.

This is all very different from what happened with nuclear tech-

nology. After World War II, the United States held all the nuclear

cards. In 1953, US president Dwight Eisenhower proposed to the UN

an international body to regulate nuclear technology. In 1957, the In-

ternational Atomic Energy Agency started work; it is the sole global

overseer for the safe and beneficial development of nuclear energy.

In contrast, many hands hold AI cards. To be sure, the United

M 9780525558613_Human_TX.indd 249 8/7/19 11:21 PM

Not

dentified

ntified

rinciples forinciples f

ation toation to

for

p effec

eff

t shape. Ev

t shape. E

l or an

or an

Distribution

of hu

d even che

even che

the world, the world

d guided i

d guided

tive

250 HUMAN COMPATIBLE

States, China, and the EU fund a lot of AI research, but almost all of

it occurs outside secure national laboratories. AI researchers in univer-

sities are part of a broad, cooperative international community, glued

together by shared interests, conferences, cooperative agreements, and

professional societies such as AAAI (the Association for the Advance-

ment of Artificial Intelligence) and IEEE (the Institute of Electrical

and Electronics Engineers, which includes tens of thousands of AI re-

searchers and practitioners). Probably the majority of investment in AI

research and development is now occurring within corporations, large

and small; the leading players as of 2019 are Google (including Deep-

Mind), Facebook, Amazon, Microsoft, and IBM in the United States

and Tencent, Baidu, and, to some extent, Alibaba in China— all among

the largest corporations in the world.

All but Tencent and Alibaba are

members of the Partnership on AI, an industry consortium that in-

cludes among its tenets a promise of cooperation on AI safety. Finally,

although the vast majority of humans possess little in the way of AI

expertise, there is at least a superficial willingness among other play-

ers to take the interests of humanity into account.

These, then, are the players who hold the majority of the cards.

Their interests are not in perfect alignment but all share a desire to

maintain control over AI systems as they become more powerful.

(Other goals, such as avoiding mass unemployment, are shared by gov-

ernments and university researchers, but not necessarily by corpora-

tions that expect to profit in the short term from the widest possible

deployment of AI.) To cement this shared interest and achieve coordi-

nated action, there are organizations with convening power, which

means, roughly, that if the organization sets up a meeting, people ac-

cept the invitation to participate. In addition to the professional soci-

eties, which can bring AI researchers together, and the Partnership

on AI, which combines corporations and nonprofit institutes, the ca-

nonical conveners are the UN (for governments and researchers) and

the World Economic Forum (for governments and corporations). In

addition, the G7 has proposed an International Panel on Artificial

9780525558613_Human_TX.indd 250 8/7/19 11:21 PM

Not

l ov

such as av

ch as av

and univerand unive

expect expect

for

e playe

play

not in perfe

not in per

ver AI s

er AI s

Distribution

nclud

the Unite

he Unite

a in in

hina—hina—

t Tencent ancent a

ndustry condustry co

cooperationoperation

mans possesns posses

perficial wiperficial w

umanity in

umanity i

rs w

PROBLEM SOLVED? 251

Intelligence, hoping that it will grow into something like the UN’s

Intergovernmental Panel on Climate Change. Important- sounding re-

ports are multiplying like rabbits.

With all this activity, is there any prospect of actual progress on

governance occurring? Perhaps surprisingly, the answer is yes, at least

around the edges. Many governments around the world are equipping

themselves with advisory bodies to help with the process of develop-

ing regulations; perhaps the most prominent example is the EU’s

High- Level Expert Group on Artificial Intelligence. Agreements,

rules, and standards are beginning to emerge for issues such as user

privacy, data exchange, and avoiding racial bias. Governments and

corporations are working hard to sort out the rules for self- driving

cars— rules that will inevitably have cross- border elements. There is a

consensus that AI decisions must be explainable if AI systems are to

be trusted, and that consensus is already partially implemented in the

EU’s GDPR legislation. In California, a new law forbids AI systems to

impersonate humans in certain circumstances. These last two items—

explainability and impersonation— certainly have some bearing on

issues of AI safety and control.

At present, there are no implementable recommendations that can

be made to governments or other organizations considering the issue

of maintaining control over AI systems. A regulation such as “AI sys-

tems must be safe and controllable” would carry no weight, because

these terms do not yet have precise meanings and because there is no

widely known engineering methodology for ensuring safety and con-

trollability. But let’s be optimistic and imagine that, a few years down

the line, the validity of the “provably beneficial” approach to AI has

been established through both mathematical analysis and practical re-

alization in the form of useful applications. We might, for example,

have personal digital assistants that we can trust to use our credit

cards, screen our calls and emails, and manage our finances because

they have adapted to our individual preferences and know when it’s

OK to go ahead and when it’s better to ask for guidance. Our

M 9780525558613_Human_TX.indd 251 8/7/19 11:21 PM

Not

ern

ng contro

g contro

st be safe ast be safe

ms do nms do n

for

contro

ont

re are no im

e are no im

ments o

Distribution

ues s

Governm

e rules for rules for

order elemder elem

plainable ifplainable i

eady partialdy partia

rnia, a newia, a new

n circumstn circums

onation—

252 HUMAN COMPATIBLE

self- driving cars may have learned good manners for interacting with

one another and with human drivers, and our domestic robots should

be interacting smoothly with even the most recalcitrant toddler. With

luck, no cats will have been roasted for dinner and no whale meat will

have been served to members of the Green Party.

At that point, it might be feasible to specify software design tem-

plates to which various kinds of applications must conform in order to

be sold or connected to the Internet, just as applications have to pass

a number of software tests before they can be sold on Apple’s App

Store or Google Play. Software vendors could propose additional tem-

plates, as long as they come with proofs that the templates satisfy the

(by then well- defined) requirements of safety and controllability.

There would be mechanisms for reporting problems and for updating

software systems that produce undesirable behavior. It would make

sense also to create professional codes of conduct around the idea of

provably safe AI programs and to integrate the corresponding theo-

rems and methods into the curriculum for aspiring AI and machine

learning practitioners.

To a seasoned observer of Silicon Valley, this may sound rather

naïve. Regulation of any kind is strenuously opposed in the Valley.

Whereas we are accustomed to the idea that pharmaceutical compa-

nies have to show safety and (beneficial) efficacy through clinical tri-

als before they can release a product to the general public, the software

industry operates by a different set of rules—namely, the empty set.

A “bunch of dudes chugging Red Bull”

at a software company can

unleash a product or an upgrade that affects literally billions of people

with no third- party oversight whatsoever.

Inevitably, however, the tech industry is going to have to acknowl-

edge that its products matter; and, if they matter, then it matters that

the products not have harmful effects. This means that there will be

rules governing the nature of interactions with humans, prohibiting

designs that, say, consistently manipulate preferences or produce ad-

dictive behavior. I have no doubt that the transition from an unregu-

9780525558613_Human_TX.indd 252 8/7/19 11:21 PM

Not

acc

how safet

w safet

they can relthey can re

peratesperates

for

rver o

f any kind

fany kind

ustomed

ustome

Distribution

addit

mplates sa

plates sa

y and conty and con

roblems anlems an

ble behavioble behavio

es of conduof condu

o integrate ntegrate

urriculum furriculum

Sili

PROBLEM SOLVED? 253

lated to a regulated world will be a painful one. Let’s hope it doesn’t

require a Chernobyl- sized disaster (or worse) to overcome the indus-

try’s resistance.

Misuse

Regulation might be painful for the software industry, but it would be

intolerable for Dr. Evil, plotting world domination in his secret under-

ground bunker. There is no doubt that criminal elements, terrorists,

and rogue nations would have an incentive to circumvent any con-

straints on the design of intelligent machines so that they could be

used to control weapons or to devise and carry out criminal activities.

The danger is not so much that the evil schemes would succeed; it is

that they would fail by losing control over poorly designed intelligent

systems— particularly ones imbued with evil objectives and granted

access to weapons.

This is not a reason to avoid regulation— after all, we have laws

against murder even though they are often circumvented. It does,

however, create a very serious policing problem. Already, we are losing

the battle against malware and cybercrime. (A recent report estimates

over two billion victims and an annual cost of around $600 billion.

)

Malware in the form of highly intelligent programs would be much

harder to defeat.

Some, including Nick Bostrom, have proposed that we use our

own, beneficial superintelligent AI systems to detect and destroy any

malicious or otherwise misbehaving AI systems. Certainly, we should

use the tools at our disposal, while minimizing the impact on personal

freedom, but the image of humans huddling in bunkers, defenseless

against the titanic forces unleashed by battling superintelligences, is

hardly reassuring even if some of them are on our side. It would be far

better to find ways to nip the malicious AI in the bud.

A good first step would be a successful, coordinated, international

M 9780525558613_Human_TX.indd 253 8/7/19 11:21 PM

Not

st m

t m

ion victim

n victim

in the formin the for

defeat.defeat

for

though

oug

ery serious

malware

alware

Distribution

ment

rcumvent

umvent

so that thso that th

arry out crimout cri

l schemes wschemes w

ol over poorover poo

ued with evd with ev

avoid

egeg

the

254 HUMAN COMPATIBLE

campaign against cybercrime, including expansion of the Budapest

Convention on Cybercrime. This would form an organizational tem-

plate for possible future efforts to prevent the emergence of uncon-

trolled AI programs. At the same time, it would engender a broad

cultural understanding that creating such programs, either deliber-

ately or inadvertently, is in the long run a suicidal act comparable to

creating pandemic organisms.

(QIHHEOHPHQWDQG+XPDQ$XWRQRP\

E. M. Forster’s most famous novels, including Howards End and A

Passage to India, examined British society and its class system in the

early part of the twentieth century. In 1909, he wrote one notable

science- fiction story: “The Machine Stops.” The story is remarkable

for its prescience, including depictions of (what we would now call)

the Internet, videoconferencing, iPads, massive open online courses

(MOOCs), widespread obesity, and avoidance of face- to- face con-

tact. The Machine of the title is an all- encompassing intelligent infra-

structure that meets all human needs. Humans become increasingly

dependent on it, but they understand less and less about how it

works. Engineering knowledge gives way to ritualized incantations

that eventually fail to stem the gradual deterioration of the Machine’s

workings. Kuno, the main character, sees what is unfolding but is

power less to stop it:

Cannot you see... that it is we that are dying, and that down here

the only thing that really lives is the Machine? We created the

Machine to do our will, but we cannot make it do our will now. It

has robbed us of the sense of space and of the sense of touch, it has

blurred every human relation, it has paralysed our bodies and our

wills.... We only exist as the blood corpuscles that course through

its arteries, and if it could work without us, it would let us die. Oh,

9780525558613_Human_TX.indd 254 8/7/19 11:21 PM

, b

eering kn

ring kn

ually fail toually fail t

Kuno, Kuno,

for

e title

itl

s all human

s all huma

ut they

Distribution

Howards EHowards E

nd its classits class

1909, he w1909, he w

Stops.” Theops.” Th

ctions of (wons of (w

ng, iPads, mng, iPads,

sity, and

san

PROBLEM SOLVED? 255

I have no remedy—or, at least, only one—to tell men again and

again that I have seen the hills of Wessex as Aelfrid saw them

when he overthrew the Danes.

More than one hundred billion people have lived on Earth. They

(we) have spent on the order of one trillion person- years learning and

teaching, in order that our civilization may continue. Up to now, its

only possibility for continuation has been through re- creation in the

minds of new generations. (Paper is fine as a method of transmission,

but paper does nothing until the knowledge recorded thereon reaches

the next person’s mind.) That is now changing: increasingly, it is pos-

sible to place our knowledge into machines that, by themselves, can

run our civilization for us.

Once the practical incentive to pass our civilization on to the next

generation disappears, it will be very hard to reverse the process. One

trillion years of cumulative learning would, in a real sense, be lost. We

would become passengers in a cruise ship run by machines, on a cruise

that goes on forever— exactly as envisaged in the film WA L L- E.

A good consequentialist would say, “Obviously this is an undesir-

able consequence of the overuse of automation! Suitably designed

machines would never do this!” True, but think what this means. Ma-

chines may well understand that human autonomy and competence

are important aspects of how we prefer to conduct our lives. They

may well insist that humans retain control and responsibility for their

own well-being—in other words, machines will say no. But we myo-

pic, lazy humans may disagree. There is a tragedy of the commons at

work here: for any individual human, it may seem pointless to engage

in years of arduous learning to acquire knowledge and skills that ma-

chines already have; but if everyone thinks that way, the human race

will, collectively, lose its autonomy.

The solution to this problem seems to be cultural, not techni-

cal. We will need a cultural movement to reshape our ideals and

preferences towards autonomy, agency, and ability and away from

M 9780525558613_Human_TX.indd 255 8/7/19 11:21 PM

Not

d ne

well unde

ell und

ortant aspecortant aspe

insist tinsist t

for

tialist

list

of the ove

of the ov

ver do t

Distribution

ther

creasingly,

easingly

hat, by themat, by them

our civilizaour civiliza

y hard to revard to re

ing would, ig would, i

cruise shipcruise sh

tly as envi

woul

256 HUMAN COMPATIBLE

self- indulgence and dependency— if you like, a modern, cultural

version of ancient Sparta’s military ethos. This would mean human

preference engineering on a global scale along with radical changes in

how our society works. To avoid making a bad situation worse, we

might need the help of superintelligent machines, both in shaping the

solution and in the actual process of achieving a balance for each

individual.

Any parent of a small child is familiar with this process. Once the

child is beyond the helpless stage, parenting requires an ever- evolving

balance between doing everything for the child and leaving the child

entirely to his or her own devices. At a certain stage, the child comes

to understand that the parent is perfectly capable of tying the child’s

shoelaces but is choosing not to. Is that the future for the human

race— to be treated like a child, forever, by far superior machines? I

suspect not. For one thing, children cannot switch their parents off.

(Thank goodness!) Nor will we be pets or zoo animals. There is really

no analog in our present world to the relationship we will have with

beneficial intelligent machines in the future. It remains to be seen

how the endgame turns out.

9780525558613_Human_TX.indd 256 8/7/19 11:21 PM

for

out.

ut.

Distribution

avin

ge, the chil

the chi

ble of tying le of tying

he future ffuture

by far supby far sup

cannot swinnot sw

e pets or zooets or zoo

d to the reld to the re

nes in the

nes in th

Appendix A

SEARCHING FOR SOLUTIONS

hoosing an action by looking ahead and considering the out-

comes of different possible action sequences is a fundamental

capability for intelligent systems. It’s something your cell

phone does whenever you ask it for directions. Figure 14 shows a typ-

ical example: getting from the current location, Pier 19, to the goal,

Coit Tower. The algorithm needs to know what actions are available

to it; typically, for map navigation, each action traverses a road seg-

ment connecting two adjacent intersections. In the example, from Pier

19 there is just one action: turn right and drive along the Embarcadero

to the next intersection. Then there is a choice: continue on or take a

sharp left onto Battery Street. The algorithm systematically explores

all these possibilities until it eventually finds a route. Typically we add

a little bit of commonsense guidance, such as a preference for explor-

ing streets that head towards the goal rather than away from it. With

this guidance and a few other tricks, the algorithm can find optimal

solutions very quickly—usually in a few milliseconds, even for a cross-

country trip.

Searching for routes on maps is a natural and familiar example, but

it may be a bit misleading because the number of distinct locations is

so small. In the United States, for example, there are only about ten

M 9780525558613_Human_TX.indd 257 8/7/19 11:21 PM

Not

g tw

ust one ac

one ac

xt intersectintersec

t onto Bt onto B

for

ithm n

map navig

map navi

wo adjac

o adjac

Distribution

head and coead and co

action sequeion sequ

t systems. systems.

k it for direcit for dire

the curre

eeds

258 HUMAN COMPATIBLE

million intersections. That may seem like a large number, but it is tiny

compared to the number of distinct states in the 15-puzzle. The

15- puzzle is a toy with a four- by- four grid containing fifteen num-

bered tiles and a blank space. The goal is to move the tiles around to

achieve a goal configuration, such as having all the tiles in numerical

order. The 15-puzzle has about ten trillion states (a million times big-

ger than the United States!); the 24- puzzle has about eight trillion

trillion states. This is an example of what mathematicians call combi-

natorial complexity— the rapid explosion in the number of combina-

tions as the number of “moving parts” of a problem increases. Returning

to the map of the United States: if a trucking company wants to opti-

mize the movements of its one hundred trucks across the United

States, the number of possible states to consider would be ten million

to the power of one hundred (i.e., 10

700

FIGURE $PDSRISDUWRI6DQ)UDQFLVFRVKRZLQJWKHLQLWLDOORFDWLRQDW3LHU

DQGWKHGHVWLQDWLRQDW&RLW7RZHU

Coit

Tower

Pier 19

9780525558613_Human_TX.indd 258 8/7/19 11:21 PM

oy w

d a blank

a blank

goal configgoal confi

e 15e 15

uzu

for

hat ma

t m

umber of d

umber of

with a fo

with a f

Distr

ysee

tribu

rib

VFRVKRZLQJRVKRZLQJ

APPENDIX A: SEARCHING FOR SOLUTIONS 259

Giving up on rational decisions

Many games have this property of combinatorial complexity, includ-

ing chess, checkers, backgammon, and Go. Because the rules of Go are

simple and elegant (figure 15), I’ll use it as a running example. The ob-

jective is clear enough: win the game by surrounding more territory

than your opponent. The possible actions are clear too: put a stone in an

empty location. Just as with navigation on a map, the obvious way to

decide what to do is to imagine different futures that result from differ-

ent sequences of actions and choose the best one. You ask, “If I do this,

what might my opponent do? And what do I do then?” This idea is illus-

trated in figure 16 for 3×3 Go. Even for 3×3 Go, I can show only a small

part of the tree of possible futures, but I hope the idea is clear enough.

Indeed, this way of making decisions seems to be just straightforward

common sense.

FIGURE  $ *R ERDUG

partway through Game 5 of

WKH  /* &XS ILQDO EH-

WZHHQ/HH6HGROEODFNDQG

Choe Myeong-hun (white).

%ODFN DQG :KLWH WDNH WXUQV

placing a single stone on any

XQRFFXSLHG ORFDWLRQ RQ WKH

ERDUG+HUHLWLV%ODFN·VWXUQ

WR PRYH DQG WKHUH DUH 

SRVVLEOH PRYHV (DFK VLGH

DWWHPSWV WR VXUURXQG DV

much territory as possible.

)RU H[DPSOH :KLWH KDV

JRRG FKDQFHV WR ZLQ WHUUL-

WRU\ DW WKH OHIWKDQG HGJH

DQGRQWKHOHIWVLGHRIWKHERWWRPHGJHZKLOH%ODFNPD\ZLQWHUULWRU\LQWKH

WRSULJKWDQGERWWRPULJKWFRUQHUV$NH\FRQFHSWLQ*RLVWKDWRIDgroup—

WKDWLVDVHWRIVWRQHVRIWKHVDPHFRORUWKDWDUHFRQQHFWHGWRRQHDQRWKHUE\

YHUWLFDORUKRUL]RQWDODGMDFHQF\$JURXSUHPDLQV DOLYHDVORQJDVWKHUHLVDW

OHDVWRQHHPSW\VSDFHQH[WWRLWLILWLVFRPSOHWHO\VXUURXQGHGZLWKQRHPSW\

VSDFHVLWGLHVDQGLVUHPRYHGIURPWKHERDUG

M 9780525558613_Human_TX.indd 259 8/7/19 11:21 PM

Not

for

istribution

ask,

n?” This id

” This id

I can showI can show

pe the idea he idea

ems to be jms to be j

Dist

ist

260 HUMAN COMPATIBLE

The problem is that Go has more than 10

170

possible positions for

the full 19×19 board. Whereas finding a guaranteed shortest route on

a map is relatively easy, finding a guaranteed win in Go is utterly in-

feasible. Even if the algorithm ponders for the next billion years, it can

explore only a tiny fraction of the whole tree of possibilities. This

leads to two questions. First, which part of the tree should the pro-

gram explore? And second, which move should the program make,

given the partial tree that it has explored?

To answer the second question first: the basic idea used by almost

all lookahead programs is to assign an estimated value to the “leaves” of

+3+5

Black to move

White to move

Black to move

FIGURE 3DUWRIWKHJDPHWUHHIRU×*R6WDUWLQJIURPWKHHPSW\LQLWLDO

VWDWH VRPHWLPHV FDOOHGWKHroot RI WKHWUHH%ODFNFDQ FKRRVH RQHRIWKUHH

SRVVLEOHGLVWLQFWPRYHV7KHRWKHUVDUHV\PPHWULFZLWKWKHVH,WZRXOGWKHQ

EH:KLWH·VWXUQWRPRYH,I%ODFNFKRRVHVWRSOD\LQWKHFHQWHU:KLWHKDVWZR

GLVWLQFWPRYHV³FRUQHURUVLGH³WKHQ%ODFNZRXOGJHWWRSOD\DJDLQ%\LPDJ-

LQLQJWKHVHSRVVLEOHIXWXUHV%ODFNFDQFKRRVHZKLFKPRYHWRSOD\LQWKHLQLWLDO

VWDWH,I%ODFNLVXQDEOHWRIROORZHYHU\SRVVLEOHOLQHRISOD\WRWKHHQGRIWKH

JDPHWKHQDQHYDOXDWLRQIXQFWLRQFDQEHXVHGWRHVWLPDWHKRZJRRGWKHSRVL-

WLRQVDUHDWWKHOHDYHVRIWKHWUHH+HUHWKHHYDOXDWLRQIXQFWLRQDVVLJQVDQG

WRWZRRIWKHOHDYHV

9780525558613_Human_TX.indd 260 8/7/19 11:21 PM

Not

oblem ioblem

HDYH

DYH

WKHOHDYHVHDYH

for

V%OD

%OD

HWRIROORZH

RIROORZ

DWLRQIXQFWLR

WLRQIXQFWLR

RIWKHWRIWKHW

Distribution

+33

tio

but

buti

ibu

*R6WDUWLQJR6WDUWLQJ

HWUHH%ODFNUHH%ODFN

VDUHV\PPHWDUHV\PPHW

NFKRRVHVWRSNFKRRVHVWRS

KHQ%ODFQ%OD

FDQFFDQF

APPENDIX A: SEARCHING FOR SOLUTIONS 261

the tree— those states furthest in the future— and then “work back” to

find out how good the choices are at the root.

For example, looking at

the two positions at the bottom of figure 16, one might guess a value

of + 5 (from Black’s viewpoint) for the position on the left and + 3 for

the position on the right, because White’s stone in the corner is much

more vulnerable than the one on the side. If these values are right,

then Black can expect that White will play on the side, leading to the

right- hand position; hence, it seems reasonable to assign a value of + 3

to Black’s initial move in the center. With slight variations, this is the

scheme used by Arthur Samuel’s checker- playing program to beat its

creator in 1955,

by Deep Blue to beat the then world chess champion,

Garry Kasparov, in 1997, and by AlphaGo to beat former world Go

champion Lee Sedol in 2016. For Deep Blue, humans wrote the piece

of the program that evaluates positions at the leaves of the tree, based

largely on their knowledge of chess. For Samuel’s program and for

AlphaGo, the programs learned it from thousands or millions of prac-

tice games.

The first question— which part of the tree should the program

explore?— is an example of one of the most important questions in AI:

What computations should an agent do? For game- playing programs, it

is vitally important because they have only a small, fixed allocation of

time, and using it on pointless computations is a sure way to lose. For

humans and other agents operating in the real world, it is even more

important because the real world is so much more complex: unless

chosen well, no amount of computation is going to make the smallest

dent in the problem of deciding what to do. If you are driving and a

moose walks into the middle of the road, it’s no use thinking about

whether to trade euros for pounds or whether Black should make its

first move in the center of the Go board.

The ability of humans to manage their computational activity so

that reasonable decisions get made reasonably quickly is at least as re-

markable as their ability to perceive and to reason correctly. And it

seems to be something we acquire naturally and effortlessly: when my

M 9780525558613_Human_TX.indd 261 8/7/19 11:21 PM

Not

tant

ing it on

g it on

and other aand other

nt becaunt becau

for

le of o

should an

because

becaus

Distribution

ogram

rld chess c

d chess c

beat formebeat forme

e, humans umans

at the leavet the leave

s. For SamuFor Sam

t from thoufrom thou

hich part

eof

262 HUMAN COMPATIBLE

father taught me to play chess, he taught me the rules, but he did not

also teach me such- and- such clever algorithm for choosing which

parts of the game tree to explore and which parts to ignore.

How does this happen? On what basis can we direct our thoughts?

The answer is that a computation has value to the extent that it can

improve your decision quality. The process of choosing computations

is called metareasoning, which means reasoning about reasoning. Just

as actions can be chosen rationally, on the basis of expected value, so

can computations. This is called rational metareasoning.

The basic

idea is very simple:

Do the computations that will give the highest expected improve-

ment in decision quality, and stop when the cost (in terms of time)

exceeds the expected improvement.

That’s it. No fancy algorithm needed! This simple principle generates

effective computational behavior in a wide range of problems, includ-

ing chess and Go. It seems likely that our brains implement something

similar, which explains why we don’t need to learn new, game- specific

algorithms for thinking with each new game we learn to play.

Exploring a tree of possibilities that stretches forward into the fu-

ture from the current state is not the only way to reach decisions, of

course. Often, it makes more sense to work backwards from the goal.

For example, the presence of the moose in the road suggests the goal

of avoid hitting the moose, which in turn suggests three possible ac-

tions: swerve left, swerve right, or slam on the brakes. It does not

suggest the action of trading euros for pounds or putting a black stone

in the center. Thus, goals have a wonderful focusing effect on one’s

thinking. No current game- playing programs take advantage of this

idea; in fact, they typically consider all possible legal actions. This is

one of the (many) reasons why I am not worried about AlphaZero

taking over the world.

9780525558613_Human_TX.indd 262 8/7/19 11:21 PM

Not

ree

e current

urrent

ften, it makften, it ma

ple, the ple, the

for

why w

ing with ea

ing with e

of possi

Distribution

expected imexpected

e cost (in terst (in te

ded! This sid! This si

ior in a widior in a wi

kely that o

don

APPENDIX A: SEARCHING FOR SOLUTIONS 263

Looking further ahead

Let’s suppose you have decided to make a specific move on the Go

board. Great! Now you have to actually do it. In the real world, this

involves reaching into the bowl of unplayed stones to pick up a stone,

moving your hand above the intended location, and placing the stone

neatly on the spot, either quietly or emphatically according to Go

etiquette.

Each of these stages, in turn, consists of a complex dance of per-

ception and motor control commands involving the muscles and nerves

of the hand, arm, shoulder, and eyes. And while reaching for a stone,

you’re making sure the rest of your body doesn’t topple over thanks to

the shift in your center of gravity. The fact that you may not be con-

sciously aware of selecting these actions does not mean that they aren’t

being selected by your brain. For example, there may be many stones

in the bowl, but your “hand”— really, your brain processing sensory

information— still has to choose one of them to pick up.

Almost everything we do is like this. While driving, we might

choose to change lanes to the left; but this action involves looking in the

mirror and over your shoulder, perhaps adjusting speed, and moving

the steering wheel while monitoring progress until the maneuver is

complete. In conversation, a routine response such as “OK, let me

check my calendar and get back to you” involves articulating fourteen

syllables, each of which requires hundreds of precisely coordinated

motor control commands to the muscles of the tongue, lips, jaw,

throat, and breathing apparatus. For your native language, this process

is automatic; it closely resembles the idea of running a subroutine in a

computer program (see page 34). The fact that complex action se-

quences can become routine and automatic, thereby functioning as

single actions in still more complex processes, is absolutely fundamen-

tal to human cognition. Saying words in a less familiar language—

perhaps asking directions to Szczebrzeszyn in Poland— is a useful

M 9780525558613_Human_TX.indd 263 8/7/19 11:21 PM

Not

heel

n convers

convers

y calendar ay calendar

each oeach o

for

to the l

the

ur shoulde

ur should

while m

Distribution

uscle

eaching fo

ching fo

’t topple ovt topple ov

t that you mat you m

does not medoes not m

ample, thermple, the

eally, yourally, your

ose one of tose one of

do is like

do is lik

ftft

264 HUMAN COMPATIBLE

reminder that there was a time in your life when reading and speak-

ing words were difficult tasks requiring mental effort and lots of

practice.

So, the real problem that your brain faces is not choosing a move

on the Go board but sending motor control commands to your mus-

cles. If we shift our attention from the level of Go moves to the level

of motor control commands, the problem looks very different. Very

roughly, your brain can send out commands about every one hundred

milliseconds. We have about six hundred muscles, so that’s a theoret-

ical maximum of about six thousand actuations per second, twenty

million per hour, two hundred billion per year, twenty trillion per

lifetime. Use them wisely!

Now, suppose we tried to apply an AlphaZero- like algorithm to

solve the decision problem at this level. In Go, AlphaZero looks ahead

perhaps fifty steps. But fifty steps of motor control commands get you

only a few seconds into the future! Not enough for the twenty million

motor control commands in an hour- long game of Go, and certainly

not enough for the trillion (1,000,000,000,000) steps involved in do-

ing a PhD. So, even though AlphaZero looks further ahead in Go than

any human can, that ability doesn’t seem to help in the real world. It’s

the wrong kind of lookahead.

I’m not saying, of course, that doing a PhD actually requires plan-

ning out a trillion muscle actuations in advance. Only quite abstract

plans are made initially— perhaps choosing Berkeley or some other

place, choosing a PhD supervisor or research topic, applying for fund-

ing, getting a student visa, traveling to the chosen city, doing some

research, and so on. To make your choices, you do just enough think-

ing, about just the right things, so that the decision becomes clear. If

the feasibility of some abstract step such as getting the visa is unclear,

you do some more thinking and perhaps information gathering, which

means making the plan more concrete in certain aspects: maybe

choosing a visa type for which you are eligible, collecting the neces-

sary documents, and submitting the application. Figure 17 shows the

9780525558613_Human_TX.indd 264 8/7/19 11:21 PM

Not

of lo

ying, of c

ng, of c

a trillion mua trillion m

made made

for

ugh Al

h A

t ability doe

ability do

okahead

Distribution

secon

wenty tri

enty tri

haZero-Zero-

ikeik

n Go, AlphaGo, Alph

motor contrtor cont

! Not enougNot enoug

our-our-

ongon

,000,000

haZ

APPENDIX A: SEARCHING FOR SOLUTIONS 265

abstract plan and the refinement of the GetVisa step into a three- step

subplan. When the time comes to begin carrying out the plan, its ini-

tial steps have to be refined all the way down to the primitive level so

that your body can execute them.

AlphaGo simply cannot do this kind of thinking: the only actions

it ever considers are primitive actions occurring in a sequence from

the initial state. It has no notion of abstract plan. Trying to apply Al-

phaGo in the real world is like trying to write a novel by wondering

whether the first letter should be A, B, C, and so on.

In 1962, Herbert Simon emphasized the importance of hierarchi-

cal organization in a famous paper, “The Architecture of Complex-

ity.”

AI researchers since the early 1970s have developed a variety of

methods that construct and refine hierarchically organized plans.

Some of the resulting systems are able to construct plans with tens of

millions of steps—for example, to organize manufacturing activities

in a large factory.

We now have a pretty good theoretical understanding of the mean-

ing of abstract actions— that is, of how to define the effects they have

on the world.

Consider, for example, the abstract action GoToBerke-

ley in figure 17. It can be implemented in many different ways, each of

which produces different effects on the world: you could sail there,

stow away on a ship, fly to Canada and walk across the border, hire a

ChooseAdvisor GetFunding

ChooseVisaType GetDocuments SubmitApplication

GetVisa GoToBerkeley DoResearch WriteThesis

FIGURE $QDEVWUDFWSODQIRUDQRYHUVHDVVWXGHQWZKRKDVFKRVHQWRJHWD

3K'DW%HUNHOH\7KH*HW9LVDVWHSZKRVHIHDVLELOLW\LVXQFHUWDLQ KDV EHHQ

H[SDQGHGRXWLQWRDQDEVWUDFWSODQRILWVRZQ

M 9780525558613_Human_TX.indd 265 8/7/19 11:21 PM

Not

rbe

tion in a

on in a

researchersresearcher

that cothat c

for

rld is l

etter should

tter shoul

rt Simon

t Simo

Distribut

his kind of tkind of

e actions oe actions

otion of

ke tr

ution

ion

ZKRKDVFKRKRKDVFKR

ELOLW\ LV XQFHULV XQFHU

266 HUMAN COMPATIBLE

private jet, and so on. But you need not consider any of these choices

for now. As long as you are sure there is a way to do it that doesn’t

consume so much time and money or incur so much risk as to imperil

the rest of the plan, you can just put the abstract step GoToBerkeley

into the plan and rest assured that the plan will work. In this way, we

can build high- level plans that will eventually turn into billions or

trillions of primitive steps without ever worrying about what those

steps are until it’s time to actually do them.

Of course, none of this is possible without the hierarchy. Without

high- level actions such as getting a visa and writing a thesis, we cannot

make an abstract plan to get a PhD; without still- higher- level actions

such as getting a PhD and starting a company, we cannot plan to get a

PhD and then start a company. In the real world, we would be lost

without a vast library of actions at dozens of levels of abstraction. (In

the game of Go, there is no obvious hierarchy of actions, so most of us

are lost.) At present, however, all existing methods for hierarchical

planning rely on a human- generated hierarchy of abstract and con-

crete actions; we do not yet understand how such hierarchies can be

learned from experience.

9780525558613_Human_TX.indd 266 8/7/19 11:21 PM

Not

for

Distribution

hesis,

igher-her-

eve

we cannot pe cannot p

world, we rld, we

ns of levels os of levels

ierarchy of archy of

l existing mexisting m

nerated hienerated hi

understan

understa

Appendix B

KNOWLEDGE AND LOGIC

ogic is the study of reasoning with definite knowledge. It is fully

general with regard to subject matter—that is, the knowledge

can be about anything at all. Logic is therefore an indispensable

part of our understanding of general purpose intelligence.

Logic’s main requirement is a formal language with precise mean-

ings for the sentences in the language, so that there is an unambiguous

process for determining whether a sentence is true or false in a given

situation. That’s it. Once we have that, we can write sound reasoning

algorithms that produce new sentences from sentences that are al-

ready known. Those new sentences are guaranteed to follow from the

sentences that the system already knows, meaning that the new sen-

tences are necessarily true in any situation where the original sentences

are true. This allows a machine to answer questions, prove mathemat-

ical theorems, or construct plans that are guaranteed to succeed.

High- school algebra provides a good example (albeit one that may

evoke painful memories). The formal language includes sentences

such as 4x + 1 = 2y − 5. This sentence is true in the situation where

x = 5 and y = 13, and false when x = 5 and y = 6. From this sentence

one can derive another sentence such as y = 2x + 3, and whenever the

first sentence is true, the second is guaranteed to be true too.

M 9780525558613_Human_TX.indd 267 8/7/19 11:21 PM

Not

s it

that prod

at prod

own. Thoseown. Thos

s that ths that th

for

n the l

ining wheth

ning whe

Once w

Distribution

definite kndefinite kn

matter—tmatter—t

l. Logic is tLogic is t

general purgeneral pu

nt is a

formfor

ngu

268 HUMAN COMPATIBLE

The core idea of logic, developed independently in ancient India,

China, and Greece, is that the same notions of precise meaning and

sound reasoning can be applied to sentences about anything at all, not

just numbers. The canonical example starts with “Socrates is a man”

and “All men are mortal” and derives “Socrates is mortal.”

This deri-

vation is strictly formal in the sense that it does not rely on any further

information about who Socrates is or what man and mortal mean.

The fact that logical reasoning is strictly formal means that it is possi-

ble to write algorithms that do it.

Propositional logic

For our purposes in understanding the capabilities and prospects

for AI, there are two important kinds of logic that really matter: prop-

ositional logic and first- order logic. The difference between the two is

fundamental to understanding the current situation in AI and how it

is likely to evolve.

Let’s start with propositional logic, which is the simpler of the

two. Sentences are made of just two kinds of things: symbols that

stand for propositions that can be true or false, and logical connectives

such as and, or, not, and

if... then. (We’ll see an example shortly.)

These logical connectives are sometimes called Boolean, after George

Boole, a nineteenth- century logician who reinvigorated his field with

new mathematical ideas. They are just the same as the logic gates used

in computer chips.

Practical algorithms for reasoning in propositional logic have been

known since the early 1960s.

2,3

Although the general reasoning task

may require exponential time in the worst case,

modern proposi-

tional reasoning algorithms handle problems with millions of proposi-

tion symbols and tens of millions of sentences. They are a core tool for

constructing guaranteed logistical plans, verifying chip designs before

they are manufactured, and checking the correctness of software ap-

plications and security protocols before they are deployed. The amaz-

9780525558613_Human_TX.indd 268 8/7/19 11:21 PM

Not

connectiv

onnecti

ineteenth-eteenth-

ematicaematica

for

de of

ns that can

and

ifif

Distribution

capabilities abilities

ogic that reogic that re

he differencdifferen

e current sicurrent si

tional logi

tional log

ust t

ust

APPENDIX B: KNOWLEDGE AND LOGIC 269

ing thing is that a single algorithm— a reasoning algorithm for

propositional logic— solves all these tasks once they have been formu-

lated as reasoning tasks. Clearly, this is a step towards the goal of

generality in intelligent systems.

Unfortunately, it’s not a very big step because the language of prop-

ositional logic is not very expressive. Let’s see what this means in prac-

tice when we try to express the basic rule for legal moves in Go: “The

player whose turn it is to move can play a stone on any unoccupied in-

tersection.”

The first step is to decide what the proposition symbols

are going to be for talking about Go moves and Go board positions. The

fundamental proposition that matters is whether a stone of a particular

color is on a particular location at a particular time. So, we’ll need sym-

bols such as

White_ Stone_ On_ 5_ 5_ At_ Move_ 38 and Black_ Stone_

On_ 5_ 5_ At_ Move_ 38. (Remember that, as with man, mortal, and

Socrates, the reasoning algorithm doesn’t need to know what the sym-

bols mean.) Then the logical condition for White to be able to play at

the 5,5 intersection at move 38 would be

(not White_ Stone_ On_ 5_ 5_ At_ Move_ 38) and

(not Black_ Stone_ On_ 5_ 5_ At_ Move_ 38)

In other words: there’s no white stone and there’s no black stone. That

seems simple enough. Unfortunately, in propositional logic it would

have to be written out separately for each location and for each move in

the game. Because there are 361 locations and around 300 moves per

game, this means over 100,000 copies of the rule! For the rules concern-

ing captures and repetitions, which involve multiple stones and loca-

tions, the situation is even worse, and we quickly fill up millions of pages.

The real world is, obviously, much bigger than the Go board: there

are far more than 361 locations and 300 time steps, and there are

many kinds of things besides stones; so, the prospect of using a prop-

ositional language for knowledge of the real world is utterly hopeless.

It’s not just the ridiculous size of the rulebook that’s a problem: it’s

M 9780525558613_Human_TX.indd 269 8/7/19 11:21 PM

Not

ds: there’

there’

mple enougmple enou

e writtee writte

for

te_

Ston

Sto

otot

Black_lack_

Ston

Distribution

rd po

stone of a

one of a

me. So, we’me. So, we’

ove_

3838

an an

at, as witht, as with

esn’t need tn’t need t

dition for Wtion for W

8 would be8 would b

270 HUMAN COMPATIBLE

also the ridiculous amount of experience a learning system would need

to acquire the rules from examples. While a human needs just one or

two examples to get the basic ideas of placing a stone, capturing stones,

and so on, an intelligent system based on propositional logic has to be

shown examples of moving and capturing separately for each location

and time step. The system cannot generalize from a few examples, as

a human does, because it has no way to express the general rule. This

limitation applies not just to systems based on propositional logic but

also to any system with comparable expressive power. That includes

Bayesian networks, which are probabilistic cousins of propositional

logic, and neural networks, which are the basis for the “deep learning”

approach to AI.

First- order logic

So, the next question is, can we devise a more expressive logical

language? We’d like one in which it is possible to tell the rules of Go

to the knowledge- based system in the following way:

for all locations on the board, and for all time steps, here are the

rules...

First- order logic, introduced by the German mathematician Gottlob

Frege in 1879, allows one to write the rules this way.

The key difference

between propositional and first- order logic is this: whereas proposi-

tional logic assumes the world is made of propositions that are true or

false, first- order logic assumes the world is made of objects that can be

related to each other in various ways. For example, there could be loca-

tions that are adjacent to each other, times that follow each other con-

secutively, stones that are on locations at particular times, and moves

that are legal at particular times. First- order logic allows one to assert

that some property is true for all objects in the world; so, one can write

9780525558613_Human_TX.indd 270 8/7/19 11:21 PM

Not

logiclogic

, intro, intr

879, allo879, allo

for

n the board

n the boar

Distribution

f pr

he “deep l

e “deep l

we devise a devise a

ich it is poich it is p

em in the

APPENDIX B: KNOWLEDGE AND LOGIC 271

for all time steps t, and for all locations l, and for all colors c,

if it is c’s turn to move at time t and l is unoccupied at time t,

then it is legal for c to play a stone at location l at time t.

With some extra caveats and some additional sentences that define

the board locations, the two colors, and what unoccupied means, we

have the beginnings of the complete rules of Go. The rules take up

about as much space in first- order logic as they do in English.

The development of logic programming in the late 1970s provided

elegant and efficient technology for logical reasoning embodied in a

programming language called Prolog. Computer scientists worked out

how to make logical reasoning in Prolog run at millions of reasoning

steps per second, making many applications of logic practical. In 1982,

the Japanese government announced a huge investment in Prolog-

based AI called the Fifth Generation project,

and the United States

and UK responded with similar efforts.

8,9

Unfortunately, the Fifth Generation project and others like it ran

out of steam in the late 1980s and early 1990s, partly because of the

inability of logic to handle uncertain information. They epitomized

what soon became a pejorative term:

Good Old- Fashioned AI, or

GOFAI.

It became fashionable to dismiss logic as irrelevant to AI;

indeed, many AI researchers working now in the area of deep learning

don’t know anything about logic. This fashion seems likely to fade: if

you accept that the world has objects in it that are related to each

other in various ways, then first- order logic is going to be relevant,

because it provides the basic mathematics of objects and relations.

This view is shared by Demis Hassabis, CEO of Google DeepMind:

You can think about deep learning as it currently is today as the

equivalent in the brain to our sensory cortices: our visual cortex or

auditory cortex. But, of course, true intelligence is a lot more than

just that, you have to recombine it into higher- level thinking and

M 9780525558613_Human_TX.indd 271 8/7/19 11:21 PM

Not

cam

y AI resea

AI rese

ow anythingow anythin

pt that pt that

for

andle u

dle

e a pejorat

a pejora

me fashio

e fashio

Distribution

g em

cientists w

ntists w

t millions ot millions o

of logic pralogic pr

a huge invehuge inve

n project,project,

efforts.orts.

8,98,9

Generation Generation

80s and ea

80s and e

ncer

272 HUMAN COMPATIBLE

symbolic reasoning, a lot of the things classical AI tried to deal

with in the 80s.

... We would like [these systems] to build up to this symbolic

level of reasoning— maths, language, and logic. So that’s a big part

of our work.

Thus, one of the most important lessons from the first thirty years

of AI research is that a program that knows things, in any useful sense,

will need a capacity for representation and reasoning that is at least

comparable to that offered by first- order logic. As yet, we do not know

the exact form this will take: it may be incorporated into probabilistic

reasoning systems, into deep learning systems, or into some still- to-

be- invented hybrid design.

9780525558613_Human_TX.indd 272 8/7/19 11:21 PM

Not

for

Distribution

we d

d into prob

nto prob

or into somor into som

Appendix C

UNCERTAINTY AND

PROBABILITY

hereas logic provides a general basis for reasoning with

definite knowledge, probability theory encompasses rea-

soning with uncertain information (of which definite

knowledge is a special case). Uncertainty is the normal epistemic situ-

ation of an agent in the real world. Although the basic ideas of proba-

bility were developed in the seventeenth century, only recently has it

become possible to represent and reason with large probability models

in a formal way.

The basics of probability

Probability theory shares with logic the idea that there are possible

worlds. One usually starts out by defining what they are—for exam-

ple, if I am rolling one ordinary six-sided die, there are six worlds

(sometimes called outcomes): 1, 2, 3, 4, 5, 6. Exactly one of them will

be the case, but a priori I don’t know which. Probability theory as-

sumes that it is possible to attach a probability to each world; for my

die roll, I’ll attach

/6 to each world. (These probabilities happen to be

M 9780525558613_Human_TX.indd 273 8/7/19 11:21 PM

Not

elop

sible to re

le to re

mal way.mal way.

for

case).

se)

the real w

ed in th

Distribution

s a general a general

ge, probabige, probab

ncertain

Unce

274 HUMAN COMPATIBLE

equal, but it need not be that way; the only requirement is that the

probabilities have to add up to 1.) Now I can ask a question such as

“What’s the probability I’ll roll an even number?” To find this, I sim-

ply add up the probabilities for the three worlds where the number is

even:

/6 +

/6 = ½.

It’s also straightforward to take new evidence into account. Sup-

pose an oracle tells me that the roll is a prime number (that is, 2, 3, or

5). This rules out the worlds 1, 4, and 6. I simply take the probabilities

associated with the remaining possible worlds and scale them up so

the total remains 1. Now the probabilities of 2, 3, and 5 are each

/3,

and the probability that my roll is an even number is now just

/3, since

2 is the only remaining even roll. This process of updating probabili-

ties as new evidence arrives is an example of Bayesian updating.

So, this probability stuff seems quite simple! Even a computer can

add up numbers, so what’s the problem? The problem comes when

there are more than a few worlds. For example, if I roll the die one

hundred times, there are 6

100

outcomes. It’s infeasible to begin the pro-

cess of probabilistic reasoning by attaching a number to each of these

outcomes individually. A clue for dealing with this complexity comes

from the fact that the die rolls are independent if the die is known to be

fair—that is, the outcome of any single roll does not affect the proba-

bilities for the outcomes of any other roll. Thus, independence is help-

ful in structuring the probabilities for complex sets of events.

Suppose I am playing Monopoly with my son George. My piece is

on Just Visiting, and George owns the yellow set whose properties are

sixteen, seventeen, and nineteen squares away from Just Visiting.

Should he buy houses for the yellow set now, so that I have to pay him

some exorbitant rent if I land on those squares, or should he wait until

the next turn? That depends on the probability of landing on the yel-

low set in my current turn.

Here are the rules for rolling the dice in Monopoly: two dice are

rolled and the piece is moved according to the total shown; if doubles

9780525558613_Human_TX.indd 274 8/7/19 11:21 PM

Not

e outcome

utcome

cturing thecturing th

se I am se I am

for

A clue

clu

e die rolls a

e die rolls

tcome o

come o

Distribution

d 5 a

s now just now just

of updatingf updating

f Bayesian uyesian u

simple! Eveimple! Eve

lem? The pm? The

s. For examFor exam

utcomes. It’utcomes. I

g by attac

for d

APPENDIX C: UNCERTAINTY AND PROBABILITY 275

are rolled, the player rolls again and moves again; if the second roll is

doubles, the player rolls a third time and moves again (but if the third

roll is doubles, the player goes to jail instead). So, for example, I might

roll 4- 4 followed by 5- 4, totaling 17; or 2- 2, then 2- 2, then 6- 2, total-

ing 16. As before, I simply add up the probabilities of all worlds where

I land on the yellow set. Unfortunately, there are a lot of worlds. As

many as six dice could be rolled altogether, so the number of worlds

runs into the thousands. Furthermore, the rolls are no longer indepen-

dent, because the second roll won’t exist unless the first roll is dou-

bles. On the other hand, if we fix the values of the first pair of dice,

then the values of the second pair of dice are independent. Is there a

way to capture this kind of dependency?

Bayesian networks

In the early 1980s, Judea Pearl proposed a formal language called

Bayesian networks (often abbreviated to Bayes nets) that makes it pos-

sible, in many real- world situations, to represent the probabilities of a

very large number of outcomes in a very concise form.

Figure 18 shows a Bayesian network that describes the rolling of

dice in Monopoly. The only probabilities that have to be supplied are

the

/6 probabilities of the values 1, 2, 3, 4, 5, 6 for the individual die

rolls (D

, D

, etc.)— that is, thirty- six numbers instead of thousands.

Explaining the exact meaning of the network requires a little bit of

mathematics,

but the basic idea is that the arrows denote dependency

relationships—for example, the value of Doubles

depends on the val-

ues of D

and D

. Similarly, the values of D

and D

(the next roll of

the two dice) depend on Doubles

because if Doubles

has value false,

then D

and D

have value 0 (that is, there is no next roll).

Just as with propositional logic, there are algorithms that can an-

swer any question for any Bayesian network with any evidence. For

example, we can ask for the probability of LandsOnYellowSet, which

M 9780525558613_Human_TX.indd 275 8/7/19 11:21 PM

Not

ly.

bilities of

lities of

, etc.)—, etc.)—

ng the eng the e

for

utcom

s a Bayesia

s a Bayesi

The onl

Distribution

irst p

ependent.

endent.

rl proposedproposed

eviated to Beviated to

tuations, t

es in

276 HUMAN COMPATIBLE

turns out to be about 3.88 percent. (This means that George can wait

before buying houses for the yellow set.) Slightly more ambitiously, we

can ask for the probability of LandsOnYellowSet given that the second

roll is a double- 3. The algorithm works out for itself that, in that case,

the first roll must have been a double and concludes that the answer is

about 36.1 percent. This is an example of Bayesian updating: when

the new evidence (that the second roll is a double- 3) is added, the

probability of LandsOnYellowSet changes from 3.88 percent to 36.1

FIGURE $%D\HVLDQQHWZRUNWKDWUHSUHVHQWVWKHUXOHVIRUUROOLQJGLFHLQ

0RQRSRO\DQGHQDEOHVDQDOJRULWKPWRFDOFXODWHWKHSUREDELOLW\RIODQGLQJRQD

particular set of squares (such as the yellow set) starting from some other

VTXDUHVXFKDV-XVW9LVLWLQJ)RUVLPSOLFLW\WKHQHWZRUNRPLWVWKHSRVVLELOLW\

RIODQGLQJRQD&KDQFHRU&RPPXQLW\&KHVWVTXDUHDQGEHLQJGLYHUWHGWRD

GLIIHUHQW ORFDWLRQ D

DQGD

UHSUHVHQW WKH LQLWLDO UROO RI WZR GLFH DQG WKH\

DUHLQGHSHQGHQWQROLQNEHWZHHQWKHP,IGRXEOHVDUHUROOHGDoubles

WKHQ

WKHSOD\HUUROOVDJDLQVRD



DQGD

KDYHQRQ]HURYDOXHVDQGVRRQ,QWKHVLW-

XDWLRQGHVFULEHGWKHSOD\HUODQGVRQWKH\HOORZVHWLIDQ\RIWKHWKUHHWRWDOVLV

RU

Total

1234

LandOn

YellowSet

Doubles

Total

123456

GoToJail

9780525558613_Human_TX.indd 276 8/7/19 11:21 PM

Not

JDL

HGWKHSWKH



for

QG

2 2

UHS

OLQNEHWZHHQ

LQNEHWZHHQ

VR

VR



DQGDQ

D\H

Distribution

rib

HSUHVHQWVWKHSUHVHQWVWKH

RFDOFXODWHWKDOFXODWHWK

s the yellow s the yellow

RUVLPSOLFLW\

RUVLPSOLFLW

PXQLW\&PXQLW\

APPENDIX C: UNCERTAINTY AND PROBABILITY 277

percent. Similarly, the probability that I roll three times (Doubles

is true) is 2.78 percent, while the probability that I roll three times

given that I land on the yellow set is 20.44 percent.

Bayesian networks provide a way to build knowledge- based sys-

tems that avoids the failures that plagued the rule- based expert sys-

tems of the 1980s. (Indeed, had the AI community been less resistant

to probability in the early 1980s, it might have avoided the AI winter

that followed the rule- based expert system bubble.) Thousands of ap-

plications have been fielded, in areas ranging from medical diagnosis

to terrorism prevention.

Bayesian networks provide machinery for representing the neces-

sary probabilities and performing the calculations to implement

Bayesian updating for many complex tasks. Like propositional logic,

however, they are quite limited in their ability to represent general

knowledge. In many applications, the Bayesian network representa-

tion becomes very large and repetitive—for example, just as the rules

of Go have to be repeated for every square in propositional logic, the

probability- based rules of Monopoly have to be repeated for every

player, for every location a player might be on, and for every move in

the game. Such huge networks are virtually impossible to create by

hand; instead, one would have to resort to code written in a traditional

language such as C++ to generate and piece together multiple Bayes

net fragments. While this is practical as an engineering solution for a

specific problem, it is an obstacle to generality because the C++ code

has to be written anew by a human expert for each application.

First- order probabilistic languages

It turns out, fortunately, that we can combine the expressiveness

of first- order logic with the ability of Bayesian networks to capture

probabilistic information concisely. This combination gives us the best

of both worlds: probabilistic

knowledge- based systems are able to

M 9780525558613_Human_TX.indd 277 8/7/19 11:21 PM

Not

ch as C++

as C++

ments. Whiments. Wh

problemproblem

for

on a p

uge network

ge netwo

would h

Distribution

resenting t

enting t

ulations to lations to

s. Like propike pro

ir ability tor ability to

he BayesianBayesia

titive—for tive—for

r every squar every squ

Monopoly

Monopol

ayer

278 HUMAN COMPATIBLE

handle a much wider range of real- world situations than either logical

methods or Bayesian networks. For example, we can easily capture

probabilistic knowledge about genetic inheritance:

for all persons c, f, and m,

if f is the father of c and m is the mother of c

and both f and m have blood type AB,

then c has blood type AB with probability 0.5.

The combination of first- order logic and probability actually gives

us much more than just a way to express uncertain information about

lots of objects. The reason is that when we add uncertainty to worlds

containing objects, we get two new kinds of uncertainty: not just un-

certainty about which facts are true or false but also uncertainty about

what objects exist and uncertainty about which objects are which.

These kinds of uncertainty are completely pervasive. The world does

not come with a list of characters, like a Victorian play; instead, you

gradually learn about the existence of objects from observation.

Sometimes the knowledge of new objects can be fairly definite, as

when you open your hotel window and see the basilica of Sacré- Cœur

for the first time; or it can be quite indefinite, as when you feel a gen-

tle rumble that might be an earthquake or a passing subway train. And

while the identity of Sacré- Cœur is quite unambiguous, the identity

of subway trains is not: you might ride the same physical train hun-

dreds of times without ever realizing it’s the same one. Sometimes we

don’t need to resolve the uncertainty: I don’t usually name all the to-

matoes in a bag of cherry tomatoes and keep track of how well each

one is doing, unless perhaps I am recording the progress of a tomato

putrefaction experiment. For a class full of graduate students, on the

other hand, I try my best to keep track of their identities. (Once, there

were two research assistants in my group who had the same first and

last names and were of very similar appearance and worked on closely

related topics; at least, I am fairly sure there were two.) The problem

9780525558613_Human_TX.indd 278 8/7/19 11:21 PM

Not

; or

at might b

might b

identity of identity o

trains trains

for

wledge

edg

hotel wind

hotel win

it can b

Distribution

y act

informati

formati

uncertaintyuncertaint

uncertaintcertain

se but also se but also

about whicout whic

mpletely pepletely p

ters, like a ters, like

istence of

stence of

of ne

of n

APPENDIX C: UNCERTAINTY AND PROBABILITY 279

is that we directly perceive not the identity of objects but (aspects of)

their appearance; objects do not usually have little license plates that

uniquely identify them. Identity is something our minds sometimes

attach to objects for our own purposes.

The combination of probability theory with an expressive formal

language is a fairly new subfield of AI, often called probabilistic pro-

gramming.

Several dozen probabilistic programming languages, or

PPLs, have been developed, many of them deriving their expressive

power from ordinary programming languages rather than first- order

logic. All PPL systems have the capacity to represent and reason with

complex, uncertain knowledge. Applications include Microsoft’s

TrueSkill system, which rates millions of video game players every day;

models for aspects of human cognition that were previously inexplicable

by any mechanistic hypothesis, such as the ability to learn new visual

categories of objects from single examples;

and the global seismic

monitoring for the Comprehensive Nuclear- Test- Ban Treaty (CTBT),

which is responsible for detecting clandestine nuclear explosions.

The CTBT monitoring system collects real- time ground move-

ment data from a global network of over 150 seismometers and aims

to identify all the seismic events occurring on Earth above a certain

magnitude and to flag the suspicious ones. Clearly there is plenty of

existence uncertainty in this problem, because we don’t know in ad-

vance the events that will occur; moreover, the vast majority of signals

in the data are just noise. There is also lots of identity uncertainty: a

blip of seismic energy detected at station A in Antarctica may or may

not come from the same event as another blip detected at station B in

Brazil. Listening to the Earth is like listening to thousands of simulta-

neous conversations that have been scrambled by transmission delays

and echoes and drowned out by crashing waves.

How do we solve this problem using probabilistic programming?

One might think we need some very clever algorithms to sort out all

the possibilities. In fact, by following the methodology of knowledge-

based systems, we don’t have to devise any new algorithms at all. We

M 9780525558613_Human_TX.indd 279 8/7/19 11:21 PM

Not

ncertainty

ertainty

e events thae events th

ta are juta are j

for

al netw

seismic eve

eismic ev

flag the

lag the

Distribution

and

nclude M

clude M

game playergame playe

were previouprevio

the ability the ability

xamples;mples;

a a

uclear-uclear-

ting clandeting cland

system c

system

work

280 HUMAN COMPATIBLE

simply use a PPL to express what we know of geophysics: how often

events tend to occur in areas of natural seismicity, how fast seismic

waves travel through the Earth and how quickly they decay, how sen-

sitive the detectors are, and how much noise there is. Then we add the

data and run a probabilistic reasoning algorithm. The resulting moni-

toring system, called NET-VISA, has been operating as part of the

treaty verification regime since 2018. Figure 19 shows NET- VISA’s

detection of a 2013 nuclear test in North Korea.

Keeping track of the world

One of the most important roles for probabilistic reasoning is in

keeping track of parts of the world that are not directly observable. In

FIGURE /RFDWLRQHVWLPDWHVIRUWKH)HEUXDU\QXFOHDUWHVWFDUULHG

out by the government of North Korea. The tunnel entrance (black cross at

ORZHUFHQWHUZDVLGHQWLILHGLQVDWHOOLWHSKRWRJUDSKV7KH1(79,6$ORFDWLRQ

HVWLPDWHLVDSSUR[LPDWHO\PHWHUVIURPWKHWXQQHOHQWUDQFHDQGLVEDVHG

SULPDULO\ RQ GHWHFWLRQV DW VWDWLRQV  WR  NLORPHWHUV DZD\ 7KH

CTBTO LEB location is the consensus estimate from expert geophysicists.

9780525558613_Human_TX.indd 280 8/7/19 11:21 PM

Not

ccu

hrough th

rough t

detectors ardetectors a

un a proun a pr

for

express wh

express w

r in are

in are

Distributio

but

U\Q\Q

he tunnel enttunnel en

SKRWRJUDSKVKRWRJUDSKV

UVIURPWKHWXURPWKHW

RQV  WRQV  WR

sensus estim

sensus esti

APPENDIX C: UNCERTAINTY AND PROBABILITY 281

most video and board games, this is unnecessary because all the relevant

information is observable, but in the real world this is seldom the case.

An example is given by one of the first serious accidents involving

a self- driving car. It occurred on South McClintock Drive at East Don

Carlos Avenue in Tempe, Arizona, on March 24, 2017.

As shown in

figure 20, a self-driving Volvo (V), going south on McClintock, is ap-

proaching an intersection where the traffic light is just turning yellow.

The Volvo’s lane is clear, so it proceeds at the same speed through the

intersection. Then a currently invisible vehicle— the Honda (H) in

figure 20—appears from behind the queue of stopped traffic and a

collision ensues.

To infer the possible presence of the invisible Honda, the Volvo

could gather clues as it approaches the intersection. In particular, the

traffic in the other two lanes is stopped even though the light is green;

the cars at the front of the queue are not inching forward into the in-

tersection and have their brake lights on. This is not conclusive evi-

dence of an invisible left turner but it doesn’t need to be; even a small

FIGURE OHIW'LDJUDPRIWKHVLWXDWLRQOHDGLQJXSWRWKHDFFLGHQW7KHVHOI

GULYLQJ9ROYRPDUNHG9LVDSSURDFKLQJDQLQWHUVHFWLRQGULYLQJLQWKHULJKWPRVW

ODQHDWWKLUW\HLJKWPLOHVSHUKRXU7UDIILFLQWKHRWKHUWZRODQHVLVVWRSSHGDQG

WKHWUDIILFOLJKW/LVWXUQLQJ\HOORZ,QYLVLEOHWRWKH9ROYRD+RQGD+LVPDN-

LQJDOHIWWXUQULJKWDIWHUPDWKRIWKHDFFLGHQW

M 9780525558613_Human_TX.indd 281 8/7/19 11:21 PM

Not

n intersect

ntersec

vo’s lane is cvo’s lane is

ion. Thion. Th

for

curred

rre

Tempe, Ari

Tempe, Ar

ving Vo

Distribut

is unnecessaunnecessa

in the real in the rea

one of the

one of th

on S

tion

DFFLGHQW7

LGHQW7

GULYLQJLQWKHUULYLQJLQWKH

UWZRODQHVLVZRODQHVLV

H9ROYRD+ROYRD+R

282 HUMAN COMPATIBLE

probability is enough to suggest slowing down and entering the inter-

section more cautiously.

The moral of this story is that intelligent agents operating in par-

tially observable environments have to keep track of what they can’t

see— to the extent possible— based on clues from what they can see.

Here’s another example closer to home: Where are your keys?

Unless you happen to be driving while reading this book— not

recommended— you probably cannot see them right now. On the

other hand, you probably know where they are: in your pocket, in

your bag, on the bedside table, in the pocket of your coat which is

hanging up, or maybe on the hook in the kitchen. You know this be-

cause you put them there and they haven’t moved since. This is a

simple example of using knowledge and reasoning to keep track of the

state of the world.

Without this capability, we would be lost— often quite literally. For

example, as I write this, I am looking at the white wall of a nondescript

hotel room. Where am I? If I had to rely on my current perceptual in-

put, I would indeed be lost. In fact, I know that I am in Zürich, because

I arrived in Zürich yesterday and I haven’t left. Like humans, robots

need to know where they are so that they can navigate successfully

through rooms, buildings, streets, forests, and deserts.

In AI we use the term belief state to refer to an agent’s current

knowledge of the state of the world— however incomplete and uncer-

tain it may be. Generally, the belief state— rather than the current

perceptual input— is the proper basis for making decisions about what

to do. Keeping the belief state up to date is a core activity for any in-

telligent agent. For some parts of the belief state, this happens auto-

matically—for example, I just seem to know that I’m in Zürich,

without having to think about it. For other parts, it happens on de-

mand, so to speak. For example, when I wake up in a new city with

severe jet lag, halfway through a long trip, I may have to make a con-

scious effort to reconstruct where I am, what I am supposed to be

9780525558613_Human_TX.indd 282 8/7/19 11:21 PM

Not

buil

use the

e the

e of the state of the sta

y be. Gy be. G

for

erday

day

e they are

dings, st

Distribution

r coa

You know

ou know

moved sinceoved since

oning to keng to ke

ost—ost—

ng at the what the w

ad to rely oad to rely

n fact, I kn

n fact, I k

nd I

APPENDIX C: UNCERTAINTY AND PROBABILITY 283

doing, and why— a bit like a laptop rebooting itself, I suppose. Keeping

track doesn’t mean always knowing exactly the state of everything in

the world. Obviously this is impossible— for example, I have no idea

who is occupying the other rooms in my nondescript hotel in Zürich,

let alone the present locations and activities of most of the eight billion

people on Earth. I haven’t the faintest idea what’s happening in the

rest of the universe beyond the solar system. My uncertainty about the

current state of affairs is both massive and inevitable.

The basic method for keeping track of an uncertain world is Bayes-

ian updating. Algorithms for doing this usually have two steps: a pre-

diction step, where the agent predicts the current state of the world

given its most recent action, and then an update step, where it receives

new perceptual input and updates its beliefs accordingly. To illustrate

how this works, consider the problem a robot faces in figuring out

where it is. Figure 21(a) illustrates a typical case: The robot is in the

middle of a room, with some uncertainty about its exact location, and

wants to go through the door. It commands its wheels to move 1.5

meters towards the door; unfortunately, its wheels are old and wobbly,

so the robot’s prediction about where it ends up is quite uncertain, as

shown in figure 21(b). If it tried to keep moving now, it might well

crash. Fortunately, it has a sonar device to measure the distance to the

doorposts. As figure 21(c) shows, the measurements suggest the robot

is about 70 centimeters from the left doorpost and 85 centimeters

from the right. Finally, the robot updates its belief state by combining

the prediction in (b) with the measurements in (c) to obtain the new

belief state in figure 21(d).

The algorithm for keeping track of the belief state can be applied

to handle not just uncertainty about location but also uncertainty

about the map itself. This results in a technique called SLAM (simul-

taneous localization and mapping). SLAM is a core component of

many AI applications, ranging from augmented reality systems to

self-driving cars and planetary rovers.

M 9780525558613_Human_TX.indd 283 8/7/19 11:21 PM

Not

ely,

As figure 2

figure 2

70 centime70 centim

right. Fright. F

for

on abo

(b). If it tr

(b). If it t

it has a

t has a

Distribution

two s

t state of

state of

e step, wherstep, wher

fs accordingccordin

a robot faca robot fa

a typical casypical ca

ertainty abtainty ab

or. It commr. It com

nfortunate

nfortunat

twh

284 HUMAN COMPATIBLE

FIGURE $URERWWU\LQJWRPRYHWKURXJKDGRRUZD\D7KHLQLWLDOEHOLHI

VWDWHWKHURERWLVVRPHZKDWXQFHUWDLQRILWVORFDWLRQLWWULHVWRPRYHPH-

WHUVWRZDUGVWKHGRRUE7KHSUHGLFWLRQVWHSWKHURERWHVWLPDWHVWKDWLWLV

FORVHUWRWKHGRRUEXWLVTXLWHXQFHUWDLQDERXWWKHGLUHFWLRQLWDFWXDOO\PRYHG

EHFDXVHLWVPRWRUVDUHROGDQGLWVZKHHOVZREEO\F7KHURERWPHDVXUHVWKH

GLVWDQFHWRHDFKGRRUSRVWXVLQJDSRRUTXDOLW\VRQDUGHYLFHWKHHVWLPDWHVDUH

FHQWLPHWHUVIURPWKHOHIWGRRUSRVWDQGFHQWLPHWHUVIURPWKHULJKWG

7KHXSGDWHVWHSFRPELQLQJWKHSUHGLFWLRQLQEZLWKWKHREVHUYDWLRQLQF

JLYHVWKHQHZEHOLHIVWDWH1RZWKHURERWKDVDSUHWW\JRRGLGHDRIZKHUHLWLV

DQGZLOOQHHGWRFRUUHFWLWVFRXUVHDELWWRJHWWKURXJKWKHGRRU

9780525558613_Human_TX.indd 284 8/7/19 11:21 PM

for

Distribution

KHLQLW

LQLW

VWRPRYH

PRYH

RWHVWLPDWHVWHVWLPDWHV

UHFWLRQLWDFWXWLRQLWDFWX

F7KHURER7KHURER

VRQDUGHYLFHVRQDUGHYLFH

FHQWLPHWHFHQWLPHW

WLRQLQEZLRQLQEZL

RERWKDVDSURWKDVDSU

HDELWWRJHWWHDELWWRJHW

Appendix D

LEARNING FROM EXPERIENCE

earning means improving performance based on experience. For

a visual perception system, that might mean learning to recog-

nize more categories of objects based on seeing examples of those

categories; for a knowledge-based system, simply acquiring more

knowledge is a form of learning, because it means the system can an-

swer more questions; for a lookahead decision- making system such as

AlphaGo, learning could mean improving its ability to evaluate posi-

tions or improving its ability to explore useful parts of the tree of

possibilities.

Learning from examples

The most common form of machine learning is called supervised

learning. A supervised learning algorithm is given a collection of train-

ing examples, each labeled with the correct output, and must produce

a hypothesis as to what the correct rule is. Typically, a supervised

learning system seeks to optimize the agreement between the hypoth-

esis and the training examples. Often there is also a penalty for hy-

potheses that are more complicated than necessary— as recommended

by Ockham’s razor.

M 9780525558613_Human_TX.indd 285 8/7/19 11:21 PM

Not

ving

ing

for

for a lo

a l

could mea

could me

its abil

its abi

Distribution

ance based nce based

at might mmight m

cts based ons based on

ased systased sy

ning, beca

ning, bec

kah

286 HUMAN COMPATIBLE

Let’s illustrate this for the problem of learning the legal moves

in Go. (If you already know the rules of Go, then at least this will

be easy to follow; if not, then you’ll be better able to sympathize

with the learning program.) Suppose the algorithm starts with the

hypothesis

for all time steps t, and for all locations l,

it is legal to play a stone at location l at time t.

It is Black’s turn to move in the position shown in figure 22. The algo-

rithm tries A: that’s fine. B and C too. Then it tries D, on top of an

existing white piece: that’s illegal. (In chess or backgammon, it would

be fine— that’s how pieces are captured.) The move at E, on top of a

black piece, is also illegal. (Illegal in chess too, but legal in backgam-

mon.) Now, from these five training examples, the algorithm might

propose the following hypothesis:

for all time steps t, and for all locations l,

if l is unoccupied at time t,

then it is legal to play a stone at location l at time t.

Then it tries F and finds to its surprise that F is illegal. After a few false

starts, it settles on the following:

FIGURE /HJDODQGLOOHJDOPRYHVLQ*R

PRYHV $ % DQG & DUH OHJDO IRU %ODFN

ZKLOHPRYHV'(DQG)DUHLOOHJDO0RYH

*PLJKWRUPLJKWQRWEHOHJDOGHSHQGLQJ

RQZKDWKDV KDSSHQHGSUHYLRXVO\ LQWKH

game.

9780525558613_Human_TX.indd 286 8/7/19 11:21 PM

Not

ece

’s how pi

how pi

e, is also ille, is also i

w, fromw, from

for

e in th

n t

fine. B an

fine. B a

that’s i

Distribution

he le

n at least

at least

er able to sr able to s

algorithm strithm s

all

locationsocations

a stone at la stone at

pos

APPENDIX D: LEARNING FROM EXPERIENCE 287

for all time steps t, and for all locations l,

if l is unoccupied at time t and

l is not surrounded by opponent stones,

then it is legal to play a stone at location l at time t.

(This is sometimes called the no suicide rule.) Finally, it tries G,

which in this case turns out to be legal. After scratching its head for a

while and perhaps trying a few more experiments, it settles on the

hypothesis that G is OK, even though it is surrounded, because it

captures the white stone at D and therefore becomes un- surrounded

immediately.

As you can see from the gradual progression of rules, learning takes

place by a sequence of modifications to the hypothesis so as to fit the

observed examples. This is something a learning algorithm can do eas-

ily. Machine learning researchers have designed all sorts of ingenious

algorithms for finding good hypotheses quickly. Here the algorithm is

searching in the space of logical expressions representing Go rules, but

the hypotheses could also be algebraic expressions representing phys-

ical laws, probabilistic Bayesian networks representing diseases and

symptoms, or even computer programs representing the complicated

behavior of some other machine.

A second important point is that even good hypotheses can be wrong:

in fact, the hypothesis given above is wrong, even after fixing it to

ensure that G is legal. It needs to include the ko or

no- repetition rule—

for example, if White had just captured a black stone at G by playing

at D, Black may not recapture by playing at G, since that produces the

same position again. Notice that this rule is a radical departure from

what the program has learned so far, because it means that legality

cannot be determined from the current position; instead, one also has

to remember previous positions.

The Scottish philosopher David Hume pointed out in 1748 that

inductive reasoning—that is, reasoning from particular observations to

M 9780525558613_Human_TX.indd 287 8/7/19 11:21 PM

Not

e o

d importan

mporta

he hypothehe hypoth

hat G is hat G is

for

c Baye

Bay

computer

compute

ther ma

her ma

Distribution

s un

n of rules, leof rules, le

hypothesispothesi

learning algearning alg

ve designeddesigne

theses quiceses quic

al expressioal express

be algebrai

be algebra

ian n

ian

288 HUMAN COMPATIBLE

general principles— can never be guaranteed.

In the modern theory of

statistical learning, we ask not for guarantees of perfect correctness

but only for a guarantee that the hypothesis found is probably approx-

imately correct.

A learning algorithm can be “unlucky” and see an un-

representative sample— for example, it might never try a move like G,

thinking it to be illegal. It can also fail to predict some weird edge

cases, such as the ones covered by some of the more complicated and

rarely invoked forms of the no- repetition rule.

But, as long as the

universe exhibits some degree of regularity, it’s very unlikely that the

algorithm could produce a seriously bad hypothesis, because such a

hypothesis would very probably have been “found out” by one of the

experiments.

Deep learning— the technology causing all the hullabaloo about

AI in the media— is primarily a form of supervised learning. It rep-

resents one of the most significant advances in AI in recent decades, so

it’s worth understanding how it works. Moreover, some researchers

believe it will lead to human- level AI systems within a few years, so

it’s a good idea to assess whether that’s likely to be true.

It’s easiest to understand deep learning in the context of a particu-

lar task, such as learning to distinguish giraffes and llamas. Given

some labeled photographs of each, the learning algorithm has to form

a hypothesis that allows it to classify unlabeled images. An image is,

from the computer’s point of view, nothing but a large table of num-

bers, with each number corresponding to one of three RGB values for

one pixel of the image. So, instead of a Go hypothesis that takes a

board position and a move as input and decides whether the move is

legal, we need a giraffe– llama hypothesis that takes a table of numbers

as input and predicts a category (giraffe or llama).

Now the question is, what sort of hypothesis? Over the last fifty-

odd years of computer vision research, many approaches have been

tried. The current favorite is a deep convolutional network. Let me un-

pack this: It’s called a network because it represents a complex mathe-

matical expression composed in a regular way from many smaller

9780525558613_Human_TX.indd 288 8/7/19 11:21 PM

Not

otog

that allow

at allow

computer’s computer’

each nueach n

for

tand d

arning to d

arning to

graphs o

raphs o

Distribution

beca

out” by on

ut” by on

all the huthe hu

f supervisedsupervised

vances in AInces in A

works. Mororks. Mor

evel AI sysevel AI sy

ther that’

ther that

ep l

APPENDIX D: LEARNING FROM EXPERIENCE 289

subexpressions, and the compositional structure has the form of a net-

work. (Such networks are often called neural networks because their

designers draw inspiration from the networks of neurons in the brain.)

It’s called convolutional because that’s a fancy mathematical way to say

that the network structure repeats itself in a fixed pattern across the

whole input image. And it’s called deep because such networks typi-

cally have many layers, and also because it sounds impressive and

slightly spooky.

A simplified example (simplified because real networks may have

hundreds of layers and millions of nodes) is shown in figure 23. The

network is really a picture of a complex, adjustable mathematical ex-

pression. Each node in the network corresponds to a simple adjustable

expression, as illustrated in the figure. Adjustments are made by chang-

ing the

weights

on each input, as indicated by the “volume controls.” The

FIGURE OHIW$VLPSOLILHGGHSLFWLRQRIDGHHSFRQYROXWLRQDOQHWZRUNIRU

UHFRJQL]LQJREMHFWVLQLPDJHV7KHLPDJHSL[HOYDOXHVDUHIHGLQDWWKHOHIWDQG

WKHQHWZRUNRXWSXWVYDOXHVDWWKHWZRULJKWPRVWQRGHVLQGLFDWLQJKRZOLNHO\

the image is to be a llama or a giraffe. Notice how the pattern of local connec-

WLRQVLQGLFDWHG E\WKH GDUNOLQHVLQWKHILUVWOD\HU UHSHDWVDFURVVWKHZKROH

OD\HUULJKWRQHRIWKHQRGHVLQWKHQHWZRUN7KHUHLVDQDGMXVWDEOHZHLJKWRQ

HDFKLQFRPLQJYDOXHVRWKDWWKHQRGHSD\VPRUHRUOHVVDWWHQWLRQWRLW7KHQ

the total incoming signal goes through a gating function that allows large signals

through but suppresses small ones.

llama

girae

M 9780525558613_Human_TX.indd 289 8/7/19 11:21 PM

Not

be a

e a

WHGE\WKHE\ WKH

KWRQHRIWKHWRQHRIWKH

FRPLQJYDOXFRPLQJYDOX

incomiinco

for

OLILHG

LPDJHV7KH

DJHV7

WVYDOXHVDWW

VYDOXHVDWW

ama or ama or

Dis

HSLFWLRHSLFWLR

ibu

girae

290 HUMAN COMPATIBLE

weighted sum of the inputs is then passed through a gating function

before reaching the output side of the node; typically, the gating func-

tion suppresses small values and allows larger ones through.

Learning takes place in the network simply by adjusting all the

volume control knobs to reduce the prediction error on the labeled

examples. It’s as simple as that: no magic, no especially ingenious algo-

rithms. Working out which way to turn the knobs to decrease the er-

ror is a straightforward application of calculus to compute how

changing each weight would change the error at the output layer. This

leads to a simple formula for propagating the error backwards from

the output layer to the input layer, tweaking knobs along the way.

Miraculously, the process works. For the task of recognizing ob-

jects in photographs, deep learning algorithms have demonstrated re-

markable performance. The first inkling of this came in the 2012

ImageNet competition, which provides training data consisting of 1.2

million labeled images in one thousand categories, and then requires

the algorithm to label one hundred thousand new images.

Geoff Hin-

ton, a British computational psychologist who was at the forefront of

the first neural network revolution in the 1980s, had been experi-

menting with a very large deep convolutional network: 650,000 nodes

and 60 million parameters. He and his group at the University of To-

ronto achieved an ImageNet error rate of 15 percent, a dramatic im-

provement on the previous best of 26 percent.

By 2015, dozens of

teams were using deep learning methods and the error rate was down

to 5 percent, comparable to that of a human who had spent weeks

learning to recognize the thousand categories in the test.

By 2017, the

machine error rate was 2 percent.

Over roughly the same period, there have been comparable im-

provements in speech recognition and machine translation based on

similar methods. Taken together, these are three of the most import-

ant application areas for AI. Deep learning has also played an import-

ant role in applications of reinforcement learning—for example, in

9780525558613_Human_TX.indd 290 8/7/19 11:21 PM

Not

ara

d an Ima

an Ima

t on the prt on the p

e using e using

for

rk revo

rev

large deep

meters.

Distribution

ackw

along the

ong the

ask of recogsk of recog

ms have demhave de

g of this cg of this c

es training training

usand categand categ

dred thousadred thou

psycholog

psycholo

utio

APPENDIX D: LEARNING FROM EXPERIENCE 291

learning the evaluation function that AlphaGo uses to estimate the

desirability of possible future positions, and in learning controllers for

complex robotic behaviors.

As yet, we have very little understanding as to why deep learning

works as well as it does. Possibly the best explanation is that deep net-

works are deep: because they have many layers, each layer can learn a

fairly simple transformation from its inputs to its outputs, while many

such simple transformations add up to the complex transformation re-

quired to go from a photograph to a category label. In addition, deep

networks for vision have built- in structure that enforces translation

invariance and scale invariance— meaning that a dog is a dog no matter

where it appears in the image and no matter how big it appears in the

image.

Another important property of deep networks is that they often

seem to discover internal representations that capture elementary fea-

tures of images, such as eyes, stripes, and simple shapes. None of these

features are built in. We know they are there because we can experi-

ment with the trained network and see what kinds of data cause the

internal nodes (typically those close to the output layer) to light up. In

fact, it is possible to run the learning algorithm a different way so that

it adjusts the image itself to produce a stronger response at chosen

internal nodes. Repeating this process many times produces what are

now known as deep dreaming or inceptionism images, such as the one in

figure 24.

Inceptionism has become an art form in itself, producing

images unlike any human art.

For all their remarkable achievements, deep learning systems as we

currently understand them are far from providing a basis for generally

intelligent systems. Their principal weakness is that they are circuits;

they are cousins of propositional logic and Bayesian networks, which,

for all their wonderful properties, also lack the ability to express com-

plex forms of knowledge in a concise way. This means that deep

networks operating in “native mode” require vast amounts of circuitry

M 9780525558613_Human_TX.indd 291 8/7/19 11:21 PM

Not

mag

es. Repea

. Repea

wn as n as

deep ddeep

Incep Incep

for

ly thos

tho

o run the le

run the l

e itself

Distribution

orce

og is a dog

is a dog

ow big it apw big it ap

p networksp networks

tions that cans that c

pes, and sims, and sim

w they are tw they are

work and s

work and

clo

292 HUMAN COMPATIBLE

to represent fairly simple kinds of general knowledge. That, in turn,

implies vast numbers of weights to learn and hence a need for unreason-

able numbers of examples— more than the universe could ever supply.

Some argue that the brain is also made of circuits, with neurons as

the circuit elements; therefore, circuits can support human- level in-

telligence. This is true, but only in the same sense that brains are made

of atoms: atoms can indeed support human- level intelligence, but that

doesn’t mean that just collecting together lots of atoms will produce

intelligence. The atoms have to be arranged in certain ways. By the

same token, the circuits have to be arranged in certain ways. Comput-

ers are also made of circuits, both in their memories and in their

processing units; but those circuits have to be arranged in certain

ways, and layers of software have to be added, before the computer

can support the operation of high- level programming languages and

logical reasoning systems. At present, however, there is no sign that

deep learning systems can develop such capabilities by themselves—

nor does it make scientific sense to require them to do so.

FIGURE $QLPDJHJHQHUDWHGE\*RRJOH·V'HHS'UHDPVRIWZDUH

9780525558613_Human_TX.indd 292 8/7/19 11:21 PM

Not

s tr

ms can ind

can ind

ean that juean that ju

e. The e. The

for

brain

rai

; therefore

; therefor

ue, but o

e, but o

Distribut

f general kngeneral k

s to learn ans to learn a

ore thanore than

sals

ution

tio

'UHDPVRIWZHDPVRIWZ

APPENDIX D: LEARNING FROM EXPERIENCE 293

There are further reasons to think that deep learning may reach a

plateau well short of general intelligence, but it’s not my purpose here

to diagnose all the problems: others, both inside

and outside

the

deep learning community, have noted many of them. The point is that

simply creating larger and deeper networks and larger data sets and

bigger machines is not enough to create human- level AI. We have al-

ready seen (in Appendix B) DeepMind CEO Demis Hassabis’s view

that “ higher- level thinking and symbolic reasoning” are essential for

AI. Another prominent deep learning expert, François Chollet, put it

this way:

“Many more applications are completely out of reach for

current deep learning techniques— even given vast amounts of human-

annotated data. . . . We need to move away from straightforward

input- to- output mappings, and on to reasoning and abstraction.”

Learning from thinking

Whenever you find yourself having to think about something, it’s

because you don’t already know the answer. When someone asks for

the number of your brand- new cell phone, you probably don’t know it.

You think to yourself, “OK, I don’t know it; so how do I find it?” Not

being a slave to the cell phone, you don’t know how to find it. You

think to yourself, “How do I figure out how to find it?” You have a

generic answer to this: “Probably they put it somewhere that’s easy for

users to find.” (Of course, you could be wrong about this.) Obvious

places would be at the top of the home screen (not there), inside the

Phone app, or in Settings for that app. You try Settings>Phone, and

there it is.

The next time you are asked for your number, you either know it

or you know exactly how to get it. You remember the procedure, not

just for this phone on this occasion but for all similar phones on all

occasions—that is, you store and reuse a generalized solution to the

problem. The generalization is justified because you understand that

M 9780525558613_Human_TX.indd 293 8/7/19 11:21 PM

Not

urself, “H

elf, “H

nswer to thnswer to t

find.” (Ofind.” (

for

nd-

elf, “OK, I

elf, “OK,

e cell ph

cell p

Distribution

out

amounts o

mounts o

y from strafrom stra

ning and abg and ab

elf having telf having

know the

wcel

294 HUMAN COMPATIBLE

the specifics of this particular phone and this particular occasion are

irrelevant. You would be shocked if the method worked only on Tues-

days for phone numbers ending in 17.

Go offers a beautiful example of the same kind of learning. In

figure 25(a), we see a common situation where Black threatens to cap-

ture White’s stone by surrounding it. White attempts to escape by

adding stones connected to the original one, but Black continues to

cut off the routes of escape. This pattern of moves forms a ladder of

stones diagonally across the board, until it runs into the edge; then

White has nowhere to go. If you are White, you probably won’t make

the same mistake again: you realize that the ladder pattern always

results in eventual capture, for any initial location and any direction,

at any stage of the game, whether you are playing White or Black. The

only exception occurs when the ladder runs into some additional

stones belonging to the escapee. The generality of the ladder pattern

follows straightforwardly from the rules of Go.

The case of the missing phone number and the case of the Go lad-

der illustrate the possibility of learning effective, general rules from a

single example— a far cry from the millions of examples needed for

deep learning. In AI, this kind of learning is called explanation- based

. . .

(a) (b) (c) (d) (e)

FIGURE 25: The concept of a ladder in Go. (a) Black threatens to capture

:KLWH·VSLHFHE:KLWHWULHVWRHVFDSHF%ODFNEORFNVWKDWGLUHFWLRQRIHV-

FDSHG :KLWHWULHVWKHRWKHUGLUHFWLRQH3OD\FRQWLQXHVLQWKH VHTXHQFH

LQGLFDWHGE\WKHQXPEHUV7KHODGGHUHYHQWXDOO\UHDFKHVWKHHGJHRIWKHERDUG

ZKHUH:KLWHKDVQRZKHUHWRUXQ7KHFRXSGHJUkFHLVDGPLQLVWHUHGE\PRYH

:KLWH·VJURXSLVFRPSOHWHO\VXUURXQGHGDQGGLHV

9780525558613_Human_TX.indd 294 8/7/19 11:21 PM

Not

s o

nally acro

ly acro

s nowhere ts nowhere

mistakemistak

for

surrou

rro

ected to th

cted to t

fescape

escape

Distribution

mple of themple of th

on situatio

ndin

reatens to c

tens to c

NVWKDWGLUHFWLVWKDWGLUHFW

RQWLQXHVLQWKWLQXHVLQWK

HDFKHVWKHHGKHVWKHHG

HJUkFHLVDGPHJUkFHLVDGP

DQGGLHVGGLHV

APPENDIX D: LEARNING FROM EXPERIENCE 295

learning: on seeing the example, the agent can explain to itself why it

came out that way and can extract the general principle by seeing

what factors were essential for the explanation.

Strictly speaking, the process does not, by itself, add new knowl-

edge—for example, White could have simply derived the existence

and outcome of the general ladder pattern from the rules of Go, with-

out ever seeing an example.

Chances are, however, that White wouldn’t

ever discover the ladder concept without seeing an example of it; so,

we can understand explanation- based learning as a powerful method

for saving the results of computation in a generalized way, so as to

avoid having to recapitulate the same reasoning process (or making

the same mistake with an imperfect reasoning process) in the future.

Research in cognitive science has stressed the importance of this

type of learning in human cognition. Under the name of chunking, it

forms a central pillar of Allen Newell’s highly influential theory of

cognition.

(Newell was one of the attendees of the 1956 Dartmouth

workshop and co- winner of the 1975 Turing Award with Herb Si-

mon.) It explains how humans become more fluent at cognitive tasks

with practice, as various subtasks that originally required thinking be-

come automatic. Without it, human conversations would be limited to

one- or two- word responses and mathematicians would still be count-

ing on their fingers.

M 9780525558613_Human_TX.indd 295 8/7/19 11:21 PM

Not

rd r

fingers.ngers.

for

us subt

sub

Without it, h

ithout it,

esponse

Distribution

ed w

process (o

ocess (o

process) inprocess) in

sed the impthe im

nder the nander the n

well’s highlyl’s highl

he attendee attendee

the 1975 Tthe 1975

ans becom

sks

9780525558613_Human_TX.indd 296 8/7/19 11:21 PM

Not

for

Distribution

Acknowledgments

Many people have helped in the creation of this book. They include

my excellent editors at Viking (Paul Slovak) and Penguin (Laura

Stickney); my agent, John Brockman, who encouraged me to write

something; Jill Leovy and Rob Reid, who provided reams of useful

feedback; and other readers of early drafts, especially Ziyad Marar,

Nick Hay, Toby Ord, David Duvenaud, Max Tegmark, and Grace

Cassy. Caroline Jeanmaire was immensely helpful in collating the in-

numerable suggestions for improvements made by the early readers,

and Martin Fukui handled the collecting of permissions for images.

The main technical ideas in the book have been developed in col-

laboration with the members of the Center for Human-Compatible

AI at Berkeley, especially Tom Griffiths, Anca Dragan, Andrew

Critch, Dylan Hadfield-Menell, Rohin Shah, and Smitha Milli. The

Center has been admirably piloted by executive director Mark Nitz-

berg and assistant director Rosie Campbell, and generously funded by

the Open Philanthropy Foundation.

Ramona Alvarez and Carine Verdeau helped to keep things run-

ning throughout the process, and my incredible wife, Loy, and our

children—Gordon, Lucy, George, and Isaac—supplied copious and

necessary amounts of love, forbearance, and encouragement to finish,

not always in that order.

9780525558613_Human_TX.indd 297 8/7/19 11:21 PMM

Not

chn

with the m

h the m

erkeley, esperkeley, e

Dylan HDylan H

for

s for im

handled the

handled th

nical ide

ical ide

Distribution

his book. Tis book. T

vak) and P) and

who encouwho encou

d, who prowho pro

early draftsrly drafts

DuvenaudDuvenau

was imme

was imm

prov

9780525558613_Human_TX.indd 298 8/7/19 11:21 PM

Not

for

Distribution

Notes

CHAPTER 1

1. The first edition of my textbook on AI, co- authored with Peter Norvig, currently di-

rector of research at Google: Stuart Russell and Peter Norvig, Artificial Intelligence: A

Modern Approach, 1st ed. (Prentice Hall, 1995).

2. Robinson developed the resolution algorithm, which can, given enough time, prove any

logical consequence of a set of first- order logical assertions. Unlike previous algo-

rithms, it did not require conversion to propositional logic. J. Alan Robinson, “A

machine- oriented logic based on the resolution principle,” Journal of the ACM 12

(1965): 23– 41.

3. Arthur Samuel, an American pioneer of the computer era, did his early work at IBM.

The paper describing his work on checkers was the first to use the term machine learn-

ing, although Alan Turing had already talked about “a machine that can learn from ex-

perience” as early as 1947. Arthur Samuel, “Some studies in machine learning using the

game of checkers,” IBM Journal of Research and Development 3 (1959): 210– 29.

4. The “Lighthill Report,” as it became known, led to the termination of research funding

for AI except at the universities of Edinburgh and Sussex: Michael James Lighthill,

“Artificial intelligence: A general survey,” in Artificial Intelligence: A Paper Symposium

(Science Research Council of Great Britain, 1973).

5. The CDC 6600 filled an entire room and cost the equivalent of $20 million. For its era

it was incredibly powerful, albeit a million times less powerful than an iPhone.

6. Following Deep Blue’s victory over Kasparov, at least one commentator predicted

that it would take one hundred years before the same thing happened in Go: George

Johnson, “To test a powerful computer, play an ancient game,” The New York Times,

July 29, 1997.

7. For a highly readable history of the development of nuclear technology, see Richard

Rhodes, The Making of the Atomic Bomb (Simon & Schuster, 1987).

8. A simple supervised learning algorithm may not have this effect, unless it is wrapped

within an A/ B testing framework (as is common in online marketing settings). Bandit

algorithms and reinforcement learning algorithms will have this effect if they operate

with an explicit representation of user state or an implicit representation in terms of

the history of interactions with the user.

9. Some have argued that profit- maximizing corporations are already out- of- control ar-

tificial entities. See, for example, Charles Stross, “Dude, you broke the future!” (key-

note, 34th Chaos Communications Congress, 2017). See also Ted Chiang, “Silicon

Valley is turning into its own worst fear,” Buzzfeed, December 18, 2017. The idea is

explored further by Daniel Hillis, “The first machine intelligences,” in Possible Minds:

Twent y- Five Ways of Looking at AI, ed. John Brockman (Penguin Press, 2019).

9780525558613_Human_TX.indd 299 8/7/19 11:21 PMM

Not

lligenc

gen

search Coun

rch Cou

C 6600 filled a6600 filled a

ncredibly powncredibly pow

wing Deep Bwing Deep

ould taould

for

had

7. A rthur

rth

M Journal of R

ournal o

ort,” as it becam

t,” as it beca

e universiti

universit

A ge

A g

Distribution

with Peter Nth Peter

eter Norvig, Norvig,

which can, giwhich can,

r logical asseogical asse

to propositionpropositio

e resolution pesolution

oneer of the coneer of the c

on checkers w

on checkers

ready talready ta

amu

300 NOTES

10. For its time, Wiener’s paper was a rare exception to the prevailing view that all tech-

nological progress was a good thing: Norbert Wiener, “Some moral and technical con-

sequences of automation,” Science 131 (1960): 1355– 58.

CHAPTER 2

1. Santiago Ramón y Cajal proposed synaptic changes as the site of learning in 1894, but

it was not until the late 1960s that this hypothesis was confirmed experimentally. See

Timothy Bliss and Terje Lomo, “ Long- lasting potentiation of synaptic transmission in

the dentate area of the anaesthetized rabbit following stimulation of the perforant

path,” Journal of Physiology 232 (1973): 331– 56.

2. For a brief introduction, see James Gorman, “Learning how little we know about the

brain,” The New York Times, November 10, 2014. See also Tom Siegfried, “There’s a

long way to go in understanding the brain,” ScienceNews, July 25, 2017. A special 2017

issue of the journal Neuron (vol. 94, pp. 933– 1040) provides a good overview of many

different approaches to understanding the brain.

3. The presence or absence of consciousness— actual subjective experience— certainly

makes a difference in our moral consideration for machines. If ever we gain enough

understanding to design conscious machines or to detect that we have done so, we

would face many important moral issues for which we are largely unprepared.

4. The following paper was among the first to make a clear connection between re-

inforcement learning algorithms and neurophysiological recordings: Wolfram Schultz,

Peter Dayan, and P. Read Montague, “A neural substrate of prediction and reward,”

Science 275 (1997): 1593– 99.

5. Studies of intracranial stimulation were carried out with the hope of finding cures for

various mental illnesses. See, for example, Robert Heath, “Electrical self- stimulation

of the brain in man,” American Journal of Psychiatry 120 (1963): 571– 77.

6. An example of a species that may be facing self- extinction via addiction: Bryson Voi-

rin, “Biology and conservation of the pygmy sloth, Bradypus pygmaeus,” Journal of

Mammalogy 96 (2015): 703– 7.

7. The Baldwin effect in evolution is usually attributed to the following paper: James

Baldwin, “A new factor in evolution,” American Naturalist 30 (1896): 441– 51.

8. The core idea of the Baldwin effect also appears in the following work: Conwy Lloyd

Morgan, Habit and Instinct (Edward Arnold, 1896).

9. A modern analysis and computer implementation demonstrating the Baldwin effect:

Geoffrey Hinton and Steven Nowlan, “How learning can guide evolution,” Complex

Systems 1 (1987): 495– 502.

10. Further elucidation of the Baldwin effect by a computer model that includes the evo-

lution of the internal reward- signaling circuitry: David Ackley and Michael Littman,

“Interactions between learning and evolution,” in Artificial Life II, ed. Christopher

Langton et al. ( Addison- Wesley, 1991).

11. Here I am pointing to the roots of our present- day concept of intelligence, rather than

describing the ancient Greek concept of nous, which had a variety of related meanings.

12. The quotation is taken from Aristotle, Nicomachean Ethics, Book III, 3, 1112b.

13. Cardano, one of the first European mathematicians to consider negative numbers,

developed an early mathematical treatment of probability in games. He died in 1576,

eighty-

seven years before his work appeared in print: Gerolamo Cardano, Liber de ludo

aleae (Lyons, 1663).

14. Arnauld’s work, initially published anonymously, is often called The Port- Royal Logic:

Antoine Arnauld, La logique, ou l’art de penser (Chez Charles Savreux, 1662). See also

Blaise Pascal, Pensées (Chez Guillaume Desprez, 1670).

15. The concept of utility: Daniel Bernoulli, “Specimen theoriae novae de mensura sortis,”

Proceedings of the St. Petersburg Imperial Academy of Sciences 5 (1738): 175– 92. Bernoul-

li’s idea of utility arises from considering a merchant, Sempronius, choosing whether to

transport a valuable cargo in one ship or to split it between two, assuming that each ship

has a 50 percent probability of sinking on the journey. The expected monetary value of

the two solutions is the same, but Sempronius clearly prefers the two- ship solution.

9780525558613_Human_TX.indd 300 8/7/19 11:21 PM

Not

7):

49549

dation of th

tion of th

he internal e internal

tions betweentions betwee

n et al. (n et al. (

((

ddd

pointipoin

for

volu

win effec

eff

inctt

(Edward

(Edwa

nd computer

d computer

Steven N

02.

Distribution

perien

f ever we ger we g

hat we have d

we have d

largely unprepargely unprep

clear connectiar connecti

cal recordings:ecordings

bstrate of preate of pre

d out with thout with t

Robert HeathRobert Heath

fPsychiatryychiatry

1212

acing acing

elf-lf-

xtxt

he pygmy sloe pygmy s

is usually a

n,” n,”

AmeAm

also

als

NOTES 301

16. By most accounts, von Neumann did not himself invent this architecture but his

name was on an early draft of an influential report describing the EDVAC stored-

program computer.

17. The work of von Neumann and Morgenstern is in many ways the foundation of mod-

ern economic theory: John von Neumann and Oskar Morgenstern, Theory of Games

and Economic Behavior (Princeton University Press, 1944).

18. The proposal that utility is a sum of discounted rewards was put forward as a mathe-

matically convenient hypothesis by Paul Samuelson, “A note on measurement of util-

ity,” Review of Economic Studies 4 (1937): 155– 61. If s

, s

,... is a sequence of states,

then its utility in this model is U(s

, s

,...) = ∑

R(s

), whereKis a discount factor and

R is a reward function describing the desirability of a state. Naïve application of this

model seldom agrees with the judgment of real individuals about the desirability of

present and future rewards. For a thorough analysis, see Shane Frederick, George Loe-

wenstein, and Ted O’Donoghue, “Time discounting and time preference: A critical

review,” Journal of Economic Literature 40 (2002): 351– 401.

19. Maurice Allais, a French economist, proposed a decision scenario in which humans

appear consistently to violate the von Neumann– Morgenstern axioms: Maurice Allais,

“Le comportement de l’homme rationnel devant le risque: Critique des postulats et

axiomes de l’école américaine,” Econometrica 21 (1953): 503– 46.

20. For an introduction to non- quantitative decision analysis, see Michael Wellman, “Fun-

damental concepts of qualitative probabilistic networks,” Artificial Intelligence 44

(1990): 257– 303.

21. I will discuss the evidence for human irrationality further in Chapter 9. The standard

references include the following: Allais, “Le comportement”; Daniel Ellsberg, Risk,

Ambiguity, and Decision (PhD thesis, Harvard University, 1962); Amos Tversky and

Daniel Kahneman, “Judgment under uncertainty: Heuristics and biases,” Science 185

(1974): 1124– 31.

22. It should be clear that this is a thought experiment that cannot be realized in practice.

Choices about different futures are never presented in full detail, and humans never

have the luxury of minutely examining and savoring those futures before choosing.

Instead, one is given only brief summaries, such as “librarian” or “coal miner.” In mak-

ing such a choice, one is really being asked to compare two probability distributions

over complete futures, one beginning with the choice “librarian” and the other “coal

miner,” with each distribution assuming optimal actions on one’s own part within

each future. Needless to say, this is not easy.

23. The first mention of a randomized strategy for games appears in Pierre Rémond de

Montmort, Essay d’analyse sur les jeux de hazard, 2nd ed. (Chez Jacques Quillau,

1713). The book identifies a certain Monsieur de Waldegrave as the source of an opti-

mal randomized solution for the card game Le Her. Details of Waldegrave’s identity

are revealed by David Bellhouse, “The problem of Waldegrave,” Electronic Journal for

History of Probability and Statistics 3 (2007).

24. The problem is fully defined by specifying the probability that Alice scores in each of

four cases: when she shoots to Bob’s right and he dives right or left, and when she shoots

to his left and he dives right or left. In this case, these probabilities are 25 percent, 70

percent, 65 percent, and 10 percent respectively. Now suppose that Alice’s strategy is to

shoot to Bob’s right with probability p and his left with probability 1 − p, while Bob dives

to his right with probability q and left with probability 1 − q. The payoff to Alice is

= 0.25pq + 0.70 p(1 − q) + 0.65 (1 − p)q + 0.10(1 − p) (1 − q), while Bob’s payoff is

= −U

. At equilibrium, ∂U

/∂p = 0 and ∂U

/∂q = 0, giving p = 0.55 and q = 0.60.

25. The original game- theoretic problem was introduced by Merrill Flood and Melvin

Dresher at the RAND Corporation; Tucker saw the payoff matrix on a visit to their

offices and proposed a “story” to go along with it.

26. Game theorists typically say that Alice and Bob could cooperate with each other (re-

fuse to talk) or defect and rat on their accomplice. I find this language confusing, be-

cause “cooperate with each other” is not a choice that each agent can make separately,

and because in common parlance one often talks about cooperating with the police,

receiving a lighter sentence in return for cooperating, and so on.

M 9780525558613_Human_TX.indd 301 8/7/19 11:21 PM

Not

ssay d’ay d

book identif

ok identif

omized solutimized soluti

vealed by Davvealed by Dav

y of Probabiy of Probab

lem islem

for

eally

one begin

beg

stribution as

stribution a

ess to say, this

s to say, this

f a random

a random

alyse

lys

Distribution

ioms:

ritique desque des

3–

46.46.

see Michael Wee Michael W

orks,” ks,”

ArtificiArtifici

y further in Chther in Ch

omportementomportemen

rd Universityd Universi

ertainty: Heurrtainty: Heur

ht experimentht experiment

e never presenever pre

amining and amining and

summaries,

eing askeing as

302 NOTES

27. For an interesting trust- based solution to the prisoner’s dilemma and other games, see

Joshua Letchford, Vincent Conitzer, and Kamal Jain, “An ‘ethical’ game- theoretic

solution concept for two- player perfect- information games,” in Proceedings of the 4th

International Workshop on Web and Internet Economics, ed. Christos Papadimitriou and

Shuzhong Zhang (Springer, 2008).

28. Origin of the tragedy of the commons: William Forster Lloyd, Two Lectures on the

Checks to Population (Oxford University, 1833).

29. Modern revival of the topic in the context of global ecology: Garrett Hardin, “The

tragedy of the commons,” Science 162 (1968): 1243– 48.

30. It’s quite possible that even if we had tried to build intelligent machines from chemical

reactions or biological cells, those assemblages would have turned out to be implemen-

tations of Turing machines in nontraditional materials. Whether an object is a general-

purpose computer has nothing to do with what it’s made of.

31. Turing’s breakthrough paper defined what is now known as the Turing machine, the

basis for modern computer science. The Entscheidungsproblem, or decision problem, in

the title is the problem of deciding entailment in first- order logic: Alan Turing, “On

computable numbers, with an application to the Entscheidungsproblem,” Proceedings of

the London Mathematical Society, 2nd ser., 42 (1936): 230– 65.

32. A good survey of research on negative capacitance by one of its inventors: Sayeef Sala-

huddin, “Review of negative capacitance transistors,” in International Symposium on

VLSI Technology, Systems and Application (IEEE Press, 2016).

33. For a much better explanation of quantum computation, see Scott Aaronson, Quan-

tum Computing since Democritus (Cambridge University Press, 2013).

34. The paper that established a clear complexity- theoretic distinction between classical

and quantum computation: Ethan Bernstein and Umesh Vazirani, “Quantum com-

plexity theory,” SIAM Journal on Computing 26 (1997): 1411– 73.

35. The following article by a renowned physicist provides a good introduction to the

current state of understanding and technology: John Preskill, “Quantum computing in

the NISQ era and beyond,” arXiv:1801.00862 (2018).

36. On the maximum computational ability of a one- kilogram object: Seth Lloyd, “Ulti-

mate physical limits to computation,” Nature 406 (2000): 1047– 54.

37. For an example of the suggestion that humans may be the pinnacle of physically

achievable intelligence, see Kevin Kelly, “The myth of a superhuman AI,” Wired, April

25, 2017: “We tend to believe that the limit is way beyond us, way ‘above’ us, as we are

‘above’ an ant.... What evidence do we have that the limit is not us?”

38. In case you are wondering about a simple trick to solve the halting problem: the obvi-

ous method of just running the program to see if it finishes doesn’t work, because that

method doesn’t necessarily finish. You might wait a million years and still not know if

the program is really stuck in an infinite loop or just taking its time.

39. The proof that the halting problem is undecidable is an elegant piece of trickery. The

question: Is there a LoopChecker(P,X) program that, for any program P and any input

X, decides correctly, in finite time, whether P applied to input

X will halt and produce

a result or keep chugging away forever? Suppose that LoopChecker exists. Now write

a program Q that calls LoopChecker as a subroutine, with Q itself and X as inputs, and

then does the opposite of what LoopChecker(Q,X) predicts. So, if LoopChecker says

that Q halts, Q doesn’t halt, and vice versa. Thus, the assumption that LoopChecker

exists leads to a contradiction, so LoopChecker cannot exist.

40. I say “appear” because, as yet, the claim that the class of NP- complete problems re-

quires superpolynomial time (usually referred to as P ≠ NP) is still an unproven con-

jecture. After almost fifty years of research, however, nearly all mathematicians and

computer scientists are convinced the claim is true.

41. Lovelace’s writings on computation appear mainly in her notes attached to her trans-

lation of an Italian engineer’s commentary on Babbage’s engine: L. F. Menabrea,

“Sketch of the Analytical Engine invented by Charles Babbage,” trans. Ada, Countess

of Lovelace, in Scientific Memoirs, vol. III, ed. R. Taylor (R. and J. E. Taylor, 1843).

Menabrea’s original article, written in French and based on lectures given by Babbage

in 1840, appears in Bibliothèque Universelle de Genève 82 (1842).

9780525558613_Human_TX.indd 302 8/7/19 11:21 PM

Not

ond

ust runt ru

n’t necessari

necessar

m is really stum is really stu

of that the haof that the ha

n: Is there an: Is there

correccorr

for

gest

e Kevin K

evi

believe that th

believe that

hat evidence

ring abou

ing th

ng t

Distribution

blem

ts inventors: S

nventors: S

nternational Sternational S

016).6).

ion, see Scottsee Scot

rsity Press, 20y Press, 20

eoretic distineoretic distin

and Umesh Vnd Umesh

26 (1997): 26 (1997):

hysicist providicist provid

hnology: Johnhnology: John

801.00862 (21.00862 (

l ability of a l ability of a

tion,”

tion

NaturNatur

n that hn that

lly

NOTES 303

42. One of the seminal early papers on the possibility of artificial intelligence: Alan Tur-

ing, “Computing machinery and intelligence,” Mind 59 (1950): 433– 60.

43. The Shakey project at SRI is summarized in a retrospective by one of its leaders: Nils

Nilsson, “Shakey the robot,” technical note 323 (SRI International, 1984). A twenty-

four- minute film, SHAKEY: Experimentation in Robot Learning and Planning, was

made in 1969 and garnered national attention.

44. The book that marked the beginning of modern, probability- based AI: Judea Pearl,

Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (Morgan

Kaufmann, 1988).

45. Technically, chess is not fully observable. A program does need to remember a small

amount of information to determine the legality of castling and en passant moves and

to define draws by repetition or by the fifty- move rule.

46. For a complete exposition, see Chapter 2 of Stuart Russell and Peter Norvig, Artificial

Intelligence: A Modern Approach, 3rd ed. (Pearson, 2010).

47. The size of the state space for StarCraft is discussed by Santiago Ontañon et al., “A

survey of real- time strategy game AI research and competition in StarCraft,” IEEE

Transactions on Computational Intelligence and AI in Games 5 (2013): 293– 311. Vast

numbers of moves are possible because a player can move all units simultaneously. The

numbers go down as restrictions are imposed on how many units or groups of units can

be moved at once.

48. On human– machine competition in StarCraft: Tom Simonite, “DeepMind beats pros

at StarCraft in another triumph for bots,” Wired, January 25, 2019.

49. AlphaZero is described by David Silver et al., “Mastering chess and shogi by self- play

with a general reinforcement learning algorithm,” arXiv:1712.01815 (2017).

50. Optimal paths in graphs are found using the A* algorithm and its many descendants:

Peter Hart, Nils Nilsson, and Bertram Raphael, “A formal basis for the heuristic deter-

mination of minimum cost paths,” IEEE Transactions on Systems Science and Cybernetics

SSC- 4 (1968): 100– 107.

51. The paper that introduced the Advice Taker program and logic- based knowledge sys-

tems: John McCarthy, “Programs with common sense,” in Proceedings of the Symposium

on Mechanisation of Thought Processes (Her Majesty’s Stationery Office, 1958).

52. To get some sense of the significance of knowledge- based systems, consider database

systems. A database contains concrete, individual facts, such as the location of my keys

and the identities of your Facebook friends. Database systems cannot store general

rules, such as the rules of chess or the legal definition of British citizenship. They

can count how many people called Alice have friends called Bob, but they cannot

determine whether a particular Alice meets the conditions for British citizenship

or whether a particular sequence of moves on a chessboard will lead to checkmate.

Database systems cannot combine two pieces of knowledge to produce a third: they

support memory but not reasoning. (It is true that many modern database systems

provide a way to add rules and a way to use those rules to derive new facts; to the

extent that they do, they are really knowledge- based systems.) Despite being highly

constricted versions of knowledge- based systems, database systems underlie most of

present- day commercial activity and generate hundreds of billions of dollars in value

every year.

53. The original paper describing the completeness theorem for first- order logic: Kurt

Gödel, “Die Vollständigkeit der Axiome des logischen Funktionenkalküls,” Monat-

shefte für Mathematik 37 (1930): 349– 60.

54. The reasoning algorithm for first- order logic does have a gap: if there is no answer—

that is, if the available knowledge is insufficient to give an answer either way— then

the algorithm may never finish. This is unavoidable: it is mathematically impossible for

a correct algorithm always to terminate with “don’t know,” for essentially the same

reason that no algorithm can solve the halting problem (page 37).

55. The first algorithm for theorem- proving in first- order logic worked by reducing first-

order sentences to (very large numbers of) propositional sentences: Martin Davis and

Hilary Putnam, “A computing procedure for quantification theory,” Journal of the

ACM 7 (1960): 201– 15. Robinson’s resolution algorithm operated directly on first- order

M 9780525558613_Human_TX.indd 303 8/7/19 11:21 PM

hether aher

a particular

particula

systems cannsystems cann

rt memory burt memory bu

e a way to e a way to

at theat t

for

igni

tains con

s co

your Facebo

your Faceb

rules of chess

ules of chess

y people

parti

art

Distribution

2013):

)

nits simultasimulta

units or groups

ts or group

monite, “Deeonite, “Dee

nuary 25, 201y 25, 201

astering chessring chess

m,” arXiv:1712,” arXiv:1712

e A* algorithmA* algorith

hael, “A formhael, “A form

Transactions oansactions o

vice Taker proce Taker p

s with commos with commo

ocesses

oces

(Her (Her

ance of ance of

ete

304 NOTES

logical sentences, using “unification” to match complex expressions containing logical

variables: J. Alan Robinson, “A machine- oriented logic based on the resolution princi-

ple,” Journal of the ACM 12 (1965): 23– 41.

56. One might wonder how Shakey the logical robot ever reached any definite conclusions

about what to do. The answer is simple: Shakey’s knowledge base contained false as-

sertions. For example, Shakey believed that by executing “push object A through door

D into room B,” object A would end up in room B. This belief was false because Shakey

could get stuck in the doorway or miss the doorway altogether or someone might

sneakily remove object A from Shakey’s grasp. Shakey’s plan execution module could

detect plan failure and replan accordingly, so Shakey was not, strictly speaking, a

purely logical system.

57. An early commentary on the role of probability in human thinking: Pierre- Simon La-

place, Essai philosophique sur les probabilités (Mme. Ve. Courcier, 1814).

58. Bayesian logic described in a fairly nontechnical way: Stuart Russell, “Unifying logic

and probability,” Communications of the ACM 58 (2015): 88– 97. The paper draws

heavily on the PhD thesis research of my former student Brian Milch.

59. The original source for Bayes’ theorem: Thomas Bayes and Richard Price, “An essay

towards solving a problem in the doctrine of chances,” Philosophical Transactions of the

Royal Society of London 53 (1763): 370– 418.

60. Technically, Samuel’s program did not treat winning and losing as absolute rewards; by

fixing the value of material to be positive; however, the program generally tended to

work towards winning.

61. The application of reinforcement learning to produce a world- class backgammon pro-

gram: Gerald Tesauro, “Temporal difference learning and TD- Gammon,” Communica-

tions of the ACM 38 (1995): 58– 68.

62. The DQN system that learns to play a wide variety of video games using deep RL:

Volodymyr Mnih et al., “ Human- level control through deep reinforcement learning,”

Nature 518 (2015): 529– 33.

63. Bill Gates’s remarks on Dota 2 AI: Catherine Clifford, “Bill Gates says gamer bots

from Elon Musk- backed nonprofit are ‘huge milestone’ in A.I.,” CNBC, June 28,

2018.

64. An account of OpenAI Five’s victory over the human world champions at Dota 2:

Kelsey Piper, “AI triumphs against the world’s top pro team in strategy game Dota 2,”

Vox, April 13, 2019.

65. A compendium of cases in the literature where misspecification of reward functions

led to unexpected behavior: Victoria Krakovna, “Specification gaming examples in

AI,” Deep Safety (blog), April 2, 2018.

66. A case where an evolutionary fitness function defined in terms of maximum velocity

led to very unexpected results: Karl Sims, “Evolving virtual creatures,” in Proceed-

ings of the 21st Annual Conference on Computer Graphics and Interactive Techniques

(ACM, 1994).

67. For a fascinating exposition of the possibilities of reflex agents, see Valentino Braiten-

berg, Vehicles: Experiments in Synthetic Psychology (MIT Press, 1984).

68. News article on a fatal accident involving a vehicle in autonomous mode that hit a

pedestrian: Devin Coldewey, “Uber in fatal crash detected pedestrian but had emer-

gency braking disabled,” TechCrunch, May 24, 2018.

69. On steering control algorithms, see, for example, Jarrod Snider, “Automatic steering

methods for autonomous automobile path tracking,” technical report CMU- RI- TR-

09- 08, Robotics Institute, Carnegie Mellon University, 2009.

70. Norfolk and Norwich terriers are two categories in the ImageNet database. They are

notoriously hard to tell apart and were viewed as a single breed until 1964.

71. A very unfortunate incident with image labeling: Daniel Howley, “Google Photos mis-

labels 2 black Americans as gorillas,” Yahoo Tech, June 29, 2015.

72. Follow- up article on Google and gorillas: Tom Simonite, “When it comes to gorillas,

Google Photos remains blind,” Wired, January 11, 2018.

9780525558613_Human_TX.indd 304 8/7/19 11:21 PM

d b

(blog)blog

e an evolutio

n evolutio

y unexpectedunexpected

he 21st Annuhe 21st Annu

1994).1994).

natingnatin

for

e’s v

hs against

gain

ases in the lite

es in the lit

havior: V

April

Apr

Distribution

ard Pr

ical TransaTransa

sing as absoluting as absolu

program genrogram gen

ce a

orld-rld-

lal

ning and TDing and TD

de variety of de variety of

ontrol throughrol through

Catherine CCatherine

fit are ‘huge fit are ‘huge

tory ovtory ov

he w

NOTES 305

&+$37(5

1. The basic plan for game- playing algorithms was laid out by Claude Shannon, “Pro-

gramming a computer for playing chess,” Philosophical Magazine, 7th ser., 41 (1950):

256– 75.

2. See figure 5.12 of Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern

Approach, 1st ed. (Prentice Hall, 1995). Note that the rating of chess players and chess

programs is not an exact science. Kasparov’s highest- ever Elo rating was 2851, achieved

in 1999, but current chess engines such as Stockfish are rated at 3300 or more.

3. The earliest reported autonomous vehicle on a public road: Ernst Dickmanns and Al-

fred Zapp, “Autonomous high speed road vehicle guidance by computer vision,” IFAC

Proceedings Volumes 20 (1987): 221– 26.

4. The safety record for Google (subsequently Waymo) vehicles: “Waymo safety report:

On the road to fully self- driving,” 2018.

5. So far there have been at least two driver fatalities and one pedestrian fatality. Some

references follow, along with brief quotes describing what happened. Danny Yadron

and Dan Tynan, “Tesla driver dies in first fatal crash while using autopilot mode,”

Guardian, June 30, 2016: “The autopilot sensors on the Model S failed to distinguish

a white tractor- trailer crossing the highway against a bright sky.” Megan Rose Dickey,

“Tesla Model X sped up in Autopilot mode seconds before fatal crash, according to

NTSB,” TechCrunch, June 7, 2018: “At 3 seconds prior to the crash and up to the

time of impact with the crash attenuator, the Tesla’s speed increased from 62 to 70.8

mph, with no precrash braking or evasive steering movement detected.” Devin

Coldewey, “Uber in fatal crash detected pedestrian but had emergency braking dis-

abled,” TechCrunch, May 24, 2018: “Emergency braking maneuvers are not enabled

while the vehicle is under computer control, to reduce the potential for erratic vehicle

behavior.”

6. The Society of Automotive Engineers (SAE) defines six levels of automation, where

Level 0 is none at all and Level 5 is full automation: “The full- time performance by an

automatic driving system of all aspects of the dynamic driving task under all roadway

and environmental conditions that can be managed by a human driver.”

7. Forecast of economic effects of automation on transportation costs: Adele Peters, “It

could be 10 times cheaper to take electric robo- taxis than to own a car by 2030,” Fast

Company, May 30, 2017.

8. The impact of accidents on the prospects for regulatory action on autonomous vehi-

cles: Richard Waters, “ Self- driving car death poses dilemma for regulators,” Financial

Times, March 20, 2018.

9. The impact of accidents on public perception of autonomous vehicles: Cox Automo-

tive, “Autonomous vehicle awareness rising, acceptance declining, according to Cox

Automotive mobility study,” August 16, 2018.

10. The original chatbot: Joseph Weizenbaum, “ ELIZA— a computer program for the

study of natural language communication between man and machine,” Communica-

tions of the ACM 9 (1966): 36– 45.

11. See physiome.org for current activities in physiological modeling. Work in the 1960s

assembled models with thousands of differential equations: Arthur Guyton, Thomas

Coleman, and Harris Granger, “Circulation: Overall regulation,” Annual Review of

Physiology 34 (1972): 13– 44.

12. Some of the earliest work on tutoring systems was done by Pat Suppes and colleagues

at Stanford: Patrick Suppes and Mona Morningstar, “ Computer- assisted instruction,”

Science 166 (1969): 343– 50.

13. Michael Yudelson, Kenneth Koedinger, and Geoffrey Gordon, “Individualized Bayes-

ian knowledge tracing models,” in Artificial Intelligence in Education: 16th International

Conference, ed. H. Chad Lane et al. (Springer, 2013).

14. For an example of machine learning on encrypted data, see, for example, Reza

Shokri and Vitaly Shmatikov, “ Privacy- preserving deep learning,” in Proceedings of the

22nd ACM SIGSAC Conference on Computer and Communications Security

(ACM,

2015).

M 9780525558613_Human_TX.indd 305 8/7/19 11:21 PM

Not

0, 20

, 20

of accidenccide

onomous veh

otive mobility ve mobilit

riginal chatbriginal chatb

f naturalf natural

for

r to ta

ents on the pr

ents on the p

s, “

, “

elf-elf-

rivri

8.8

Distribution

S failed

ailed

ky.” Megan R

Megan R

re fatal crash,re fatal cras

r to the crashto the cra

speed increaseed increase

ring movemeg moveme

rian but had rian but had

ncy braking mcy braking m

, to reduce tho reduce th

(SAE) defineAE) defi

ull automatioull automatio

pects of the dpects of the

that can be m

fautomatiofautoma

electelect

306 NOTES

15. A retrospective on the first smart home, based on a lecture by its inventor, James

Sutherland: James E. Tomayko, “Electronic Computer for Home Operation (ECHO):

The first home computer,” IEEE Annals of the History of Computing 16 (1994): 59– 61.

16. Summary of a smart- home project based on machine learning and automated deci-

sions: Diane Cook et al., “MavHome: An agent- based smart home,” in Proceedings of the

1st IEEE International Conference on Pervasive Computing and Communications (IEEE,

2003).

17. For the beginnings of an analysis of user experiences in smart homes, see Scott Da-

vidoff et al., “Principles of smart home control,” in Ubicomp 2006: Ubiquitous Comput-

ing, ed. Paul Dourish and Adrian Friday (Springer, 2006).

18. Commercial announcement of AI- based smart homes: “The Wolff Company unveils

revolutionary smart home technology at new Annadel Apartments in Santa Rosa, Cal-

ifornia,” Business Insider, March 12, 2018.

19. Article on robot chefs as commercial products: Eustacia Huen, “The world’s first

home robotic chef can cook over 100 meals,” Forbes, October 31, 2016.

20. Report from my Berkeley colleagues on deep RL for robotic motor control: Sergey

Levine et al., “End- to- end training of deep visuomotor policies,” Journal of Machine

Learning Research 17 (2016): 1– 40.

21. On the possibilities for automating the work of hundreds of thousands of warehouse

workers: Tom Simonite, “Grasping robots compete to rule Amazon’s warehouses,”

Wired, July 26, 2017.

22. I’m assuming a generous one laptop- CPU minute per page, or about 10

operations. A

third- generation tensor processing unit from Google runs at about 10

operations per

second, meaning that it can read a million pages per second, or about five hours for

eighty million two- hundred- page books.

23. A 2003 study on the global volume of information production by all channels: Peter

Lyman and Hal Varian, “How much information?” sims.berkeley.edu/ research/ projects

/ how- much- info- 2003.

24. For details on the use of speech recognition by intelligence agencies, see Dan Froom-

kin, “How the NSA converts spoken words into searchable text,” The Intercept, May

5, 2015.

25. Analysis of visual imagery from satellites is an enormous task: Mike Kim, “Mapping

poverty from space with the World Bank,” Medium.com, January 4, 2017. Kim esti-

mates eight million people working 24/ 7, which converts to more than thirty million

people working forty hours per week. I suspect this is an overestimate in practice,

because the vast majority of the images would exhibit negligible change over the

course of one day. On the other hand, the US intelligence community employs tens of

thousands of people sitting in vast rooms staring at satellite images just to keep track

of what’s happening in small regions of interest; so one million people is probably

about right for the whole world.

26. There is substantial progress towards a global observatory based on real- time satellite

image data: David Jensen and Jillian Campbell, “Digital earth: Building, financing and

governing a digital ecosystem for planetary data,” white paper for the UN Science-

Policy- Business Forum on the Environment, 2018.

27. Luke Muehlhauser has written extensively on AI predictions, and I am indebted to him

for tracking down original sources for the quotations that follow. See Luke Muehl-

hauser, “What should we learn from past AI forecasts?” Open Philanthropy Project

report, 2016.

28. A forecast of the arrival of human- level AI within twenty years: Herbert Simon, The

New Science of Management Decision (Harper & Row, 1960).

29. A forecast of the arrival of human- level AI within a generation: Marvin Minsky, Com-

putation: Finite and Infinite Machines (Prentice Hall, 1967).

30. John McCarthy’s forecast of the arrival of human- level AI within “five to 500 years”:

Ian Shenker, “Brainy robots in our future, experts think,” Detroit Free Press, September

30, 1977.

31. For a summary of surveys of AI researchers on their estimates for the arrival of human-

level AI, see aiimpacts.org. An extended discussion of survey results on human- level

9780525558613_Human_TX.indd 306 8/7/19 11:21 PM

ay. On tOn

people sitti

ople sitti

happening in appening in

ght for the whght for the wh

substantiasubstanti

: Davi: Da

for

from

the Worl

ople working

ople workin

y hours per w

hours per w

ority of t

e oth

Distribution

Journ

thousands of

ousands of

ule Amazon’s e Amazon’s

page, or aboute, or abou

le runs at abouuns at abou

s per second, per second,

rmation prodrmation prod

mation?” simstion?” sims

ognition by ingnition by

oken words inoken words in

atellitesatellite

NOTES 307

AI is given by Katja Grace et al., “When will AI exceed human performance? Evidence

from AI experts,” arXiv:1705.08807v3 (2018).

32. For a chart mapping raw computer power against brain power, see Ray Kurzweil, “The

law of accelerating returns,” Kurzweilai.net, March 7, 2001.

33. The Allen Institute’s Project Aristo: allenai.org/ aristo.

34. For an analysis of the knowledge required to perform well on fourth- grade tests of

comprehension and common sense, see Peter Clark et al., “Automatic construction of

inference- supporting knowledge bases,” in Proceedings of the Workshop on Automated

Knowledge Base Construction (2014), akbc.ws/ 2014.

35. The NELL project on machine reading is described by Tom Mitchell et al., “ Never-

ending learning,” Communications of the ACM 61 (2018): 103– 15.

36. The idea of bootstrapping inferences from text is due to Sergey Brin, “Extracting pat-

terns and relations from the World Wide Web,” in The World Wide Web and Databases,

ed. Paolo Atzeni, Alberto Mendelzon, and Giansalvatore Mecca (Springer, 1998).

37. For a visualization of the black- hole collision detected by LIGO, see LIGO Lab

Caltech, “Warped space and time around colliding black holes,” February 11, 2016,

youtube.com/ watch? v= 1agm33iEAuo.

38. The first publication describing observation of gravitational waves: Ben Abbott et al.,

“Observation of gravitational waves from a binary black hole merger,” Physical Review

Letters 116 (2016): 061102.

39. On babies as scientists: Alison Gopnik, Andrew Meltzoff, and Patricia Kuhl, The Sci-

entist in the Crib: Minds, Brains, and How Children Learn (William Morrow, 1999).

40. A summary of several projects on automated scientific analysis of experimental data

to discover laws: Patrick Langley et al., Scientific Discovery: Computational Explorations

of the Creative Processes (MIT Press, 1987).

41. Some early work on machine learning guided by prior knowledge: Stuart Russell, The

Use of Knowledge in Analogy and Induction (Pitman, 1989).

42. Goodman’s philosophical analysis of induction remains a source of inspiration: Nelson

Goodman, Fact, Fiction, and Forecast (University of London Press, 1954).

43. A veteran AI researcher complains about mysticism in the philosophy of science:

Herbert Simon, “Explaining the ineffable: AI on the topics of intuition, insight and

inspiration,” in Proceedings of the 14th International Conference on Artificial Intelligence,

ed. Chris Mellish (Morgan Kaufmann, 1995).

44. A survey of inductive logic programming by two originators of the field: Stephen Mug-

gleton and Luc de Raedt, “Inductive logic programming: Theory and methods,” Journal

of Logic Programming 19– 20 (1994): 629– 79.

45. For an early mention of the importance of encapsulating complex operations as new

primitive actions, see Alfred North Whitehead, An Introduction to Mathematics

(Henry Holt, 1911).

46. Work demonstrating that a simulated robot can learn entirely by itself to stand up:

John Schulman et al., “ High- dimensional continuous control using generalized advan-

tage estimation,” arXiv:1506.02438 (2015). A video demonstration is available at

youtube.com/ watch? v= SHLuf2ZBQSw.

47. A description of a reinforcement learning system that learns to play a capture- the- flag

video game: Max Jaderberg et al., “ Human- level performance in first- person multi-

player games with population-

based deep reinforcement learning,” arXiv:1807.01281

(2018).

48. A view of AI progress over the next few years: Peter Stone et al., “Artificial intelligence

and life in 2030,” One Hundred Year Study on Artificial Intelligence, report of the 2015

Study Panel, 2016.

49. T he media- fueled argument between Elon Musk and Mark Zuckerberg: Peter Holley,

“Billionaire burn: Musk says Zuckerberg’s understanding of AI threat ‘is limited,’ ” The

Washington Post, July 25, 2017.

50. On the value of search engines to individual users: Erik Brynjolfsson, Felix Eggers, and

Avinash Gannamaneni, “Using massive online choice experiments to measure changes

in well- being,” working paper no. 24514, National Bureau of Economic Research,

2018.

M 9780525558613_Human_TX.indd 307 8/7/19 11:21 PM

Not

min

mention ntio

ctions, see

ons, see

Holt, 1911).olt, 1911).

demonstratindemonstratin

chulman etchulman e

mationmat

for

of th

gan Kaufm

Kau

logic program

logic progra

aedt, “Induct

edt, “Induct

19–19–

20 (120 (1

f the

Distribution

aves: Ben As: Ben A

e merger,” merger,”

Phy

off, and Patrif, and Patri

earnn

(William (Willia

entific analysiic analysi

c Discovery: CoDiscovery: C

ded by prior kded by prior k

ion

(Pitman, 1(Pitman,

induction reminduction rem

astt

(University(Univers

lains about mlains about m

he ineffable:

14th Int14th In

308 NOTES

51. Penicillin was discovered several times and its curative powers were described in med-

ical publications, but no one seems to have noticed. See en.wikipedia.org/ wiki/ His

tory_ of_ penicillin.

52. For a discussion of some of the more esoteric risks from omniscient, clairvoyant AI

systems, see David Auerbach, “The most terrifying thought experiment of all time,”

Slate, July 17, 2014.

53. An analysis of some potential pitfalls in thinking about advanced AI: Kevin Kelly,

“The myth of a superhuman AI,” Wired, April 25, 2017.

54. Machines may share some aspects of cognitive structure with humans, particularly

those aspects dealing with perception and manipulation of the physical world and

the conceptual structures involved in natural language understanding. Their delibera-

tive processes are likely to be quite different because of the enormous disparities in

hardware.

55. According to 2016 survey data, the eighty- eighth percentile corresponds to $100,000

per year: American Community Survey, US Census Bureau, www.census.gov/ pro

grams- surveys/ acs. For the same year, global per capita GDP was $10,133: National

Accounts Main Aggregates Database, UN Statistics Division, unstats.un.org/ unsd

/ snaama.

56. If the GDP growth phases in over ten years or twenty years, it’s worth $9,400 trillion

or $6,800 trillion, respectively— still nothing to sneeze at. On an interesting historical

note, I. J. Good, who popularized the notion of an intelligence explosion (page 142),

estimated the value of human- level AI to be at least “one megaKeynes,” referring to the

fabled economist John Maynard Keynes. The value of Keynes’s contributions was esti-

mated in 1963 as £100 billion, so a megaKeynes comes out to around $2,200,000

trillion in 2016 dollars. Good pinned the value of AI primarily on its potential to en-

sure that the human race survives indefinitely. Later, he came to wonder whether he

should have added a minus sign.

57. The EU announced plans for $24 billion in research and development spending for the

period 2019– 20. See European Commission, “Artificial intelligence: Commission out-

lines a European approach to boost investment and set ethical guidelines,” press re-

lease, April 25, 2018. China’s long- term investment plan for AI, announced in 2017,

envisages a core AI industry generating $150 billion annually by 2030. See, for exam-

ple, Paul Mozur, “Beijing wants A.I. to be made in China by 2030,” The New York

Times, July 20, 2017.

58. See, for example, Rio Tinto’s Mine of the Future program at riotinto.com/ aus tralia

/ pilbara/ mine- of- the- future- 9603.aspx.

59. A retrospective analysis of economic growth: Jan Luiten van Zanden et al., eds., How

Was Life? Global Well- Being since 1820 (OECD Publishing, 2014).

60. The desire for relative advantage over others, rather than an absolute quality of life, is

a positional good; see Chapter 9.

CHAPTER 4

1. Wikipedia’s article on the Stasi has several useful references on its workforce and its

overall impact on East German life.

2. For details on Stasi files, see Cullen Murphy, God’s Jury: The Inquisition and the Making

of the Modern World (Houghton Mifflin Harcourt, 2012).

3. For a thorough analysis of AI surveillance systems, see Jay Stanley, The Dawn of Robot

Surveillance (American Civil Liberties Union, 2019).

4. Recent books on surveillance and control include Shoshana Zuboff, The Age of Surveil-

lance Capitalism: The Fight for a Human Future at the New Frontier of Power (PublicAf-

fairs, 2019) and Roger McNamee, Zucked: Waking Up to the Facebook Catastrophe

(Penguin Press, 2019).

5. News article on a blackmail bot: Avivah Litan, “Meet Delilah— the first insider threat

Trojan,” Gartner Blog Network, July 14, 2016.

6. For a low- tech version of human susceptibility to misinformation, in which an unsus-

pecting individual becomes convinced that the world is being destroyed by meteor

9780525558613_Human_TX.indd 308 8/7/19 11:21 PM

analysinaly

obal l

ell-ell-

for relative afor relative a

nal goodnal good

; see C; see

for

gen

g wants A

nts

o Tinto’s Min

Tinto’s Mi

future-

uture-

9696

of ec

Distribution

nstat

t’s worth $9,4

worth $9,4

On an interestOn an interes

ligence explogence explo

ne megaKeynemegaKeyn

of Keynes’s coKeynes’s co

es comes outs comes out

e of AI primaof AI prim

ely. Later, he ely. Later, he

n in research n in research

mmission, “Artmission, “A

ost investmenost investmen

ng-ng-

erm invm inv

ating $1ating $

NOTES 309

strikes, see Derren Brown: Apocalypse, “Part One,” directed by Simon Dinsell, 2012,

youtube.com/ watch? v= o_ CUrMJOxqs.

7. An economic analysis of reputation systems and their corruption is given by Steven

Tadelis, “Reputation and feedback systems in online platform markets,” Annual Re-

view of Economics 8 (2016): 321– 40.

8. Goodhart’s law: “Any observed statistical regularity will tend to collapse once pres-

sure is placed upon it for control purposes.” For example, there may once have been a

correlation between faculty quality and faculty salary, so the US News & World Report

college rankings measure faculty quality by faculty salaries. This has contributed to a

salary arms race that benefits faculty members but not the students who pay for those

salaries. The arms race changes faculty salaries in a way that does not depend on fac-

ulty quality, so the correlation tends to disappear.

9. An article describing German efforts to police public discourse: Bernhard Rohleder,

“Germany set out to delete hate speech online. Instead, it made things worse,” World-

Post, February 20, 2018.

10. On the “infopocalypse”: Aviv Ovadya, “What’s worse than fake news? The distortion

of reality itself,” WorldPost, February 22, 2018.

11. On the corruption of online hotel reviews: Dina Mayzlin, Yaniv Dover, and Judith

Chevalier, “Promotional reviews: An empirical investigation of online review manipu-

lation,” American Economic Review 104 (2014): 2421– 55.

12. Statement of Germany at the Meeting of the Group of Governmental Experts, Con-

vention on Certain Conventional Weapons, Geneva, April 10, 2018.

13. The Slaughterbots movie, funded by the Future of Life Institute, appeared in Novem-

ber 2017 and is available at youtube.com/ watch? v= 9CO6M2HsoIA.

14. For a report on one of the bigger faux pas in military public relations, see Dan Lam-

othe, “Pentagon agency wants drones to hunt in packs, like wolves,” The Washington

Post, January 23, 2015.

15. Announcement of a large- scale drone swarm experiment: US Department of Defense,

“Department of Defense announces successful micro- drone demonstration,” news re-

lease no. NR- 008- 17, January 9, 2017.

16. Examples of research centers studying the impact of technology on employment are

the Work and Intelligent Tools and Systems group at Berkeley, the Future of Work and

Workers project at the Center for Advanced Study in the Behavioral Sciences at Stan-

ford, and the Future of Work Initiative at Carnegie Mellon University.

17. A pessimistic take on future technological unemployment: Martin Ford, Rise of the Ro-

bots: Technology and the Threat of a Jobless Future (Basic Books, 2015).

18. Calum Chace, The Economic Singularity: Artificial Intelligence and the Death of Capital-

ism (Three Cs, 2016).

19. For an excellent collection of essays, see Ajay Agrawal, Joshua Gans, and Avi Goldfarb,

eds., The Economics of Artificial Intelligence: An Agenda (National Bureau of Economic

Research, 2019).

20. The mathematical analysis behind this “inverted-U” employment curve is given by

James Bessen, “Artificial intelligence and jobs: The role of demand” in The Economics

of Artificial Intelligence, ed. Agrawal, Gans, and Goldfarb.

21. For a discussion of economic dislocation arising from automation, see Eduardo Porter,

“Tech is splitting the US work force in two,” The New York Times, February 4, 2019.

The article cites the following report for this conclusion: David Autor and Anna Salo-

mons, “Is automation labor- displacing? Productivity growth, employment, and the

labor share,” Brookings Papers on Economic Activity (2018).

22. For data on the growth of banking in the twentieth century, see Thomas Philippon,

“The evolution of the US financial industry from 1860 to 2007: Theory and evidence,”

working paper, 2008.

23. The bible for jobs data and the growth and decline of occupations: US Bureau of

Labor Statistics, Occupational Outlook Handbook: 2018–

2019 Edition (Bernan Press,

2018).

24. A report on trucking automation: Lora Kolodny, “Amazon is hauling cargo in self-

driving trucks developed by Embark,” CNBC, January 30, 2019.

M 9780525558613_Human_TX.indd 309 8/7/19 11:21 PM

Not

and

The Eche E

Cs, 2016)., 2016).

cellent collectellent collect

The Economics The Economics

ch, 2019).ch, 2019)

hemathem

for

ools

Center for

of Work Initia

of Work Init

on future tech

n future tech

the Threat

he Threat

nomic

Distribution

aniv Dovev Dove

of online rev

online rev

f GovernmenGovernmen

April 10, 20ril 10, 20

f Life Institutee Institute

CO6M2CO6M

n military pubmilitary pu

hunt in packs,unt in packs,

swarm experswarm exper

es successful successfu

2017. 2017.

tudying the

nd Systend Syste

Adv

310 NOTES

25. The progress of automation in legal analytics, describing the results of a contest: Jason

Tashea, “AI software is more accurate, faster than attorneys when assessing NDAs,”

ABA Journal, February 26, 2018.

26. A commentary by a distinguished economist, with a title explicitly evoking Keynes’s

1930 article: Lawrence Summers, “Economic possibilities for our children,” NBER

Reporter (2013).

27. The analogy between data science employment and a small lifeboat for a giant cruise

ship comes from a discussion with Yong Ying- I, head of Singapore’s Public Service

Division. She conceded that it was correct on the global scale, but noted that “Singa-

pore is small enough to fit in the lifeboat.”

28. Support for UBI from a conservative viewpoint: Sam Bowman, “The ideal welfare

system is a basic income,” Adam Smith Institute, November 25, 2013.

29. Support for UBI from a progressive viewpoint: Jonathan Bartley, “The Greens endorse

a universal basic income. Others need to follow,” The Guardian, June 2, 2017.

30 . C ha ce, i n The Economic Singularity, calls the “paradise” version of UBI the Star Trek

economy, noting that in the more recent series of Star Trek episodes, money has been

abolished because technology has created essentially unlimited material goods and

energy. He also points to the massive changes in economic and social organization that

will be needed to make such a system successful.

31. The economist Richard Baldwin also predicts a future of personal services in his book

The Globotics Upheaval: Globalization, Robotics, and the Future of Work (Oxford Uni-

versity Press, 2019).

32. The book that is viewed as having exposed the failure of “ whole- word” literacy educa-

tion and launched decades of struggle between the two main schools of thought on

reading: Rudolf Flesch, Why Johnny Can’t Read: And What You Can Do about It

(Harper & Bros., 1955).

33. On educational methods that enable the recipient to adapt to the rapid rate of techno-

logical and economic change in the next few decades: Joseph Aoun, Robot- Proof: Higher

Education in the Age of Artificial Intelligence (MIT Press, 2017).

34. A radio lecture in which Turing predicted that humans would be overtaken by ma-

chines: Alan Turing, “Can digital machines think?,” May 15, 1951, radio broadcast,

BBC Third Programme. Typescript available at turingarchive.org.

35. News article describing the “naturalization” of Sophia as a citizen of Saudi Arabia:

Dave Gershgorn, “Inside the mechanical brain of the world’s first robot citizen,”

Quartz, November 12, 2017.

36. On Yann LeCun’s view of Sophia: Shona Ghosh, “Facebook’s AI boss described Sophia

the robot as ‘complete b— — t’ and ‘Wizard- of- Oz AI,’ ” Business Insider, January

6, 2018.

37. An EU proposal on legal rights for robots: Committee on Legal Affairs of the Euro-

pean Parliament, “Report with recommendations to the Commission on Civil Law

Rules on Robotics (2015/ 2103(INL)),” 2017.

38. The GDPR provision on a “right to an explanation” is not, in fact, new: it is very similar

to Article 15(1) of the 1995 Data Protection Directive, which it supersedes.

39. Here are three recent papers providing insightful mathematical analyses of fairness:

Moritz Hardt, Eric Price, and Nati Srebro, “Equality of opportunity in supervised

learning,” in Advances in Neural Information Processing Systems 29, ed. Daniel Lee et al.

(2016); Matt Kusner et al., “Counterfactual fairness,” in Advances in Neural Informa-

tion Processing Systems 30, ed. Isabelle Guyon et al. (2017); Jon Kleinberg, Sendhil

Mullainathan, and Manish Raghavan, “Inherent trade- offs in the fair determination of

risk scores,” in 8th Innovations in Theoretical Computer Science Conference, ed. Christos

Papadimitriou (Dagstuhl Publishing, 2017).

40. News article describing the consequences of software failure for air traffic control:

Simon Calder, “Thousands stranded by flight cancellations after systems failure at

Europe’s air- traffic coordinator,” The Independent, April 3, 2018.

9780525558613_Human_TX.indd 310 8/7/19 11:21 PM

Not

ompletmple

oposal on legaposal on lega

rliament, “Rerliament, “Re

n Robotics n Robotics

R provR pro

(

for

pescr

the “natu

“na

ide the mech

de the me

2, 2017.

, 2017

w of Sophi

——

Distribution

mater

ocial organal organ

ersonal servicrsonal servic

Future of Woruture of Wo

ure of “f “

hole-hole-

the two mainthe two main

Read: And Wad: And

recipient to adipient to ad

t few decadest few decades

lligencegence

(MIT (MI

predicted thapredicted tha

al machines

t availabt availa

NOTES 311

CHAPTER 5

1. Lovelace wrote, “The Analytical Engine has no pretensions whatever to originate any-

thing. It can do whatever we know how to order it to perform. It can follow analysis;

but it has no power of anticipating any analytical relations or truths.” This was one of

the arguments against AI that was refuted by Alan Turing, “Computing machinery

and intelligence,” Mind 59 (1950): 433– 60.

2. The earliest known article on existential risk from AI was by Richard Thornton, “The

age of machinery,” Primitive Expounder IV (1847): 281.

3. “The Book of the Machines” was based on an earlier article by Samuel Butler, “Darwin

among the machines,” The Press (Christchurch, New Zealand), June 13, 1863.

4. Another lecture in which Turing predicted the subjugation of humankind: Alan

Turing, “Intelligent machinery, a heretical theory” (lecture given to the 51 Society,

Manchester, 1951). Typescript available at turingarchive.org.

5. Wiener’s prescient discussion of technological control over humanity and a plea to

retain human autonomy: Norbert Wiener, The Human Use of Human Beings (Riverside

Press, 1950).

6. The front- cover blurb from Wiener’s 1950 book is remarkably similar to the motto of

the Future of Life Institute, an organization dedicated to studying the existential risks

that humanity faces: “Technology is giving life the potential to flourish like never

before... or to self- destruct.”

7. An updating of Wiener’s views arising from his increased appreciation of the possibility

of intelligent machines: Norbert Wiener, God and Golem, Inc.: A Comment on Certain

Points Where Cybernetics Impinges on Religion (MIT Press, 1964).

8. Asimov’s Three Laws of Robotics first appeared in Isaac Asimov, “Runaround,” As-

tounding Science Fiction, March 1942. The laws are as follows:

1. A robot may not injure a human being or, through inaction, allow a human being to

come to harm.

2. A robot must obey the orders given it by human beings except where such orders

would conflict with the First Law.

3. A robot must protect its own existence as long as such protection does not conflict

with the First or Second Laws.

It is important to understand that Asimov proposed these laws as a way to generate

interesting story plots, not as a serious guide for future roboticists. Several of his sto-

ries, including “Runaround,” illustrate the problematic consequences of taking the

laws literally. From the standpoint of modern AI, the laws fail to acknowledge any el-

ement of probability and risk: the legality of robot actions that expose a human to

some probability of harm— however infinitesimal— is therefore unclear.

9. The notion of instrumental goals is due to Stephen Omohundro, “The nature of self-

improving artificial intelligence” (unpublished manuscript, 2008). See also Stephen

Omohundro, “The basic AI drives,” in Artificial General Intelligence 2008: Proceed-

ings of the First AGI Conference, ed. Pei Wang, Ben Goertzel, and Stan Franklin (IOS

Press, 2008).

10. The objective of Johnny Depp’s character, Will Caster, seems to be to solve the prob-

lem of physical reincarnation so that he can be reunited with his wife, Evelyn. This just

goes to show that the nature of the overarching objective doesn’t matter— the instru-

mental goals are all the same.

11. The original source for the idea of an intelligence explosion: I. J. Good, “Speculations

concerning the first ultraintelligent machine,” in Advances in Computers, vol. 6, ed.

Franz Alt and Morris Rubinoff (Academic Press, 1965).

12. An example of the impact of the intelligence explosion idea: Luke Muehlhauser, in

Facing the Intelligence Explosion (intelligenceexplosion.com), writes, “Good’s paragraph

ran over me like a train.”

M 9780525558613_Human_TX.indd 311 8/7/19 11:21 PM

Not

“Run

From theom t

probability an

probability a

obability of bability of

otion of instrotion of instr

ng artifing artifi

ro,

for

d Law

nderstand that

nderstand th

ots, not as a

ts, not as a

round,”round,”

sta

Distribution

imilar t

lar t

ying the exi

the exi

ntial to flourintial to flou

ed appreciatioappreciatio

Golem, Inc.: Aem, Inc.: A

T Press, 1964T Press, 196

red in Isaac Ared in Isaac A

aws are as folls are as fol

ng or, throughor, throu

given it by hugiven it by h

t Law. tLaw.

n existencen existen

312 NOTES

13. Diminishing returns can be illustrated as follows: suppose that a 16 percent improve-

ment in intelligence creates a machine capable of making an 8 percent improvement,

which in turn creates a 4 percent improvement, and so on. This process reaches a limit

at about 36 percent above the original level. For more discussion on these issues, see

Eliezer Yudkowsky, “Intelligence explosion microeconomics,” technical report 2013- 1,

Machine Intelligence Research Institute, 2013.

14. For a view of AI in which humans become irrelevant, see Hans Moravec, Mind Chil-

dren: The Future of Robot and Human Intelligence (Harvard University Press, 1988). See

also Hans Moravec, Robot: Mere Machine to Transcendent Mind (Oxford University

Press, 2000).

CHAPTER 6

1. A serious publication provides a serious review of Bostrom’s Superintelligence: Paths,

Dangers, Strategies: “Clever cogs,” Economist, August 9, 2014.

2. A discussion of myths and misunderstandings concerning the risks of AI: Scott Alex-

ander, “AI researchers on AI risk,” Slate Star Codex (blog), May 22, 2015.

3. The classic work on multiple dimensions of intelligence: Howard Gardner, Frames of

Mind: The Theory of Multiple Intelligences (Basic Books, 1983).

4. On the implications of multiple dimensions of intelligence for the possibility of super-

human AI: Kevin Kelly, “The myth of a superhuman AI,” Wired, April 25, 2017.

5. Evidence that chimpanzees have better short- term memory than humans: Sana Inoue

and Tetsuro Matsuzawa, “Working memory of numerals in chimpanzees,” Current Bi-

ology 17 (2007), R1004– 5.

6. An important early work questioning the prospects for rule- based AI systems: Hubert

Dreyfus, What Computers Can’t Do (MIT Press, 1972).

7. The first in a series of books seeking physical explanations for consciousness and rais-

ing doubts about the ability of AI systems to achieve real intelligence: Roger Penrose,

The Emperor’s New Mind: Concerning Computers, Minds, and the Laws of Physics (Ox-

ford University Press, 1989).

8. A revival of the critique of AI based on the incompleteness theorem: Luciano Floridi,

“Should we be afraid of AI?” Aeon, May 9, 2016.

9. A revival of the critique of AI based on the Chinese room argument: John Searle,

“What your computer can’t know,” The New York Review of Books, October 9, 2014.

10. A report from distinguished AI researchers claiming that superhuman AI is probably

impossible: Peter Stone et al., “Artificial intelligence and life in 2030,” One Hundred

Year Study on Artificial Intelligence, report of the 2015 Study Panel, 2016.

11. News article based on Andrew Ng’s dismissal of risks from AI: Chris Williams, “AI

guru Ng: Fearing a rise of killer robots is like worrying about overpopulation on Mars,”

12. An example of the “experts know best” argument: Oren Etzioni, “It’s time to intelli-

gently discuss artificial intelligence,” Backchannel, December 9, 2014.

13. News article claiming that real AI researchers dismiss talk of risks: Erik Sofge, “Bill

Gates fears AI, but AI researchers know better,” Popular Science, January 30, 2015.

14. Another claim that real AI researchers dismiss AI risks: David Kenny, “IBM’s open

letter to Congress on artificial intelligence,” June 27, 2017, ibm.com/ blogs/ policy

/ kenny- artificial- intelligence- letter.

15. Report from the workshop that proposed voluntary restrictions on genetic engineering:

Paul Berg et al., “Summary statement of the Asilomar Conference on Recombinant

DNA Molecules,” Proceedings of the National Academy of Sciences 72 (1975):

1981– 84.

16. Policy statement arising from the invention of CRISPR- Cas9 for gene editing: Orga-

nizing Committee for the International Summit on Human Gene Editing, “On human

gene editing: International Summit statement,” December 3, 2015.

17. The latest policy statement from leading biologists: Eric Lander et al., “Adopt a mora-

torium on heritable genome editing,” Nature 567 (2019): 165– 68.

9780525558613_Human_TX.indd 312 8/7/19 11:21 PM

Not

rtific

tific

based on Ad on

aring a rise of

March 19, 201arch 19, 20

mple of the “mple of the “

scuss artiscuss arti

ecl

for

of AI b

an’t know,”

know

uished AI rese

uished AI re

ne et al., “A

al Intellial Intelli

Distribution

d Gardn

Gardn

or the possibilor the possib

WiredWired

, April , Apri

mory than humry than hum

rals in chimps in chimp

ects for ects for

ule-ule-

ess, 1972)., 1972).

cal explanatioal explanatio

ms to achieve to achiev

Computers, MComputers, M

ased on the in

ased on the i

, May 9, , May 9

ed oned o

NOTES 313

18. Etzioni’s comment that one cannot mention risks if one does not also mention benefits

appears alongside his analysis of survey data from AI researchers: Oren Etzioni, “No,

the experts don’t think superintelligent AI is a threat to humanity,” MIT Technology

Review, September 20, 2016. In his analysis he argues that anyone who expects super-

human AI to take more than twenty- five years— which includes this author as well as

Nick Bostrom— is not concerned about the risks of AI.

19. A news article with quotations from the Musk– Zuckerberg “debate”: Alanna Petroff,

“Elon Musk says Mark Zuckerberg’s understanding of AI is ‘limited,’ ” CNN Money,

July 25, 2017.

20. In 2015 the Information Technology and Innovation Foundation organized a debate

titled “Are super intelligent computers really a threat to humanity?” Robert Atkinson,

director of the foundation, suggests that mentioning risks is likely to result in reduced

funding for AI. Video available at itif.org/ events/ 2015/ 06/ 30/ are- super- intelligent

- computers- really- threat- humanity; the relevant discussion begins at 41:30.

21. A claim that our culture of safety will solve the AI control problem without ever

mentioning it: Steven Pinker, “Tech prophecy and the underappreciated causal power

of ideas,” in Possible Minds: Twenty-Five Ways of Looking at AI, ed. John Brockman

(Penguin Press, 2019).

22. For an interesting analysis of Oracle AI, see Stuart Armstrong, Anders Sandberg, and

Nick Bostrom, “Thinking inside the box: Controlling and using an Oracle AI,” Minds

and Machines 22 (2012): 299– 324.

23. Views on why AI is not going to take away jobs: Kenny, “IBM’s open letter.”

24. An example of Kurzweil’s positive views of merging human brains with AI: Ray

Kurzweil, interview by Bob Pisani, June 5, 2015, Exponential Finance Summit, New

York, NY.

25. Article quoting Elon Musk on neural lace: Tim Urban, “Neuralink and the brain’s

magical future,” Wait But Why, April 20, 2017.

26. For the most recent developments in Berkeley’s neural dust project, see David Piech et

al., “StimDust: A 1.7 mm

, implantable wireless precision neural stimulator with ul-

trasonic power and communication,” arXiv: 1807.07590 (2018).

27. Susan Schneider, in Artificial You: AI and the Future of Your Mind (Princeton Univer-

sity Press, 2019), points out the risks of ignorance in proposed technologies such as

uploading and neural prostheses: that, absent any real understanding of whether elec-

tronic devices can be conscious and given the continuing philosophical confusion over

persistent personal identity, we may inadvertently end our own conscious existences

or inflict suffering on conscious machines without realizing that they are conscious.

28. An interview with Yann LeCun on AI risks: Guia Marie Del Prado, “Here’s what Face-

book’s artificial intelligence expert thinks about the future,” Business Insider, Septem-

ber 23, 2015.

29. A diagnosis of AI control problems arising from an excess of testosterone: Steven

Pinker, “Thinking does not imply subjugating,” in What to Think About Machines That

Think, ed. John Brockman (Harper Perennial, 2015).

30. A seminal work on many philosophical topics, including the question of whether

moral obligations may be perceived in the natural world: David Hume, A Treatise of

Human Nature (John Noon, 1738).

31. An argument that a sufficiently intelligent machine cannot help but pursue human

objectives: Rodney Brooks, “The seven deadly sins of AI predictions,” MIT Technology

Review, October 6, 2017.

32. Pinker, “Thinking does not imply subjugating.”

33. For an optimistic view arguing that AI safety problems will necessarily be resolved in

our favor: Steven Pinker, “Tech prophecy.”

34. On the unsuspected alignment between “skeptics” and “believers” in AI risk: Alexan-

der, “AI researchers on AI risk.”

M 9780525558613_Human_TX.indd 313 8/7/19 11:21 PM

Not

with Yath Y

icial intellig

al intellig

015.15

gnosis of AI cgnosis of AI c

“Thinking “Thinkin

. John. Jo

for

ut th

ostheses:

hese

conscious an

conscious a

identity, we m

dentity, we

n consciou

n LeC

Distribution

ed. J

ng, Anders Sa

Anders Sa

d using an Orausing an Ora

ny, “IBM’s op“IBM’s op

erging humanng human

5, Exponenti5, Exponenti

ce: Tim Urbane: Tim Urban

0, 2017.2017.

Berkeley’s neuBerkeley’s neu

ntable wirelesable wirel

tion,” arXiv: 1tion,” arXiv: 1

You: AI and t

You: AI and

risks ofrisks o

314 NOTES

&+$37(5

1. For a guide to detailed brain modeling, now slightly outdated, see Anders Sandberg

and Nick Bostrom, “Whole brain emulation: A roadmap,” technical report 2008- 3,

Future of Humanity Institute, Oxford University, 2008.

2. For an introduction to genetic programming from a leading exponent, see John Koza,

Genetic Programming: On the Programming of Computers by Means of Natural Selection

(MIT Press, 1992).

3. The parallel to Asimov’s Three Laws of Robotics is entirely coincidental.

4. The same point is made by Eliezer Yudkowsky, “Coherent extrapolated volition,” tech-

nical report, Singularity Institute, 2004. Yudkowsky argues that directly building in

“Four Great Moral Principles That Are All We Need to Program into AIs” is a sure

road to ruin for humanity. His notion of the “coherent extrapolated volition of human-

kind” has the same general flavor as the first principle; the idea is that a superintelli-

gent AI system could work out what humans, collectively, really want.

5. You can certainly have preferences over whether a machine is helping you achieve your

preferences or you are achieving them through your own efforts. For example, sup-

pose you prefer outcome A to outcome B, all other things being equal. You are unable

to achieve outcome A unaided, and yet you still prefer B to getting A with the ma-

chine’s help. In that case the machine should decide not to help you— unless perhaps

it can do so in a way that is completely undetectable by you. You may, of course, have

preferences about undetectable help as well as detectable help.

6. The phrase “the greatest good of the greatest number” originates in the work of Francis

Hutcheson, An Inquiry into the Original of Our Ideas of Beauty and Virtue, In Two Treatises

(D. Midwinter et al., 1725). Some have ascribed the formulation to an earlier comment

by Wilhelm Leibniz; see Joachim Hruschka, “The greatest happiness principle and other

early German anticipations of utilitarian theory,” Utilitas 3 (1991): 165– 77.

7. One might propose that the machine should include terms for animals as well as hu-

mans in its own objective function. If these terms have weights that correspond to how

much people care about animals, then the end result will be the same as if the machine

cares about animals only through caring about humans who care about animals. Giv-

ing each living animal equal weight in the machine’s objective function would cer-

tainly be catastrophic—for example, we are outnumbered fifty thousand to one by

Antarctic krill and a billion trillion to one by bacteria.

8. The moral philosopher Toby Ord made the same point to me in his comments on an

early draft of this book: “Interestingly, the same is true in the study of moral philoso-

phy. Uncertainty about moral value of outcomes was almost completely neglected in

moral philosophy until very recently. Despite the fact that it is our uncertainty of

moral matters that leads people to ask others for moral advice and, indeed, to do re-

search on moral philosophy at all!”

9. One excuse for not paying attention to uncertainty about preferences is that it is for-

mally equivalent to ordinary uncertainty, in the following sense: being uncertain

about what I like is the same as being certain that I like likable things while being

uncertain about what things are likable. This is just a trick that appears to move the

uncertainty into the world, by making “likability by me” a property of objects rather

than a property of me. In game theory, this trick has been thoroughly institutionalized

since the 1960s, following a series of papers by my late colleague and Nobel laureate

John Harsanyi: “Games with incomplete information played by ‘Bayesian’ players,

Parts I– III,” Management Science 14 (1967, 1968): 159– 82, 320– 34, 486– 502. In deci-

sion theory, the standard reference is the following: Richard Cyert and Morris de

Groot, “Adaptive utility,” in Expected Utility Hypotheses and the Allais Paradox, ed.

Maurice Allais and Ole Hagen (D. Reidel, 1979).

10. AI researchers working in the area of preference elicitation are an obvious exception.

See, for example, Craig Boutilier, “On the foundations of expected expected utility,” in

Proceedings of the 18th International Joint Conference on Artificial Intelligence (Morgan

Kaufmann, 2003). Also Alan Fern et al., “A decision- theoretic model of assistance,”

Journal of Artificial Intelligence Research 50 (2014): 71– 104.

9780525558613_Human_TX.indd 314 8/7/19 11:21 PM

Not

abo

phy until unt

ers that leads

moral philosomoral philo

cuse for not pcuse for not p

uivalent uivalent

Ili

for

r exam

xam

ion trillion

rilli

er Toby Ord m

r Toby Ord

ok: “Interest

k: “Interes

t moral t moral

ery

Distribution

For

qual. Yo

al. Yo

getting A wi

ng A wi

help help

ou—ou—

ou. You may, ou. You may

ble help.help.

” originates inriginates in

of Beauty andof Beauty an

the formulatithe formulati

The greatest he greatest h

eory,” ory,”

UtilitasUtilitas

hould include ld includ

these terms hthese terms h

hen the end rehen the end re

h caring abou

ight in theight in t

le, wele, we

NOTES 315

11. A critique of beneficial AI based on a misinterpretation of a journalist’s brief interview

with the author in a magazine article: Adam Elkus, “How to be good: Why you can’t

teach human values to artificial intelligence,” Slate, April 20, 2016.

12. The origin of trolley problems: Frank Sharp, “A study of the influence of custom on the

moral judgment,” Bulletin of the University of Wisconsin 236 (1908).

13. The “ anti- natalist” movement believes it is morally wrong for humans to reproduce

because to live is to suffer and because humans’ impact on the Earth is profoundly

negative. If you consider the existence of humanity to be a moral dilemma, then I

suppose I do want machines to resolve this moral dilemma the right way.

14. Statement on China’s AI policy by Fu Ying, vice chair of the Foreign Affairs Commit-

tee of the National People’s Congress. In a letter to the 2018 World AI Conference in

Shanghai, Chinese president Xi Jinping wrote, “Deepened international cooperation

is required to cope with new issues in fields including law, security, employment, eth-

ics and governance.” I am indebted to Brian Tse for bringing these statements to my

attention.

15. A very interesting paper on the non- naturalistic non- fallacy, showing how preferences

can be inferred from the state of the world as arranged by humans: Rohin Shah et al.,

“The implicit preference information in an initial state,” in Proceedings of the 7th Inter-

national Conference on Learning Representations (2019), iclr.cc/ Conferences/ 2019

/ Schedule.

16. Retrospective on Asilomar: Paul Berg, “Asilomar 1975: DNA modification secured,”

Nature 455 (2008): 290– 91.

17. News article reporting Putin’s speech on AI: “Putin: Leader in artificial intelligence

will rule world,” Associated Press, September 4, 2017.

CHAPTER 8

1. Fermat’s Last Theorem asserts that the equation a

= b

+ c

has no solutions with a, b,

and c being whole numbers and n being a whole number larger than 2. In the margin

of his copy of Diophantus’s Arithmetica, Fermat wrote, “I have a truly marvellous proof

of this proposition which this margin is too narrow to contain.” True or not, this guar-

anteed that mathematicians pursued a proof with vigor in the subsequent centuries.

We can easily check particular cases— for example, is 7

equal to 6

+ 5

? (Almost,

because 7

is 343 and 6

+ 5

is 341, but “almost” doesn’t count.) There are, of course,

infinitely many cases to check, and that’s why we need mathematicians and not just

computer programmers.

2. A paper from the Machine Intelligence Research Institute poses many related issues:

Scott Garrabrant and Abram Demski, “Embedded agency,” AI Alignment Forum, No-

vember 15, 2018.

3. The classic work on multiattribute utility theory: Ralph Keeney and Howard Raiffa,

Decisions with Multiple Objectives: Preferences and Value Tradeoffs (Wiley, 1976).

4. Paper introducing the idea of inverse RL: Stuart Russell, “Learning agents for uncer-

tain environments,” in Proceedings of the 11th Annual Conference on Computational

Learning Theory (ACM, 1998).

5. The original paper on structural estimation of Markov decision processes: Thomas

Sargent, “Estimation of dynamic labor demand schedules under rational expectations,”

Journal of Political Economy 86 (1978): 1009– 44.

6. The first algorithms for IRL: Andrew Ng and Stuart Russell, “Algorithms for inverse

reinforcement learning,” in Proceedings of the 17th International Conference on Machine

Learning, ed. Pat Langley (Morgan Kaufmann, 2000).

7. Better algorithms for inverse RL: Pieter Abbeel and Andrew Ng, “Apprenticeship learn-

ing via inverse reinforcement learning,” in Proceedings of the 21st International Conference

on Machine Learning, ed. Russ Greiner and Dale Schuurmans (ACM Press, 2004).

8. Understanding inverse RL as Bayesian updating: Deepak Ramachandran and Eyal

Amir, “Bayesian inverse reinforcement learning,” in Proceedings of the 20th Interna-

tional Joint Conference on Artificial Intelligence, ed. Manuela Veloso (AAAI Press,

2007).

M 9780525558613_Human_TX.indd 315 8/7/19 11:21 PM

Not

the Mae M

brant and A

ant and A

5, 2018.5, 201

assic work onassic work on

ons with Muons with M

roducrodu

for

ns p

rticular ula

+ 5+ 5

is 34

is 3

es to check, a

s to check, a

ers.ers.

hine

ine

Distribution

ns: Ro

oceedings ofdings of

iclr.cc/r.cc/

Confe

5: DNA modiDNA modi

utin: Leader inLeader in

4, 2017. 2017.

he equation he equation

being a wholeeing a who

meticametica

, Ferm, Ferm

margin is too

sued a psued a

316 NOTES

9. How to teach helicopters to fly and do aerobatic maneuvers: Adam Coates, Pieter

Abbeel, and Andrew Ng, “Apprenticeship learning for helicopter control,” Communi-

cations of the ACM 52 (2009): 97– 105.

10. The original name proposed for an assistance game was a cooperative inverse reinforce-

ment learning game, or CIRL game. See Dylan Hadfield- Menell et al., “Cooperative

inverse reinforcement learning,” in Advances in Neural Information Processing Systems

29, ed. Daniel Lee et al. (2016).

11. These numbers are chosen just to make the game interesting.

12. The equilibrium solution to the game can be found by a process called iterated best re-

sponse: pick any strategy for Harriet; pick the best strategy for Robbie, given Harriet’s

strategy; pick the best strategy for Harriet, given Robbie’s strategy; and so on. If this

process reaches a fixed point, where neither strategy changes, then we have found a

solution. The process unfolds as follows:

1. Start with the greedy strategy for Harriet: make 2 paperclips if she prefers paper-

clips; make 1 of each if she is indifferent; make 2 staples if she prefers staples.

2. There are three possibilities Robbie has to consider, given this strategy for Harriet:

a. If Robbie sees Harriet make 2 paperclips, he infers that she prefers paperclips, so

he now believes the value of a paperclip is uniformly distributed between 50¢ and

$1.00, with an average of 75¢. In that case, his best plan is to make 90 paperclips

with an expected value of $67.50 for Harriet.

b. If Robbie sees Harriet make 1 of each, he infers that she values paperclips and

staples at 50¢, so the best choice is to make 50 of each.

c. If Robbie sees Harriet make 2 staples, then by the same argument as in 2(a), he

should make 90 staples.

3. Given this strategy for Robbie, Harriet’s best strategy is now somewhat different

from the greedy strategy in step 1: if Robbie is going to respond to her making 1 of

each by making 50 of each, then she is better off making 1 of each not just if she is

exactly indifferent but if she is anywhere close to indifferent. In fact, the optimal

policy is now to make 1 of each if she values paperclips anywhere between about

44.6¢ and 55.4¢.

4. Given this new strategy for Harriet, Robbie’s strategy remains unchanged. For ex-

ample, if she chooses 1 of each, he infers that the value of a paperclip is uniformly

distributed between 44.6¢ and 55.4¢, with an average of 50¢, so the best choice is

to make 50 of each. Because Robbie’s strategy is the same as in step 2, Harriet’s best

response will be the same as in step 3, and we have found the equilibrium.

13. For a more complete analysis of the off- switch game, see Dylan Hadfield- Menell et al.,

“The off- switch game,” in Proceedings of the 26th International Joint Conference on Arti-

ficial Intelligence, ed. Carles Sierra (IJCAI, 2017).

14. The proof of the general result is quite simple if you don’t mind integral signs. Let P(u)

be Robbie’s prior probability density over Harriet’s utility for the proposed action a.

Then the value of going ahead with a is

(We will see shortly why the integral is split up in this way.) On the other hand, the

value of action d, deferring to Harriet, is composed of two parts: if u > 0, then Harriet

lets Robbie go ahead, so the value is u, but if u < 0, then Harriet switches Robbie off,

so the value is 0:

Comparing the expressions for EU(a) and EU(d), we see immediately that EU(d) ≥

EU(a) because the expression for

EU(d) has the negative- utility region zeroed out.

The two choices have equal value only when the negative region has zero probability—

that is, when Robbie is already certain that Harriet likes the proposed action. The

theorem is a direct analog of the well- known theorem concerning the non- negative

expected value of information.

EU a() P(u) udu P(u) udu P(u) udu

=⋅=

∫

⋅+ ⋅

∫∫

−∞

∞∞

−∞

EU d() P(u)0du P(u) udu

=⋅+⋅

∫∫

∞

−∞

9780525558613_Human_TX.indd 316 8/7/19 11:21 PM

Not

e t

omplete anal

plete anal

witch game,” iwitch game,”

telligencetelligence

, ed. C, ed. C

of of the geof of the g

s prios pr

for

for H

of each,

eac

44.6¢ and 5

44.6¢ and

. Because Rob

Because Ro

e same as

same as

Distribution

trateg

prefers paefers pa

ributed betwe

uted betwe

an is to make n is to make

s that she valuat she va

0 of each.each.

by the same by the same

s best strategybest strategy

Robbie is goingbie is goin

e is better off e is better off

anywhere closeywhere clo

ch if she valuech if she value

rriet, Rorriet, R

ein

NOTES 317

15. Perhaps the next elaboration in line, for the one human– one robot case, is to consider

a Harriet who does not yet know her own preferences regarding some aspect of the

world, or whose preferences have not yet been formed.

16. To see how exactly Robbie converges to an incorrect belief, consider a model in which

Harriet is slightly irrational, making errors with a probability that diminishes expo-

nentially as the size of error increases. Robbie offers Harriet 4 paperclips in return for

1 staple; she refuses. According to Robbie’s beliefs, this is irrational: even at 25¢ per

paperclip and 75¢ per staple, she should accept 4 for 1. Therefore, she must have made

a mistake— but this mistake is much more likely if her true value is 25¢ than if it is, say,

30¢, because the error costs her a lot more if her value for paperclips is 30¢. Now

Robbie’s probability distribution has 25¢ as the most likely value because it represents

the smallest error on Harriet’s part, with exponentially lower probabilities for values

higher than 25¢. If he keeps trying the same experiment, the probability distribution

becomes more and more concentrated close to 25¢. In the limit, Robbie becomes cer-

tain that Harriet’s value for paperclips is 25¢.

17. Robbie could, for example, have a normal (Gaussian) distribution for his prior belief

about the exchange rate, which stretches from −∞ to +∞.

18. For an example of the kind of mathematical analysis that may be needed, see Avrim

Blum, Lisa Hellerstein, and Nick Littlestone, “Learning in the presence of finitely or

infinitely many irrelevant attributes,” Journal of Computer and System Sciences 50

(1995): 32– 40. Also Lori Dalton, “Optimal Bayesian feature selection,” in Proceedings

of the 2013 IEEE Global Conference on Signal and Information Processing, ed. Charles

Bouman, Robert Nowak, and Anna Scaglione (IEEE, 2013).

19. Here I am rephrasing slightly a question by Moshe Vardi at the Asilomar Conference

on Beneficial AI, 2017.

20. Michael Wellman and Jon Doyle, “Preferential semantics for goals,” in Proceedings of

the 9th National Conference on Artificial Intelligence (AAAI Press, 1991). This paper

draws on a much earlier proposal by Georg von Wright, “The logic of preference recon-

sidered,” Theory and Decision 3 (1972): 140 – 67.

21. My late Berkeley colleague has the distinction of becoming an adjective. See Paul

Grice, Studies in the Way of Words (Harvard University Press, 1989).

22. The original paper on direct stimulation of pleasure centers in the brain: James Olds

and Peter Milner, “Positive reinforcement produced by electrical stimulation of septal

area and other regions of rat brain,” Journal of Comparative and Physiological Psychology

47 (1954): 419– 27.

23. Letting rats push the button: James Olds, “ Self- stimulation of the brain; its use to

study local effects of hunger, sex, and drugs,” Science 127 (1958): 315– 24.

24. Letting humans push the button: Robert Heath, “Electrical self- stimulation of the

brain in man,” American Journal of Psychiatry 120 (1963): 571– 77.

25. A first mathematical treatment of wireheading, showing how it occurs in reinforce-

ment learning agents: Mark Ring and Laurent Orseau, “Delusion, survival, and intelli-

gent agents,” in Artificial General Intelligence: 4th International Conference, ed. Jürgen

Schmidhuber, Kristinn Thórisson, and Moshe Looks (Springer, 2011). One possible

solution to the wireheading problem: Tom Everitt and Marcus Hutter, “Avoiding wire-

heading with value reinforcement learning,” arXiv:1605.03143 (2016).

26. How it might be possible for an intelligence explosion to occur safely: Benja Fallen-

stein and Nate Soares, “Vingean reflection: Reliable reasoning for self- improving

agents,” technical report 2015- 2, Machine Intelligence Research Institute, 2015.

27. The difficulty agents face in reasoning about themselves and their successors: Benja

Fallenstein and Nate Soares, “Problems of self- reference in self- improving space- time

embedded intelligence,” in Artificial General Intelligence: 7th International Conference,

ed. Ben Goertzel, Laurent Orseau, and Javier Snaider (Springer, 2014).

28. Showing why an agent might pursue an objective different from its true objective if its

computational abilities are limited: Jonathan Sorg, Satinder Singh, and Richard Lewis,

“Internal rewards mitigate agent boundedness,” in Proceedings of the 27th International

Conference on Machine Learning, ed. Johannes Fürnkranz and Thorsten Joachims (2010),

icml.cc/ Conferences/ 2010/ papers/ icml2010proceedings.zip.

M 9780525558613_Human_TX.indd 317 8/7/19 11:21 PM

Not

h t

fects of ts o

mans push t

ns push t

man,” man,”

AmericaAmerica

mathematicamathematica

earning ageearning ag

ts,” ints,”

for

ct st

ive reinfo

rein

s of rat brain,”

of rat brain

he button:

e button:

unge

nge

Distribution

y be needee neede

the presence

presence

uter and Systeter and Syst

eature selectioture selectio

nformation Promation Pr

EE, 2013).2013).

oshe Vardi at tshe Vardi at t

ential semantential semant

al Intelligencentelligence

Georg von WriGeorg von Wr

972): 2):

140–140 –

67.6

the distincti the distincti

rds

(Harvar (Harvar

mulationmulatio

318 NOTES

&+$37(5

1. Some have argued that biology and neuroscience are also directly relevant. See, for

example, Gopal Sarma, Adam Safron, and Nick Hay, “Integrative biological simula-

tion, neuropsychology, and AI safety,” arxiv.org/ abs/ 1811.03493 (2018).

2. On the possibility of making computers liable for damages: Paulius C

erka, Jurgita

Grigiene

, and Gintare

Sirbikyte

, “Liability for damages caused by artificial intelli-

gence,” Computer Law and Security Review

31 (2015): 376– 89.

3. For an excellent machine- oriented introduction to standard ethical theories and their

implications for designing AI systems, see Wendell Wallach and Colin Allen, Moral

Machines: Teaching Robots Right from Wrong (Oxford University Press, 2008).

4. The sourcebook for utilitarian thought: Jeremy Bentham, An Introduction to the Prin-

ciples of Morals and Legislation (T. Payne & Son, 1789).

5. Mill’s elaboration of his tutor Bentham’s ideas was extraordinarily influential on lib-

eral thought: John Stuart Mill, Utilitarianism (Parker, Son & Bourn, 1863).

6. The paper introducing preference utilitarianism and preference autonomy: John

Harsanyi, “Morality and the theory of rational behavior,” Social Research 44 (1977):

623– 56.

7. An argument for social aggregation via weighted sums of utilities when deciding on

behalf of multiple individuals: John Harsanyi, “Cardinal welfare, individualistic eth-

ics, and interpersonal comparisons of utility,” Journal of Political Economy 63 (1955):

309– 21.

8. A generalization of Harsanyi’s social aggregation theorem to the case of unequal prior

beliefs: Andrew Critch, Nishant Desai, and Stuart Russell, “Negotiable reinforce-

ment learning for Pareto optimal sequential decision- making,” in Advances in Neural

Information Processing Systems 31, ed. Samy Bengio et al. (2018).

9. The sourcebook for ideal utilitarianism: G. E. Moore, Ethics (Williams & Nor-

gate, 1912).

10. News article citing Stuart Armstrong’s colorful example of misguided utility maximi-

zation: Chris Matyszczyk, “Professor warns robots could keep us in coffins on heroin

drips,” CNET, June 29, 2015.

11. Popper’s theory of negative utilitarianism (so named later by Smart): Karl Popper, The

Open Society and Its Enemies (Routledge, 1945).

12. A refutation of negative utilitarianism: R. Ninian Smart, “Negative utilitarianism,”

Mind 67 (1958): 542– 43.

13. For a typical argument for risks arising from “end human suffering” commands, see

“Why do we think AI will destroy us?,” Reddit, reddit.com/ r/ Futurology/ com ments

/ 38fp6o/ why_ do_ we_ think_ ai_ will_ destroy_ us.

14. A good source for self- deluding incentives in AI: Ring and Orseau, “ Delusion, survival,

and intelligent agents.”

15. On the impossibility of interpersonal comparisons of utility: W. Stanley Jevons, The

Theory of Political Economy (Macmillan, 1871).

16. The utility monster makes its appearance in Robert Nozick, Anarchy, State, and Uto-

pia (Basic Books, 1974).

17. For example, we can fix immediate death to have a utility of 0 and a maximally happy

life to have a utility of 1. See John Isbell, “Absolute games,” in Contributions to the

Theory of Games, vol. 4, ed. Albert Tucker and R. Duncan Luce (Princeton University

Press, 1959).

18. The oversimplified nature of Thanos’s population- halving policy is discussed by Tim

Harford, “Thanos shows us how not to be an economist,” Financial Times, April 20,

2019. Even before the film debuted, defenders of Thanos began to congregate on the

subreddit r/ thanosdidnothingwrong/. In keeping with the subreddit’s motto, 350,000

of the 700,000 members were later purged.

19. On utilities for populations of different sizes: Henry Sidgwick, The Methods of Ethics

(Macmillan, 1874).

20. The Repugnant Conclusion and other knotty problems of utilitarian thinking: Derek

Parfit, Reasons and Persons (Oxford University Press, 1984).

9780525558613_Human_TX.indd 318 8/7/19 11:21 PM

Not

nk A

k A

do_

we_we_

ce for or

elf-elf-

igent agents.”ent agents

impossibilityimpossibility

Political Political

mon

for

(Ro

e utilitarian

itar

43.43

ent for risks

nt for risk

will dewill de

Distribution

ities when d

when d

elfare, individelfare, indivi

Political Econoolitical Econ

orem to the cm to the c

art Russell, “art Russell,

cision-cision-

akingaking

Bengio et al. (2gio et al. (

G. E. MooG. E. Moo

’s colorful exas colorful exa

or warns roboor warns rob

arianism (sarianism

ledge,ledge

NOTES 319

21. For a concise summary of axiomatic approaches to population ethics, see Peter Eckers-

ley, “Impossibility and uncertainty theorems in AI value alignment,” in Proceedings of the

AAAI Workshop on Artificial Intelligence Safety, ed. Huáscar Espinoza et al. (2019).

22 . C a lcu lat i ng the long- term carrying capacity of the Earth: Daniel O’Neill et al., “A

good life for all within planetary boundaries,” Nature Sustainability 1 (2018): 88– 95.

23. For an application of moral uncertainty to population ethics, see Hilary Greaves and

Toby Ord, “Moral uncertainty about population axiology,” Journal of Ethics and Social

Philosophy 12 (2017): 135– 67. A more comprehensive analysis is provided by Will

MacAskill, Krister Bykvist, and Toby Ord, Moral Uncertainty (Oxford University

Press, forthcoming).

24. Quotation showing that Smith was not so obsessed with selfishness as is commonly

imagined: Adam Smith, The Theory of Moral Sentiments (Andrew Millar; Alexander

Kincaid and J. Bell, 1759).

25. For an introduction to the economics of altruism, see Serge- Christophe Kolm and Jean

Ythier, eds., Handbook of the Economics of Giving, Altruism and Reciprocity, 2 vols.

( North- Holland, 2006).

26. On charity as selfish: James Andreoni, “Impure altruism and donations to public

goods: A theory of warm- glow giving,” Economic Journal 100 (1990): 464– 77.

27. For those who like equations: let Alice’s intrinsic well- being be measured by w

and

Bob’s by w

. Then the utilities for Alice and Bob are defined as follows:

= w

+ C

= w

+ C

Some authors suggest that Alice cares about Bob’s overall utility U

rather than just

his intrinsic well- being w

, but this leads to a kind of circularity in that Alice’s utility

depends on Bob’s utility which depends on Alice’s utility; sometimes stable solutions

can be found but the underlying model can be questioned. See, for example, Hajime

Hori, “Nonpaternalistic altruism and functional interdependence of social prefer-

ences,” Social Choice and Welfare 32 (2009): 59– 77.

28. Models in which each individual’s utility is a linear combination of everyone’s well-

being are just one possibility. Much more general models are possible— for example,

models in which some individuals prefer to avoid severe inequalities in the distribu-

tion of well- being, even at the expense of reducing the total, while other individuals

would really prefer that no one have preferences about inequality at all. Thus, the

overall approach I am proposing accommodates multiple moral theories held by indi-

viduals; at the same time, it doesn’t insist that any one of those moral theories is cor-

rect or should have much sway over outcomes for those who hold a different theory. I

am indebted to Toby Ord for pointing out this feature of the approach.

29. Arguments of this type have been made against policies designed to ensure equality of

outcome, notably by the American legal philosopher Ronald Dworkin. See, for example,

Ronald Dworkin, “What is equality? Part 1: Equality of welfare,” Philosophy and Public

Affairs 10 (1981): 185– 246. I am indebted to Iason Gabriel for this reference.

30. Malice in the form of revenge- based punishment for transgressions is certainly a com-

mon tendency. Although it plays a social role in keeping members of a community in

line, it can be replaced by an equally effective policy driven by deterrence and

prevention— that is, weighing the intrinsic harm done when punishing the transgres-

sor against the benefits to the larger society.

31. Let E

and P

be Alice’s coefficients of envy and pride respectively, and assume that

they apply to the difference in well- being. Then a (somewhat oversimplified) formula

for Alice’s utility could be the following:

= w

+ C

– E

– w

) + P

– w

)

= (1 + E

+ P

) w

+ (C

– E

– P

) w

Thus, if Alice has positive pride and envy coefficients, they act on Bob’s welfare ex-

actly like sadism and malice coefficients: Alice is happier if Bob’s welfare is lowered,

all other things being equal. In reality, pride and envy typically apply not to differ-

ences in well- being but to differences in visible aspects thereof, such as status and

possessions. Bob’s hard toil in acquiring his possessions (which lowers his overall

M 9780525558613_Human_TX.indd 319 8/7/19 11:21 PM

Not

o Toby Tob

of this type

this type

notably by thnotably by th

d Dworkin, “Wd Dworkin, “W

10 (1981): 10 (1981)

the fothe

for

the

t no one

m proposing a

m proposing

e time, it does

time, it does

much sway

Ord fo

Distribution

dona

1990): 90):

4646

ng be measure

be measure

ned as followsed as follows

’s overall utilverall util

ind of circuland of circula

Alice’s utility;ice’s utilit

an be questionn be question

functional innctional in

(2009): (2009):

59–59–

777

s utility is a liutility is a

Much more genMuch more gen

uals prefer t

xpense xpense

ave

320 NOTES

well- being) may not be visible to Alice. This can lead to the self- defeating behaviors

that go under the heading of “keeping up with the Joneses.”

32. On the sociology of conspicuous consumption: Thorstein Veblen, The Theory of the

Leisure Class: An Economic Study of Institutions (Macmillan, 1899).

33. Fr ed Hi rsch, The Social Limits to Growth (Routledge & Kegan Paul, 1977).

34. I am indebted to Ziyad Marar for pointing me to social identity theory and its impor-

tance in understanding human motivation and behavior. See, for example, Dominic

Abrams and Michael Hogg, eds., Social Identity Theory: Constructive and Critical Ad-

vances (Springer, 1990). For a much briefer summary of the main ideas, see Ziyad

Marar, “Social identity,” in This Idea Is Brilliant: Lost, Overlooked, and Underappreci-

ated Scientific Concepts Everyone Should Know, ed. John Brockman (Harper Perennial,

2018).

35. Here, I am not suggesting that we necessarily need a detailed understanding of the

neural implementation of cognition; what is needed is a model at the “software” level

of how preferences, both explicit and implicit, generate behavior. Such a model would

need to incorporate what is known about the reward system.

36. Ralph Adolphs and David Anderson, The Neuroscience of Emotion: A New Synthesis

(Princeton University Press, 2018).

37. See, for example, Rosalind Picard, Affective Computing, 2nd ed. (MIT Press, 1998).

38. Waxing lyrical on the delights of the durian: Alfred Russel Wallace, The Malay Archi-

pelago: The Land of the Orang- Utan, and the Bird of Paradise (Macmillan, 1869).

39. A less rosy view of the durian: Alan Davidson, The Oxford Companion to Food (Oxford

University Press, 1999). Buildings have been evacuated and planes turned around in

mid- flight because of the durian’s overpowering odor.

40. I discovered after writing this chapter that the durian was used for exactly the same

philosophical purpose by Laurie Paul, Transformative Experience (Oxford University

Press, 2014). Paul suggests that uncertainty about one’s own preferences presents fatal

problems for decision theory, a view contradicted by Richard Pettigrew, “Transforma-

tive experience and decision theory,” Philosophy and Phenomenological Research 91

(2015): 766– 74. Neither author refers to the early work of Harsanyi, “Games with in-

complete information, Parts I– III,” or Cyert and de Groot, “Adaptive utility.”

41. An initial paper on helping humans who don’t know their own preferences and are

learning about them: Lawrence Chan et al., “The assistive multi- armed bandit,” in

Proceedings of the 14th ACM/ IEEE International Conference on Human– Robot Interac-

tion (HRI), ed. David Sirkin et al. (IEEE, 2019).

42. Eliezer Yudkowsky, in Coherent Extrapolated Volition (Singularity Institute, 2004),

lumps all these aspects, as well as plain inconsistency, under the heading of muddle— a

term that has not, unfortunately, caught on.

43. On the two selves who evaluate experiences: Daniel Kahneman, Thinking, Fast and

Slow (Farrar, Straus & Giroux, 2011).

44. Edgeworth’s hedonimeter, an imaginary device for measuring happiness moment to

moment: Francis Edgeworth, Mathematical Psychics: An Essay on the Application of

Mathematics to the Moral Sciences (Kegan Paul, 1881).

45. A standard text on sequential decisions under uncertainty: Martin Puterman, Markov

Decision Processes: Discrete Stochastic Dynamic Programming (Wiley, 1994).

46. On axiomatic assumptions that justify additive representations of utility over time:

Tjalling Koopmans, “Representation of preference orderings over time,” in Decision

and Organization, ed. C. Bartlett McGuire, Roy Radner, and Kenneth Arrow ( North-

Holland, 1972).

47. The 2019 humans (who might, in 2099, be long dead or might just be the earlier selves

of 2099 humans) might wish to build the machines in a way that respects the 2019

preferences of the 2019 humans rather than pandering to the undoubtedly shallow and

ill- considered preferences of humans in 2099. This would be like drawing up a consti-

tution that disallows any amendments. If the 2099 humans, after suitable delibera-

tion, decide they wish to override the preferences built in by the 2019 humans, it

seems reasonable that they should be able to do so. After all, it is they and their de-

scendants who have to live with the consequences.

9780525558613_Human_TX.indd 320 8/7/19 11:21 PM

Not

ky,

aspectspec

s not, unfort

ot, unfort

wo selves who o selves who

arrar, Straus &arrar, Straus &

rth’s hedonrth’s hedo

FranciFran

for

hum

wrence C

nce

ACM/M/

EEE I

EEE

d Sirkin et al.

Sirkin et al.

CoherenCoheren

as we

Distribution

n: A

d. (MIT Pres

(MIT Pres

Wallace, Wallace,

The The

disee

(Macmilla (Macmill

xford CompanidCompan

uated and pland and pla

odor.odor.

e durian was durian wa

nsformative Exnsformative Ex

nty about one’sabout one’

ontradicted byontradicted by

y,” ”

PhilosophyPhilosop

efers to the eaefers to the ea

I,” or Cyert

ns whons who

NOTES 321

48. I am indebted to Wendell Wallach for this observation.

49. An early paper dealing with changes in preferences over time: John Harsanyi, “Welfare

economics of variable tastes,” Review of Economic Studies 21 (1953): 204– 13. A more

recent (and somewhat technical) survey is provided by Franz Dietrich and Christian

List, “Where do preferences come from?,” International Journal of Game Theory 42

(2013): 613– 37. See also Laurie Paul, Transformative Experience (Oxford University

Press, 2014), and Richard Pettigrew, “Choosing for Changing Selves,” philpapers.org

/ archive/ PETCFC.pdf.

50. For a rational analysis of irrationality, see Jon Elster, Ulysses and the Sirens: Studies in

Rationality and Irrationality (Cambridge University Press, 1979).

51. For promising ideas on cognitive prostheses for humans, see Falk Lieder, “Beyond

bounded rationality: Reverse- engineering and enhancing human intelligence” (PhD

thesis, University of California, Berkeley, 2018).

&+$37(5

1. On the application of assistance games to driving: Dorsa Sadigh et al., “Planning for

cars that coordinate with people,” Autonomous Robots 42 (2018): 1405– 26.

2. Apple is, curiously, absent from this list. It does have an AI research group and is

ramping up rapidly. Its traditional culture of secrecy means that its impact in the mar-

ketplace of ideas is quite limited so far.

3. Max Tegmark, interview, Do You Trust This Computer?, directed by Chris Paine, writ-

ten by Mark Monroe (2018).

4. On estimating the impact of cybercrime: “Cybercrime cost $600 billion and targets

banks first,” Security Magazine, February 21, 2018.

APPENDIX A

1. The basic plan for chess programs of the next sixty years: Claude Shannon, “Program-

ming a computer for playing chess,” Philosophical Magazine, 7th ser., 41 (1950): 256–

75. Shannon’s proposal drew on a centuries- long tradition of evaluating chess positions

by adding up piece values; see, for example, Pietro Carrera, Il gioco degli scacchi

(Giovanni de Rossi, 1617).

2. A report describing Samuel’s heroic research on an early reinforcement learning algo-

rithm for checkers: Arthur Samuel, “Some studies in machine learning using the game

of checkers,” IBM Journal of Research and Development 3 (1959): 210– 29.

3. The concept of rational metareasoning and its application to search and game playing

emerged from the thesis research of my student Eric Wefald, who died tragically in a

car accident before he could write up his work; the following appeared posthumously:

Stuart Russell and Eric Wefald, Do the Right Thing: Studies in Limited Rationality (MIT

Press, 1991). See also Eric Horvitz, “Rational metareasoning and compilation for opti-

mizing decisions under bounded resources,” in Computational Intelligence, II: Proceed-

ings of the International Symposium, ed. Francesco Gardin and Giancarlo Mauri

( North- Holland, 1990); and Stuart Russell and Eric Wefald, “On optimal game- tree

search using rational meta- reasoning,” in Proceedings of the 11th International Joint

Conference on Artificial Intelligence, ed. Natesa Sridharan (Morgan Kaufmann, 1989).

4. Perhaps the first paper showing how hierarchical organization reduces the combinato-

rial complexity of planning: Herbert Simon, “The architecture of complexity,” Pro-

ceedings of the American Philosophical Society 106 (1962): 467– 82.

5. The canonical reference for hierarchical planning is Earl Sacerdoti, “Planning in a

hierarchy of abstraction spaces,” Artificial Intelligence 5 (1974): 115–

35. See also Aus-

tin Tate, “Generating project networks,” in Proceedings of the 5th International Joint

Conference on Artificial Intelligence, ed. Raj Reddy (Morgan Kaufmann, 1977).

6. A formal definition of what high- level actions do: Bhaskara Marthi, Stuart Russell,

and Jason Wolfe, “Angelic semantics for high- level actions,” in Proceedings of the 17th

International Conference on Automated Planning and Scheduling, ed. Mark Boddy, Maria

Fox, and Sylvie Thiébaux (AAAI Press, 2007).

M 9780525558613_Human_TX.indd 321 8/7/19 11:21 PM

Not

the th

he t

before he cfore he c

ssell and Eric

991). See also991). See als

g decisions ug decisions

the Intethe I

lla

for

uel’s he

’s h

thur Samuel

ur Samu

ournal of Resea

urnal of Rese

onal metare

nal metar

sis rese

s re

Distribution

1405

405

AI research g

s that its impas that its imp

, directed bydirected by

ercrime cost $ercrime cost

018.018.

he next sixty next sixty

PhilosophicaPhilosophica

enturies-enturies-

onon

ee, for exam

cre

322 NOTES

APPENDIX B

1. This example is unlikely to be from Aristotle, but may have originated with Sextus

Empiricus, who lived probably in the second or third century CE.

2. The first algorithm for theorem- proving in first- order logic worked by reducing first-

order sentences to (very large numbers of) propositional sentences: Martin Davis and

Hilary Putnam, “A computing procedure for quantification theory,” Journal of the

ACM 7 (1960): 201– 15.

3. An improved algorithm for propositional inference: Martin Davis, George Logemann,

and Donald Loveland, “A machine program for theorem- proving,” Communications of

the ACM 5 (1962): 394– 97.

4. The satisfiability problem— deciding whether a collection of sentences is true in some

world— is NP- complete. The reasoning problem— deciding whether a sentence fol-

lows from the known sentences— is co- NP- complete, a class that is thought to be

harder than NP- complete problems.

5. There are two exceptions to this rule: no repetition (a stone may not be played that

returns the board to a situation that existed previously) and no suicide (a stone may

not be placed such that it would immediately be captured— for example, if it is already

surrounded).

6. The work that introduced first- order logic as we understand it today (Begriffsschrift

means “concept writing”): Gottlob Frege, Begriffsschrift, eine der arithmetischen nachge-

bildete Formelsprache des reinen Denkens (Halle, 1879). Frege’s notation for first- order

logic was so bizarre and unwieldy that it was soon replaced by the notation introduced

by Giuseppe Peano, which remains in common use today.

7. A summary of Japan’s bid for supremacy through knowledge- based systems: Edward

Feigenbaum and Pamela McCorduck, The Fifth Generation: Artificial Intelligence and

Japan’s Computer Challenge to the World ( Addison- Wesley, 1983).

8. The US efforts included the Strategic Computing Initiative and the formation of the

Microelectronics and Computer Technology Corporation (MCC). See Alex Roland

and Philip Shiman, Strategic Computing: DARPA and the Quest for Machine Intelligence,

1983– 1993 (MIT Press, 2002).

9. A history of Britain’s response to the re- emergence of AI in the 1980s: Brian Oakley

and Kenneth Owen, Alvey: Britain’s Strategic Computing Initiative (MIT Press, 1990).

10. The origin of the term GOFAI: John Haugeland, Artificial Intelligence: The Very Idea

(MIT Press, 1985).

11. Interview with Demis Hassabis on the future of AI and deep learning: Nick Heath,

“Google DeepMind founder Demis Hassabis: Three truths about AI,” TechRepublic,

September 24, 2018.

APPENDIX C

1. Pearl’s work was recognized by the Turing Award in 2011.

2. Bayes nets in more detail: Every node in the network is annotated with the probability

of each possible value, given each possible combination of values for the node’s parents

(that is, those nodes that point to it). For example, the probability that Doubles

has

value true is 1.0 when D

and D

have the same value, and 0.0 otherwise. A possible

world is an assignment of values to all the nodes. The probability of such a world is the

product of the appropriate probabilities from each of the nodes.

3. A compendium of applications of Bayes nets: Olivier Pourret, Patrick Naïm, and Bruce

Marcot, eds., Bayesian Networks: A Practical Guide to Applications (Wiley, 2008).

4. The basic paper on probabilistic programming: Daphne Koller, David McAllester, and

Avi Pfeffer, “Effective Bayesian inference for stochastic programs,” in Proceedings of the

14th National Conference on Artificial Intelligence (AAAI Press, 1997). For many addi-

tional references, see probabilistic- programming.org.

5. Using probabilistic programs to model human concept learning: Brenden Lake, Ruslan

Salakhutdinov, and Joshua Tenenbaum, “ Human- level concept learning through prob-

abilistic program induction,” Science 350 (2015): 1332– 38.

9780525558613_Human_TX.indd 322 8/7/19 11:21 PM

Not

ind f

2018.18.

work was recwork was r

in moin m

for

Britai

ita

GOFAIAI

: Joh

is Hassabis o

s Hassabis

under Dunder D

Distribution

icid

ample,

ple,

nd it today (nd it today

ne der arithmene der arithm

Frege’s notatiege’s notati

placed by the nced by the

e today.e today.

gh gh

nowledgenowledge

ifth GeneratioGeneratio

Addison-ddison

WesleWesl

Computing Inimputing I

hnology Corphnology Corp

uting: DARPA uting: DARPA

o the reo the re

s Stras Stra

NOTES 323

6. For a detailed description of the seismic monitoring application and associated probabil-

ity model, see Nimar Arora, Stuart Russell, and Erik Sudderth, “ NET- VISA: Network

processing vertically integrated seismic analysis,” Bulletin of the Seismological Society of

America 103 (2013): 709– 29.

7. News article describing one of the first serious self- driving car crashes: Ryan Ran-

dazzo, “Who was at fault in self- driving Uber crash? Accounts in Tempe police report

disagree,” Republic (azcentral.com), March 29, 2017.

APPENDIX D

1. The foundational discussion of inductive learning: David Hume, Philosophical Essays

Concerning Human Understanding (A. Millar, 1748).

2. Leslie Valiant, “A theory of the learnable,” Communications of the ACM 27 (1984):

1134 – 42. See also Vladimir Vapnik, Statistical Learning Theory (Wiley, 1998). Val-

iant’s approach concentrated on computational complexity, Vapnik’s on statistical

analysis of the learning capacity of various classes of hypotheses, but both shared a

common theoretical core connecting data and predictive accuracy.

3. For example, to learn the difference between the “situational superko” and “natural

situational superko” rules, the learning algorithm would have to try repeating a board

position that it had created previously by a pass rather than by playing a stone. The

results would be different in different countries.

4. For a description of the ImageNet competition, see Olga Russakovsky et al., “Ima-

geNet large scale visual recognition challenge,” International Journal of Computer

Vision 115 (2015): 211– 52.

5. The first demonstration of deep networks for vision: Alex Krizhevsky, Ilya Sutskever,

and Geoffrey Hinton, “ImageNet classification with deep convolutional neural net-

works,” in Advances in Neural Information Processing Systems 25, ed. Fernando Pereira

et al. (2012).

6. The difficulty of distinguishing over one hundred breeds of dogs: Andrej Karpathy,

“What I learned from competing against a ConvNet on ImageNet,” Andrej Karpathy

Blog, September 2, 2014.

7. Blog post on inceptionism research at Google: Alexander Mordvintsev, Christopher

Olah, and Mike Tyka, “Inceptionism: Going deeper into neural networks,” Google AI

Blog, June 17, 2015. The idea seems to have originated with J. P. Lewis, “Creation by

refinement: A creativity paradigm for gradient descent learning networks,” in Proceed-

ings of the IEEE International Conference on Neural Networks (IEEE, 1988).

8. News article on Geoff Hinton having second thoughts about deep networks: Steve

LeVine, “Artificial intelligence pioneer says we need to start over,” Axios, September

15, 2017.

9. A catalog of shortcomings of deep learning: Gary Marcus, “Deep learning: A critical

appraisal,” arXiv:1801.00631 (2018).

10. A popular textbook on deep learning, with a frank assessment of its weaknesses:

François Chollet, Deep Learning with Python (Manning Publications, 2017).

11. An explanation of explanation- based learning: Thomas Dietterich, “Learning at the

knowledge level,” Machine Learning 1 (1986): 287– 315.

12. A superficially quite different explanation of explanation- based learning: John Laird,

Paul Rosenbloom, and Allen Newell, “Chunking in Soar: The anatomy of a general

learning mechanism,” Machine Learning 1 (1986): 11– 46.

M 9780525558613_Human_TX.indd 323 8/7/19 11:21 PM

n Ge

ificial inteal in

og of shortcomof shortco

sal,” arXiv:18sal,” arXiv:18

lar textblar textb

hol

for

nceptio

pti

he idea seem

dea se

vity paradigm

rnational Co

national C

ff Hintoff Hint

lig

Distribution

superko

perko

to try repea

ry repea

an by playingan by playin

Olga Russakoga Russako

International ernational

vision: Alex Kvision: Alex K

ation with deen with de

n Processing SyProcessing Sy

er one hundrer one hundre

against a Coagainst a Co

arch at Goarch at G

sm: Gsm: G

Image Credits

Smithsonian Institution Archives.

/by/3.0/legalcode.

Dynamics.

Society (ARS), New York.

Future of Life Institute / Stuart Russell.

Sorensen.

OpenStreetMap.org. creativecommons.org/licenses/by/2.0

/legalcode.

Page 281 — Figure 19: Terrain photo:DigitalGlobe via Getty Images.

Page 284 — Figure 20: (right) Courtesy of the Tempe Police

Department.

creativecommons.org/licenses/by/2.0/legalcode.

9780525558613_Human_TX.indd 324 8/7/19 11:21 PM

Not

re 11

e 1

ghts Rese

ts Rese

Figure 14: Figure 14:

StreetMap.StreetMap.

de.de.

for

e / S

(left) © A

eft) ©

ElysiuElysiu

Distribution

ng; (c) Courg; (c) Cou

ativecommvecomm

AI ResearchAI Research

Steinberg Foteinberg Fo

Noam Esh

uart Ruart R

Index

AAAI (Association for the

Advancement of Articial

Intelligence), 250

Abbeel, Pieter, 73, 192

abstract actions, hierarchy of, 87–90

abstract planning, 264–66

access shortcomings, of intelligent

personal assistants, 67–68

action potentials, 15

actions, discovering, 87–90

actuators, 72

Ada, Countess of Lovelace.

See Lovelace, Ada

adaptive organisms, 18–19

agent. See intelligent agent

agent program, 48

“AI Researchers on AI Risk”

(Alexander), 153

Alciné, Jacky, 60

Alexander, Scott, 146, 153, 169–70

algorithms, 33–34

Bayesian networks and, 275–77

Bayesian updating, 283, 284

bias and, 128–30

chess-playing, 62–63

coding of, 34

completeness theorem and, 51–52

computer hardware and, 34–35

content selection, 8–9, 105

deep learning, 58–59, 288–93

dynamic programming, 54–55

examples of common, 33–34

exponential complexity of problems

and, 38–39

halting problem and, 37–38

lookahead search, 47, 49–50,

260–61

propositional logic and, 268–70

reinforcement learning, 55–57, 105

subroutines within, 34

supervised learning, 58–59, 285–93

Alibaba, 250

AlphaGo, 6, 46–48, 49–50, 55, 91, 92,

206–7, 209–10, 261, 265, 285

AlphaZero, 47, 48

altruism, 24, 227–29

altruistic AI, 173–75

Amazon, 106, 119, 250

Echo, 64–65

“Picking Challenge” to accelerate

robot development, 73–74

Analytical Engine, 40

ants, 25

Aoun, Joseph, 123

Apple HomePod, 64–65

“Architecture of Complexity, The”

(Simon), 265

Aristotle, 20–21, 39–40, 50, 52, 53,

114, 245

Armstrong, Stuart, 221

Arnauld, Antoine, 21–22

Arrow, Kenneth, 223

M 9780525558613_Human_TX.indd 325 8/7/19 11:21 PM

Not

s, 18–

igent agentnt agent

m, 48m, 48

chers on AI chers on AI

nder), 15nder), 15

for

ce.

1919

Distribution

of common, 3f common,

ntial complexal comple

d, 38–398–39

alting problemalting problem

lookahead sokahead

260–6260–

propopropo

reire

326 INDEX

articial intelligence (AI), 1–12

agent (See intelligent agent)

agent programs, 48–59

benecial, principles for (See

benecial AI)

benets to humans of, 98–102

as biggest event in human history, 1–4

conceptual breakthroughs required

for (See conceptual breakthroughs

required for superintelligent AI)

decision making on global scale,

capability for, 75–76

deep learning and, 6

domestic robots and, 73–74

general-purpose, 46–48, 100, 136

global scale, capability to sense and

make decisions on, 74–76

goals and, 41–42, 48–53, 136–42,

165–69

governance of, 249–53

health advances and, 101

history of, 4–6, 40–42

human preferences and (See human

preferences)

imagining what superintelligent

machines could do, 93–96

intelligence, dening, 39–61

intelligent personal assistants and,

67–71

limits of superintelligence, 96–98

living standard increases and, 98–100

logic and, 39–40

media and public perception of

advances in, 62–64

misuses of (See misuses of AI)

mobile phones and, 64–65

multiplier effect of, 99

objectives and, 11–12, 43, 48–61,

136–42, 165–69

overly intelligent AI, 132–44

pace of scientic progress in creating,

6–9

predicting arrival of superintelligent

AI, 76–78

reading capabilities and, 74–75

risk posed by (See risk posed by AI)

scale and, 94–96

scaling up sensory inputs and capacity

for action, 94–95

self-driving cars and, 65–67,

181–82, 247

sensing on global scale, capability

to, 75

smart homes and, 71–72

softbots and, 64

speech recognition capabilities and,

74–75

standard model of, 9–11, 13,

48–61, 247

Turing test and, 40–41

tutoring by, 100–101

virtual reality authoring by, 101

World Wide Web and, 64

“Articial Intelligence and Life in 2030”

(One Hundred Year Study on

Articial Intelligence), 149, 150

Asimov, Isaac, 141

assistance games, 192–203

learning preferences exactly in long

run, 200–202

off-switch game, 196–200

paperclip game, 194–96

prohibitions and, 202–3

uncertainty about human objectives,

200–202

Association for the Advancement of

Articial Intelligence (AAAI), 250

assumption failure, 186–87

Atkinson, Robert, 158

Atlas humanoid robot, 73

autonomous weapons systems (LAWS),

110 –13

autonomy loss problem, 255–56

Autor, David, 116

Avengers: Innity War (lm), 224

“avoid putting in human goals”

argument, 165–69

axiomatic basis for utility theory, 23–24

axioms, 185

Babbage, Charles, 40, 132–33

backgammon, 55

Baidu, 250

Baldwin, James, 18

9780525558613_Human_TX.indd 326 8/7/19 11:21 PM

Not

ellige

lige

increases creases

9–40–40

d public percd public perc

s in, 62–s in, 62–

for

stants and,

nce, 96–nce, 96–

Distribution

horing by,

ing by,

Web and, 64Web and, 64

elligence and ligence and

Hundred Yearndred Ye

icial Intelligal Intellig

ov, Isaac, 141ov, Isaac, 141

sistance gameance gam

learning plearning p

run, run,

off-soff

INDEX 327

Baldwin effect, 18–20

Banks, Iain, 164

bank tellers, 117–18

Bayes, Thomas, 54

Bayesian logic, 54

Bayesian networks, 54, 275–77

Bayesian rationality, 54

Bayesian updating, 283, 284

Bayes theorem, 54

behavior, learning preferences from,

190–92

behavior modication, 104–7

belief state, 282–83

benecial AI, 171–210, 247–49

caution regarding development of,

reasons for, 179

data available for learning about

human preferences, 180–81

economic incentives for, 179–80

evil behavior and, 179

learning to predict human

preferences, 176–77

moral dilemmas and, 178

objective of AI is to maximize

realization of human preferences,

173–75

principles for, 172–79

proofs for (See proofs for benecial AI)

uncertainty as to what human

preferences are, 175–76

values, dening, 177–78

Bentham, Jeremy, 24, 219

Berg, Paul, 182

Berkeley Robot for the Elimination of

Tedious Tasks (BRETT), 73

Bernoulli, Daniel, 22–23

“Bill Gates Fears AI, but AI Researchers

Know Better” (Popular Science), 152

blackmail, 104–5

blinking reex, 57

blockchain, 161

board games, 45

Boole, George, 268

Boolean (propositional) logic,

51, 268–70

bootstrapping process, 81–82

Boston Dynamics, 73

Bostrom, Nick, 102, 144, 145, 150, 166,

167, 183, 253

brains, 16, 17–18

reward system and, 17–18

Summit machine, compared, 34

BRETT (Berkeley Robot for the

Elimination of Tedious Tasks), 73

Brin, Sergey, 81

Brooks, Rodney, 168

Brynjolfsson, Erik, 117

Budapest Convention on Cybercrime,

253–54

Butler, Samuel, 133–34, 159

“can’t we just...” responses to risks

posed by AI, 160–69

“...avoid putting in human goals,”

165–69

“...merge with machines,”

163–65

“...put it in a box,” 161–63

“...switch it off,” 160–61

“...work in human-machine

teams,” 163

Cardano, Gerolamo, 21

caring professions, 122

Chace, Calum, 113

changes in human preferences over time,

240–45

Changing Places (Lodge), 121

checkers program, 55, 261

chess programs, 62–63

Chollet, François, 293

chunking, 295

circuits, 291–92

CNN, 108

CODE (Collaborative Operations in

Denied Environments), 112

combinatorial complexity, 258

common operational picture, 69

compensation effects, 114–17

completeness theorem (Gödel’s),

51–52

complexity of problems, 38–39

Comprehensive Nuclear-Test-Ban

Treaty (CTBT) seismic monitoring,

279–80

M 9780525558613_Human_TX.indd 327 8/7/19 11:21 PM

e, 17

, 17

ng, 177–78177–78

remy, 24, 219emy, 24, 219

, 182, 182

obot for tobot for t

Task

Tas

for

for benecial

for beneci

at human

t human

–76–76

Distribution

nces,

responses t

ponses t

I, 160–69I, 160–69

putting in huutting in h

merge with mge with m

163–6563–65

“...put it in..put it

“...swit“...swit

“...w“...w

CaC

328 INDEX

computer programming, 119

computers, 32–61

algorithms and (See algorithms)

complexity of problems and, 38–39

halting problem and, 37–38

hardware, 34–35

intelligent (See articial intelligence)

limits of computation, 36–39

software limitations, 37

special-purpose devices, building,

35–36

universality and, 32

computer science, 33

“Computing Machinery and

Intelligence” (Turing), 40–41, 149

conceptual breakthroughs required for

superintelligent AI, 78–93

actions, discovering, 87–90

cumulative learning of concepts and

theories, 82–87

language/common sense problem,

79–82

mental activity, managing, 90–92

consciousness, 16–17

consequentialism, 217–19

content selection algorithms, 8–9, 105

content shortcomings, of intelligent

personal assistants, 67–68

control theory, 10, 44–45, 54, 176

convolutional neural networks, 47

cost function to evaluate solutions, and

goals, 48

Credibility Coalition, 109

CRISPR-Cas9, 156

cumulative learning of concepts and

theories, 82–87

cybersecurity, 186–87

Daily Telegraph, 77

decision making on global scale, 75–76

decoherence, 36

Deep Blue, 62, 261

deep convolutional network, 288–90

deep dreaming images, 291

deepfakes, 105–6

deep learning, 6, 58–59, 86–87,

288–93

DeepMind, 90

AlphaGo, 6, 46–48, 49–50, 55, 91,

92, 206–7, 209–10, 261, 265, 285

AlphaZero, 47, 48

DQN system, 55–56

deection arguments, 154–59

“research can’t be controlled”

arguments, 154–56

silence regarding risks of AI, 158–59

tribalism, 150, 159–60

whataboutery, 156–57

Delilah (blackmail bot), 105

denial of risk posed by AI, 146–54

“it’s complicated” argument, 147–48

“it’s impossible” argument, 149–50

“it’s too soon to worry about it”

argument, 150–52

Luddism accusation and, 153–54

“we’re the experts” argument,

152–54

deontological ethics, 217

dexterity problem, robots, 73–74

Dickinson, Michael, 190

Dickmanns, Ernst, 65

DigitalGlobe, 75

domestic robots, 73–74

dopamine, 17, 205–6

Dota 2, 56

DQN system, 55–56

Dune (Herbert), 135

dynamic programming algorithms,

54–55

E. coli, 14–15

eBay, 106

ECHO (rst smart home), 71

“Economic Possibilities for Our

Grandchildren” (Keynes), 113–14,

120–21

The Economic Singularity: Articial

Intelligence and the Death of

Capitalism (Chace), 113

Economist, The, 145

Edgeworth, Francis, 238

Eisenhower, Dwight, 249

electrical action potentials, 15

Eliza (rst chatbot), 67

9780525558613_Human_TX.indd 328 8/7/19 11:21 PM

Not

l net

net

evaluate soluate so

Coalition, 10Coalition, 10

9, 1569, 156

nin

for

elligent

gen

7–687–68

45, 54, 176

5, 54, 176

works, 4works, 4

Distr

0505

tribution

gume

rgument, 1

ment, 1

o worry abouo worry abo

t, 150–52150–52

accusation acusation

e the experts”e experts”

52–5452–54

ontological etological e

dexterity prexterity pr

DickinsonDickinso

DickmDick

DigDig

INDEX 329

Elmo (shogi program), 47

Elster, Jon, 242

Elysium (lm), 127

emergency braking, 57

enfeeblement of humans problem,

254–55

envy, 229–31

Epicurus, 219

equilibrium solutions, 30–31, 195–96

Erewhon (Butler), 133–34, 159

Etzioni, Oren, 152, 157

eugenics movement, 155–56

expected value rule, 22–23

experience, learning from, 285–95

experiencing self, and preferences,

238–40

explanation-based learning, 294–95

Facebook, 108, 250

Fact, Fiction and Forecast (Goodman), 85

fact-checking, 108–9, 110

factcheck.org, 108

fear of death (as an instrumental goal),

140– 42

feature engineering, 84–85

Fermat, Pierre de, 185

Fermat’s Last Theorem, 185

Ferranti Mark I, 34

Fifth Generation project, 271

rewalling AI systems, 161–63

rst-order logic, 51, 270–72

probabilistic languages and, 277–80

propositional logic distinguished, 270

Ford, Martin, 113

Forster, E. M., 254–55

Fox News, 108

Frege, Gottlob, 270

Full, Bob, 190

G7, 250–51

Galileo Galilei, 85–86

gambling, 21–23

game theory, 28–32. See also assistance

games

Gates, Bill, 56, 153

GDPR (General Data Protection

Regulation), 127–29

Geminoid DK (robot), 125

General Data Protection Regulation

(GDPR), 127–29

general-purpose articial intelligence,

46–48, 100, 136

geometric objects, 33

Glamour, 129

Global Learning XPRIZE

competition, 70

Go, 6, 46–47, 49–50, 51, 55, 56

combinatorial complexity and, 259–61

propositional logic and, 269

supervised learning algorithm and,

286–87

thinking, learning from, 293–95

goals, 41–42, 48–53, 136–42, 165–69

God and Golem (Wiener), 137–38

Gödel, Kurt, 51, 52

Goethe, Johann Wolfgang von, 137

Good, I. J., 142–43, 153, 208–9

Goodhart’s law, 77

Goodman, Nelson, 85

Good Old-Fashioned AI (GOFAI), 271

Google, 108, 112–13

DeepMind (See DeepMind)

Home, 64–65

misclassifying people as gorillas in

Google Photo, 60

tensor processing units (TPUs), 35

gorilla problem, 132–36

governance of AI, 249–53

governmental reward and punishment

systems, 106–7

Great Decoupling, 117

greed (as an instrumental goal), 140–42

Grice, H. Paul, 205

Gricean analysis, 205

halting problem, 37–38

hand construction problem, robots, 73

Hardin, Garrett, 31

hard takeoff scenario, 144

Harop (missile), 111

Harsanyi, John, 220, 229

Hassabis, Demis, 271–72, 293

Hawking, Stephen, 4, 153

health advances, 101

M 9780525558613_Human_TX.indd 329 8/7/19 11:21 PM

ems,

ms,

51, 270–71, 270–7

ic languages c languages

tional logic dtional logic d

in, 113in, 113

for

ct, 271

t, 271

61–6361–63

Distribution

ng from, 29

rom, 29

8–53, 136–428–53, 136–

emm

(Wiener) (Wiener

urt, 51, 5251, 52

e, Johann Woohann Wo

od, I. J., 142–od, I. J., 142–

Goodhart’s lawodhart’s la

Goodman,Goodman,

Good OGood O

GoogGoo

330 INDEX

He Jiankui, 156

Herbert, Frank, 135

hierarchy of abstract actions, 87–90,

265–66

High-Level Expert Group on Articial

Intelligence (EU), 251

Hillarp, Nils-Åke, 17

Hinton, Geoff, 290

Hirsch, Fred, 230

Hobbes, Thomas, 246

Howard’s End (Forster), 254

Hufngton Post, 4

human germline alteration, ban on,

155–56

human–machine teaming, 163–65

human preferences, 211–45

behavior, learning preferences from,

190–92

benecial AI and, 172–77

changes in, over time, 240–45

different people, learning to make

trade-offs between preferences of,

213–27

emotions and, 232–34

errors as to, 236–37

of experiencing self, 238–40

heterogeneity of, 212–13

loyal AI, 215–17

modication of, 243–45

of nice, nasty and envious humans,

227–31

of remembering self, 238–40

stupidity and, 232–34

transitivity of, 23–24

uncertainty and, 235–37

updates in, 241–42

utilitarian AI (See utilitarianism/

utilitarian AI)

utility theory and, 23–27

human roles, takeover of, 124–31

Human Use of Human Beings

(Wiener), 137

humble AI, 175–76

Hume, David, 167, 287–88

IBM, 62, 80, 250

ideal utilitarianism, 219

IEEE (Institute of Electrical and

Electronics Engineers), 250

ignorance, 52–53

imitation game, 40–41

inceptionism images, 291

inductive logic programming, 86

inductive reasoning, 287–88

inputs, to intelligent agents, 42–43

instinctive organisms, 18–19

Institute of Electrical and Electronics

Engineers (IEEE), 250

instrumental goal, 141–42, 196

insurance underwriters, 119

intelligence, 13–61

action potentials and, 15

brains and, 16, 17–18

computers and, 39–61

consciousness and, 16–17

E. coli and, 14–15

evolutionary origins of, 14–18

learning and, 15, 18–20

nerve nets and, 16

practical reasoning and, 20

rationality and, 20–32

standard model of, 9–11, 13,

48–61, 247

successful reasoning and, 20

intelligence agencies, 104

intelligence explosions, 142–44, 208–9

intelligent agent, 42–48

actions generated by, 48

agent programs and, 48–59

dened, 42

design of, and problem types, 43–45

environment and, 43, 44, 45–46

inputs to, 42–43

multi-agent cooperation design, 94

objectives and, 43, 48–61

reex, 57–59

intelligent computers. See articial

intelligence (AI)

intelligent personal assistants,

67–71, 101

commonsense modeling and, 68–69

design template for, 69–70

education systems, 70

health systems, 69–70

9780525558613_Human_TX.indd 330 8/7/19 11:21 PM

Not

envi

nvi

ring self, 238ing self, 238

and, 232–34and, 232–34

y of, 23–2y of, 23–

for

us humus hum

Distribution

and, 15, 15

17–1817–18

and, 39–61nd, 39–61

sness and, 16ss and, 1

and, 14–15d, 14–15

olutionary oriolutionary ori

learning and,rning and

nerve netsnerve nets

practicpractic

ratiorat

INDEX 331

personal nance systems, 70

privacy considerations, 70–71

shortcomings of early systems,

67–68

stimulus–response templates and, 67

understanding content, improvements

in, 68

International Atomic Energy

Agency, 249

Internet of Things (IoT), 65

interpersonal services as the future of

employment, 122–24

algorithmic bias and, 128–30

decisions affecting people, use of

machines in, 126–28

robots built in humanoid form and,

124–26

intractable problems, 38–39

inverse reinforcement learning,

191–93

IQ, 48

Ishiguro, Hiroshi, 125

is-ought problem, 167

“it’s complicated” argument, 147–48

“it’s impossible” argument, 149–50

“it’s too soon to worry about it”

argument, 150–52

jellysh, 16

Jeopardy! (tv show), 80

Jevons, William Stanley, 222

JiaJia (robot), 125

jian ai, 219

Kahneman, Daniel, 238–40

Kasparov, Garry, 62, 90, 261

Ke Jie, 6

Kelly, Kevin, 97, 148

Kenny, David, 153, 163

Keynes, John Maynard, 113–14,

120–21, 122

King Midas problem, 136–40

Kitkit School (software system), 70

knowledge, 79–82, 267–72

knowledge-based systems, 50–51

Krugman, Paul, 117

Kurzweil, Ray, 163–64

language/common sense problem,

79–82

Laplace, Pierre-Simon, 54

Laser-Interferometer Gravitational-Wave

Observatory (LIGO), 82–84

learning, 15

behavior, learning preferences from,

190–92

bootstrapping process, 81–82

culture and, 19

cumulative learning of concepts and

theories, 82–87

data-driven view of, 82–83

deep learning, 6, 58–59, 84, 86–87,

288–93

as evolutionary accelerator, 18–20

from experience, 285–93

explanation-based learning, 294–95

feature engineering and, 84–85

inverse reinforcement learning,

191–93

reinforcement learning, 17, 47, 55–57,

105, 190–91

supervised learning, 58–59, 285–93

from thinking, 293–95

LeCun, Yann, 47, 165

legal profession, 119

lethal autonomous weapons systems

(LAWS), 110–13

Life 3.0 (Tegmark), 114, 138

LIGO (Laser-Interferometer

Gravitational-Wave Observatory),

82–84

living standard increases, and AI,

98–100

Lloyd, Seth, 37

Lloyd, William, 31

Llull, Ramon, 40

Lodge, David, 1

logic, 39–40, 50–51, 267–72

Bayesian, 54

dened, 267

rst-order, 51–52, 270–72

formal language requirement, 267

ignorance and, 52–53

programming, development of, 271

propositional (Boolean), 51, 268–70

M 9780525558613_Human_TX.indd 331 8/7/19 11:21 PM

Not

), 80

Stanley, 2anley, 2

, 125125

ani

for

Distribution

8–59

ry acceleratorry accelerat

rience, 285–9ience, 285–

ation-based len-based

ure engineerinengineerin

nverse reinfornverse reinfor

191–93191–93

reinforcereinforce

105105

supsu

332 INDEX

lookahead search, 47, 49–50, 260–61

loophole principle, 202–3, 216

Lovelace, Ada, 40, 132–33

loyal AI, 215–17

Luddism accusation, 153–54

machines, 33

“Machine Stops, The” (Forster), 254–55

machine translation, 6

McAfee, Andrew, 117

McCarthy, John, 4–5, 50, 51, 52, 53,

65, 77

malice, 228–29

malware, 253

map navigation, 257–58

mathematical proofs for benecial AI,

185–90

mathematics, 33

matrices, 33

Matrix, The (lm), 222, 235

MavHome project, 71

mechanical calculator, 40

mental security, 107–10

“merge with machines” argument,

163–65

metareasoning, 262

Methods of Ethics, The (Sidgwick),

224–25

Microsoft, 250

TrueSkill system, 279

Mill, John Stuart, 217–18, 219

Minsky, Marvin, 4–5, 76, 153

misuses of AI, 103–31, 253–54

behavior modication, 104–7

blackmail, 104–5

deepfakes, 105–6

governmental reward and punishment

systems, 106–7

intelligence agencies and, 104

interpersonal services, takeover of,

124–31

lethal autonomous weapons systems

(LAWS), 110–13

mental security and, 107–10

work, elimination of, 113–24

mobile phones, 64–65

monotonicity and, 24

Moore, G. E., 219, 221, 222

Moore’s law, 34–35

Moravec, Hans, 144

Morgan, Conway Lloyd, 18

Morgenstern, Oskar, 23

Mozi (Mozi), 219

multi-agent cooperation design, 94

Musk, Elon, 153, 164

“Myth of Superhuman AI, The”

(Kelly), 148

narrow (tool) articial intelligence, 46,

47, 136

Nash, John, 30, 195

Nash equilibrium, 30–31, 195–96

National Institutes of Health (NIH), 155

negative altruism, 229–30

NELL (Never-Ending Language

Learning) project, 81

nerve nets, 16

NET-VISA, 279–80

Network Enforcement Act (Germany),

108, 109

neural dust, 164–65

Neuralink Corporation, 164

neural lace, 164

neural networks, 288–89

neurons, 15, 16, 19

Never-Ending Language Learning

(NELL) project, 81

Newell, Allen, 295

Newton, Isaac, 85–86

New Yorker, The, 88

Ng, Andrew, 151, 152

Norvig, Peter, 2, 62–63

no suicide rule, 287

Nozick, Robert, 223

nuclear industry, 157, 249

nuclear physics, 7–8

Nudge (Thaler & Sunstein), 244

objectives, 11–12, 43, 48–61, 136–42,

165–69. See also goals

off-switch game, 196–200

onebillion (software system), 70

One Hundred Year Study on Articial

Intelligence (AI100), 149, 150

9780525558613_Human_TX.indd 332 8/7/19 11:21 PM

Not

279

, 217–18, 217–18, 2

in, 4–5, 76, 1n, 4–5, 76, 1

AI, 103–31, 2AI, 103–31, 2

modicatimodicat

4–

for

wick),

k),

Distribution

0–31, 195

31, 195

es of Health (es of Health

ism, 229–30m, 229–30

ver-Ending L-Ending

rning) projecng) projec

nets, 16 nets, 16

ET-VISA, 27-VISA, 27

Network EnNetwork En

108, 108,

neuralneur

NeuNeu

INDEX 333

OpenAI, 56

operations research, 10, 54, 176

Oracle AI systems, 161–63

orthogonality thesis, 167–68

Ovadya, Aviv, 108

overhypothesis, 85

overly intelligent AI, 132–44

fear and greed, 140–42

gorilla problem, 132–36

intelligence explosions and, 142–44,

208–9

King Midas problem, 136–40

paperclip game, 194–96

Part, Derek, 225

Partnership on AI, 180, 250

Pascal, Blaise, 21–22, 40

Passage to India, A (Forster), 254

Pearl, Judea, 54, 275

Perdix (drone), 112

Pinker, Steven, 158, 165–66, 168

Planet (satellite corporation), 75

Politics (Aristotle), 114

Popper, Karl, 221–22

Popular Science, 152

positional goods, 230–31

practical reasoning, 20

pragmatics, 204

preference autonomy principle, 220, 241

preferences. See human preferences

preference utilitarianism, 220

Price, Richard, 54

pride, 230–31

Primitive Expounder, 133

prisoner’s dilemma, 30–31

privacy, 70–71

probability theory, 21–22, 273–84

Bayesian networks and, 275–77

rst-order probabilistic languages,

277–80

independence and, 274

keeping track of not directly

observable phenomena, 280–84

probabilistic programming, 54–55,

84, 279–80

programming language, 34

programs, 33

prohibitions, 202–3

Project Aristo, 80

Prolog, 271

proofs for benecial AI

assistance games, 184–210, 192–203

learning preferences from behavior,

190–92

mathematical guarantees, 185–90

recursive self-improvement and,

208–10

requests and instructions,

interpretation of, 203–5

wireheading problem and, 205–8

propositional logic, 51, 268–70

Putin, Vladimir, 182, 183

“put it in a box” argument, 161–63

puzzles, 45

quantum computation, 35–36

qubit devices, 35–36

randomized strategy, 29

rationality

Aristotle’s formulation of, 20–21

Bayesian, 54

critiques of, 24–26

expected value rule and, 22–23

gambling and, 21–23

game theory and, 28–32

inconsistency in human preferences,

and developing theory of benecial

AI, 26–27

logic and, 39–40

monotonicity and, 24

Nash equilibrium and, 30–31

preferences and, 23–27

probability and, 21–22

randomized strategy and, 29

for single agent, 20–27

transitivity and, 23–24

for two agents, 27–32

uncertainty and, 21

utility theory and, 22–26

rational metareasoning, 262

reading capabilities, 74–75

real-world decision problem

complexity and, 39

M 9780525558613_Human_TX.indd 333 8/7/19 11:21 PM

Not

uman

man

tarianism, ianism,

d, 54d, 54

–31–31

xpounder,xpounder

for

principle, 22

inciple, 2

preferenpreferen

Distribution

1, 26

82, 183183

argument, 1argument,

um computatomputat

it devices, 35it devices, 35

randomizerandomize

rationalirational

ArA

334 INDEX

Reasons and Persons (Part), 225

Recombinant DNA Advisory

Committee, 155

recombinant DNA research, 155–56

recursive self-improvement, 208–10

redlining, 128

reex agents, 57–59

reinforcement learning, 17, 47, 55–57,

105, 190–91

remembering self, and preferences,

238–40

Repugnant Conclusion, 225

reputation systems, 108–9

“research can’t be controlled”

arguments, 154–56

retail cashiers, 117–18

reward function, 53–54, 55

reward system, 17

Rise of the Robots: Technology and the

Threat of a Jobless Future (Ford), 113

risk posed by AI, 145–70

deection arguments, 154–59

denial of problem, 146–54

Robinson, Alan, 5

Rochester, Nathaniel, 4–5

Rutherford, Ernest, 7, 77, 85–86, 150

Sachs, Jeffrey, 230

sadism, 228–29

Salomons, Anna, 116

Samuel, Arthur, 5, 10, 55, 261

Sargent, Tom, 191

scalable autonomous weapons, 112

Schwab, Klaus, 117

Second Machine Age, The (Brynjolfsson &

McAfee), 117

Sedol, Lee, 6, 47, 90, 91, 261

seismic monitoring system (NET-VISA),

279–80

self-driving cars, 65–67, 181–82, 247

performance requirements for,

65–66

potential benets of, 66–67

probabilistic programming and,

281–82

sensing on global scale, 75

sets, 33

Shakey project, 52

Shannon, Claude, 4–5, 62

Shiller, Robert, 117

side-channel attacks, 187, 188

Sidgwick, Henry, 224–25

silence regarding risks of AI, 158–59

Simon, Herbert, 76, 86, 265

simulated evolution of programs, 171

SLAM (simultaneous localization and

mapping), 283

Slate Star Codex blog, 146, 169–70

Slaughterbot, 111

Small World (Lodge), 1

Smart, R. N., 221–22

smart homes, 71–72

Smith, Adam, 227

snopes.com, 108

social aggregation theorem, 220–21

Social Limits to Growth, The

(Hirsch), 230

social media, and content selection

algorithms, 8–9

softbots, 64

software systems, 248

solutions, searching for, 257–66

abstract planning and, 264–66

combinatorial complexity and, 258

computational activity, managing,

261–62

15-puzzle and, 258

Go and, 259–61

map navigation and, 257–58

motor control commands and, 263–64

24-puzzle and, 258

“Some Moral and Technical

Consequences of Automation”

(Wiener), 10

Sophia (robot), 126

specications (of programs), 248

“Speculations Concerning the First

Ultraintelligent Machine” (Good),

142–43

speech recognition, 6

speech recognition capabilities, 74–75

Spence, Mike, 117

SpotMini, 73

SRI, 41–42, 52

9780525558613_Human_TX.indd 334 8/7/19 11:21 PM

Not

5, 10, 55, 210, 55,

19119

onomous weonomous we

us, 117us, 117

for

Distribution

5050

088

egation theoretion theo

imits to Grows to Grow

Hirsch), 230Hirsch), 230

cial media, anl media, a

algorithalgorith

softbots, softbots,

softwasoftw

solusolu

INDEX 335

standard model of intelligence, 9–11, 13,

48–61, 247

StarCraft, 45

Stasi, 103–4

stationarity, 24

statistics, 10, 176

Steinberg, Saul, 88

stimulus–response templates, 67

Stocksh (chess program), 47

striving and enjoying, relation between,

121–22

subroutines, 34, 233–34

Summers, Larry, 117, 120

Summit machine, 34, 35, 37

Sunstein, Cass, 244

Superintelligence (Bostrom), 102, 145,

150, 167, 183

supervised learning, 58–59, 285–93

surveillance, 104

Sutherland, James, 71

“switch it off” argument, 160–61

synapses, 15, 16

Szilard, Leo, 8, 77, 150

tactile sensing problem, robots, 73

Taobao, 106

technological unemployment. See work,

elimination of

Tegmark, Max, 4, 114, 138

Tellex, Stephanie, 73

Tencent, 250

tensor processing units (TPUs), 35

Terminator (lm), 112, 113

Tesauro, Gerry, 55

Thaler, Richard, 244

Theory of the Leisure Class, The

(Veblen), 230

Thinking, Fast and Slow

(Kahneman), 238

thinking, learning from, 293–95

Thornton, Richard, 133

Times, 7, 8

tool (narrow) articial intelligence, 46,

47, 136

TPUs (tensor processing units), 35

tragedy of the commons, 31

Transcendence (lm), 3–4, 141–42

transitivity of preferences, 23–24

Treatise of Human Nature, A

(Hume), 167

tribalism, 150, 159–60

truck drivers, 119

TrueSkill system, 279

Tucker, Albert, 30

Turing, Alan, 32, 33, 37–38, 40–41,

124–25, 134–35, 140–41, 144, 149,

153, 160–61

Turing test, 40–41

tutoring, 100–101

tutoring systems, 70

2001: A Space Odyssey (lm), 141

Uber, 57, 182

UBI (universal basic income), 121

uncertainty

AI uncertainty as to human

preferences, principle of, 53, 175–76

human uncertainty as to own

preferences, 235–37

probability theory and, 273–84

United Nations (UN), 250

universal basic income (UBI), 121

Universal Declaration of Human Rights

(1948), 107

universality, 32–33

universal Turing machine, 33, 40–41

unpredictability, 29

utilitarian AI, 217–27

Utilitarianism ((Mill), 217–18

utilitarianism/utilitarian AI, 214

challenges to, 221–27

consequentialist AI, 217–19

ideal utilitarianism, 219

interpersonal comparison of utilities,

debate over, 222–24

multiple people, maximizing sum of

utilities of, 219–26

preference utilitarianism, 220

social aggregation theorem and, 220

Somalia problem and, 226–27

utility comparison across populations

of different sizes, debate over,

224–25

utility function, 53–54

M 9780525558613_Human_TX.indd 335 8/7/19 11:21 PM

Not

ssing units (Tsing units (T

(lm), 112, (lm), 112,

erry, 55erry, 55

for

nt.

See

138

Distrib

ibution

(l

al basic incoml basic inco

ncertainty asrtainty as

preferences, preferences,

human unceuman unc

prefereprefer

probaproba

UniteUni

unun

336 INDEX

utility monster, 223–24

utility theory, 22–26

axiomatic basis for, 23–24

objections to, 24–26

value alignment, 137–38

Vardi, Moshe, 202–3

Veblen, Thorstein, 230

video games, 45

virtual reality authoring, 101

virtue ethics, 217

visual object recognition, 6

von Neumann, John, 23

W3C Credible Web group, 109

WA L L- E (lm), 255

Watson, 80

wave function, 35–36

“we’re the experts” argument,

152–54

white-collar jobs, 119

Whitehead, Alfred North, 88

whole-brain emulation, 171

Wiener, Norbert, 10, 136–38,

153, 203

Wilczek, Frank, 4

Wiles, Andrew, 185

wireheading, 205–8

work, elimination of, 113–24

caring professions and, 122

compensation effects and,

114 –17

historical warnings about, 113–14

income distribution and, 123

occupations at risk with adoption of

AI technology, 118–20

reworking education and research

institutions to focus on human

world, 123–24

striving and enjoying, relation

between, 121–22

universal basic income (UBI)

proposals and, 121

wage stagnation and productivity

increases, since 1973, 117

“work in human–machine teams”

argument, 163

World Economic Forum, 250

World Wide Web, 64

Worshipful Company of Scriveners, 109

Zuckerberg, Mark, 157

9780525558613_Human_TX.indd 336 8/7/19 11:21 PM

Not

for

Distribution

g, rela

income (UBincome (U

and, 121and, 121

gnation and ption and

eases, since 1es, since 1

k in human–mk in human–m

argument, argument

World EconoWorld Econo

World WiWorld W

WorshWors

M 9780525558613_Human_TX.indd 337 8/7/19 11:21 PM

Not

for

Distribution

9780525558613_Human_TX.indd 338 8/7/19 11:21 PM

Not

for

Distribution