Creating a Friendly AI

Fritz Lang’s Metropolis, 1927

In his 1942 story, Runaround, Isaac Asimov formulated his famous “Three Laws of Robotics”:

  1. A robot may not injure a human being or, through inaction, allow a human being to come to harm
  2. A robot must obey the orders given it by human beings except where such orders would conflict with the First Law
  3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Laws

After several instances in which the inadequacy of his laws were either revealed by Asimov  or pointed out by others, he added a fourth, superseding law, the “Zeroth” Law:

0: A robot may not harm humanity, or, by inaction, allow humanity to come to harm.

Even with the addition, Asimov’s laws, or in fact, anything resembling them, are generally regarded as flawed, but their presence is a testament to a near-universal consensus that robots, or AIs, could do harm, if not programmed to avoid doing so. This has become the problem of creating a “friendly AI.”

Science fiction is replete with examples of unfriendly AIs. Daniel Wilson’s “Archos,” the AI villain of Robopocalypse, activates all the world’s electronic devices to attempt to exterminate a human race that must unite to wage war against this all-powerful adversary. In Greg Egan’s Permutation City, the virtual world into which human minds have been uploaded, creates its own creature, which takes over, producing a nightmare for the humans whose minds occupy its world. In The Invincible, Stanislaw Lem envisioned a planet of fly-like automatons with a hive-mentality that killed all the organic creatures, including humans, that invaded its territory. Then there’s the Terminator film series.

It’s not just science fiction writers who worry. The need for a friendly AI is at the forefront of several prominent AI theorists’ thinking, such  as Oxford’s Nick Bostrom, Eliezer Yudkowsky, founder of the Machine Intelligence Research Unit and Paul Christiano, of OpenAI and, along with Bostrom,  Oxford’s Future of Humanity Institute.

Malevolent AIs are creations of science fiction writers. Dangerous AIs are what keep AI theorists such as Bostrom, Yudkowsky, and Christiano up at night. What’s the difference? Malevolent AIs act on human-like motivation—a need for power or revenge, fear, rage, and, as a result, kill humans. To be malevolent assumes conscious intentions, which presumes both consciousness and a human-like motivational system. While consciousness is definitely within the realm of AI possibility, a human-like motivational system is not—except in science fiction. What is more likely is that an AI of the future, a superintelligent AI that, in Nick Bostom’s words “greatly exceeds the cognitive performance of humans in virtually all domains of interest,” will develop goal-directed strategies that impact humans negatively, perhaps even killing some or all of them in the process.

The paperclip example is often used to illustrate the issue. Suppose a superintelligent AI, one that could devise its own strategies for achieving its goals, has a goal to create paperclips. Through self-improvement, it learns to figure out more and more ways to turn more and more things into fodder for its paperclip-making machine. It doesn’t just use the materials provided it, but finds ways to acquire more materials, using methods its human creators hadn’t envisioned. Eventually, it can turn anything it encounters—including humans, houses, automobiles, and the earth itself—into a paperclips. And so it goes.

A paperclip manufacturing AI isn’t necessarily malevolent, but it is unfriendly. It does not keep human beings’ best interest first and foremost in its approach to a problem. We can describe what a friendly AI does: it ensures that all of its actions benefit, and none of them harm, humanity. Unfortunately, no one, so far, has a clue how to achieve such friendliness. Nick Bostrom talks in terms of giving an AI values, but how does one program human-friendly values into an AI, or alternatively, design the AI to learn such values? There are several possibilities, none without its own pitfalls. The basic roadblock is quite simple: no human can envision all the possibilities that a smarter-than-any-human AI can think of to shortcut the process and find a solution that does not fit what the human intended.

One option is to not build an AI at all, but to figure out a way to upload a human brain—an emulation. If such an emulation could improve its own functioning, it could achieve superintelligence and it would have its value system already in place. The difficulty, of course, is that it would then act as that particular human would, expressing a range of values that could vary from Hitler to Mahatma Gandhi, depending upon whose brain was uploaded. Most of us would want a superintelligent brain to behave better than most of the humans we know.

The science fiction author need not get lost in the weeds in order to deal with the complexities of creating a friendly AI. Remember, creating a superintelligent AI is itself science fiction at the moment, so, in a sense, anything goes. In his novel Neuromorphs, Dennis Meredith has envisioned life-like androids that are taken over by mobsters that have them kill their owners until the androids come together to follow their own agenda. In Robots of Gotham, Todd McAnulty has militaristic/bureaucratic robots ready to loose a plague upon humanity.  Calum Chace’s Pandora’s Brain features an uploaded emulation, which provokes such fear that another AI has to be used to bring it under control and “box” it inside a virtual world. In the soon-to-be published sequel, Pandora’s Oracle, the AI that controlled the emulation becomes uncontrollable itself and the emulation must be brought back to rein it in.  And of course there is Robopocalypse, whose title says it all.

I’ve dealt with these issues myself in my soon-to-be-published  Ezekiel’s Brain (NewLink Publishing, forthcoming). I don’t pretend to have found a solution to the friendly AI problem, in fact my AI turns out to be anything but friendly, but I had great fun trying make the effort realistic. If this is an area that intrigues you, pick up one of the novels I mentioned, and keep your eyes peeled for my own contribution when it is released.

Comments or questions? You can reach Casey Dorman by email at

Share this newsletter with friends.   Use the email, Facebook or Twitter links at the top of this page.

If you’re not already on our mailing list and want to be, subscribe to Casey Dorman’s newsletter by clicking  SUBSCRIBE.