Artificial Intelligence Safety And Security Independent Study Week 2

Last week was a good start, and our exploration rate was fairly high. This week we are starting with a review of AI safety literature, which will serve as a reference for the rest of the semester. We will address shortcomings in this paper and find relevant recent developments that were not around at the time of writing.

Week 2 Monday

"Responses to Catastrophic AGI Risk: a survey"
by Kaj Sotala1 and Roman V Yampolskiy

This paper is very long, I'm just reading over it without taking notes today.

Week 2 Wednesday

"Darwain Among the Machines"
by Samuel Butler (signed "Cellarius", as I will refer to them as) in 1867

Original Scan

This was cited in "Responses to Catastrophic AGI Risk" as one of the earliest instances of someone advocating action to prevent the overpowering of humans by machines. Cellarius advocates the complete disassembly of all machinery and the end of the creation of machines altogether, in order to preserve humanity in the long run. He cites the ever increasing abilities of machines, and the use of machines to create machines, as reasons to believe the machine reproduction and conciousness will happen eventually.

There's really no ground to critisize this article, but it does lay the foundation for a great amount of misunderstanding (in my opinion) about the nature of minds, so it's helpful to understand why these points are flawed.

They make a familiar argument about the "progress" from non-living, so basic life, to plants and animals and finally to humans, and our mechanical creations that evolve faster than any living thing ever could. H.G. Wells's writings make me doubt that continued progress is as guaranteed as scientific thinkers generally claim it to be, and "The Patterning Instinct" makes me weary around attributing "upward and accelerating without bounds" to any process. Also, the vegetable kingdom is still around despite the grand success at animals populating the Earth, which is a very different case from the outcomes expected in a superintelligence event. I need to be aware of my scientific tendency to wish for linear continuations even in very complex and non-linear systems.

If I'm understanding correctly, they argue that the decreasing size from large, cumbersome clocks to small, intricate watches is similar to the progression from large land animals to "highly organised" mammals we see today, like humans. Miniaturization is by no means a universal sign of progress, and the opposite seems to be true in nature: viruses are inferior to bacteria are inferior to insects are inferior to mammals. Maybe I'm misunderstanding them, the dialect the article is written in is a bit strange to my eyes.

They then go on about how machines will be greater than humans due to lack of harmful emotions and impulses: "Their minds will be in a state of perpetual calm, the contentment of a spirit that knows no wants, is disturbed by no regrets." This is a view that is still around today. Very simple machines, like those around in Cellarius's time, certainly lack emotions but can still be attributed desires. For instance, when understanding a thermostat it is very convenient to talk about the thermostat "desiring" to keep the room at a steady temperature.

As we have built more complex AI for solving problems, more "soft" and emotion-like approaches have been very fruitful, such as the use of a curiosity metric to aid in playing video games. A future AGI is likely to have many of these soft metrics. As these metric interact in complex ways, they might be best understood as a system of feelings, to anthropomorphise. Anthropomorphizing is very dangerous around AI, but more intelligent system can often be best understood in terms of human cognition. The AI prefers having it's king on the back row. The AI is curious about what's in the rest of the level. The AI wants to collect coins. A "perfect" utility maximization agent wouldn't need to have any of these biases, but many practical designs for AI have organic qualities.

Another issue is the assumption that machines will be more powerful than us and without emotion. If a machine "knows no wants", then what actions would it ever take in the world? I believe Cellarius has an image in their mind of machines similar to humans, filling humans roles out of duty and good without explicit goals.

Next he makes a very dangerous claim, that has only recently been refutes (TODO when did we realize this was terrible?): "man will have become to the machine what the horse and the dog are to man. He will continue to exist, nay even to improve, and will be probably better off in his state of domestication under the beneficent rule of the machines than he is in his present wild state."

This is an idea to very extremely weary of. Cellarius draws the analogy between horse-human and human-machine to indicate that humans will still exist under machine rule. Today, horse are mostly an eccentric luxury and rarely used for practical needs (except for by the Amish and other cultures without extensive machinery). If the horse-human human-machine analogy is true (it's probably not), then what would the future of humankind look like? Would human slaves only be used by "eccentric" machines who would prefer to have humans working their acid mines instead of automated drones? It is possible, under this imagined world, that humans would serve as luxury pets to machines and be treated very well, or at least to the extent that machines understand what's good for humans. Even this bleak future seems to be extraordinary unlikely, as goal-optimizing AI would never need to keep a single live human around once our functionality can be replicated more efficiently. An AI that would keep any humans around in good condition would be a fairly safe AI, and work in AI safety has shown this to be very difficult to do in a foolproof way; it is very unlikely that "evolutionary progress" in machines would organically lead to this outcome. A goal-seeking superintelligence would consume all matter, all living things with it, as soon as was convenient. Humans don't come even close to doing this.

Week 2 Friday

I wrote about "Meditations on Moloch" this day, regarding values of collectives, such as uncoordinated groups of people. Read it here.