Thesis Week 8 (Spring)

It’s Monday! Starting on time today. Or maybe a minute late, I had to fix a loose wire in my computer bumping against the GPU’s fan. Anyway, what do I need to do? I remember my AI safety section has a lot of TODOs. I’ll dig up all the landmark literature from AI safety and weave it into a few pages of related works.

Work log:


I’ve downloaded the major AI safety works from the 80,000 hours AI safety curriculum. They make having a career seems so easy. Maybe I should do that. But what I really want is to research alife, specifically ISC and “useful” alife which is both theoretically sound and also messy and alive and most importantly physically realizable at any scale. But that’s not what this thesis is about so I should save that for another time.

I’m taking paper notes on each paper as I read/skim each of them.

Oops I downloaded one in a foreign language. Lemme see if I can find it in English.

Also, my focus and alertness are not good today. My sleep schedule has been off from the weekend (and I slept in after waking around 8 today). The adrafinil doesn’t seem to have much of an effect either, but hopefully I’ll have better sleep tonight and give it a better chance to work.

Oh, well for one thing, “Superintelligence: Paths, dangers, strategies” is a book, and secondly I’ve already read it (actually, it’s the source of 90% of my AI safety understanding). I should find a review of it so I can poach the names and concepts.

Lunch time. I got through a few papers. Diving into the sea of all (supposed) knowledge is disorienting, but having this AI safety background built into my paper will be a good thing and will add a lot of content.

Back from lunch, 9 minutes late. I took a little nap at the end, but still don’t feel refreshed. Oh well, I should still slough through the rest of the day. Annie agrees with my sentiment of research reading being boring.

Reading this work puts my work in perspective as “AI safety lite”. This isn’t a bad thing - AI safety is important in the far future, and my work on modern safety with AI is useful now, even if it doesn’t see quite as far or on the same scale.

It’s 2:11 and I don’t think I’ve been terribly productive but I have been working this whole time (with a typical amount of distraction, maybe slightly more susceptibility to distraction). Also my handwriting is really terrible while taking notes, I think my handwriting is only passable when I’m being very patient and when note taking I want brevity and quickness and lack of subtlety so my handwriting suffers.

Anyway that’s enough lit review for today, I’m going to track down citations for these papers and books and rename them to their appropriate locations. I also need to copy in the pdfs from my laptop.

I did was the last 20 minutes setting up ftp to use ssl, but that’s okay. I have my pdfs now. I’m starting to get a headache, I think I might be a bit sleep deprived? Except I didn’t stay up that late. Hmm.


I was 5 minutes late, the walk to work took a bit longer because of the construction. I’m now doing a walk to work (clockwise around the block) and from work (counterclockwise around the block) to make my work environment more real, and also make time for a bare minimum of movement and time outside. Also means I have to dress presentably for work, and I think I’ll keep my shoes on “during work”. The main problem with my work boundaries right now is that my work computer is also my game playing computer (and non-work programming computer). This reduces the workfulness of the space, and I’m not sure how to fix it. I found have another table that I move my monitor to when doing non-work, or set up the room so I can sit on a different side of the table for work vs non-work usage. The fake commute to and from work is fairly effective, though. Also I didn’t have the space or furniture to have two computer desks. I barely have one. And I have way too much stuff right now to be buying furniture. But that’s a home problem, and I need to get to work because I’ve now spend almost 10 minutes blogging about work. I guess that counts as work, but we’re much too meta already.

So what to do today? I have all these new papers, I need to write an AI-safety related works subsection. Also my handwriting is terrible, I shouldn’t have handwritten those notes. Oops. I’ll transcribe them as comments in the .bib file for now.

I’m going to put on some work music, too. I could put Spotify on my work computer but it’s fairly free so putting a DRM music player on here seems wrong. Despite the fact I have non-free Nvidia drivers and two non-free games already installed. Oops. Sorry Richard Stalin.

Oh, first I’ll make a launch-thesis script. That way I can start working every day with the same windows. Not super focused but I need an open-and-closed technical job to do today because I’m pretty worse for wear right now. I was woken up too early by construction.

Okay! Done with that. Took about 25 minutes, I think it was time well spent. That’ll save my 20 seconds every day, for a total, over the remaining 50 some days in my work schedule, that’s minutes saved.

Hmm I’m getting a pretty bad headache. I didn’t sleep that well last night.

Okay so I added AI safety gridworlds, it’s one of my favorite AI safety papers. I also spent 15 minutes making a command to copy the most recent download to a given destination, since sorting through my downloads folder gets cumbersome.

I was actually on a roll there, I think this AI section is going to be pretty good.

Lunch break!

I’m back, 5 minutes early to make up for this morning (or just because I finished eating and doing the dishes and I didn’t know what to do with 5 minutes).

I don’t especially want to work, but I do want this project to be done on time, and that relies on all past me’s having done regular work and all future me’s doing work. Since the past, present, and future are (within the scheduled time) under the same rules, I can’t expect a way to get out of work today that wouldn’t either 1. make it possible to do the same on any other day or 2. create an exception to the rules which will lead a small step closer to (1). This is exactly what Functional Decision Theory is about, recognizing decisions as functions that exist in multiple places in time and space and choosing the right function. For me, I have a (work/don’t work) decision every minute of every day, and I’ve made a function “until 2 weeks before finals, work 10am-12pm and 1-3pm Monday, Tuesday, and Friday, and 10am-12pm on Thursday”. Adding exception to that function right now (don’t work if you have a headache, don’t work if you don’t feel great) will both add those exceptions to the future, allow for the easier addition of more exceptions, and create perverse incentives to evaluate myself as feeling worse so I can not work when I’m lazy. So here I am, suffering a little bit from working in a suboptimal state for the sake of continuity and sanctifying the decision function. I think functional decision theory might be a candidate for replacing all of morality… or at least enhancing our default, true morals with the ability to work in unbounded time horizons and with pseudo-empathy for unlimited living beings.

Anyway, that’s only roughly tangentially related to a topic hardly worth mentioning in a footnote of my thesis, so no instead of being here 5 minutes early it’s effectively like I’m 7 minutes late. Oh well.

Oops I lost my mouse. I get with my keyboard only pro technique there are some downsides. I guess I don’t really need it. I could use an inertial scroll wheel on my keyboard though.

Done for the day, time to walk back home (okay, I am home but the ritual - turn right and walk around the block).


Oops I’m 5 minute late. Anyway, I really didn’t like the adrafinil pills, so I poured one out into some tea because some people drink it like that instead. However it was so extremely bitter (even with a tablespoon of sugar) that I could not drink it. So I poured it out. Apparently it’s palatable with cranberry juice, so I’ll try that for week 10 but for this week I’ll just not take anything today and tomorrow. Maybe an herbal tea, just for the ritual.

Also, not work related, but I got back into writing PyTorch code yesterday, and it was just as easy as I remember. I’m working on a GPU Moveable Feast Machine implementation where the cells are neural networks.

Anyway, I need to get to writing today. I’m going to make evaluating AI safety concerns its own section, to complement the system classification section.

I’m not sure if this is actually a good section to write. Anyone researching AI today is either using the standard tools (not powerful enough to have any hope of exploding) or doing original research in the direction of AGI (and they are perfectly aware of what they’re doing). The target audience of my paper is firmly in the first category - even business that have an “AI ethics committee”.

I just read over the article on facial recognition biases and how use of facial recognition is bad to begin with that Dr. Yampolskiy sent me.

[] (Can the Biases in Facial Recognition Be Fixed; Also, Should They?)

He also sent me his list of movies with AI safety relevant plot lines. Alphaville sounds like an especially interesting one, with suppression of humanity instead of its destruction (I think Bostrom classifies that as a “Shriek”).

Okay so I thought my system was bad but it’s actually coming along well. What I have so far is pretty basic and naive, but lays the groundwork for a more thought out system to be build over it.

\section{AI Safety Evaluation}

In this section, we evaluate AI safety concerns, without being concerned about the details of the
system the AI is being used in. While no current AI systems pose an existential threat, the
possibility of a ``foom" scenario \cite{bostrom2001extinction} is considered seriously as it is
unknown how far we are from this points, and recent advancements (such at GPT-3
\cite{brown2020gpt3}) remind us how rapidly AI capabilities can increase. In this section we
create a schema for classifying the AI itself in terms of autonomy, oversight, and escape potential.


While autonomy is hard to pin down, we all know how to identify it. Or as Supreme Court justice
Potter Stewart said, ``I'll know it when I see it". [TODO this sentence is a joke, remove it and
probably this subsection]

Level 0: Little or no autonomy, the program is explicitly designed and won't act in unexpected ways
except due to bugs, and this unexpected behavior is unintelligent in nature.

Level 1: An optimization process is used, but not one that will attempt to break out of confinement.
Examples of this include gradient descent and other standard machine learning algorithms. Image
classifiers and GPT-3 are expected to be in this class, although there is speculation that with some
prompts GPT-3 could emulate enough of a dangerous mind to attempt a break at confinement. 

Level 2: The program is designed to have agency, and interacts with the environment (even a
simulated one) in a goal-directed way. The includes reinforcement learners and agents learned by
genetic algorithms

Level 3 (danger): A practical understanding of self, and theory of other minds develops

\subsection{Goal Observability}

Software that has very simple functionality (such as a thermostat) has completely comprehensible
goals, to the extent that we hardly consider them goals so much as the system's only functionality.
More complex software may use an optimization process to maximize some objective function (or
minimize a loss function), which provides a goal which is understandable, but the goal seeking
behavior of the agent may not be. 

Level 0: Programs that ``just run" and have no goal seeking behavior

Level 1: Basic goal seeking behavior such as numerical optimzation

Level 2: Complex goal seeking with the ability to seek subgoals and express reward hacking

Level 3 (danger): Ability to realize instrumental goals and seek self improvement

\subsection{Escape Potential}

Level 0: None

Level 1: The agent may find unexpected behaviors which technically achieve the goal but are not

Level 2: The agent intentionally seeks to hack its environment

Level 3 (danger): The agent has the ability and drive to break through most cybersecurity and
containment proceedures


Level 0: Obviosly a machine

Level 1: Some suface level apprearance of humanity (natural language interaction, robot
chassis with human features, 

Level 2: Level 1 but with software to support the illusion of humanity (speech interaction,
human-ish actuated movements)

Level 3 (danger): The AI can be mistaken for a human even with unrestricted communication (may be
text-only, voice, or in person)

Hmm that didn’t format exactly right. Any way, I ended up doing some actual work today! But it’s quittin’ time (20 minutes past, actually). Bye!


I think I had a dream that I configured this website to use Pandoc for the markdown formatting. Weird. Anyway I’m here totally on time today! Although I do have a bagel in the toaster I’m going to go and prepare and eat here in a minute.

Well that was a solid 10 minute procrastinate.

I’ve added a little bit but I’m just not working that well today. I’ll read some Perrow instead to make the best of this unfocused day.

So far, “Normal Accident Theory” is interesting but not especially relevant to my thesis as it focuses so heavily on nuclear power plants. They make decent metaphors for use cases of AI, but only in a very abstract way.

Lunch Break

I’m back. I think I’m going to work on the blog code rendering issue, because I am having trouble focusing on work and reading Perrow doesn’t feel especially productive.

Hey I did it! I set up the site to run on my desktop (sbcl, quicklisp) and had to make some changes (for some reason the string library was being more strict about types it would accept). Then I changed the bit of code that used cl-markdown to instead use a system call to pandoc, and fixed a bug I had with the header chomping eating up a bunch more text than it was supposed to and everything worked! For doing software maintenance on something I made like 2 or 3 years ago, that was surprisingly fast and easy. I guess lisp is just magic like that.

Oof okay I still need to work. I don’t know what to write. I guess I could start on another case study, maybe a fictional one.

Well HAL-9000 turned out to be pretty good for analyzing in my system. The AI safety section was especially applicable because Space Odyssey 2001 is mostly about an aligned issue of a human-level AI.

Time to spellcheck, checkout and push changes and I’ll be done for the week. And a quick status update to Dr. Yampolskiy.