I’ve been interested in AI risk for a while and my confidence in its seriousness has increased over time, but I’ve generally harbored some hesitation about believing some combination of short-ish AI timelines and high risk levels. In this post I’ll introspect on what comes out when I try to expand on reasons for this hesitation and categorize the reasons into seeming (likely) unjustified vs. potentially justified.
I use justified as “should affect my credence in AI risk levels, timelines, etc.” and unjustified as the opposite. These categorizations are very tentative: I could imagine myself changing my mind about several considerations.
I also describe my current overall attitude toward the importance of AI risk given these considerations.
I have somewhat contrarian instincts and enjoy debating, playing devil’s advocate, etc. It feels boring to agree with the 80,000 Hours ranking of AI risk as the most important problem; it would feel more fun to come up with a contrarian take and try to flesh out the arguments and get people on board. But this doesn’t mean that the contrarian take is right; in fact, given my beliefs about how talented EAs are I should expect the current take to be more likely than the contrarian one before looking into it.
Desire for kids
I’ve always enjoyed spending time with kids and as such have likely wanted to have kids for as long as I can remember. It’s hard for me to grapple with the idea that my kids’ most likely reason to die young would be AI risk, and perhaps not even close. I’ve become more hesitant about my desire to have kids due to a high perceived risk level and also potential reduced productivity effects during a very important period; I’d want to be able to spend a lot of time with my kids and not treat them as a second priority to my work. This has been tough to swallow.
Uncomfortable about implications for EA
I got into EA via Doing Good Better and was originally excited about the opportunity to clearly save many lives throughout my career. I went vegan due to animal welfare concerns and still feel a lot of intuitive sympathy for the huge amounts of suffering many humans and animals are currently going through. It feels a bit sad to me that as my beliefs have been evolving it’s been hard to deny that there’s a decent chance that AI safety and things that feed into it (e.g. movement building, rationality/epistemics improvement, grantmaking, etc.) have a much higher EV than other activities all else equal.
I might feel more at peace if my beliefs implied higher levels of variance in what the most impactful activities were. Having a relatively diverse and inclusive movement feels important and more fun than one where the most talented people are mostly funneled into the same few activities. Doubly so compared to a focus on AI that feels weird to many people and could be badly mistaken given our level of understanding. And I’d still be reluctant to encourage people who feel very passionate about what they do and are doing useful things to switch to working on AI safety.
But it might just be a fact about the world that AI safety is by a substantial amount the most important cause area, and this is actually consistent with original EA arguments about unexpectedly high differences in impact between cause areas. And how I feel about this fact shouldn’t influence whether I believe it’s true.
Feelings about AI risk figures
I admire Eliezer in a lot of ways but I find it hard to get through his writing given his drawn out style, and he seems overly bombastic to me at times. I haven’t read the sequences though I might at some point. I haven’t gotten past Chapter 1 of HPMOR and probably don’t intend to. And his beliefs about animal suffering seem pretty crazy to me. But my feelings about Eliezer don’t affect how strong the object-level arguments are for AI posing an existential risk.
Worries about bias towards AI and lack of AI expertise
I was pretty interested in machine learning before I found out about EA. This made me suspicious when I started to seriously believe that AI risk was the most important cause area; wasn’t this a bit too fishy? I projected these worries onto others as well, like: isn’t it a coincidence that people who love math concluded that the best way to save the world is by thinking about fun math stuff all day?
Reflection has made me less concerned about this because I realized that I have opposing hesitations depending on if the person worried about AI risk was an AI expert or not. If they were an AI expert, I have the worry described above that the conclusion was too convenient. But for people worried about AI who aren’t AI experts, I had the worry that they didn’t know enough to be worried! So either way I was coming up with a justification to be hesitant. See also Caution on Bias Arguments.
I think there would be more reason for concern if those concerned about AI risk were overwhelmingly either AI experts or AI novices, but in fact it seems like a healthy mix to me (e.g. Stuart Russell is an expert, most of the 80,000 hours team are novices). Given this and my opposing intuitions depending on the advocator, I think these reasons for hesitancy aren’t much of a concern.
EDIT: As pointed out in this comment, it's possible that both experts and novices are biased towards AI because they find it cool/fun.
Doomsaying can’t be vindicated
I’m a competitive guy, and I really like the feeling of being right/vindicated (“I told you so!”). I don’t like the opposite feeling of losing, being wrong and embarrassed, etc. And to a first approximation, doomsaying can’t be vindicated, it can only be embarrassed! In this way I admire MIRI for sticking their neck out with relatively short timelines and high p(doom) with a fast takeoff. They only have the potential to be embarrassed; if they’re right we’ll likely all drop dead with approximately no time for “I told you so!”.
We have no idea what we’re doing
While “no idea” is a hyperbole, recent discussions such as the MIRI conversations have highlighted deep disagreements about the trajectory of AI and which approaches seem promising as a result. Predicting the future seems really hard, and technological predictions are often too aggressive. It’s likely we’ll look back on work we’re doing 20 years from now and think it was very misguided, similar to how we might look at lots of work 20 years ago. But note that this unpredictability can cut both ways; it might be hard to rule out short timelines and some past technological predictions may have been too conservative.
Note that this could also potentially point toward “figuring out what we’re doing” rather than deprioritizing AI risk, depending on views on just how hard it is to figure out what we’re doing. This is basically my current take though I think “trying to actually do stuff” should be a large part of the portfolio of figuring things out.
Many smart people disagree
But have they engaged with the arguments? points out in the context of AI risk:
The upshot here seems to be that when a lot of people disagree with the experts on some issue, one should often give a lot of weight to the popular disagreement, even when one is among the experts and the people's objections sound insane. Epistemic humility can demand more than deference in the face of peer disagreement: it can demand deference in the face of disagreement from one's epistemic inferiors, as long as they're numerous. They haven't engaged with the arguments, but there is information to be extracted from the very fact that they haven't bothered engaging with them.
I think this is a legitimate concern and enjoy efforts to seek out and flesh out opinions of generally reasonable people and/or AI experts who think AI risk is misguided. This may be a case where steelmanning is particularly useful. Recent efforts in this direction include Transcripts of interviews with AI researchers and Why EAs are Skeptical about AI Safety.
But I think at a certain point you need to take a stand, and overly modest epistemology has its downsides. I also have the intuition that oftentimes if you want to have a big impact, at some point you have to be willing to follow arguments you believe in even if they’re disputed by many reasonable people. You have to accept the possibility you might be badly mistaken and make the bet.
Expected value of the future
This is a concern with a brand of longtermism in general rather than AI specifically, and note that it might push toward working on AI from more of a suffering-focused perspective (or even mostly doing standard AI risk stuff depending on how much overlap there is) rather than deprioritizing AI stuff.
But I do have some unresolved uncertainties about the expected value of the future; it seems fairly unclear to me though still positive if I had to guess. I’m planning on spending more time thinking about this at some point but for now will just link to some relevant posts here, here, and here. Also related is Holden’s suggestion to explore how we should value long-run outcomes relative to each other.
Bias toward religion-like stories
My concerns are broadly similar to the ones described in this post: it seems like concerns about AI risk follow similar patterns to some religions/cults: AI is coming soon and we’ll probably either enter a ~utopia or all die within our lifetimes, depending on what actions we take.
I don’t think we should update too much on this (the replies to the post above are worth reading and generally convincing imo) but it seems useful to keep in mind. Lots of very impactful groups (e.g. startups) also have some features of cults/religions, so again I feel at some point one has to take a stand on the object-level issues based on their best guess.
Track record of AI risk figures
There are at least some data points of Eliezer being overconfident in the past about technological timelines, which should maybe cause us to downweight his specific assessments a little. Though he has also been fairly right on the general shape of the problem and way ahead of everyone else, so we also need to take that into account.
Not sure I have much more to add here besides linking this more comprehensive post and comment section.
My best guess is the high-level argument of the form “We could in principle create AI more intelligent than us, it seems fairly likely it will happen this century, and creating agents more intelligent us would be a really big deal and could lead to very good or bad outcomes” similar to the one described here is basically right and alone implies that AI is an extremely important technology to pay attention to. This plus instrumental convergence plus the orthogonality thesis seem sufficient to make AI the biggest existential risk we know of by a substantial margin.
Over time I’ve become more confident that some of my hesitations are basically unjustified and the others seem more like points for further research than reasons to not treat AI risk as the most important problem. I’d be excited for further discussion and research on some of the potentially justified hesitations, in particular: improving and clarifying our epistemic state, seeking out and better understanding opinions of reasonable people who disagree, and the expected value of the future.
Thanks to Miranda Zhang for feedback and discussion. Messy personal stuff that affected my cause prioritization (or: how I started to care about AI safety) vaguely inspired me to write this.
Comment on this post on the EA Forum.
Something like, >50% of AGI/TAI/APS-AI within 30 years ↩︎
Say, >15% chance of existential catastrophe this century ↩︎
I forget if I actually realized this myself or I first saw someone else make this point, maybe Rob Wiblin on Twitter? ↩︎
I’m actually a bit confused about this though; I wonder how useful MIRI considers its work from 20 years ago to be? ↩︎