Brief thoughts on forecasting/epistemics interventions

All views are my own rather than those of any organizations/groups that I’m affiliated with. Trying to share my current views relatively bluntly. Note that I am often cynical about things I’m involved in. Thanks to Adam Binks for feedback. Crossposted to EA Forum comment (comment is easier to read on mobile).

Edit: I think the grantmaking program has different scope than I was expecting; see this comment by Benjamin for more.

Following the EA Forum comments skeptical of the value of the new OpenPhil forecasting grantmaking program, I figured it might be useful to quickly write up some personal takes on forecasting’s promise and what subareas I’m most excited about (where “forecasting” is defined as things I would expect to be in the scope of OpenPhil’s program to fund).

  1. Overall, most forecasting grants that OP has made seem much lower EV than the AI safety grants (I’m not counting grants that seem more AI-y than forecasting-y, e.g. Epoch, and I believe these wouldn’t be covered by the new grantmaking program). Due to my ASI timelines (10th percentile ~2027, median ~late 2030s), I’m most excited about forecasting grants that are closely related to AI, though I’m not super confident that no non-AI related ones are above the bar.
  2. I generally agree with the view that I’ve heard repeated a few times that EAs significantly overrate forecasting as a cause area, while the rest of the world significantly underrates it.
    1. I think EAs often overrate superforecasters’ opinions, they’re not magic. A lot of superforecasters aren’t great (at general reasoning, but even at geopolitical forecasting), there’s plenty of variation in quality.
      1. General quality: Becoming a superforecaster selects for some level of intelligence, open-mindedness, and intuitive forecasting sense among the small group of people who actually make 100 forecasts on GJOpen. There are tons of people (e.g. I’d guess very roughly 30-60% of AI safety full-time employees?) who would become superforecasters if they bothered to put in the time.
        1. Some background: as I’ve written previously I’m intuitively skeptical of the benefits of large amounts of forecasting practice (i.e. would guess strong diminishing returns).
      2. Specialties / domain expertise: Contra a caricturized “superforecasters are the best at any forecasting questions” view, consider a grantmaker deciding whether to fund an organization. They are, whether explicitly or implicitly, forecasting a distribution of outcomes for the grant. But I’d guess most would agree that superforecasters would do significantly worse than grantmakers at this “forecasting question”. A similar argument could be made for many intellectual jobs, which could be framed as forecasting. The question on whether superforecasters are relatively better isn’t “Is this task answering a forecasting question“ but rather “What are the specific attributes of this forecasting question”.
        1. Some people seem to think that the key difference between questions superforecasters are good at vs. smart domain experts are in questions that are *resolvable* or *short-term*. I tend to think that the main differences are along the axes of *domain-specificity* and *complexity*, though these are of course correlated with the other axes. Superforecasters are selected for being relatively good at short-term, often geopolitical questions.
        2. As I’ve written previously: It varies based on the question/domain how much domain expertise matters, but ultimately I expect reasonable domain experts to make better forecasts than reasonable generalists in many domains.
          1. There’s an extreme here where e.g. forecasting what the best chess move is obviously better done by chess experts rather than superforecasters.
          2. So if we think of a spectrum from geopolitics to chess, it’s very unclear to me where things like long-term AI forecasts land.
          3. This intuition seems to be consistent with the lack of quality existing evidence described in Arb’s report (which debunked the “superforecasters beat intelligence experts without classified information” claim!).
    2. Similarly, I’m skeptical of the straw rationalist view that highly liquid well-run prediction markets would be an insane societal boon, rather than a more moderate-large one (hard to operationalize, hope you get the vibe). See here for related takes. This might change with superhuman AI forecasters though, whose “time” might be more plentiful.
  3. Historically, OP-funded forecasting platforms (Metaculus, INFER) seem to be underwhelming on publicly observable impact per dollar (in terms of usefulness for important decision-makers, user activity, rationale quality, etc.). Maybe some private influence over decision-makers makes up for it, but I’m pretty skeptical.
    1. Tbh, it’s not clear that these and other platforms currently provide more value to the world than the opportunity cost of the people who spend time on them. e.g. I was somewhat addicted to Metaculus then later Manifold for a bit and spent more time on these than I would reflectively endorse (though it’s plausible that they were mostly replacing something worse like social media). I resonate with some of the comments on the EA Forum post mentioning that it’s a very nerd-sniping activity; forecasting to move up a leaderboard (esp. w/quick-resolving questions) is quite addicting to me compared to normal work activities.
  4. I’ve heard arguments that getting superforecasted probabilities on things is good because they’re more legible/credible because they’re “backed by science”. I don’t have an airtight argument against this, but it feels slimy to me due to my beliefs above about superforecaster quality.
  5. Regarding whether forecasting orgs should try to make money, I’m in favor of pushing in that direction as a signal of actually providing value, though it’s of course a balance re: the incentives there and will depend on the org strategy.
  6. The types of forecasting grants I’d feel most excited about atm are, roughly ordered, and without a claim that any are above OpenPhil’s GCR bar (and definitely not exhaustive, and biased toward things I’ve thought about recently):
    1. Making AI products for forecasting/epistemics in the vein of FutureSearch and Elicit. I’m also interested in more lightweight forecasting/epistemic assistants.
      1. FutureSearch and systems in recent papers are already pretty good at forecasting, and I expect substantial improvements soon with next-gen models.
      2. I’m excited about making AIs push toward what's true rather than what sounds right at first glance or is pushed by powerful actors.
      3. However, even if we have good forecasting/epistemics AIs, I’m worried that it won’t convince people of the truth since people are irrational and often variance in their beliefs is explained by gaining status/power, vibes, social circles, etc. It seems especially hard to change people’s minds on very tribal things, which seem correlated with the most important beliefs to change.
        1. AI friends might actually be more important than AI forecasters for epistemics, but that doesn’t mean AI forecasters are useless.
      4. I might think/write more about this soon. See also Lukas's Epistemics Project Ideas and ACX on AI for forecasting
    2. Judgmental forecasting of AI threat models, risks, etc. involving a mix of people who have AI / dangerous domain expertise and/or very strong forecasting track record (>90th percentile superforecaster), ideally as many people as possible who have both. Not sure how helpful it will be but it seems maybe worth more people trying.
      1. In particular, forecasting that can help inform risk assessment / RSPs seems like a great thing to try. See also discussion of the Delphi technique in the context of AGI risk assessment here. Malcolm Murray at GovAI is running a Delphi study to get estimates of likelihood and impact of various AI risks from experts.
      2. This is related to a broader class of interventions that might look somewhat like a “structured review process” in which one would take an in-depth threat modeling report and have various people review and contribute their own forecasts in addition to qualitative feedback. My sense is that when superforecasters reviewed Joe Carlsmith’s p(doom) forecast in a similar vein that the result wasn’t that useful, but the exercise could plausibly be more useful with better quality reviews/forecasts. It’s unclear whether this would be a good use of resources above the usual ad-hoc/non-forecasting review process, but might be worth trying more.
    3. Forecasting tournaments on AI questions with large prize pools: I think these historically have been meh (e.g. the Metaculus one attracted few forecasters, wasn’t fun to forecast on (for me at least), and I’d guess significantly improved ~no important decisions), but I think it’s plausible things could go better now as AIs are much more capable, there are many more interesting and maybe important things to predict, etc.
    4. Crafting forecasting questions that are cruxy on threat models / intervention prioritization between folks working on AI safety
      1. It’s kind of wild that there has been so little success on this front. See frustrations from Alex Turner “I think it's not a coincidence that many of the "canonical alignment ideas" somehow don't make any testable predictions until AI takeoff has begun.” I worry that this will take a bunch of effort and not get very far (see Paul/Eliezer finding only a somewhat related bet re: their takeoff speeds disagreement), but it seems worth giving a more thorough shot with different participants.
      2. I’m relatively excited about doing things within the AI safety group rather than between this group and others (e.g. superforecasters) because I expect the results might be more actionable for AI safety people. (edit: I got feedback that this bullet was too tribal and I think that might be right, though it's possible that within the group just is more promising/actionable on the margin; another possible distinction that might better is preferring inclusion of people who've thought deeply about AI to generalists)

I incorporated some snippets of a reflections section from a previous forecasting retrospective above, but there’s a little that I didn’t include if you’re inclined to check it out.

Edit: copying in a follow-up comment I wrote on the forum:

Just chatted with @Ozzie Gooen about this and will hopefully release audio soon. I probably overstated a few things / gave a false impression of confidence in the parent in a few places (e.g., my tone was probably a little too harsh on non-AI-specific projects); hopefully the audio convo will give a more nuanced sense of my views. I'm also very interested in criticisms of my views and others sharing competing viewpoints.

Also want to emphasize the clarifications from my reply to Ozzie:

  1. While I think it's valuable to share thoughts about the value of different types of work candidly, I am very appreciative of both people working on forecasting projects and grantmakers in the space for their work trying to make the world a better place (and am friendly with many of them). As I maybe should have made more obvious, I am myself affiliated with Samotsvety Forecasting, and Sage which has done several forecasting projects (and am for the most part more pessimistic about forecasting than others in these groups/orgs). And I'm also doing AI forecasting research atm, though not the type that would be covered under the grantmaking program.
  2. I'm not trying to claim with significant confidence that this program shouldn't exist. I am trying to share my current views on the value of previous forecasting grants and the areas that seem most promising to me going forward. I'm also open to changing my mind on lots of this!

Edit 2: Here's the podcast.