Editorials & Other Articles

Celerity

(46,862 posts) Fri Dec 13, 2024, 09:43 PM Dec 13

Assessing the Risk of Takeover Catastrophe from Large Language Models

Seth D. Baum
Global Catastrophic Risk Institute
http://sethbaum.com * http://gcrinstitute.org
Forthcoming in Risk Analysis, DOI:10.1111/risa.14353. This version 3 July 2024.

https://gcrinstitute.org/papers/072_llm-takeover.pdf

Abstract

This article presents a risk analysis of large language models (LLMs), a type of “generative” artificial
intelligence (AI) system that produces text, commonly in response to textual inputs from human users.
The article is specifically focused on the risk of LLMs causing an extreme catastrophe in which they do
something akin to taking over the world and killing everyone. The possibility of LLM takeover
catastrophe has been a major point of public discussion since the recent release of remarkably capable
LLMs such as ChatGPT and GPT-4. This arguably marks the first time when actual AI systems (and
not hypothetical future systems) have sparked concern about takeover catastrophe. The article’s
analysis compares (A) characteristics of AI systems that may be needed for takeover, as identified in
prior theoretical literature on AI takeover risk, with (B) characteristics observed in current LLMs. This
comparison reveals that the capabilities of current LLMs appear to fall well short of what may be
needed for takeover catastrophe. Future LLMs may be similarly incapable due to fundamental
limitations of deep learning algorithms. However, divided expert opinion on deep learning and surprise
capabilities found in current LLMs suggest some risk of takeover catastrophe from future LLMs. LLM
governance should monitor for changes in takeover characteristics and be prepared to proceed more
aggressively if warning signs emerge. Unless and until such signs emerge, more aggressive governance
measures may be unwarranted.

KEYWORDS: Artificial Intelligence, Large Language Models, Catastrophic Risk

1. Introduction

Throughout the history of artificial intelligence (AI), there has been concern about the possibility that
someday, one or more advanced AI systems may conquer their human creators, with potentially
catastrophic results. The basic idea is that the AI system(s) would become more intelligent than
humans, enabling them to outsmart humanity, seize control of the planet, and then cause catastrophe in
the pursuit of some flawed set of goals, potentially even resulting in human extinction (Good, 1965;
Vinge, 1993; Bostrom, 2014; Russell, 2019). The article will refer to this type of event as an AI
takeover catastrophe.

Prior research on AI takeover catastrophe has been largely theoretical, focused on hypothetical
future AI systems. This includes general discussions of the topic (Bostrom, 2014; Russell, 2019) and
risk analyses (Barrett and Baum, 2017; Sotala 2018). Some recent studies analyze the risk of takeover
catastrophe if future advanced AI systems resemble current state-of-the-art systems (Carlsmith, 2023;
Ngo et al., 2023); this work is of a more empirical character, based in part on observations of actual AI
systems. However, all of these studies are future-oriented. They are premised on the idea that some
early attention is warranted due to the paramount importance of AI takeover catastrophe, if and when
such an event were to occur. This work falls broadly within the scope of anticipatory governance
(Guston, 2014).

Now, for arguably the first time ever, there are actual AI systems raising significant concerns about
takeover: large language models (LLMs), a form of “generative” AI that generates text in response to
user queries. Recent LLMs have shown remarkable capabilities across topics spanning perhaps the
entire breadth of contemporary human discourse. Of course, existing LLMs have not yet caused a
takeover catastrophe, but perhaps they still could, or perhaps takeover might come from future LLMs.
Some have expressed concern about LLM takeover catastrophe (Leahy, 2021; FLI, 2023; Yudkowsky,
2023), whereas others have criticized it (Gebru et al., 2023; Marcus, 2023; Kambhampati, 2024). The
matter has attracted public and policy interest, even appearing in a White House press conference
(White House, 2023). Additionally, a document prepared for the United Kingdom AI Safety Summit
considered current LLMs as a possible precursor to future AI systems that could cause takeover
catastrophe (DSIT, 2023a). However, LLM takeover risk has not yet been analyzed in detail; that is the
purpose of this article.

snip