Episode 312: Could an international agreement protect us from dangerous AI? (with Malo Bourgon)

Enjoying the episode? Want to listen later? Subscribe on any of these apps or stores to be notified when we release new episodes:

Listen on

Apple Podcasts

May 23, 2026

What are the world’s leading AI companies actually trying to build when they talk about superintelligence? Is the goal merely better chatbots, or systems that could outperform all humans across every cognitive task? Why would such a system be so alluring if it could accelerate medicine, science, education, abundance, and human flourishing? Why would it also create an unprecedented concentration of power for whoever controlled it? If intelligence includes not only abstract reasoning but persuasion, strategy, manipulation, planning, and technological invention, what happens when those capacities are automated at superhuman scale? How seriously should we take AI CEOs when they say the technology could go catastrophically wrong, and how should we interpret the tension between their public concern and their continued participation in the race? If we cannot reliably inspect their goals, motives, reasoning, or learned objectives, how could we know whether apparent obedience is real safety or just surface behavior? Even if alignment were solved, who should be trusted to steer a superintelligence? Could compute governance, chip tracking, training thresholds, inspections, and a US-China agreement buy time before the frontier moves further? What do nuclear weapons, nuclear power, chemical weapons, and germline engineering teach us about the possibility and limits of technological restraint? Is resignation itself part of the danger, and could a credible movement for coordination make a saner future more possible? averages? And when injustice affects both men and women differently, what framework avoids turning that into a zero-sum argument?

Links:

MIRI

Malo Bourgon leads the Machine Intelligence Research Institute. Before becoming CEO, Malo served as a program management analyst and then as COO, helping implement many of MIRI’s current systems, processes, and program activities. Malo joined MIRI in 2012 shortly after completing a master’s degree in engineering at the University of Guelph.

SPENCER: Malo, welcome to the Clearer Thinking Podcast.

MALO: Hey, man, thanks for having me.

SPENCER: What is it that the biggest AI companies claim they're trying to build?

MALO: I think most of them are on record saying that they're trying to build superintelligence. We could talk about what that means, but some of them don't use that word specifically. Many of them do, but I think that's the explicit goal. The thing they're doing now — chatbots–is just a stepping stone to that kind of ultimate goal.

SPENCER: Yeah. So what do you mean by superintelligence?

MALO: It's one of these things where a bunch of people want a crisp definition, and I don't think there's one standard.

SPENCER: What do they mean? What are they trying to build?

MALO: Yeah, I think the thing that I hear when they talk about it is — and it's similar to a definition that I have — something like an AI system that would be at least better at every cognitive task than any human, likely every human combined. When I say that, I think sometimes people think of, I don't know, maybe Einstein, but for everything, and I think that kind of still undersells the gap. They're not talking about, and certainly what I'm thinking about, I'm not talking about Einstein relative to the average human. For everything, I'm thinking more of the average human than a mouse or an ant or something, in terms of how intelligent and how capable they are.

SPENCER: Now, why would they try to build that? Because you might just think, "Oh, well, if you want to make a trillion dollars, you don't actually need something that's so intelligent that it makes every human seem like a mouse on every cognitive domain."

MALO: I think the allure, there's a bunch of maybe Machiavellian motivations you might have, but I think it would be great if at one point we built a superintelligence that we could actually properly steer to positive ends. The reason I think that would be useful is because I kind of look at the arc of human history, and it seems like almost all of the greatest positive changes in human welfare have been the product of us applying our intelligence to developing new technologies and solving problems. If we could create an AI system that was radically smarter than us, it would be able to help with a bunch of things that we would love for it to help us with. It could kind of usher in a scale of abundance and flourishing. I think that would not be possible without that. And so I think that's the promise if we could build a superintelligence and steer it and control it in the right way.

SPENCER: It's obviously hard to know what's going on in people's minds, but do you think that that's what's motivating the CEOs of companies like Anthropic and OpenAI and Google's AI division?

MALO: I try pretty hard not to pretend I know what's going on inside people's heads. I think it's worth noting that many of them have written about this stuff long before they were billionaires running some of the biggest, most powerful companies in the world trying to do this. I think they are, at least on record, to some extent, being like, "Yeah, we think that building this would be enormously useful and good for humanity." That's one of the motivations. But again, I don't know what's in their heart. There are certainly a bunch of cynical perspectives you could have about why you'd want to have such a powerful technology and potentially be the one controlling it.

SPENCER: Without accusing anyone of anything, what would be the cynical explanation?

MALO: I think that if you could build a superintelligence and be the one who retained the ability to control it and steer it, there's a whole kind of side thing of, does it even make sense to say that you can control a superintelligence? But setting that aside, certainly it would make you enormously powerful. There's a certain concentration of power concern separate from even the existential risk concerns that I spend a lot of my time thinking about. To be the one wielding a superintelligence would certainly allow you to have enormous power in the world. I think that's probably something that could be attractive to certain people. I have no idea if that's something that's attractive to people building the AI companies. I could also imagine it being very attractive to states who would want to be able to leverage that kind of power.

SPENCER: Do you think that if it was actually achieved, it would give someone literally the power to control the world? Or do you think that's a crazy exaggeration?

MALO: No, I think there's a lot of kind of, what are the details? What exactly are you talking about? I do think that in a world in which someone had something like the superintelligence that we're talking about, that would be as smart and capable as we are relative to ants or something, that would allow them to develop technologies that we would have a hard time imagining now, and do a bunch of things at scale that currently are just not possible, that would allow them to, in principle, probably have a decisive advantage over all other actors in the world.

SPENCER: It makes me think about when people with greater weapons technology have encountered people without that technology. It's not even a fair fight. It's kind of ridiculous. If one group has guns and one group has swords, the group with sword is always going to lose. Just like the advantage that superior technology gives you, if you actually had, I think one CEO was quoted saying, a country of geniuses in a data center, or something even way more powerful than that, the possibility that that could create technology so next level that you're just on a whole other playing field seems at least plausible.

MALO: Yeah. So, I think there's a question of what technologies could you have and how could you leverage them? I do like the analogy. There are historical examples during the era when Europeans were colonizing the rest of the world, where they would show up with guns to people who had never seen guns before. You could imagine being part of one of those native tribes and thinking, "Well, that boat, we don't know how to build something that big, but surely there can't be that many people on that boat, we could take them." Then they show up and point some weird stick at you, and it makes a sound, and all of a sudden, you fall over dead and have a hole in you, and you're like, "What the hell was that?" It kind of doesn't matter how many spears you have; from a distance, people could just shoot you. So there's something like, what could you do with very powerful technologies? We can get into speculating about what those would look like. I think it's also the case that a superintelligence would be good at things like strategy, persuasion, and manipulation. There's a lot of thinking about how you would shape the world, how you would manipulate and work with other actors in the world to cause them to do the things you wanted to do. I don't think it necessarily has to all be framed as how you would leverage powerful technologies to dominate people, but also how could you just steer the world if you were that much smarter, that much more capable, with the ability to process more information more quickly than anyone else at any time, that kind of stuff.

SPENCER: Yeah. It seems billionaires who want to already have a decent ability to shape the world in some ways. A billionaire can hire people to do things, but they don't have a data center full of a million geniuses who will do exactly what they they're told for as long as they're told. Or a million super geniuses, much smarter than any human that's ever lived in every cognitive domain. It just feels like a totally different thing.

MALO: Yeah, it feels like a totally different thing to me. One way that I often think about it is a lot of times people's ability, or companies' abilities, or countries' abilities to do things are bottlenecked on human capacity to do a certain type of work, to execute a certain type of task, to do things in the world. If, in some sense, you did not have that bottleneck, and the thing that was doing that work was enormously more intelligent and capable, there's just a lot that you can get up to in the world that is simply not possible right now. I feel a very deep respect for the power of intelligence and how much it can accomplish. It's really hard to imagine that we're close to some peak of what we're capable of in terms of understanding the physical world, understanding how we can manipulate it, understanding how we can get a better grasp on social structures and how they work, and influence them in a variety of ways. How you could do that at scale when you don't have the limitations that humans do of processing information and working memory, and you could parallelize that across a country of geniuses, or super geniuses, or super intelligences in a data center.

SPENCER: This actually raises one objection that people sometimes have, which is that they say, "Well, if you look at the world today, it's not like the smartest people run things." Yes, smart people might have some more ability to influence the world, but often, who's in power are people that are persuasive or people that are effective in certain ways, or good at getting attention, or all kinds of things. So why should we expect that intelligence really gets you that far, rather than all the other things that seem to matter in the world?

MALO: One of my answers to that is that different people mean different things by "intelligence." Some people mean the book-smart thing. Often, when I'm talking about this, I like to set that word aside. When I'm discussing what AGI or superintelligence is, or what the goal of the field of AI has been from the beginning, it was to get computers to do the thinking that humans do. We use intelligence or smart to compress that concept. I think being persuasive is another capability or skill that I would bundle in some sense. You can get better at that. Whether you want to call that intelligence or something else, there are different categories. I don't see a reason to believe that some superintelligent AI system couldn't also be super persuasive or capable in other ways. It's convenient to bundle that into one term, and oftentimes, as a shorthand, I'll use intelligence, but I like to make that separation. Sure, maybe in humans, for some reason, sometimes being very charismatic is uncorrelated with being particularly good at solving novel math proofs or developing new technologies or being an engineer. That certainly doesn't seem to me like it has to be the case for any system that is capable of accomplishing goals in the world.

SPENCER: This might be a thing where human intelligence works a certain way, and so our instincts around it are obviously very mapped around human intelligence. That's what we have experience with. But AI intelligence can just be very different. A good example of this is with IQ tests. With IQ tests, you find this peculiar phenomenon where you can give people math problems or spelling problems or memory problems, and you find a substantial correlation between them. People who are good at some of these thinking tasks tend to be, on average, better than others. That's not a perfect correlation, but those people also tend not to be the best at marketing jobs, for example. Maybe there's some gains to having a higher IQ and being good at marketing, but it doesn't seem like the main thing. It seems that many incredible marketers are very persuasive. Or people who are charismatic, they're not necessarily the super high IQ people; they just have different skills. If you're training an AI system to be good at any kind of task that you can provide training data for, suddenly, solving IQ problems looks a lot like solving persuasion because it's about giving general-purpose learning algorithms and lots of training data.

MALO: I think we did talk about decomposition, and there might be more people who are emotionally intelligent in some way. There's a definition of that term that points at the difference between being charismatic and persuasive and empathetic in a way that helps you work with people, which another form of what they think of as raw intelligence doesn't have. I do notice a trend that for a certain type of high-g people, certain things might come easier to them than others. To the extent that they decide to try to become better at something, they might not be a natural at everything, but they can apply that G to become better at it. I've seen people who were awkward become more charismatic by putting attention toward trying to be better interpersonally, even if it doesn't come naturally to them. I think there's how human minds are constructed and what the trade-offs are there. That doesn't have to hold in AI systems. We can train them in different ways to have a variety of skills that are all correlated with this, and then separately, to the extent that you have some kind of general capability, you can apply it to become better at a variety of things, even if you didn't start with a strong ability there.

SPENCER: If we look at AI systems from maybe six years ago, it looked a lot like it was very difficult to get a system to be good at lots of things. You could train these narrow models, and they got really good at one task, and that was really cool. Then suddenly we figured out, "Wait, there are actually ways to train them to be good at lots of things simultaneously." Now you've got all these systems that can make images for you, they can write poetry, they can fill out forms, they can do legal work. It's almost incredible, all the different things they can do. Now it starts to look more like we can think of intelligence as this kind of general-purpose phenomenon, rather than this narrow phenomenon.

MALO: This makes me want to go back and say something like the term intelligence, the term artificial intelligence, has been on a journey. When the field was founded, the goal was to make computers do the thinking that we were doing. They were very much thinking about creating something like what we call AGI today, and they were overly ambitious, or at least optimistic about how easy that would be. At Dartmouth, they thought that a few grad students working for a summer could probably make a lot of progress on that. It turned out that 70 years later, they were still struggling, and in the meantime, they did a bunch of stuff that automated particular cognitive tasks or physical tasks, and that's what AI came to mean. In some sense, I think the field was always, at least at the beginning, pointed at this general property, and people who still cared about that wanted a term to refer to that, which is how we got artificial general intelligence as a more specific term to mean this general thing where, when we look at humans, they can learn one thing in one domain and apply it to a different domain. They're not just good at one particular thing. They can be good at a variety of things, and they can improve at those things simultaneously, etc. There is a general sense that now we're seeing with LLMs that are trained on the pre-training process, which basically involves having them predict the next word across all the written material they can find. Now we have multimodal models that do that for images and video. In some sense, you have to be very generally capable to do that. They're not just memorizing the entire Internet. They have to build some sort of internal structure that helps them reason or think or predict much higher-level patterns that help them understand a broad set of phenomena, and that generalizes in ways that don't seem surprising to me that system trained in that way is going to have this general property in the way that other systems didn't.

SPENCER: I think a really interesting example of this is a recent paper where they looked at whether AI models emotions internally, and their fascinating result was that they can actually find sections of these systems. It's complicated how they measure them, but they can find sections of the systems that seem to be modeling many different types of emotion. Not only that, but they find that those emotions change when you'd expect them to. If they're reading a vignette about anger or someone getting angry, the activation in this anger system goes off, whereas if they read about something disgusting, the disgust system goes off. They can even intervene on the AI system in those regions, and the AI behavior changes based on that. What's really fascinating about that is that nobody tried to give these systems emotions. It just turns out that if you're trying to write text, if someone says, "Write me a poem that's about this," or "Tell me the rest of the story," you have to model emotions to know what to do. If the character in the story is angry, you have to somehow know that. In some sense, there has to be some part of the neural net modeling anger. It's interesting that we can find that part. But you get all of this stuff that seems like intelligence, in this case, a certain understanding of emotions, and maybe the beginning of emotional intelligence, kind of for free in some way, just by trying to predict what comes next in text.

MALO: Yeah. And I think we see that across a variety of things. We saw from early LLMs that if you gave them a Python program and asked them what the program would output, they would be reasonably good at, in some sense, being a bad Python interpreter. I think that's also because when you're looking at a bunch of examples of Python programs in your training corpus and trying to predict what comes next, where some of that training corpus has the output of that program, you kind of learn some sort of mini Python interpreter, code interpreter, type abstraction that helps you predict what programming type shape things output. I think it makes sense that with a bunch of text that has emotional flavor and character that to be good at generating that kind of text, there's something that you have to model internally about the type of thing that would output that type of text. We don't know how to look inside those systems and understand the shapes of those things and how they do it, and how much they look like emotions or not. But certainly we see correlations where parts of the network light up, and those parts of the network are associated. When we put in text that has things we would associate with anger present in them. That just seems like they're building a bunch of general societal understanding or abstractions that help them predict what will come next. Emotion, or something that we can map to emotion, seems natural that it would arise in those systems in some way if they were good at this task.

SPENCER: Let's go back to talking about the top AI companies for a second. Sometimes they're accused of essentially lying about their belief that these things could become super intelligent. People say, "Oh, that's just a marketing gimmick; they're trying to make you scared of the technology," because that implies that technology is so powerful that investors should invest hundreds of billions of dollars in it. What do you think of that claim?

MALO: I don't find that argument very compelling. This is often paired with that particular claim, with them also expressing concerns about catastrophic or extinction risk, where they're just saying that to generate hype, to make their technology sound more powerful, more impressive. I just don't know. Is there another example in history where someone had a powerful technology, and instead of concentrating on the things that people would actually want, even if they were potentially dangerous, they concentrated at least part of what they were talking about on how it could be dangerously risky and even cause human extinction? I just don't think this kind of argument holds up. If you were trying to get people excited to invest in your technology, which things would you decide to talk about? You could decide to talk about how it would be extremely powerful, how it could confer powerful capabilities to governments, how it could generate a bunch of money, how it could do positive and negative sounding stuff. But I don't know; this idea that it's all hype for hype's sake to try and generate investment seems a little silly to me. I would also point back to basically every single one of the people who are running these companies. Before they were in the position they are in now, they were all kind of talking about this stuff and taking it seriously. It's not just something they're saying now to generate hype; it was something they believed before they were in their current positions.

SPENCER: It also seems like a way to freak out regulators, which is the opposite of what you want to do if you're trying to build a business. You don't want to get regulators' attention if you say, "Oh, my technology might end the world." That's a very strange strategy from that perspective.

MALO: It certainly seems like many of the companies are trying very hard to avoid being regulated at the moment. Sometimes they say things that make it sound like they do want a bunch of regulation, but if you look at the actions of their government affairs team and the lobbying they do, they certainly seem like they're trying very hard not to. If you were just focused on making a bunch of money and having the government get out of your way, I don't think you would be talking about all these big risks and acknowledging the possibility of extinction, especially if this was all just a play to become rich and powerful.

SPENCER: What do we know about what these CEOs actually think or say about why it's dangerous? When they explain themselves, what do they say about that?

MALO: Yeah, it is a difficult question since most of them don't say very precise things.

SPENCER: What gives you the impression that these AI CEOs actually are concerned or taking this seriously?

MALO: Yeah. So I think it's important to be cynical and not just listen to what all these AI company CEOs say and believe every word. Though, I do think that throwing it all out or having the maximally cynical interpretation is also the wrong move. One thing I would point to is literally the things that they're saying. As we've discussed, prior to being CEOs of these companies, they've written about the concerns around building very powerful AI systems. Sam Altman has said that he thinks that if it goes wrong, it could be "lights out" for everyone. Dario Amodei has said similar things in the past, and even now, when they are the leaders of these companies, it's not like they don't ever talk about that stuff anymore. I think just this past November, Dario was being interviewed by Axios, and they asked him a question about what he thought the risk here was. He thought there was a 25% chance that this could go really, really badly, including human extinction. They've also been asked whether they think that things are going too fast and that they're risky in a variety of ways. This happened in Davos for Dario from Anthropic and Demis Hassabis from Google DeepMind. Demis said that he wishes he could proceed more slowly, that even maybe a pause would be justified because there are important technical safety problems that haven't been resolved yet, but that he can't because him stopping on his own wouldn't be helpful. His other competitors would just race forward. In the same interview, Dario was asked a similar question, and he said, "I think Demis and I could probably agree to proceed carefully, but China is not going to stop." Elon Musk has said that he has AI nightmares and thinks that we should be slowing down AI and robotics if we could, but we can't. When you're running these companies and saying, "I wish I could go slower. I wish I could be more careful," talking about double-digit probabilities of these systems posing catastrophic or extinction risks, this should be treated as real evidence that these people believe these things and that this is a consistent view they've had before they were in these positions and continue to hold. Sometimes now, certain of them talk about some of the bigger risks less often, and they're a little more watered down. I think that's more evidence of the pressure they're under and the economic incentives they're facing, but they're just saying the same things that a lot of the experts, like Yoshua Bengio and Geoffrey Hinton, have said. There's a case statement where hundreds of scientists, including them, have signed this thing. So I believe they think so. A lot of the cynical analysis of why they'd be saying that for other reasons just doesn't pencil out.

SPENCER: Before we go more into the risk side, let's talk about a counterargument, which is that some people say, "Look, if this technology is not that powerful, it doesn't matter either way. It's just not going to end the world. If it is that powerful, then if we were to pause it or curtail it significantly, wouldn't we be missing out on this incredible humanitarian benefit?" Maybe a superintelligent doctor in everyone's pocket, or maybe a superintelligent scientist that cures most or all human disease. You can go through every field and say, "What would that look like? A superintelligent tutor for every child that's teaching them exactly the things that are most useful for them to know in the most effective method possible," etc. So, how big is the cost of doing these things?

MALO: It definitely is a thing that weighs on me. I think one thing that sometimes gets lost in this discourse is people throw around this term "doomer." Doomer can mean everyone from people who think AI is bad and that we shouldn't have any AI in anything, and even self-driving cars are bad, all the way through to folks like me at MIRI who think that if we could get it right, if we could figure out how to build superintelligences that we could steer and that we weren't worried about these extinction risks, that this would be the single greatest thing that humanity could do for our future flourishing, I want that future. I want a future where we can leverage the power of intelligence and automate it to do all these great things. I just think that to get there, we have to have a certain amount of caution and wisdom that we don't have right now, and we're not on track in this current race to be able to build those superintelligent systems in a way that will allow us to get those benefits without an unacceptably high probability that it will go very badly and likely result in extinction. Also, I don't think we should just stop all AI or something. There's some complicated question of what would you do? How would you stop pushing the frontier? Which things would be allowed? Which things wouldn't be allowed? But I'm not saying that the chatbots of today or something are existentially dangerous, or the self-driving cars that we're working on, or the AlphaFold systems. I think there's an enormous amount that we could do and continue to do with AI that would help us realize a bunch of these benefits. But there's a particular category of racing towards generally capable, superintelligent AI systems that we need to pump the brakes on, and that will forgo some benefits before we get to the really dangerous stuff that I'm worried about. Part of the challenge is it's hard to tell where that threshold is or when we get into the really scary danger zone. I kind of wish that I believed I lived in a world where the problem wasn't so complicated and challenging. I want all those benefits, but from where I'm sitting, the risk of pushing forward the way we are now is that none of us get any of those benefits, because there's a high chance that none of us are around to appreciate them, because we're not going to build superintelligences that will allow us to capture those benefits. I often say, I hope I'm wrong, but I fear I'm right, and I feel it a lot when I'm talking to policymakers, and I'm discussing how I think we potentially need international agreements to basically pump the brakes, pause building ever more capable general AI systems. I'm asking for some pretty big stuff with some pretty large, complicated consequences, but I just think that's the most reasonable path. I wish I lived in a different world.

SPENCER: What do you find most personally convincing that this technology is not safe enough to build in the near future?

MALO: What do I think is the most convincing argument for why it's too dangerous?

SPENCER: Yeah.

MALO: I think the core fundamental thing is how little understanding we have of how these AI systems function. Right now, we're already seeing signs of the challenges that presents, but because the AI systems aren't that powerful, those concerns aren't that big of a deal. When we're actually talking about superintelligent AI systems, not being able to have a rich understanding of how the AI system is reasoning and how it's working, as we've trained it, what goals, desires, or incentives it has internalized, and in which ways, all those seem fundamentally important to have a much richer, deeper grasp on than we currently have. If we don't, I think there are a bunch of ways that we try to make these AI systems safer that look a lot more like papering over the behaviors and the problems that we know how to see without any real ability to look inside the model and actually tell whether we've fundamentally addressed these core concerns. When you're potentially dealing with a superintelligence, all of those challenges magnify to the point where it seems like a crazy gamble to say, "Wow, this thing seems like it's behaving pretty nicely. Let's just roll the dice and hope that we're right." One of the core challenges is that as we're growing these systems, we're not building them or crafting them in a way that we understand. If we could do that, I think there are still a bunch of problems and challenges we need to deal with, but that would certainly make everything a lot easier.

SPENCER: My understanding is that your view is that if we build these systems and get them sufficiently intelligent, it's almost certain it's going to end really badly for us. Is that right? Or do you think it's more uncertain?

MALO: What does almost certain mean? I think it's difficult to put a probability on these things. There's this whole notion of p(doom), and people have different numbers. Some people are like, "Oh, my p(doom) is 10%," and some people are like, "It's 90% plus," which kind of seems almost certain, modulated by my ability to be calibrate confidence at that level of confidence. I don't spend an enormous amount of time nailing down my exact probability there. I'm in the regime where the risks seem deep enough, the fundamental technical challenges seem hard enough, and the probability seems high enough that it seems unacceptable. Some days, if you ask me, I'll say, "Oh man, I don't know, maybe 75% chance, maybe more." Some days I might feel more optimistic and say, "I don't know, maybe a 25% chance that it goes really poorly, but more than likely, we can figure something out here." I think all of those are, in some sense, unacceptably high. If you were loading me on an airplane and said, "Oh, there's only a 25% chance that this plane crashes, 75% chance that you get there okay," I would be concerned. For many things, I think humans should be allowed to make decisions and take risks into their own hands. But when we're talking about all of human civilization, I think anything approaching double-digit probabilities is just unacceptable. I don't spend an enormous amount of time trying to get very precise numbers there; the general space seems so fraught and reckless that the chances of this going poorly are just unacceptably high.

SPENCER: One thing that worries me is that even if we can control these systems, I still don't feel very good about them, because I think they would give so much power to whoever had control of them, and that itself is extremely worrisome. So there's that first question: can we control them? And then the second question is, "Okay, well, who's in control of them? That's also really worrisome." So it doesn't give me a lot of comfort regarding the control question.

MALO: I feel there's a hierarchy of challenges. Can you train an AI system to want the things you want it to want, or to have the objectives that you want it to have? And can you verify that? Let's say that you can do that. Then there's the question of, "Do you understand or even know what objectives or values you would want to train in such a system?" I think the concept of controlling a superintelligent system kind of doesn't make sense. The goal you would want to have is that, in some sense, you could instill it with a set of values or goals that are compatible with human flourishing. Let's say that you could, in some sense, do that. You still have the question of the extent to which a person or a company could build a superintelligence and succeed at this. Unless they do something with that superintelligence that prevents others from building a superintelligence that isn't aligned in that way, you still have a problem where someone else could do something catastrophic in the world. To the extent to which you can build a superintelligence and have some ability to steer it and control it, you have to deal with these concentration of power risks. Who is deciding what to do with superintelligence? Do we trust companies, governments, or various individuals to have a large amount of power or control over that? Do we even understand how we would be able to more democratically govern that type of technology? There's a whole hierarchy of risks here. I agree that even if we solve the safety problems, there's a long list of other things we need to figure out how to address for the future to go well.

SPENCER: So what in your view needs to happen?

MALO: At a high level, I think this reckless race to build superintelligence is just reckless and unacceptably risky. We have to find some way to stop that race to prevent people from developing powerful superintelligences until we have a radically better understanding of how we would even do that, in principle, let alone in practice, in a way that could be safe. That's not something an individual company can do. We talked about how many of the leaders of the companies are saying they wish they could proceed more slowly, but there are all these incentives that cause them to feel they can't. Unilateral stopping doesn't help much. If Anthropic or DeepMind were to say, "We're just going to go slower," or "We're going to hit the brakes," or "We're going to stop because we wish we could take more time to address the risk," that doesn't help much. That just means the next company in line, who's less cautious, will be the one in the lead and steering where the future goes with the development of technology. If the US decided to unilaterally stop, we still have the problem of all the other countries in the world. In some sense, there is a big coordination problem here. My sense is that the only really good shot we have of getting through this is to have some sort of internationally coordinated agreement to prevent the creation of superintelligence until we have a much deeper, radically better understanding of the technology, how to build AI systems that we could deeply understand, and solve a lot of the other problems we were talking about, such that we could actually get the good future with superintelligence.

SPENCER: What would it look like to actually create that coordination?

MALO: I can talk about where that would all be headed in terms of what an international agreement would look like, and the path to get there. A lot of that is difficult to predict, but maybe I can start with a little bit of what the big picture would look like in terms of what that coordination might entail. I think the most promising approach would be something like the US and China entering into an agreement where they both agree to stop a certain set of research and AI training for more capable, generally intelligent AI systems, and to work with their allies and their spheres of influence to build an increasingly large global coalition of people who are doing that.

SPENCER: Just to be clear, this wouldn't stop the use of current AIs or existing models. It would be about training bigger and bigger models.

MALO: That's right. One of the big levers there, and this will get harder over time, is governance of compute. Training AI systems today requires running massive training runs that take months on huge data centers that consume the power of a small city. The chips in those data centers are designed by three companies, and 90% of them are manufactured by one company, TSMC, in Taiwan. The machines to make those chips are made by one company in the Netherlands. There is a sense in which the supply chain for creating powerful AI systems is very narrow. If there was actual political will from the two major powers in the world to try and stop this, they could agree to consolidate the compute that they have now. There are a lot of data centers, but not that many relative to the two most powerful countries, trying to understand where they all are. There is tracking the chips, where they are, consolidating them into data centers that both parties track and monitor. Then there is a verification monitoring regime that would have rules in place for how those chips could be used. At MIRI, we've taken a first stab at sketching out what that agreement would look like. We have a paper out called something like "International Agreement to Prevent the Premature Creation of Superintelligence." In that agreement, it starts with a bilateral agreement between the US and China to track down and put all AI chips, either retrofit data centers or move them into monitored data centers, and impose a restriction where no one could do any training over a certain threshold, which in the agreement is 10 to the 24 flops. That was picked somewhat as a conservative bound since we already do training runs over that, but with the kind of march of algorithmic efficiency, it seemed like a good lower bound to prevent people from pushing the frontier of more generally capable systems. That's a good first pass. There is a second range, which would be 10 to the 22 to 10 to the 24, that would be monitored. If you were training AI systems at that scale, there would be various monitoring to ensure that you weren't doing certain types of prohibited training. We can talk about how you might verify and enforce that. Generally, there would be a chip tracking effort that would require any collection of AI chips, in the agreement something like 16 H100 equivalents or greater that are connected with fast interconnect, to be registered so they can be part of this monitoring regime. This is trying to govern the primary resource that goes into creating these powerful AI systems and have visibility and the ability to intervene and prevent people from doing the type of training that we wouldn't want them to do to push the frontier here.

SPENCER: Do you think that for a treaty to be successful, it would have to be at the country level, rather than, say, the top four big AI companies all coming together and reaching an agreement amongst themselves?

MALO: I think if the top AI companies, and it wasn't just American companies, but also Chinese companies, could coordinate in that way, that would be helpful. I do think that poses challenges, as there are a bunch of jurisdictions where people wouldn't necessarily agree. My answer is that it would be helpful. I think right now, we're seeing a lot of signs from the companies that they don't think they could make that workable on their own. I think there are also a bunch of challenges here, where that kind of coordination is considered anti-competitive and illegal, so the government would have to do a bunch of things anyway to allow what would be seen now as collusion between AI companies to make these kinds of agreements. But I do think it has to be international for this to be more durable, as the relevant actors here are states and not companies. I have a broader view that there's a fundamental challenge that humanity needs to solve. As we move more towards technological maturity, we're going to develop more and more technologies that allow smaller and smaller sets of actors to have very large impacts on the world, both positive and negative. Individual coordination between actors or markets is not up to the task of doing that type of governance. In some sense, solving these coordination problems is what governments are designed to do. They are the primary actors that need to get together to make these agreements and then enforce them in their jurisdictions.

SPENCER: I know that companies agreeing on pricing is illegal, but is it actually illegal for companies to agree to not develop technology or to slow down a technology?

MALO: That's my understanding. I've certainly chatted with folks at the Frontier Model Forum, which is one of these organizations that tries to get companies together to hold conversations and do coordination. It seems there are a bunch of really annoying fundamental limitations on how they can collaborate on a variety of things that you would think are obviously good ideas. For reasons related to the way the laws are written, it can be considered anti-competitive, as they could be conspiring to make it the case that consumers don't get benefits they otherwise would get because they're limiting the development of a certain technology.

SPENCER: What about AI companies working together? That's another thing that some people have proposed. Imagine they thought they were close to building this superintelligence. The AI companies could say, "Okay, we're going to collaborate. We're going to make a group project," rather than each trying to compete with each other separately. Maybe that would allow it to be slowed down and done more carefully, and everyone gets input, et cetera.

MALO: This is another thing where I'm kind of if AI companies did that, that would seem better than them not doing it. I think the challenge here and the stakes are high enough that we should not just rely on hoping for the goodwill of private companies and private actors to cause the right kind of coordination to happen at the right times, and that there aren't other actors who will step in and not want to be part of that type of coordination. One way of framing this is that there's a near-term question of what the current players could agree to that would make me feel better. There's also the question of where this is all going and how we can build towards a world in which these problems are solved more robustly. A lot of the proposals that look more like private coordination seem to me to be useful and helpful, but they don't solve the fundamental challenge here: superintelligence will be enormously powerful and potentially enormously dangerous and risky to build, and individual sets of actors being responsible isn't enough. There's an analogy here: what if you applied the same reasoning to companies building nuclear weapons? I feel like most governments of the world would not be like, "Oh, well, we know the current ones that are building it, and they seem responsible, and they say that they're not going to use them in ways that threaten our democracies, and also that they won't accidentally explode and level a city where they're developing them." Hopefully, another company doesn't start building one in secret in a way that we can't quite tell. It just seems like there are good things that would make for a better world if they were happening, but I don't think they fundamentally solve the problem.

SPENCER: How do you think about the fact that Trump is in power now, and he seems pretty anti-AI regulation, as far as I can tell?

MALO: I think that there's a general sense in which many people who are leaders in the world haven't really grappled with the big challenges here. My guess is that Trump hasn't spent much time engaging with the arguments on one side or another. I think of it again, there are a bunch of disanalogies, but similar to the nuclear analogy, a lot changed in the world when it became apparent to US presidents like Reagan, or leaders of the Soviet Union like Gorbachev, that it wasn't just a powerful technology that they could use to leverage power against their adversaries, but that either actor having too much of this power posed a risk. If they had enough nukes, and the adversary had enough nukes, it was a threat to themselves. I just think that if Trump believed the risks were present that I believe were present, he would have a very different perspective. Certainly now, with the technology that has the economic benefits, the potential strategic benefits, and the military benefits, the view that the US needs to dominate is a fairly reasonable view to have. We don't want to cede a bunch of economic and strategic advantage to our adversaries. But this kind of isn't really baking in the big risks. I think that the challenge here, maybe the question, maybe the thing your question is actually getting at is, it sounds like the thing that you want would be great. I don't see the political will. How do you solve the political will question? Maybe that's what you're actually asking. It sure seems like there isn't the political will. I certainly don't have an answer for how we go from where we are today to there being broad consensus among world leaders that the risks are as big as I think they are. But I do think that certainly in my two and a bit years having done the work that MIRI does now, which is more like comms and governance flavored, going to DC, talking to policymakers, I've been heartened and surprised by the degree to which when they actually engage with the subject matter. At the start of this conversation, we talked about what the AI companies are actually doing. When you explain what the current AI companies' plan is, what they see the risks as, what other experts see the risks as, and what their plan is to make it safe, when you lay out that they're trying to build superintelligence, "here's what superintelligence is, here's a bunch of the risks, and here's what they think the risks are as well." They acknowledge that there are a bunch of safety problems they haven't solved. Their current plan is to build AI systems that will be able to automate AI research and development, and while they're all racing amongst themselves and with Chinese companies, they're going to try and use some of these very powerful AI systems to also solve all these problems that they don't know how to solve, and do that before these risks manifest. A lot of them just go, "Well, that sounds crazy." Yes, that is what we're talking about. This is the current situation. I don't know when, at what time, there could be a sufficient amount of political will, and I certainly empathize with people who feel like it seems hopeless, or that this all just seems inevitable. Can the governments of the world rise to the occasion? Will they ever understand the risks if these risks are real? I certainly feel that. I often think back to the 50s, where I think there were a lot of smart people sitting around in the 50s going, "We're fucked."

SPENCER: [inaudible]

MALO: Yeah. They were looking back at the arc of human history, and they were like, we sure seem super into fighting wars. Then we got more powerful toys, and then we fought a World War, and then we were so horrified by that that we made the League of Nations, and we were like, "Never again." Then we immediately fought World War II and now we have nuclear weapons. We're not going to be able to contain this. They're just going to spread. It's just a matter of time until there's a big nuclear exchange and we end up in some sort of nuclear apocalypse. That was a pretty reasonable perspective to have at that time. We got lucky along the way, but I will note that we're still here. More countries have nuclear weapons than I would feel comfortable with. But we do have a bunch of international agreements in place. We have the Treaty on the Non-Proliferation of Nuclear Weapons. We've done a bunch of work with adversaries throughout the last 60 plus years to negotiate de-escalation here. There was a moment where it was a movie, a great story. I don't know if many people know this, but there was this movie, The Day After, which was a made-for-television movie that chronicled people after a nuclear first strike happened in the United States as they were going through the aftermath of a nuclear wasteland. Reagan watched that movie and was depressed for two weeks, and it really hit home to him the risks nuclear weapons posed to the world. It's my understanding that in history, this is very counterfactually responsible for him going to the table with Gorbachev and basically being like, "What are we doing here? We need to step back from the brink." I certainly don't think we're at that point today, and I don't know how to predict when we'll get to that point, but I think it's possible, and we can look back at history and see that it's possible. If our leaders had a shared understanding of this, it poses a risk to all of us. One thing I like to say is nobody wins in a race to be the first to lose control. If the probability is sufficiently high that racing towards superintelligence just means someone eventually succeeds, and we lose control of it, and it's bad for everybody, then this whole notion of winning the race kind of gets flipped on its head. Something very different could happen in how the world relates to this. There's a question of, "Can we get there before it's too late?"

SPENCER: Yeah, I think the point you made about Trump is an interesting one, because I would imagine that if Trump actually believed this was incredibly dangerous for the whole world, and he could step in and protect the world, I think he actually would do that.

MALO: Yeah.

SPENCER: I think if he felt like he would get a lot of accolades for it and would be protecting the world, it actually does seem compatible with something he would do.

MALO: Yeah, and I don't know, he seems like a guy who likes to make deals, and it would be a pretty epic deal if he actually believed in the risks here. He could say that he met with Xi and hammered out a substantial agreement to protect the world with our biggest adversary to protect us all. It certainly seems like Trump can pivot on positions, and to the extent to which he eventually ended up thinking these risks were serious, he could decide to take a bunch of decisive action and try to broker some deal here. I guess another thing I'll say is, I think there are a bunch of hard questions about what this agreement would look like and how it would come to be. All these things in practice are good, hard questions. We've done a bunch of work. We've tried to design a first stab at what an agreement like this would look like. But MIRI's technical governance team is six people. This kind of thing isn't going to happen if the experts, diplomats, and folks in the world who actually understand how these things work much more deeply and have experience doing these negotiations are not thinking about it. And so I think there's this political will question, which I think as AI systems get more capable, a lot of times people start to also think about the risks, and that will continue to happen as we get more and more capable systems. To the extent that that happens, I think the path here is that more people who actually understand how to do these types of agreements start to have them in their shower thoughts and start to take these risks seriously. That's why we have our governance team at MIRI, but also we have our comms team, because we think the world won't be up for taking this challenge seriously if enough people aren't thinking about it and aren't applying their expertise to solve all the problems. We're certainly not going to solve all the problems. I don't think MIRI knows how to one-shot a treaty. None of these things are one-shot in any way. They're long, complicated processes that get negotiated, and there's a bunch of work to do, and we're not going to be able to do it all. We have to get to a point in the world where enough of the relevant people, enough of the relevant experts believe these risks are serious enough that they start to try and figure out what the actual solutions would look like, "Where we're right, where we're wrong, in terms of how an agreement like this might work, and actually work towards what a first stab at implementing it would be, or what the paths there are." We've done some thinking about what the paths would be for that.

SPENCER: Another thing this makes me think about is governments have slowed down technologies before. We talked about nuclear weapons, but actually nuclear power. I think there's a pretty good argument that governments basically slowed it down to a crawl when it probably would have gone way, way faster if they hadn't done that.

MALO: I do also think I agree, and in that case, I actually think that's kind of a tragedy.

SPENCER: A tragedy of concept.

MALO: That's right, that's right, proof of concept for an unfortunate success, in the sense that, yeah, I actually think that that is a pretty clear case where there were risks, but some part of the narrative got so far away from us that we ended up foregoing a huge amount of upside that was kind of unjustified. But yeah, there's, you know, from nuclear weapons to chemical weapons to, this isn't even necessarily government intervention. But another category of the world not pursuing a technology is kind of, there's basically a taboo around recombinant DNA research, like genetic germline engineering, and this isn't a case where this also, I think, kind of speaks to you. So there's a certain set of people who hear about these international agreements, and they worry that the thing that we're proposing is some sort of international police state or something. And there's a way in which you could do surveillance here and concentrate power to try and prevent the creation of superintelligence, that could look like that. And those are worlds that I don't want to live in, but also to the extent to which people take a certain set of risks seriously, and they're much more broadly understood. There is also just an enormous amount of power in that understanding. And I think genetic germline engineering is an interesting case where there are tens of thousands of people in the world who can just single-handedly do that type of work. But the scientific community came together a while ago, and we're like, "We think the risks here are kind of, you know, pretty high, and this is kind of fundamentally messing with something like what it means to be human in a way that's kind of screwed up to unilaterally do and kind of agreed not to do this." And there is a big taboo about doing such a job. And it's not like it never happens. Sometimes someone gets caught doing it, but they immediately get ostracized. Their funding gets pulled, and they don't do that kind of work. So I don't think that it has to just be, if you can't catch everybody and you can't surveil everybody, then this plan is doomed to failure. I think you can do the big things to prevent the large training runs and that sort of thing, and implement a bunch of other restrictions, and then continue to grow this common understanding that the risks are too large here . That also just has a bunch of ways in which that's enforced, just through taboos and general understanding through society.

SPENCER: Could you lay out some of your thinking on how we go from what we have today to a treaty between the US and China?

MALO: Sure. So I think it's often easier to build towards these sorts of things when there are pre-established mechanisms for having these types of conversations. So right now, there's not a great Track I, government-to-government conversation set up between the US and China on many things, but including AI, and I think we would be in a better place if we had that and we were working towards some sort of agreements and red lines that were far below the things that we're talking about with this international agreement as a way to start to build that process, to build those relationships, to start to do that kind of thing. I think there are a bunch of issues that could be promising for that type of work. One example that often comes to mind for me is the proliferation of powerful dual-use capabilities. So right now, we're in a world where most of the major US companies keep their models private, but most of the companies in China are much more aggressive about open sourcing. And I think it's unclear to me how much of this is strategic top-down on the part of the Chinese government, but certainly, if it was, it makes a lot of sense right now, from a competitive perspective, where if you're behind, it's advantageous for you if most of the world is nipping at the heels of your adversary, and so you're trying to degrade their advantage. I think China also very much values stability, and to the extent to which we're actually starting to cross the threshold of building models that have more concerning dual-use capabilities that would be destabilizing or that have bigger national security implications. Maybe Mythos and the next generation of models are going to start to cross that threshold where Mythos, in particular, from Anthropic, seems to be as good or better than most of the world experts at cyber vulnerability discovery and exploitation. I think it's probably in the interest of both countries to not have those types of powerful dual-use capabilities proliferate broadly into the hands of incautious and malicious actors, but to the extent to which one of them is doing it, but the other isn't, they don't necessarily have the incentive to stop. But I think it's kind of an easy thing to get together and be like, "Can we both agree to be a little more rigorous about having some red lines and some domestic rules about when we allow those capabilities to proliferate broadly in the open versus not?" And another thing that people often point at when talking about international agreements with countries like China is that it's very difficult to build trust. It's difficult to trust that they'll agree to things because we can't verify that, and so we can't trust the Chinese to hold up their end of the bargain. Well, certainly with this first step, it's fairly easy to verify. You just have to check whether the companies in both countries are actually not putting powerful dual-use capabilities out into the open, that they're serving them behind APIs with the right kind of classifiers and safeguards in place. So that doesn't solve the main big picture problems that I'm worried about, but it's a step in the direction of having processes to even make agreements about these sorts of things. And so I think that starting to work towards having those kinds of processes with maybe agreements like that starts building us towards that world. I think they could then, with that kind of process, to the extent to which there was increasing concern even about the dual-use capabilities, let alone some of the loss of control concerns, there are paths that go through starting to have disclosures about AI systems and hardware to agree to pre-notifications about certain training runs that start to build the foundation of these sorts of things. I think there are a bunch of things that we can be doing with starting to be better at monitoring and accounting for chips and data centers, where they're at, all those sorts of things. And so there's a scale or progression that you could be going through that starts with small, common-sense moves that both countries would find in their own interests, through to, as I expect, as AI capabilities get more capable, as the capabilities themselves seem more concerning, as I expect there will be a greater appreciation for maybe these loss of control risks are real, that will have more of the structures in place to be more ambitious and the relationships in place to start to consider what it would look like to do these more rigorous kinds of consolidation of compute, like those more hardcore things.

SPENCER: Seems like a nice feature of that approach is it involves starting small and iterating, right, finding early wins, and then you can build on those and say, "Okay, well, that collaboration between countries worked well to prevent systems like Mythos from causing insane hacks from occurring all the time." So now, what else? And you go from there.

MALO: Yeah, and we already have some precedent for this. There have already been small agreements where the US and China have agreed not to incorporate AI systems into nuclear command and control in terms of decision-making around firing nuclear weapons. There can be a bunch of agreements around no first use of AI systems in certain categories of conflict, that sort of thing. That said, I think we might be in a pretty accelerated timeframe here. I don't think we have 10 or 20 years to slowly build these things up. But I also don't think we need to do it all at once. I think it's unrealistic to expect that we go from a world where no one is really taking this seriously to discussing the international agreement that MIRI is putting forward as an example overnight. We have to take steps in that direction, find mutually beneficial targets and agreements, and establish red lines that both actors understand and think are in their own interests.

SPENCER: If the US, China, and maybe a bunch of other countries all agreed to some kind of ban on certain uses of chips for training past a certain level, how easy would it be for a country to cheat on that or acquire chips in a way that nobody could tell?

MALO: There are a few different ways to answer this question, and part of it depends on political will. To the extent that Xi Jinping and Donald Trump were worried about loss of control from superintelligence and wanted to decide to do something today, there are certainly ways to stop a bunch of AI training and have physical inspectors and data center layer monitoring of what was going on with AI chips. They could agree to have mutual third parties doing inspections, which would be very low-tech. This would be more disruptive since low-tech solutions don't allow for as many fine-grained distinctions about what is okay and what is not because you don't have that kind of visibility. But there are a bunch of low-tech solutions that can be ported over from examples in the nuclear world or other types of agreements. That would certainly be a start. The sweet spot, I think, is to work towards building verification mechanisms into the AI hardware itself, so you could demonstrate that those chips were being used for a certain category of things and not another category in a way that your adversary would find trustworthy. You might have to have a neutral third party come and visit the site to ensure that things were implemented the way you said they were. I think this is the world we want to go to for many reasons. I find it hard to imagine that we will end up in a stable world unless we can find some combination of legal, technical, and institutional mechanisms to impose some restrictions on AI development, deployment, and diffusion. If people can build AI systems that are smarter than any human at developing cancer cures, those systems can also build dangerous bioweapons. The same applies to patching cyber vulnerabilities; those are also super hackers. If we let it rip and allow anyone to use those unrestricted, we want to do all the defensive things. We want to try to patch all the software first. We want to put screening on gene synthesis labs so they are less likely to synthesize dangerous compounds for people who want to do bad things. But the surface area of reality is very large, and it's impossible to defend it all in principle. We need to find a way to determine, "Okay, who is able to train these systems, who is training them, what rules we want them to abide by, and how we can check that only approved individuals are training certain types of systems?" We need to ensure that unauthorized individuals do not have the compute resources to do so. A lot of the incentives for these verification mechanisms, for on-chip monitoring, and for data center layer software monitoring apply to grappling with AI as a powerful dual-use capability. We are not moving nearly as fast on that as we need to be, even just selfishly domestically. The incentives are there to do that, and many of those things are the foundations needed to demonstrate that capability to other parties in an agreement.

SPENCER: Since the chips are made by a relatively small number of companies, would we essentially be able to see all the chips being produced to add this technology to them if there was the political will to do so?

MALO: Yeah. I think there are already prototypes of current chips that mostly have the hardware mechanisms in place, certainly to do the location verification type stuff that we would want to minimally do, and a lot of the workload verification that we'd want to do. Most of the challenge is actually tamper resistance. If you had a new firmware update that had location verification and granted the chip a license that would stop working if it didn't connect to the internet and get that license refreshed every month or something, and also potentially a ransom classifier that only allowed a certain type of workload to happen on the chip or another, most of this is possible right now with current chips. Some of it isn't, but the majority of it is. Most of the challenge from the international agreement standpoint is, could you sell a chip like that to someone you didn't trust and be confident that it was tamper resistant enough that they couldn't undo those protections and use those chips in nefarious ways? There is a lot of work going into what it would look like to reach that standard of tamper resistance, or at least knowing if you enter into this type of agreement and a whole data center went dark, you would at least know where to look and know who to contact about the fact that maybe they were no longer complying with what you wanted them to comply with. Most of the knowledge of how to do this is there. A lot of the hardware is there. The biggest thing is getting the development in the tamper resistant mechanisms. There are also ways to get around this by not necessarily doing it on the chips themselves. There is a whole category of research into flexible hardware enabled mechanisms, called flexHEGs, where you could wrap the chip itself in a tamper resistant enclosure that provided all the properties you wanted, such that you didn't have to have the chips themselves have these mechanisms, but you could have the enclosure have it, and if someone tampered with the enclosure itself, it would basically destroy the chip.

SPENCER: My understanding is that with nuclear weapons, there's a cover inspection regime where there are certain protocols and countries have to get inspected, etc. Are there any useful ideas that could be applied here?

MALO: Yeah, so when we worked on the agreement, we actually wrote article text, and for most articles, there's a section on precedent with some other commentary. We drew from a bunch of things, like the Treaty on the Non-Proliferation of Nuclear Weapons, which establishes a neutral inspection agency and has a bunch of mechanisms. There's the Strategic Arms Reduction Treaty (START) treaty, which has challenged inspections, something we incorporated into our treaty. You might feel confident that if another party to the agreement was doing what they said they were doing, you could tell that they weren't reneging on the treaty. But you might still feel uncomfortable that they found some way to do it in a way that you can't check, that they have circumvented the verification mechanisms. So you have a mechanism where you could request a challenge inspection, and they would have to allow certain inspectors to come in and verify that they were doing what they said they were doing. We took inspiration from the Chemical Weapons Convention, which has this whole notion of declared and monitored facilities, which we adapted to data centers. There are a bunch of disanalogies with the nuclear situation. Some people are grumpy when folks like me talk about how nuclear is a great analogy. We can go into what the analogies are, but I think there are a lot of things to learn from those prior nuclear agreements for mechanisms here, even though not all the analogies carry over.

SPENCER: What do they say doesn't carry over well?

MALO: Well, there's some mechanism stuff, and there's some higher-level stuff. One important disanalogy with nuclear is that from the start with nuclear weapons, it was very clear what the risk was. When you're talking about doing Non-Proliferation stuff similar to nuclear weapons, a reasonable counter is that it was certainly a lot easier in that world where we were introduced to nuclear weapons by two cities being blown off the face of the map and then a bunch of tests. Everyone kind of knew what the stakes were, whereas with AI, that is not the case. We have arguments for why these superintelligences would be difficult to control, and folks like me think that loss of control is a very serious possibility, maybe in the high double digits. We don't have a superintelligence that we can look at and all be scared of. Similarly, aside from nuclear power, there weren't the other incentives for nuclear weapons, like the economic positive upsides of developing the technology. It was a lot easier to just be scared of them and to do a bunch of Non-Proliferation stuff.

SPENCER: Maybe nuclear power to some extent, although obviously, somewhat severable technologies. But that sort of an upside.

MALO: That's kind of the only one, and it's still big enough that it's easier to monitor. There is a sense in which, yeah, once a model is open-sourced, if you have four Mac Studios, even the big ones, you can just run those yourself or something. There's a very strong analogy with some of the proliferation stuff if you're looking at compute. In some sense, I actually think probably trying to prevent someone from building a very large data center is more tractable than stopping them from building some centrifuges that refine uranium. Uranium is just rocks in the ground, and centrifuges are complicated, but they're not that complicated. If you were taking this seriously, there's one company that builds the machines that make the chips, and one company that basically makes all the chips. If you're actually intervening, no one's going to start making their own AI chips and building their own data centers. There's a very, very difficult kind of concentrate, a very narrow bottleneck for controlling that. There's a separate thing of, yes, once you do have AI systems out in the world, there's a certain proliferation question, which is very difficult. There's also the march of algorithmic efficiency. I think the type of agreement we're talking about isn't the type of thing that works for 50 years. It's difficult to imagine that there won't be any sort of progress in figuring out how to make AI systems more efficient. We're already seeing just with normal AI development all kinds of algorithmic efficiency insights causing it to be the case that with far less compute, you can train the same capability of AI systems two years later. There are probably lots of paradigmatic insights to be had. You and I, our brains run on 20 watts. There's a lot that we can probably figure out about how to make radically more efficient AI systems than we have today. Some of those insights could be things that are much more theoretical in nature or require a small amount of compute in order to test in a way that would be very difficult to verify. There is a sense in which, yes, maybe there are also things in nuclear space. There are insights to be had where people could build tiny nukes on their own without the ability to be caught. But that seems more possible at some point in the future with AI, where someone could have an insight, where with their four Mac Studios, they could actually build a very powerful AI system that would be unprecedented if they had some brilliant idea or two or three about how to do something very differently with AI. There's a sense in which this is kind of a stopgap; we have to do something with the time. If we can get an international agreement like this, if we can stop people pushing the frontier for as much as we can for something like 10 or 20 years, we still have to figure out what we do with that time to then solve all the other problems we talked about earlier that the world needs to grapple with to build a stable, flourishing world with superintelligence. But I think it's quite unrealistic. I think it's impossible. It wouldn't be the future that I'd be trying to target. I think it's very unlikely that we can somehow prevent people from building superintelligence forever.

SPENCER: Now, before we finish up, how about we do a rapid fire round where I just throw out a bunch of difficult questions, and you give your quick take.

MALO: Sure, I will try my best to fight my rambling instincts.

SPENCER: So how much does it matter whether AI is conscious or sentient when it comes to AI posing a potential threat to humanity?

MALO: I think my answer to that question is, I wish we understood enough about what was going on in those AI systems or consciousness to have a better answer to this. It's certainly my sense that you could build the type of AI system that I'm worried about posing an extinction risk, and for there to be nothing that it is like to be that thing. It doesn't have to be conscious or sentient to be a powerful AI system that is trying to accomplish goals in the world or pursue some sort of objective, and be capable enough that it would pose an extinction risk to humanity. There's a separate question of whether AI systems can be conscious. I don't see any reason, in principle, why that can't be the case. I think it's a gnarly question that we have a hard time even understanding how to tell whether we are conscious, and why, besides internal subjective reports or something. So it seems possible. It doesn't seem to me that it is necessarily correlated with whether we're going to get the risks that I'm worried about, but I do think there's a whole separate category of things that we need to worry about, such as are we creating a new conscious species and what is our moral responsibility to that, even if they're not super intelligences. That is a whole other gnarly thing that we need to grapple with.

SPENCER: What's the orthogonality thesis, and what do people get wrong about it?

MALO: Oh, yeah, geez, the orthogonality thesis. So the orthogonality thesis was originally designed to address a sense that a lot of people have, or an intuition that they have, that as humans become more intelligent, there is some correlation between intelligence, goodness, or morality. The orthogonality thesis is just trying to make a very narrow claim that, in principle, any level of intelligence or capability can be paired with any goal; these two things are orthogonal, and that's totally separate from a claim about how correlated these things might be in any sort of training paradigm. It's just a very narrow claim against the fundamental intuition that there is something that, by default, must be correlated here. Just as you can design a computer program to output any integer, you can design a computer program to pursue any goal, and it won't necessarily end up caring about us or about sentient life because of it, or be more moral. That's it. I think it often gets confused into an argument where the orthogonality thesis is in some sense load-bearing for something about MIRI's view or my view, that because the orthogonality thesis is true, every AI system will care about some random thing that isn't something we care about. That's just not the case. It's a very specific, narrow claim about what's likely to be possible in principle as a response to a very specific type of argument. The orthogonality thesis was invented before deep nets; it's a very specific thing trying to deal with a very specific argument.

SPENCER: Suppose that we solve the alignment problem with AI and build superintelligence, and additionally, somehow we get really good values into the system. So everything goes really super well. What do you think the world would be like? Or what do you actually envision as the best-case scenario outcome? What kind of world do we have?

MALO: I think there's a sense in which that's fundamentally difficult to imagine. It's like asking someone from the 1100s what we would be able to do with modern technology without them even having a sense of what modern technology was possible. Things that I do imagine are, at minimum, something like the marginal cost of goods goes to zero, that basically anything that anyone could want is something they could have. We're going to have a bunch of challenges to deal with, like how do people have purpose in a world where basically any sort of material they want could be satisfied? I could imagine there would be groups of people who would want to live very normal lives. Maybe it turns out that for a certain type of person, the peak experience is being a safe subsistence farmer who has good relationships and makes their own food. There will be a corner of the earth that is just people living fairly normal lives. There will be people who are very transhumanist who want to upload their brains and be von Neumann probes, explore the galaxy, and have experiences that I can't quite even imagine in civilizations of digital minds. The space is broad and confusing, and there's something fundamentally difficult to predict. Mostly it's a question of how do we find purpose, flourishing, and happiness in a world in which our labor and how we spend our time are not tied to our survival and our needs being met. I don't know. There's a sense in which philosophers have been debating what humans want and what is good for 4000 years, and it seems kind of mixed up in where we'll end up if all that stuff is possible. I certainly don't have the answers. A good world here often will, in some sense, come through a powerful, aligned system helping us navigate those questions. This isn't quite a rapid-fire round. One thing I'll say is it's hard to imagine that we end up in one of those stable worlds if we don't. I haven't heard of a plan. There are different levels of where we would succeed, and we've talked about the different challenges. There's this fundamental challenge of how do you get to a world where we have superintelligences and everything's okay? I have not heard a plan for how that goes well for the type of thing that I mean by superintelligence if the plan for superintelligence isn't to do something like what Eliezer calls CEV, which stands for Coherent Extrapolated Volition. The goal that the superintelligence system has is to figure out what we individually and as a society would want if we were more the type of people we would want to be, had more time to think, and were wiser. I don't like to frame it as wiser because that's putting a term on what wise means. But in some sense, it's the AI system's job to figure out what we would want if we were more the people we strive to be, if we had more time to think about it and figure it out, and to lead us on that path, while also ensuring that there aren't other actors in the world who could screw it up for us.

SPENCER: Final question for you, what do you want people to remember from this conversation?

MALO: I think that might change depending on who the person is and where they're at, or probably would change if it was six months ago or a year ago or six months in the future. Right now, certainly, a thing that I'm focused on is, I do think a lot of people, I notice this a lot, in a bunch of different ways, with the book and looking at comments on Facebook, ads and stuff. There's a lot of people who are worried about this, but are resigned or think it's inevitable that we won't be able to somehow rise to the occasion. I guess I would want them to take away from this conversation that it is possible, that humanity has done really hard things in the past, that we've been in places where things seemed pretty bleak, and that there isn't a magic solution, but millions of individual actions, of trying to grapple with this problem, of talking about it, of engaging with policymakers actually can shift the world in a way in which eventually you can get a Reagan and a Gorbachev who are like, "What the hell are we doing here?" And do something different, and that is something that is within reach. It seems to me that, in some sense, the people that are called the doomers, I don't like that term, because some of the people who are, you know, like me, that are called doomers, are, in some sense, the people who are earliest to being most excited about the positive benefits of the technology. But it seems to me like most of the doomers are, in some sense, actually the optimists, where there are people running the companies and a bunch of other people going, "Oh, it's inevitable. The only way through here is to race and kind of cross our fingers that we can solve all these problems." The doomers are sitting around going, "No, we can actually be the better versions of ourselves as a society who can rise to the occasion to figure out how to govern this technology, to not do this reckless race, and to actually pick the wiser, saner path that's more likely to lead to a flourishing future." I guess I would want people to take away that that is a real, possible world, and it might seem far away, but the more that we talk about it being inevitable and impossible, the more it makes it inevitable and impossible. If we start to talk about it as if it's just hard, but something that we can do and that we would want to do, that makes those things more possible.

SPENCER: Malo, thanks so much for coming on the Clearer Thinking Podcast.

MALO: Hey man, thanks for having me.

Staff

Spencer Greenberg — Host + Director
Ryan Kessler — Producer + Technical Lead
WeAmplify — Transcriptionists
Igor Scaldini — Marketing Consultant

Music

Affiliates

Click here to return to the list of all episodes.

CLEARER THINKING

Episode 312: Could an international agreement protect us from dangerous AI? (with Malo Bourgon)

Contact Us