with Spencer Greenberg
the podcast about ideas that matter

Episode 207: Should we pause AI development until we're sure we can do it safely? (with Joep Meindertsma)

Enjoying the episode? Want to listen later? Subscribe on any of these apps or stores to be notified when we release new episodes:

April 25, 2024

Should we pause AI development? What might it mean for an AI system to be "provably" safe? Are our current AI systems provably unsafe? What makes AI especially dangerous relative to other modern technologies? Or are the risks from AI overblown? What are the arguments in favor of not pausing — or perhaps even accelerating — AI progress? What is the public perception of AI risks? What steps have governments taken to migitate AI risks? If thoughtful, prudent, cautious actors pause their AI development, won't bad actors still keep going? To what extent are people emotionally invested in this topic? What should we think of AI researchers who agree that AI poses very great risks and yet continue to work on building and improving AI technologies? Should we attempt to centralize AI development?

Joep Meindertsma is a database engineer and tech entrepreneur from the Netherlands. He co-founded the open source e-democracy platform Argu, which aimed to get people involved in decision-making. Currently, he is the CEO of, a software development firm from the Netherlands that aims to give people more control over their data; and he is also working on a specification and implementation for modeling and exchanging data called Atomic Data. In 2023, after spending several years reading about AI safety and deciding to dedicate most of his time towards preventing AI catastrophe, he founded PauseAI and began actively lobbying for slowing down AI development. He's now trying to grow PauseAI and get more people in action. Learn more about him on his GitHub page.

JOSH: Hello, and welcome to Clearer Thinking with Spencer Greenberg, the podcast about ideas that matter. I'm Josh Castle, the producer of the podcast, and I'm so glad you've joined us today. In this episode, Spencer speaks with Joep Meindertsma about provably safe AI systems, the need for AI regulation, and pausing the development of AI capabilities.

SPENCER: Joep, welcome.

JOEP: Hey, Spencer, nice to be here.

SPENCER: Nice to have you. Let's just start right off the bat. Why do you think that we should pause AI development?

JOEP: Well, because it's, in my opinion, too dangerous to continue on the path we're currently on. And pausing actually buys us time to think about how we can actually use this technology in a safe way, how to build it in a safe way, and how to get the right regulations in place. Right now, we're currently in this blind race, and I feel like pausing is the most sensible thing to do right now.

SPENCER: And what does pausing mean? Could you explain what that would entail as a policy?

JOEP: About nine months ago, there was a pause letter started by the Future of Life Institute, and that was calling for a pause on the development of the largest AI systems for six months. So basically, all AI models that are currently being used would be completely legal, nothing would change. But the development or training runs for these large AI systems would be paused for six months. What we are calling for at PauseAI is not necessarily a pause for six months, but one that is basically taking as long as we need to make sure AI is safe. So we're calling for a pause until AI can be developed provably safe. And again, that's only for the largest models and not the smaller ones.

SPENCER: What does provably safe mean?

JOEP: Provably safe AI can be mathematically guaranteed not to result in very unsafe behaviors. So when we're talking about unsafe behaviors, we're thinking about an AI going rogue — basically pursuing its own objective without being able to be shut down — or contributing to very harmful effects such as creating a bio weapon or creating cybersecurity weapons. So a provably safe AI system is built in such a way that we can know before it's being released or even trained, that it can do these dangerous things.

SPENCER: So when you're talking about it not doing dangerous things, what are some of the things that you have in mind that you wouldn't want an AI system to do?

JOEP: There's actually a very broad range of risks that AI systems can pose. There's already risks from systems that are currently being deployed and currently being used. But I'm mostly interested in the types of risks that AI models could pose that do not exist yet, because these tend to be so much more catastrophic. I have a background in software engineering, and one thing I tend to focus a lot on is the cybersecurity risk of AI. So a model like GPT-4 can already find some security vulnerabilities in code bases. And if that capability advances to a superhuman level at some point, an AI will be able to find these zero-day vulnerabilities in all sorts of code bases. And that will basically mean that we end up with AI that has the capability to hack into all sorts of computers, networks, etc.

SPENCER: Just to clarify, by zero-day, you mean exploits that have not yet been discovered by the people that developed the software. Is that correct?

JOEP: Exactly. Unknown vulnerabilities. Sufficiently capable AI can basically find all sorts of zero-day exploits by looking at code bases, hypothetically. And that sort of capability could be very hard on society.

SPENCER: Yeah, I can see why that would be a complete disaster. Because if you could have an AI scan through really commonly used code, let's say commonly used JavaScript libraries or C++ libraries, find new exploits nobody's ever found, then it could do mass scale hacking based on those exploits. Right now, hackers do this, but it's very time consuming and very expensive. People have to pay a ton of money to buy these exploits, so a lot of it, as far as I understand, is done by government hacking. Is that accurate?

JOEP: Yeah, exactly. So maybe you've heard about the Stuxnet virus, for example?

SPENCER: Yeah. Why don't you explain that for a moment?

JOEP: Yeah. The Stuxnet virus was created to target nuclear facilities in Iran, and nobody's entirely sure who created it — but you know, it's probably some large governments — and that virus used seven zero-day exploits. That is the largest number of these types of unknown vulnerabilities that has ever been seen in a virus. And it basically means that a virus has all sorts of methods, in this case, seven different methods to enter systems and take some degree of control over them. And finding these exploits, as you've mentioned, is extremely costly. I think right now, the cost of a zero-day exploit is like $100,000. But if that cost is reduced to a mere number of cents, that means that in a very short amount of time, a lot of people can find a lot of zero-day exploits. And that could either be used to improve security. But patching systems takes a lot of time. And in that time, between, let's say, at some point, we have this capability, that we can find zero-day exploits in code bases for a very low price using some sort of new type of AI. And this capability is released, so people can use it. Well, obviously, most people are the good guys, right? So they try to patch our software, improve the security of existing systems. But there will also be a small number of people who will try to break into systems or maybe just maximize the amount of havoc that they can cause. And when they have this capability, they have the advantage of time, because all the ones who are trying to do defense will need, on average, a couple of weeks to patch the existing systems. And that means there's a window of vulnerability. So this is one specific way in which I think having an AI with very high capabilities, in this case only cybersecurity capabilities, could lead to really bad outcomes. Our society is completely reliant on technology.

SPENCER: You talked about AI systems being provably safe. I have trouble seeing how that could apply in this scenario. Because if you have an AI that can debug code and look for exploits at all, what's to stop someone from applying it to code that they didn't create? And how could the AI possibly know they didn't create it? So I don't really understand how you could prevent this capability, other than just not allowing AI to debug code?

JOEP: Well, that's a very good point. So there are some people who are saying that making a provably safe AI system is completely impossible. And they make strong arguments similar to the ones you're basically making like, "Okay, if you have a system that can find security vulnerabilities, then it can be used for good or bad. So if it's provably safe, maybe it should never be able to find these vulnerabilities in the first place. Right?"

SPENCER: Well, right. That's what I'm thinking. I just don't see how you could make sure it only would use it for good if it had this capability at all.

JOEP: Yes, that's a very good point. And I'm also not convinced that a provably safe AI system is actually buildable, I have no idea how to build it, I don't even know exactly how to define it. But I do know that, right now, the systems that are being developed are provably not safe. And I think the current paradigm is inherently dangerous and warrants a pause.

SPENCER: When you say provably not safe, what does that mean? Has someone proven that they're not safe?

JOEP: We've already seen the lack of controllability of our current AI models. So what I'm talking about, for example, is the fact that virtually every large language model that has been released has been jailbroken. So that means that someone tries to get the AI to do things that it was supposed not to do. So the AI model is supposed to not make racist remarks, for example, or tell people how to make a bomb. But with the right prompts, basically, every single AI system out there can be jailbroken and turned into a system that does the things that they shouldn't be doing. So when I mean provably unsafe, I'm basically saying the models out there right now are provably not adhering to the safety constraints that they should be adhering to.

SPENCER: How do we differentiate from something like a flashlight, which is also not provably safe, because you could hit someone over the head with them and kill or kill them, and something like an AI system? It seems to me that the bar can't be: you can use it for anything dangerous, because if we had that bar, we wouldn't allow flashlights, right? It seems like we need some kind of other bar here about some kind of level of severity of danger or something like that.

JOEP: That's a very valid question. And I don't know exactly where that bar needs to be. Maybe a provably safe AI system should just not be able to escape human control. Maybe a provably safe AI system should be way safer than that. I'm not entirely sure how we should specify that or how to define that. But I do believe that if you have a group of people who are experts on this topic, for example, mathematicians, people who have strong backgrounds in AI safety, they can probably come up with a good standard or specification that we can then use to decide when we maybe should continue developing the largest AI systems. I think pressing play, again, should at least mean we have some sort of agreement that we can do it, to some degree, safely enough. And the second thing we should agree on is whether we know how such an AI will be used, who gets to own it, who gets to control it, who gets to basically turn their wishes into reality.

SPENCER: I could imagine from the point of view of an AI researcher or a startup founder who's running an AI related company that your proposal might be frustratingly vague, where they might say, "Well, you're suggesting we pause this, but we don't have a clear criteria when it'd be unpaused. So you're just basically putting us into this weird limbo where we can't do anything." What would you say to someone with that complaint?

JOEP: I would say that pressing pause is a reversible option. We can press play again. We can press play when we, in some way, like politicians, democratic societies, decide to press play, we can press play. But once we get to superhuman intelligence, that will be a very, very irreversible thing. So I agree, that's what I'm proposing is a bit vague. I don't know exactly when we should press play, I don't know exactly how to define the conditions for pressing play. But I do believe that just continuing on our current trajectory means that we're basically making an irreversible decision on that.

SPENCER: Another thing people might say is that, well, AI is really cool. It can do things like make art and write essays. But how is it fundamentally different from so many other technologies? And with all these other technologies, by and large, society has gotten by okay. Maybe there have been some struggles, but we've been able to manage them. What makes AI special, that we should try to pause it as opposed to these other technologies where we mainly haven't had to pause them?

JOEP: Intelligence is a really interesting concept, and it's extremely powerful. So intelligence can lead to new inventions, it can lead to new types of other innovations, it can lead to new technologies. So I don't know of any other technology that can lead to so many other types of innovations that we even don't know of as AIs can.

SPENCER: Well, you might think that electricity or the internet might be like that, or oil refining, things like that, that is sort of core foundational technologies that spur hundreds or thousands, even hundreds of thousands of other technologies.

JOEP: Yeah, I think, in a way, electricity and the internet are somewhere at a root of a skill tree. So you can build on them and find new things, and then you can use knowledge of the base thing to get more complicated things. But artificial intelligence can be used to discover entirely new branches. It can be used to, at some points, understand and get to a deeper level of understanding of physics, for example. We're definitely not there yet with AI. But the point that we, at some point, probably will. And when we reach that point, AI will be used or will be able to create all sorts of new technologies that are very hard to imagine at this point.

SPENCER: I think a hypothetical that makes me take this kind of consideration really seriously, is imagining that some company one day could do something like turn on a million Alan Turings and have them work on whatever problem they want at 1000 times the speed of a human mind. And just say, well, a million Alan Turings working at 1000 times the speed of the human mind, holy shit. The amount of stuff that could do is absolutely mind boggling. And the amount of progress that could be made in such a short period might be just beyond anything that we've ever conceived of before, especially if these Alan Turings, unlike Alan Turing, actually have access to most of the knowledge of humanity that's already on the internet, all embedded in their minds already.

JOEP: Yeah, it's absolutely wild to try to imagine what types of innovations could come from such a process. All I know is that it's going to be weird.

SPENCER: So you have been involved in this organization, PauseAI. And as I understand it, you've been doing some protests to try to spread this idea of pausing. Is that right?

JOEP: Yes, that's correct.

SPENCER: And what's the nature of those protests been? And what kind of reception have you had to them?

JOEP: The protests themselves have been pretty small so far. I think the largest protest was maybe 20 people. We protested in a lot of different countries: Brussels, the Netherlands, Belgium, there were protests in Germany, Denmark, US, and the United Kingdom. A lot of them in the United Kingdom, actually. And people tend to receive these protests actually pretty well. People on the streets tend to be interested and friendly. I think part of what makes the protests a bit different from other types of protests is maybe the type of audience who's currently protesting. So many of the people who are currently protesting are maybe AI engineers, software engineers, people with technical backgrounds — something that we're really trying to diversify and work on — but yeah, it makes for a unique group of people who are protesting.

SPENCER: Where do you run these protests? Is it a tech company headquarters?

JOEP: Sometimes it's at tech companies, sometimes at Parliament, for example, or a relevant location where an event is being held. So there's this whole process of basically having discussions and trying to find the best suitable location. You want the location where there's other people who need to make sense from a narrative perspective that you're protesting at that location.

SPENCER: And what's your theory of change? What's the idea of how these protests actually lead to things being different in the world?

JOEP: Protests have been very effective in the past in making a topic more well known, getting people to think about a topic, getting people to talk about a topic. And how that typically works is that protests are covered by journalists and are shared on social media. That gets people to talk about the subject, and that can help shape the narrative, in a way.

SPENCER: How has the media covered your protests so far? Like what kind of coverage and what is the angle they've taken on that?

JOEP: Our very first protest was in Brussels. I reached out to this journalist who was there and basically said, "Hey, we're going to protest at a location where Sam Altman was going to speak. And it's probably gonna be the first anti-AI protest in the world, where we're basically saying, 'Hey, this technology is dangerous, and it's an existential threat. We should pause.'" And the journalist was there; we were there with seven people. The journalist asked a bunch of questions. We talked about AI safety for half an hour. And then an article appeared on Politico. The title, I think, was "The rag-tag group trying to pause AI." And it was just a very fair description of who we were. We were not professional protesters; we're just a bunch of people who were concerned about AI, and who said, "Okay, wow, this stuff is scary. Let's do something about it."

SPENCER: Have you gotten the sense that journalists are sympathetic to your cause, generally, or they're skeptical of your cause, or they're kind of unsure what to make of it?

JOEP: I think most journalists have some degree of concern. But journalists also have this interesting tendency where they don't actually fully internalize what they're hearing. They have some sort of maybe a safe distance from the things they are hearing. I think that's important if you're a journalist. You cover all sorts of horrible or extreme things, and you need to have some sort of mental sanity. So I think many of the journalists were a little distanced to the things that we're talking about. I feel like we're having a conversation about such an emotional subject and such a big important thing. I think most of the coverage so far has been pretty fair and fair and good. But I kind of worry that, at some point, the narrative of pausing AI will become boring. So I think when we started out, we had this look that nobody was doing this. And then journalists have a new story to tell. But at some point, that story gets boring. So we noticed the first protests were covered well. Later on, we did a bunch of protests, they weren't covered, basically, at all. So what we're currently focusing more on is basically growing the movement, getting people to join the cause. People are doing firing sessions, some people are making videos on YouTube. There's all sorts of projects that people are doing at PauseAI. And protests are just basically a small part of that.

SPENCER: Have you gotten backlash, for example, from AI researchers?

JOEP: Well, mostly on Twitter. There is this group called e/acc, which stands for effective accelerationism. Most of the group tends to believe that AI safety is just not worth the efforts and pausing is a bad idea. So that group is really, really strongly against what PauseAI is pushing for. But so far, no real backlash from the media, for example.

SPENCER: So the accelerationists, what are the kind of arguments they make about why this is a bad idea?

JOEP: So yeah, I should steelman the case for e/acc. I think the primary argument that people make is, AI does not pose any catastrophic risks. That's the number one argument that people tend to make. I think that argument seems hopelessly naive in a way or optimistic. There are a lot of different scenarios where AI can go wrong. We already talked about cybersecurity risk. There's also biohazard risks. There's rogue AI risks. And most AI scientists believe that some of these risks are definitely something to be concerned about. And ignoring all of them, to me, seems hopelessly naive. But there is also a second argument. And that second argument that accelerationists tend to make is that it's better if everybody has their own AI. So if we all get our own superintelligent AI, we basically distribute the power. Everybody has their own omnipotent god in their own pocket, in a way. Problem with that argument, in my mind, is that the universe where multiple superintelligences exist is extremely unstable. What if one of these people uses their AI to take control of a part of land, or try to disable the other AIs? What happens? I'm afraid that the universe quickly converges onto a state where only one superintelligent AI exists. And I think it's going to be a very chaotic place for a while. What do you think about this? Would a world work where many people have their own superintelligent AIs?


SPENCER: If by superintelligence we mean, for example, an AI that's better at 99% of tasks than 99% of humans or something like this, so it can do almost everything that almost any human can do better than almost every human. Then that also implies it could do things like hacking. It can do things like manipulation, like social manipulation. It could do things like trying to make money in lots of different ways. That seems to me very, very unstable, if everyone has something like this. However, if they were extremely well-constrained, for example, if we had a really, really good sense of how to limit them to prevent them from causing harm, then maybe you could imagine a stable world where everyone has an AI in their pocket. But if you haven't solved the kind of constraint issue, then it just seem, to me, extremely unstable as a situation because, as you say, eventually some people are going to use the system to do things that either are going to cause harm in various ways, or try to wrest control from other people and so on. And so, yeah, I don't see how that leads to an equilibrium that we like.

JOEP: Yeah, I fully agree with this. And I think it's part of the reason why many people who are currently accelerationists should, in my view, probably also want to support a pause, because we're not just asking for stopping AI developments right now at the largest language models. But we're also saying, let's only continue if we know how to use this properly, how we can perhaps distribute this power properly, how we can democratize this properly. Because right now, I think we're on a trajectory where I don't expect it to be well distributed. And even if it is distributed well, as you said nicely, we won't probably get a nice, stable equilibrium with everybody having a very intelligent AI in their pocket.

SPENCER: I want to run by you a couple arguments that I hear against things like pausing AI to see your response to them. The first is: people argue that this idea of thinking of an AI as superintelligent is sort of a misguided concept from the beginning, because it sort of treats intelligence as though it's like a single thing. But we know it's not a single thing. And one way we know that is if you look at something like GPT-4 today, GPT-4 today is better than almost all humans — maybe even all humans — in certain things already. For example, its knowledge base is far more than any human in history has ever been able to attain. And yet, there's still ways that it's dumber than most humans. And so, this idea that, 'Oh, we're just gonna get this thing that's super intelligent, and then it's going to be smarter than everyone,' they argue, it doesn't really make sense. Really, we need to think about specific capabilities instead of just imagining this sort of magic genie that can do everything.

JOEP: I agree with that way of looking at intelligence. I think, indeed, it is a common trap, basically, to think of intelligence as a sort of linear property, where it goes on an IQ scale, for example, and GPT-4 may have some IQ at some level, but intelligence is a highly dimensional thing. But I think when we're talking about superintelligence, we are talking about a thing that is way more capable at practically everything than all people. And that will, by definition, also be more capable at these dangerous capabilities. And I think talking about dangerous capabilities is probably one of the things we should do way more often. So when we're talking about dangerous capabilities, I'm talking about cybersecurity capabilities, but also human manipulation. And I think some capabilities are very abstract and very dangerous because they can lead to basically other capabilities. So if you have an AI that is very good at reasoning, simply extremely smart in the sense that it can find new conclusions from any given set of data, I think that capability, on its own, is probably a dangerous thing, because it can lead to very unpredictable new types of innovations and technological breakthroughs that can, on their own, be very destabilizing.

SPENCER: I think this kind of critic, though, will say that it's silly to talk about superintelligence given that we know that intelligence, when you're talking about AI systems versus a human, isn't just a single variable; it's all these different variables. They're kind of skeptical that we can even say, "Well, assume there's eventually going to be superintelligence, what's going to happen?" They'll say, "No, no, it's just not the way it's gonna work. It's just gonna improve at different specific capabilities."

JOEP: So one way of thinking about this is to try to visualize it. I know, it's difficult to do that on audio podcasts, but we can try it anyway. So you probably know these circular graphs, where there's a bunch of dimensions. So you have a spike going on the top, and the right, and bottom right, etc. And every single of these spikes represents a certain capability in our story. So then you can imagine that there is some shape that GPT-4 has. So GPT-4 probably has a way higher spike at knowledge than a regular human has. And maybe it has a smaller spike at the level of reasoning capabilities. And then there's also this other shape for a very dangerous AI, like there exists some shape of an AI that is dangerous, and it has a bunch of capabilities that, when combined, could lead to very, very dangerous situations. I'm not entirely sure what the shape looks like. I'm not entirely sure how you could predict that shape. I know that there are some capabilities that we know will be dangerous when they are created. But I think if you think about such a shape, you can probably imagine that there could exist some type of AI that would be definitely dangerous.

SPENCER: So is the argument, then, it doesn't really matter whether it becomes super intelligent. Maybe superintelligent means if you think of each of those capabilities as being a spike on your chart you're imagining, that superintelligence would have a really high level on every single spike. But I think you're saying, well, that doesn't really matter. What matters is that some combinations of having high enough spikes at some set of those skills leads to extreme danger of different forms, whether it's someone using the AI to build a bioweapon that wipes out 99% of people on Earth, or using the AI to hack millions of computers worldwide simultaneously, or using it to help them create an authoritarian government, whatever. The point is, we need to avoid any of those configurations, not whether it's actually technically superintelligent.

JOEP: Yeah, well said. Exactly. I think we're focusing often too much on what superintelligence actually exactly is, when we should be talking way more about which capabilities are dangerous. But from a policy point of view, I think it does make sense to just say, okay, everything above this threshold should probably be considered dangerous because, in practice, it is very hard to predict which exact capabilities the system will have, especially in the current paradigm of training AIs.

SPENCER: Another argument that I hear people make, that makes people not as concerned about these issues that would lead to thinking we should pause AI, is they say, "Well, we're really not actually that close to building something that's that smart or that dangerous." They say, "Well look at the current systems, they have all these limitations. We haven't had any massive disasters caused by AIs yet. You think that if they were really that close to being as powerful as people claim, we would already be seeing some really bad things occurring from them." What's your response to that?

JOEP: I think it does make sense. You don't want to pause too early, right? You don't want to prevent too much types of innovation in AI. Because, obviously, these systems can also be very valuable. So, you want to pause at a moment when the risk becomes unacceptably high. And it is a very hard thing to predict when exactly that is. As I mentioned before, we don't know how to predict which capabilities an AI system will have in advance. And I think, if we look at the average estimates that AI researchers have on when we get to superhuman capabilities, these have been dropping extremely fast in the last few years. Basically, since the release of GPT-3 and GPT-4, these estimates have dropped quite rapidly. Then, you need to stop before we're too late. And then you need to ask yourself, how much risk are you willing to take? Are we going to wait until the median estimate? Are we going to wait until the bottom 10% estimates? Are we going to take 1% chance that we'll have AI and then pause? I think we've already reached a point where we're above a 1% chance that dangerous level AI is created in the next training run. And that's why I would say the moment to pause is right now.

SPENCER: How does the general public right now think about this topic? If you know, in surveys and interviews, what do people who are not AI experts, who are not involved in this field, but who just maybe have tried out a few of these systems, say?

JOEP: The public is actually quite concerned about AI, more than I expected it to be, especially in the United States and the United Kingdom. And the public also quite broadly supports either slowing down or pausing AI developments. I think there was a survey last week that showed, I think, three to one support for an international treaty that bans the creation of superintelligent AI. And earlier surveys, I think, showed about 70% support slowing down and 60% support, again, banning superintelligence. So the public is actually quite in favor of drastic policy measures. And I think it's a real shame that politicians right now are not even thinking of these policy measures as something to be taken seriously.

SPENCER: I wonder if politicians might think, "Well, sure, in a survey, people say that they have this attitude, but this isn't really the thing that they think about day to day. What they think about day to day is their health care, or the education of their children, or hot button political issues that make them hate the outgroup. Not AI development."

JOEP: I think you're probably right. I don't think AI right now is a huge topic that people talk about a lot, especially not the type of risks that we at PauseAI are mostly concerned about. But people are, in fact, at some degree, concerned about this. It's really fascinating to me how little politicians are currently doing in order to prevent the possibility of human extinction right now. The risks are so insanely high, that I really just can't grasp the lack of action by some people.

SPENCER: My understanding is that Biden took some steps towards moving in these directions. What was it that he did recently?

JOEP: So in Biden's executive order, from my understanding, it did require these companies to do pre-deployment evaluations, which basically means that before you release an AI to the public, you test which types of capabilities it has, and you make sure that it doesn't do the wrong thing. And this is obviously a good thing. So there are people who are working on this problem. And we have seen serious policy measures arising in the past year, not only in the US, but also from the AI Safety Summit, there were pre-deployment measures being proposed. But I think these pre-deployment measures are not enough. They still allow companies to create dangerous AIs. They can create AIs that may be able to take over the world or may be able to hack every system on the planet. And yes, maybe they won't release it to the public. But the fact that that AI exists somewhere at a server park, is already a danger in itself. Maybe the weights are leaked; we've seen this happen with the Llama AI by Meta, for example, that some hacker gets access to the weights, for example. It's also possible. Also, things can go wrong inside the AI lab itself. So even before deployment, things can go wrong. And obviously, these evaluations aren't perfect, so they are likely to miss certain capabilities. And then there's another problem because these capabilities that are present in a trained model can often be extended at runtime with other types of tools. So we've basically seen how GPT-4 became way more powerful by having access to all sorts of APIs, or having access to long-term memory using another extension. So expect more of these types of innovations to occur, and it's very hard to predict which types of innovations they will be. But we can expect AI models to become more capable and have new types of capabilities that weren't present in the base model.

SPENCER: Another piece of criticism I hear regarding this idea of pausing, is that if we require pauses, then the worst actors will be the ones that don't pause, and then they'll catch up or even get ahead of the more responsible actors. For example, it could be unscrupulous companies that break the law. It could be maybe individual rogue groups that don't care about what the law says. It could be countries that maybe are less concerned about AI safety that will then take the lead. How do you respond to that?

JOEP: It's definitely a risk. If we pause and we don't do it well, we don't do it carefully, then I do expect people to continue. So, you don't just have to tell companies like, " Oh, please stop training these very large models, or else you'll get a fine." You need to make it actually hard to do so. So one way of preventing dangerous AI models from being trained, in rogue settings for example, is you can do compute tracing. You can track the sales of GPUs, for example. Virtually all these chips are created in Taiwan by TSMC. Basically, all the hardware that TSMC is getting is coming from this Dutch company called ASML. NVIDIA is currently producing pretty much all of these AI training chips, like H-100s. So the pipeline is very much controllable and centralized. That means that with compute governance, we can actually prevent actors from getting their hands on very powerful hardware that is currently required to train these dangerous models. So I think with proper governance, you can actually prevent these rogue actors from getting their hands on such a model.


SPENCER: You mentioned earlier that this is an emotional topic. But to most people, this is not an emotional topic. And so I wonder, how do you feel about this on an emotional level? Do you believe that you're going to die from these AI systems? Or do you think of it more like a distant possibility that you, on some level, don't emotionally fully process?

JOEP: Yeah, this is a topic that I find really fascinating because I noticed with myself that I first have been thinking about this topic for like seven years and it wasn't emotional. I was just thinking about AI safety as this abstract, but very interesting topic. It was just interesting. But it was also not really applicable to me. It wasn't going to happen in my lifetime. It wouldn't happen to my family and friends. It would happen sometime in the future, and it's somebody else's problem to fix. I have my own life; that was basically the way I looked at it. And then GPT-3, GPT-4 happened. And at some point, I basically started to emotionally internalize that this could happen in my lifetime. And it could happen pretty soon. And I think that was the moment when it started to become emotional for me. So basically, after internalizing all of this, around the time when GPT-4 was launched (I think that was it), I just cried quite often, actually, maybe once every few days, for maybe two months. And I think, in retrospect, it was a process of grief. It may be similar to having some sort of very dangerous diagnosis for a disease, and you have to accept the fact that, at some point, you could die from this. And that's a very sad thing to internalize. The more I thought about this, the more I felt like everybody around me was crazy in a way. It's just so frustrating to see if people actually believe the risks are real and possible, that they don't act as if they are real and possible. And I think it has everything to do with internalizing it emotionally. If you hear about existential risk from AI, or the other types of risk from AI, and you do take them seriously on an intellectual level, you'd probably ask yourself, are you taking these also seriously on an emotional level? Because I think the two are very different in how you act on them and whether you start doing things in a different way in your life. So how was this for you? Are you worried about AI?

SPENCER: I am worried about it. I'm worried that it will cause tremendous damage to society. I have trouble getting an accurate, confident sense of when it will happen. I think I'm extremely uncertain about that. But I do, yeah, it is in the back of my mind; a concern that does come into my mind fairly regularly.

JOEP: And how do you process that emotionally? Do you envision things may be going wrong in your own life or it's just so abstract?

SPENCER: I think it's mainly still abstract. Occasionally, I think about how it could actually manifest, like what could actually happen if you build a sufficiently intelligent rogue AI system that we don't have to control? But a lot of that, I think is a more abstract kind of worry at the back of my mind,

JOEP: Do you feel sadness because of that?

SPENCER: I would say more anxiety. I think of sadness as more an emotion I and others are likely to feel when you kind of internalize something as lost, whereas anxiety is more something might be lost. There's an uncertainty around it, and so I tend to process it more as anxiety. How about you?

JOEP: That's interesting. I don't think I feel anxiety because of this. But yeah, mostly just sadness. And that's why I came up, I basically mentioned that metaphor of having some sort of diagnosis because, in a way, I think there's many ways of processing such news. And there's a lot of ways in which people cope. And for me, it's probably a combination of trying to prevent it and act on it. But also, the sadness part, I find it also very fascinating to see how other people cope with hearing about the existential risks from AI. So, when you tell people about this, many of them just shoot in complete denial, right? There's a couple of ways in which people deny it. Sometimes they say, "I just don't believe it. It's just sci-fi nonsense. I just don't believe it. It's very simple." And sometimes people do believe it, but they will say, "Yeah, I just don't want to think about it." I heard this response a lot. I also had a response from people where they say, "Yeah, that could happen. But then it doesn't matter, right? Because we'll all be dead. And that's fine." Many people have this gut response to hearing about these tremendous risks as being some sort of okay with them. And yeah, I don't really understand how that works, but I think it's some sort of coping mechanism.

SPENCER: Well, suppose someone tells you that the clothing shops that you shop at are using child labor that's exploited. Well, what do you do in that instance? Well, one option is you disbelieve. Another option is you keep shopping in the stores, but you just kind of feel guilty about it. Third option is you say, "Well, maybe they're using child labor, but maybe those children are better off, right? Maybe they'd be even worse off if they weren't working in the factory." A fourth option is you go shop somewhere else, and you just try to find other stores. But I think with something like this, it's even worse than the child labor situation, because the child labor one is at least there's a concrete action people feel they can take like, okay, at least I could stop shopping at the stores. While some percentage of people will deny it or rationalize or whatever, some percent will be like, "Okay, I'll do it. It will be a nuisance, but I will find other stores to shop at." But with this, I think it's even worse, because people just don't know what they could even do.

JOEP: Yeah, I think you hit the nail on the head. An idea is so much harder to internalize. A really, really bad, dangerous type of scenario is really hard to internalize, if you have no way to protect against it. And I think the fact that people will have no way of preventing this is part of what makes people shoot into denial. It's just the lack of action. One of the reasons why I'm trying, at least with PauseAI, is to give people a source of hope that there's something you can do: you can protest, you can send letters to your representatives. Many people support a pause; we can actually do this. I think if you give people a really dark piece of news and you combine it with something that they can do to prevent it, maybe they are more likely to actually believe the bad piece of news.

SPENCER: So what are the concrete actions? Let's suppose someone wants to do something really specific.

JOEP: I think the first thing you should do is get in touch with other people who are working on this. So for example, at PauseAI, we have a Discord server with like 800 people who are talking about this every day. So we're just talking about what we can do. How can we be most effective? And many people have different skills. Some people are really great at designing cool posters. So maybe you should help out with designing a poster. Other people have great writing skills, and they should send an email to their representative, for example. Other people have great policy ideas, and they should help out with writing policy. Some people can help out with solving the alignment problem by diving into AI safety. I think pretty much everyone has a way to contribute. But you just have to see all the types of things that you could contribute to, take something and help out with it. I'm pretty sure 99% of people could help out in some way.

SPENCER: What do you think about the psychology of people that lead these top AI labs? And I don't mean the ones that are just like, "Yeah, it's totally safe. There's nothing to worry about." I mean, people who seem genuinely concerned. They will acknowledge publicly, "Yes, there's dangers of AI. We have to be careful." But they continue working on building it. And they work on not only building it, but pushing the envelope, building the better models, the faster models, the bigger models.

JOEP: Yeah, it is hard to speculate, of course. But it must be tough. I'm absolutely sure it must feel like a huge weight on your shoulders. Like, if you're Sam Altman or Demis Hassabis, and you understand that this technology can be so incredibly dangerous. I'm not entirely sure what's going on inside their heads, but I can imagine that what they're thinking is 'rather me than somebody else'. Like, "I think it would be better if I lead this company and make the decisions because I can do it safely. I trust my own character more than I trust any other AI lab at this point." I can imagine that that's the sort of sentiment that gets them to say, "Okay, we should go ahead and do this thing, because the alternative is just somebody else gets to do this, and that could be very wrong. It could be very dangerous."

SPENCER: I suspect you're right. I suspect that if we take Sam Altman and Demis Hassabis, who lead OpenAI and DeepMind respectively, I think that they have both heard extensive arguments about why AI can be really risky. I think that they have spent tons of time thinking about this. I don't think that they just dismiss them as ridiculous. I mean, maybe they dismiss some of the concerns that are ridiculous. But I think they do take a bunch of them seriously. So yeah, I suspect that you're right, that they think, "Well, I'd rather do it," "I'd rather be in charge of it," "I think someone else is going to do it, so better me than them." Maybe a bit of a sense of being the hero in the story. If they can build it safely, then AI could usher in really incredible benefits to society. If you could have millions of synthetic minds working on solving the biggest problems in society, and these mines are far smarter than the smartest human, you can imagine that you can make incredible progress. And so, maybe that's also a motivating factor thinking, "Well, not only can I do it safely, but I can usher in a new age for society that could be far better than what we've ever had."

JOEP: Yeah, I can totally understand that that's like a primary motivation. Especially if you combine that with, let's say, a large chance of things going right, then I can totally understand why people would go ahead and take some degree of risk. But I think that for the rest of society, it feels very rare to have a small group of people basically deciding the fate of all of humanity, and us not having any say in this, right?

SPENCER: You know, I suspect there's also another aspect here, which is this idea of: can things be done iteratively? So I suspect that Sam Altman thinks that the best way to build safe AI is to iterate to it, to build one version, see what problems it has, fix it, build the next version, pay close attention to where it's going wrong, and kind of iterate your way to it. And I think that if you have that mindset, then maybe his strategy makes some sense. And I'm curious to hear your reaction to that, if people think, "Well, the only thing we can do to make it safe is through this iterative process."

JOEP: I remember when OpenAI launched their GPT plugins. About one week before Sam Altman announced that they launched the plugins, their Head of AI safety, Jan Leike tweeted something in the sense of, "Let's not rush ahead and integrate AIs with everything in our society before we know how to do it safely." So I think many people at OpenAI also believe that we need to be careful with deploying things to the public. I know that there's a lot of potential for things to go wrong, because we don't know exactly what is going to happen. And I do understand at some level that you want to iterate, you want to try things out, you want to give normal people a feeling of what this technology can actually do. But I think this focus that these companies have on releasing models as quickly as possible is mostly simply the result of economic pressures. They want to win, they want to be the best, they want to be the best company, the most profitable ones. And there's all sorts of incentives that lead them to deploying their models as soon as possible. I know that GPT-4 wasn't released for seven months, because they did RLHF, this really intensive process of making it more aligned, basically more controllable by humans. And they wrote an extensive report on AI safety before they released it. But then Google came out with their Bard model. And there was no mention of safety at all, because Google knew they were kind of lagging, and they just had to rush and get this model out as quickly as possible. And I feel like that's the current situation we're in. We're in a racing dynamic, where companies have a lot of incentives to rush ahead and release their models as quickly as possible. So even if iterating would be safe in some way, I don't think that's the argument that's currently being used. It's probably just market pressure and capitalism.

SPENCER: Yeah, I think there's kind of two aspects to that. One is, even if you're iterating carefully, you may be pushing others to iterate less carefully, or just to leap ahead through a competitive dynamic or through them learning from what you're doing. I also wonder whether an iterative process really is safe. And I think it depends on whether you think capabilities will suddenly change. If you think there's this really smooth curve of capabilities, and maybe you can kind of keep predicting what's about to happen with the newer models. But if you think that there might be one day when they train a new model, and it actually has extremely different behavior than they've seen before, kind of the difference between GPT-2 (which could kind of write cute stories), and GPT-3 (which suddenly started to actually feel like it could do real work), and GPT-4 (which actually starts to feel like you have your own little digital intern). There's a qualitative difference in their behavior, and you can imagine GPT-5 or GPT-6 or GPT-7 actually behaving so differently that they can't predict how it's going to behave. And the iterations are just too big, essentially.

JOEP: Yes, I think, already, the iterations are incredibly large. And it strikes me as odd to see people like Sam Altman, for example, he mentioned that they consider these unpredictable capability advancements as a fun little guessing game in their own office. So yeah, it's worrisome to hear that even the AI labs themselves really have a hard time predicting which types of capabilities are going to come out from the next model. And the training runs are so incredibly expensive, so if you make a new iteration, you probably have to make it way larger. So you kind of have this economic incentive to make a large jump and not baby steps.

SPENCER: If Sam Altman or Demis Hassabis or people in similar roles ever hear this, what would you want to say to them?

JOEP: Please, Sam, Demis, if you're listening to this, consider, again, the possibility of supporting a pause. I don't think it's too late. I think we still have time to take this race and slow it down. I think all of humanity depends on your support for this policy measure.

SPENCER: And what would you say to them about what they're getting wrong? Or what about their actions do you think is irresponsible?

JOEP: I think training a very large language model or a new AI is probably safe, but there is some nonzero chance that is increasingly larger, that is going to be the worst thing to ever happen to mankind. If you're in charge of an organization that builds this thing, you have almost full responsibility for things going wrong. I really want them to internalize what this responsibility actually means. It's just so hard for me to think about how I would feel if I were in their shoes. I would never end up in that position, I'm pretty sure about that. If I were as smart as them as...I could probably find... [sighs] I just want to know what they're thinking. I just want to understand their actions way better. I just want to sit down with them and ask them about: how do we navigate to safety? How do you know for sure that if you train a new model, that everything is going to be fine? And I think part of what we differ in what we believe about the world is whether a pause is actually feasible. I know, for example, Elon Musk has said that if he had a button to pause AI development, he'd definitely press it. But at the same time, he starts a new AI company, because he says, "Well, I don't have the button, I can't do it. So I'm going ahead and just start a new AI company and make the race a little bit more intense." In a way I kind of get that. But I would really wish for them to push way harder for the pause option.

SPENCER: One thing I wonder about is, if it would be good if companies like OpenAI and DeepMind, which is part of Google, would collaborate more and work together more to try to combat competitive dynamics. What do you think about that?

JOEP: Yes, that would be probably a huge, huge improvement. Have you heard about the magic proposal, for example?


JOEP: So there's this proposal by this other advocacy group called ControlAI. And they're basically saying what we should do is not just pause AI development, but we should centralize all further AI development, and we should make sure that this centralized organization has the only rights to further capabilities and for further AI frontier models. So the idea is to have one singular organization that is allowed to make these capability gains. And I think that's probably a way safer world to be in. But I can also see why people are concerned about this. Because you have this huge amount of power centralization. This new organization, this new thing will, quite literally, if they manage to build superintelligence, have all the power in the world. But then again, I would rather have that to be some sort of thing that is under democratic control, than have that be OpenAI or DeepMind or any other company for that matter.

SPENCER: Why do you focus on pausing as an approach to making AI safer? Why not use other strategies?

JOEP: If I were to see a better strategy, then I would definitely start advocating for that. But I haven't heard of any better strategy so far. Which ones do you think would be interesting alternatives?

SPENCER: Well, if I think about other people who are very concerned about safety and what they are doing, some are focusing on policy. So they're trying to get the government to increase policies around monitoring, around deployment, and so on. We talked about that a little earlier. Other strategies I see are more focused on influencing these larger AI companies, like trying to influence OpenAI or influence DeepMind. Then there's strategies that are really focused on the technical side, really kind of saying, "Well, the best thing to do is actually trying to figure out how to make AI safe. And then if we can figure that out, then we can try to get these large players to adopt our approaches." So those are three strategies that are different from pausing, or not necessarily that they run against pausing, but they are kind of at least complementary and take different focuses.

JOEP: I think all of these strategies are valid and should probably happen more than less. The first you mentioned was policy measures. I think what we're proposing with PauseAI is, in fact, just another policy measure. But you specifically mentioned evaluations and pre-deployment type of checks, and we already discussed that I think that these are just insufficient, because you still allow dangerous AI from being created. You also mentioned doing safety work. And again, that's great, people should definitely contribute to AI safety. I just think the group of people who are doing AI safety will need to have more time, I just want to buy that group of people time. And that's my primary reason for pausing: to get them to work longer on this problem.

SPENCER: Do you have any final thoughts you want to leave the listener with before we wrap up?

JOEP: Yeah, sure. I think most people can agree that AI poses a large amount of risks. And some of these risks are so incredibly bad, that our minds are basically incapable of thinking about it. We are really bad at getting horrible news and internalizing it. We can barely deal with the death of people or our own mortality, let alone the possibility of human extinction. And I think right now, the biggest problem for humanity is that we are not able to have the right emotional response to the risks that we're facing. What we should do is, take it easy, implement policy measures that make sure that nobody's building the technology that is extremely dangerous, before we know how to do it safely. And if you feel like this is a good idea, maybe you should join a movement and help out. I'm pretty sure there's a way you can.

SPENCER: Joep, thank you for coming on.

JOEP: Thanks so much for having me.


JOSH: A listener asks: you had a bunch of psychology related guests on recently. What is your synthesis of the different viewpoints? For example, CBT versus metacognitive versus super shrinks?

SPENCER: It's a tough question because there's so many different perspectives on this issue. But the way that I think about it, I certainly believe that there are certain attributes that make someone better or worse as a therapist. For example, I think if a therapist has contempt for their client, that can be a really big problem. And I think if they have empathy, that can be really helpful. I also think that the emphasis on learning, where you give feedback forms and you actually get feedback, can really help a therapist. I think it can help in almost any profession, if you can create more feedback loops that actually tell you how you're doing. That can be really useful. At the same time, I really believe that some psychological methods work better than others at treating specific problems. And partly, I think this is because some of the psychological theories are truer than others. But also, just because I think that there's a real causal structure to many problems that people have. And with certain types of causal structures in the world, there are going to be certain kinds of solutions that help untangle that causal structure. So for example, let's say someone has a lot of fear interacting with other people. And as a consequence, they start avoiding social interaction. It makes sense that over time, they don't really have the feedback mechanism to kind of learn to build that confidence because they keep avoiding interaction. Exposure therapy says, "Okay, we can break the cycle, we can start to put people in situations where they're actually pretty likely to do okay, or at least not fail catastrophically, where they get to have social interaction." And then the brain starts to learn, "Oh, wait, maybe this is not so dangerous, not so scary," and it can kind of break that cycle. So that's one example where I think having a deep causal understanding of the mechanisms help lead to a treatment that is actually effective. And I think that there are many different types of effective psychological techniques, but they need to be matched to the kind of causal structure of the situation someone finds themselves in.




Click here to return to the list of all episodes.


Sign up to receive one helpful idea and one brand-new podcast episode each week!

Contact Us

We'd love to hear from you! To give us your feedback on the podcast, or to tell us about how the ideas from the podcast have impacted you, send us an email at:

Or connect with us on social media: