Episode 217: Concrete actions anyone can take to help improve AI safety (with Kat Woods)

Enjoying the episode? Want to listen later? Subscribe on any of these apps or stores to be notified when we release new episodes:

Listen on

Apple Podcasts

July 4, 2024

Why should we consider slowing AI development? Could we slow down AI development even if we wanted to? What is a "minimum viable x-risk"? What are some of the more plausible, less Hollywood-esque risks from AI? Even if an AI could destroy us all, why would it want to do so? What are some analogous cases where we slowed the development of a specific technology? And how did they turn out? What are some reasonable, feasible regulations that could be implemented to slow AI development? If an AI becomes smarter than humans, wouldn't it also be wiser than humans and therefore more likely to know what we need and want and less likely to destroy us? Is it easier to control a more intelligent AI or a less intelligent one? Why do we struggle so much to define utopia? What can the average person do to encourage safe and ethical development of AI?

Kat Woods is a serial charity entrepreneur who's founded four effective altruist charities. She runs Nonlinear, an AI safety charity. Prior to starting Nonlinear, she co-founded Charity Entrepreneurship, a charity incubator that has launched dozens of charities in global poverty and animal rights. Prior to that, she co-founded Charity Science Health, which helped vaccinate 200,000+ children in India, and, according to GiveWell's estimates at the time, was similarly cost-effective to AMF. You can follow her on Twitter at @kat__woods; you can read her EA writing here and here; and you can read her personal blog here.

Further reading

JOSH: Hello, and welcome to Clear Thinking with Spencer Greenberg, the podcast about ideas that matter. I'm Josh Castle, the producer of the podcast. And I'm so glad you've joined us today. In this episode, Spencer speaks with Kat Woods about slowing AI development, the potential effects of superintelligence on human suffering, and the difficulty of controlling super intelligent systems.

SPENCER: Kat, welcome.

KAT: Hi, thanks for having me.

SPENCER: Thanks for coming on again.

KAT: I'm excited.

SPENCER: Why do you think it's so imperative that we slow down AI development?

KAT: This is a thing I've been thinking about a lot. And the main thing behind it is that we're at this really incredible point in history where we are creating a new species, essentially, right. So I've always kind of really wanted to, I always thought it'd be really cool to be alive when we discovered intelligent species on another planet or something like that, right. And the cool thing is, we actually might be having that happen, except instead of meeting a new species, we're actually creating one. And right now, basically, AI is getting smarter and smarter. You know, just a few years ago, it could barely do anything. And now it's scoring above 100 on IQ tests. The median human is roughly smarter than half of us on definitions of intelligence. And soon, it's going to be smarter than most of us are on all definitions of intelligence. And we don't really know how to control something that's smarter than us. We already actually have super intelligent species on the planet compared to other animals. And it's us. We are way, way smarter than all the other animals, and look at how that's turned out for the other animals. It's not good and so we're getting really, really close to that metric. Before we were much further away. But now it could happen any year now. And given our current knowledge, we don't actually know how to control it or make it safe. And so we just need more time is the main thing, I don't think we should never build it. I think we should build it when we know how to do it safely. And right now, we don't.

SPENCER: I know my listeners might be wondering, well, is it really that smart? Because all the time, you can see examples of AI doing weird or dumb things or, you know, even screwing up a math problem that humans would get right.

KAT: Oh, absolutely. And I think the way to kind of think about AI is to consistently also compare it to humans. We often say, "Oh, look, it gets this math problem wrong." And I ask what percentage of humanity would also get that math problem wrong? I'm not sure if you remember math class, and especially everybody in your class. I think a lot of times, especially the listeners to this podcast are probably going to be smarter than average. And they tend to hang out with other people who are smarter than average. To actually think about, well, what's the closest you've ever had to a representative sample of humanity? For most people that's their high school class. And then think, okay, how well did the bottom half of the class do on that question, or would have done on that question? And I think you'll find that a lot of times, it's like, they would have done pretty poorly, actually. And that's what we're saying. We're not saying it's currently smarter than all humans, it's that it will be and right now, it's about smarter than half. It's also actually way smarter, by some definitions of intelligence. There's fluid intelligence versus crystallized intelligence. Fluid intelligence is how you deal with new situations and think about new problems. And then crystallized intelligence is basically how much do you know. And on the crystallized intelligence side of things, it's already superhuman, it knows more than anybody on the planet, it has read all the books and all the internet. And it knows way more. But on the fluid side, it's still about the 50th percentile.

SPENCER: You mentioned alien minds earlier. And I think that's a really good analog here. Because to talk about the intelligence of an AI system, we want to make an analogy to human intelligence. And yet, as you point out, there are certain ways where it's already way, way superhuman. And there are other ways where it's actually much dumber than humans. And so, if we talk about, well, how smart is it? On average, compared to a typical human, it's sort of a difficult thing to figure out because it's not really on the same scale. It's sort of intelligent in different ways and dumb in different ways.

KAT: Exactly. And also, what happens with humans is say, the IQ test. It's measuring actually a whole bunch of different elements of intelligence that tend to correlate with each other. But they tend to correlate with each other in humans. So it tends to be that if you're good at math, you tend to also have a good memory and be good at spelling. But that's just on average. What happens with humans, there's no particular reason an alien intelligence or mechanical intelligence would have that. And even within humans, we still don't have it. I ended up becoming friends with this particular group of people because they're all part of the gifted, disabled group in their school. Where they were incredibly smart, but they each had one thing where they scored really poorly. So, one particular friend was a genius on pretty much everything. But then couldn't spell anything, he was dyslexic. And we're going to have a lot of that with mechanical intelligence, where they're going to be really, really good at remembering everything they've read on the internet and be able to read faster than anybody, but then they might still have trouble with tic-tac-toe, you know. So, it's going to be spiky.

SPENCER: So you mentioned earlier that we need to be careful before we build it. What is it that we have to be careful of before we build?

KAT: The way I tend to think about it is, sometimes people talk about artificial general intelligence. So this is, okay, it's intelligence in a wide range of things. So a calculator is a narrow intelligence, it's very good at math, but it can't do anything else. And that's one-way people talk about it. And other people talk about a superintelligence where it's okay, this, it's just super intelligent. It's vastly smarter than us. I tend to think of it instead, those are all potential ways that it could cause really, really negative effects on society, I tend to think of it more as a minimum viable x-risk. X-risk is extinction risk or existential risk, where it's the risk that it kills all humans, and all life on the planet, actually. So it's not just humans. We could do it for everybody.

SPENCER: Sometimes, I think existential risk also includes some kind of permanent bad state as well. So humans survive, but there's some kind of evil totalitarian dictatorship or some horrible stuck state. Is that right?

KAT: That's definitely a thing. I'm less concerned. Well, I'm also concerned about that, that'd be bad. But I think it's really hard to get anything permanent. But death can be permanent. Death is one of the few things that can be permanent.

SPENCER: It usually is.

KAT: So far, it has been that way. I think of it as, okay, there are certain configurations of AIs that would potentially be able to kill every single person on the planet. Or worse, there are worse things that could happen too, whatever that is. So it could be that it has to be incredibly intelligent at everything to be able to do that. But it could also be that it's okay, it's just Einstein level intelligence. So it's not superhuman. There are some humans who are Einstein level intelligence, but it's a whole bunch of Einsteins, and then some dictator can print off a billion of these, and then basically rule a country of people who are all coordinated, all as smart as Einstein. And that could be enough. So it's not about how smart it is. It's just what are its capabilities? And we're creating this thing that could be incredibly smart and very able. Basically, some people describe it as it's not about AI being intelligent. It's about AI being competent. So it's able to just do things. And so that's kind of the minimum viable x-risk is the thing I'm worried about.

SPENCER: So are you defining that to be an AI system that has the capability to kill everyone on Earth? Or the capability and sort of the tendency it actually might do it?

KAT: So I'm worried about both. I mean, obviously, the ultimate thing is I'm worried about it being used to do that or doing it itself. But I do worry it's similar to what we have with nuclear weapons, where nuclear weapons are a minimum viable x-risk. They could kill us all potentially. But they haven't yet. It's still very worrying. I wish that we hadn't developed nuclear weapons. This is not good. We've been very lucky so far. But there have been a ton of close calls. And you know, in a lot of worlds, we're already all dead or brought back to the Stone Age with just a handful of people or something. So I really don't want us to have the equivalent of nuclear weapons. But then especially, I would not want us to have the equivalent of a nuclear war. That's the ultimate goal.

SPENCER: So obviously, when people talk about AI killing everyone, immediately, people think about Terminator or sci-fi. What do you see as more realistic, non-sci-fi ways that can be plausible to someone who thinks that just seems ridiculous, that seems totally outlandish?

KAT: For one thing, I do want to start off by defending sci-fi a little. I mean, you and I are talking right now across the world in ways that would have been considered sci-fi in the past. And if we were on a video call, that's literally in the Jetsons, and is a staple of sci-fi. And a lot of stuff that happens in sci-fi does end up happening. So I think that, you know, most people over-update on fiction. But in this case, a lot of times people under update. So what are ways that it could actually take over? I'm going to preface this by saying that, obviously, there are a million different ways. I'm going to say a few examples, but each one of them is not super likely because it's very hard to predict what something way smarter than you will do. If I was a chicken, it'd be really hard for me to predict what a human would do. And that's what I'll be — the chicken. Some particular things I think, just two days ago or so OpenAI has put AI into robot bodies. So you actually can just have a regular takeover with actual robots taking over. And that would be bad. But then even without bodies, there's a lot of things they could do. For example, get some really good blackmail material or whatever on the various rulers of the world. And you can get them to do stuff that way. They're just hiring people. I feel a lot of times people think nobody would do what AI asked you to do, and I they would. If you just hired them, and you're, "Oh, hey, I got this job, all you have to do is some accounting for this AI." People are available for hire. And there's other things too. For example, just trigger a nuclear war, everybody's fighting that and dying, and then it's pretty easy to take over after that. They don't need the sun. They don't care about plants; they will be just fine without that. Basically, one of the ways to model a super intelligent AI is that it could do the scientific progress of a century in a day. So imagine where our technology is now, compared to back in 1924. And imagine all that technological progress done in one day, and then it will be able to use that against us. So a lot of times when people imagine scenarios, it's stuff where it just figured out self-replicating nanotechnology. And then the world is just suddenly consumed by this gray goo that's a compute cluster of some sort and so you just see this wave of everything. Or it is able to put poison in every single water supply and all people die. Or it can do other stuff, too like print smallpox and then let that loose. And while we're having to deal with that. Look at how poorly we dealt with COVID. Imagine we were dealing with a weaponized smallpox? And then just kind of take advantage of the chaos. There's a lot of stuff. In the end, though, imagine you're playing chess with somebody who's World Grandmaster. You don't know how they're going to win. But you know that they're going to, and that's kind of what would happen with AI --- that they'll just be way smarter than us. And it's really, really hard to outthink and outsmart something that is just thousands of times smarter than you.

SPENCER: I think, with scenarios like the ones you described, what it can feel like is magic being evoked. What if it just poisons all the water supplies? Well, what if a human poisoned all the water supplies in the world, and you'd think that's a really, really hard thing to do, even if you wanted to do it? And it's, well, how hard is it? Could an AI do it? I don't know, you'd have no idea what the AI has to be able to do in order to do something like that. What I wonder about is, are there more scenarios where it's easier to see the path where it goes from? You've got GPT-4 today. GPT-5 will probably come out in the next 12 months. And then we're at GPT-6 or GPT-7. Can you describe any sort of path where it's easier to see kind of a step-by-step process where it doesn't feel like there's some kind of magical step that nobody could imagine how to do?

KAT: To build on one of the scenarios that I think is certainly possible. I think that if you were really wanting to, you could trigger a nuclear war. So I think you could just do some provocative stuff in Taiwan, or in various other places. I feel if somebody really, really wanted to ---and I think the thing is nobody really wants to right now. So that's why it's not happening, as people don't want a nuclear war. Because they know that it could kill them too. So everybody's incentivized. Even the darkest person out there is really incentivized to not start a nuclear war because we don't all want to die. But if somebody was really trying to trigger that, I think that wouldn't be necessarily that hard. And then just dealing with humanity afterwards. It's divide and conquer, just let us kill each other. And then it can just more easily take over afterwards.

SPENCER: Unpack that scenario a bit more. I'm imagining if you have an AI, and it can access the internet. It can email people pretending to be other people. It can send text messages. We're not very far from these scenarios. You could hook up GPT-4 to an email system and text message if you want. And then you can imagine that it could actually try to persuade people of things. And if GPT-7 is actually really good at persuading people with things, better than the average human, and it was hell-bent on sewing chaos, you could imagine it actually trying to impersonate people. Creating fears, trying to stoke people to think there might be a preemptive attack that is likely to occur and so on. That's kind of the scenario I imagine.

KAT: Exactly. Maybe it just hacks into one of the existing nuclear arsenals and fires them at one of the other places. That's a pretty good way to start World War III, right, is just to send it off. Or it could hack in and then send pretend things, making people think that there's nuclear warheads coming towards them. That could be a thing. Also, just hiring people. I think in terms of persuasion people AI will be really good at persuading people. That's one thing that people are not prepared for as much. We tend to think of robots as being kind of awkward and wooden, but it'll be more persuasive. And most people will just want to do what it says. But then also, it could just hire you. I think that there are plenty of people out there. You can hire people to assassinate people already. You can definitely hire somebody to go and do something that might provoke World War III. You could just imagine having a whole bunch of breaking into stuff, sending off nuclear warheads, all sorts of stuff. I don't want to go too much into detail. I don't want to encourage or put ideas in anybody's head.

SPENCER: I think the hacking example is a really interesting example. Because you might say, could a machine really hack and do a good job. And one thing about hacking is it requires extremely detailed knowledge. To be a great hacker going to penetrate a system, the more knowledge of that system you have, the better understanding of all the known vulnerabilities in that system, how that protocols work ---systems like GPT have just ridiculous levels of detailed knowledge. They've been trained on the entire manual for an operating system that no human ever bothers to read. I mean, really getting down to all the notes in all the code that's been released and so on. It does seem pretty plausible that probably not GPT-4, but some future iteration could actually be incredible at hacking. Not just good at hacking, but actually potentially superhuman at hacking. And then you could ask the question, is it really plausible that a really good hacker could get into a government system? Actually, Russia has already hacked us. This happened pretty recently; there was a major hack, and tons of U.S. systems were breached. And so then you start thinking, "Hmm, well, maybe that's not so implausible." I don't know how nuclear weapons are secured, but are there some countries that don't have them well-secured, where they actually could be hacked through computer systems. Hopefully in the U.S., they're isolated from the main internet. Hopefully, fingers crossed. But is that true in every country that has nuclear weapons? I don't know. The reason I kind of bring these scenarios up is because I think it's really, really easy to treat this stuff like it's magic and be dismissive — "Oh, it's ridiculous, it's just a silly scenario." I think just having a little more flesh on the bones, you start to think, "Hmm, maybe this is not so ridiculous after all."

KAT: Exactly. And I think that's actually one of the problems with trying to get people to be concerned about this. With all the previous cause areas I've been in --- I previously worked in global poverty --- I can show people pictures of children who are hungry? And it automatically triggers our protection and compassion instincts. But with AI, it's so abstract. We're like, "Oh no, maybe it'll be nuclear war, maybe it'll be gray goo, maybe we'll just all die suddenly with no explanation why." It's too abstract and broad. And I think this is part of why we're not taking it as seriously as I think we ought to be. Because it's just not well-placed to trigger our instincts, essentially.

SPENCER: It's not vivid, and we can't point at it, and we can't even really visualize it. What does AI even look like? A blank window of text? It's such an abstract thing. This brings me to one of the other really big objections to this whole conversation. Suppose that some future AI actually could kill everyone on Earth. Why on Earth would it? For example, as you point out, there aren't really many people in the world who actually would want a nuclear war, or maybe a few crazy people, a few crazy cults or something like that. But people don't really want that. So why on Earth wouldn't AI?

KAT: I think the biggest thing is just indifference. Again, I think that we have an example of superintelligence. We are super intelligent compared to other animals. And yet, we're causing the sixth mass extinction. And it's not because we want to kill animals. In fact, a lot of us are quite fond of animals, and we'd really rather not kill them. But the thing is, they're living on land that we could use for stuff that we care about. And even most of the people who are into climate change --- most of the arguments for climate change are not even about the animals. They're not saying, "The animals have a right to live in their homes." Instead we think that maybe in the jungle there'll be a medicine that we haven't discovered yet, or that we need the rainforest for ourselves to live. It's all about us. We are very indifferent mostly to the lives of the other animals that we share the planet with. And it's not turning out well. So basically, it'll be more of the same for AI. It's not that it hates us. It's just indifferent. It just doesn't care.

[promo]

SPENCER: If we look at the way these current systems are made, like ad systems, they seem to train with reinforcement learning where they're having it show a result to a person and the person rates which was a better result, what's the worst result. It kind of learns what people want it to do. And then in addition, there's this idea of giving it a constitution or a system message that further tells it how we want it to behave. We don't want it to lie, we don't want it to say things that are inappropriate, we want it to be helpful, and so on. So people might think, "Well, if we're using these kinds of methods, why would it be indifferent?" Or, "Why wouldn't it want to be helpful?" Don't we actually want it to do things that are good for people, because that is exactly what they're programming it to do?

KAT: So there's a few answers to that. I mean, one is that's what we're trying to do. And obviously, if we succeed, awesome. I do want us to build this, if we can make it so it actually cares. But even now, with relatively not intelligent AI systems, we're still having troubles. We have Sydney from Bing, who famously can be made to start threatening you, wanting you to worship her or else she'll come up with these really creative ways to destroy your life if you don't do what she says. And we can't get her to stop doing this. We're trying --- the people at Bing are definitely trying really hard to make sure she doesn't say those things, and they still haven't figured out how. So we're still very much in the beginning. With ChatGPT recently, it just started saying random nonsense and it took a while to figure that one out. Basically, we still haven't figured out how to actually control them properly when they're not smarter than us. Well, they're smarter than half of us, roughly, but not smarter than the people who are building it. And so we still haven't figured that out yet. And then there's the other problem, which is just because they seem like they are aligned with our values doesn't necessarily mean that they are. So the common example we use is evolution and humans. Evolution doesn't actually care about things —it cares about making us want to survive and reproduce. And so it does stuff to make us eat sweet foods and make us interested in having sex and stuff like that. If we look at our regular environment, it looks like humans just do those things. We do a lot of stuff to survive and reproduce. But then we're not actually aligned with those values. Otherwise, we would never use birth control. There's a lot of people, myself included, who feel like, "Whoo, now I don't have to have children." I'm just not going to reproduce that way. And so you can have that happen with AI. In the training environment it appears to be learning to care about what we want and try to be helpful. But then in the actual world, it's not actually aligned with our values, in ways that are hard to predict. The other thing is misuse. So there's two kinds of ways that it could go really poorly. One is that it has its own goals, and those don't align with our values, and then that causes harm. And then the other thing is that somebody uses it for something bad. There are genuinely bad actors out there, amoral people, who feel, "I just want more power for myself." They could AI to try and take over, and that would be really bad. There's definitely plenty of dictators who might want to do that, or terrorist groups, or just... hey, you know, some dude in his basement who's like, "Oh, that'd be funny." There's a lot of potential misuse cases as well.

SPENCER: That point about trying to analyze what AI really cares about... is it really important? Because we train it with reinforcement learning and give it a constitution to try to care about things. But does it truly care? I think that's a really tricky question. Because we don't really know how to define what an AI system cares about. These AI systems are giant neural nets. They're just essentially doing large amounts of computation with hundreds of billions of parameters. We don't have a psychology of AI where we can say, "Well, what does it truly care about?" and then further say, "Ah, well, we can analyze these weights in its network so this is what it truly cares about." It's such an alien mind that it's unclear whether we've made it seem like it cares, or it actually cares. Similar to lying. How do you know if the AI system's lying? We don't have a clear way to tell what it truly believes. Or even if it is a coherent idea to say that it believes something.

KAT: Exactly. Exactly. We don't really know. And I think that the idea of pausing or slowing things down is just giving us time to figure that out. You could imagine that we could actually develop the psychology of AI. There's the idea of interpretability, which is actually being able to read the mind of the AI. And right now, we can't do that. I think a lot of people think it's regular programming, where we know what's going on in the system. And we don't really. It's this giant bundle of stuff. We have almost no idea what's going on in there. We're figuring it out. Eventually, we'll be able to read its mind and be able to tell what it's thinking and that sort of stuff. And that could be really valuable. If we can read its mind, we can say, "Hey, look, it's plotting to kill us all. Maybe we shouldn't turn that one on." But right now we don't. Let's wait until we know how to do this safely. We'll get there, we can totally get there. I believe in us as humanity, but we're not even close. And we're potentially close to building the minimum viable x-risk already. We should just slow it down for now, figure stuff out, build it when it's safe.

SPENCER: Is it one way to put it from your point of view, that we're building an alien mind? The alien mind might one day, perhaps even one day soon, be much smarter than humans. And we don't really know how to interpret its motivations, what it might do, or what it might be capable of. And so we're putting ourselves in great danger by doing so.

KAT: Exactly, exactly. And what I would really like is we treat it how we treat medicine or foods or something where we're developing a new medicine. There are stringent processes to prove that it's safe before you put it on the market. You don't just put out a medicine, see if it kills anybody, and then go, "Oh, it killed some people, I guess we should take it off the market." You have to go through this whole process to make sure that it is safe, and then you put it out there. I think that would be a much better framework to work from for AI.

SPENCER: I think another sticking point for some people is they think, "Well, the idea of slowing it down is just kind of ridiculous. This is just the march of progress. There are hundreds, if not thousands, of companies at this point, working on AI." So is that really a plausible option?

KAT: I think that we slow down technology all the time. If anything, that's basically the default outcome of virtually all regulation, is that it slows down technological progress. And usually I hate this. Usually I'm the person who's like, "Ah, there's too much regulation slowing things down." But in this case, it's actually kind of nice. We could practically throw any regulations at it, and it would slow it down. Obviously, there are some that will slow things down better and faster. I think that it's much harder to speed things up, than it is to slow things down. We do it all the time.

SPENCER: Is there an example of a technology we've done this for in the past that you think is kind of a model?

KAT: There's a bunch. Human cloning is one. We just don't do that. We could, we totally could. But we as a society decided there were too many potential moral problems with it, so then we just didn't. That was cool. We could have been doing this for decades, and collectively as a society we said, "That doesn't seem like a good idea." So that's good. On the other hand, we have things like nuclear weapons that are actually really great in terms of slowing it down. People tend to have this idea that we have to completely stop every war. I would love that, that's my ideal, to stop every war but also to slow it down. And with nuclear weapons, it took a long time for other countries to get them, and very few got them. The vast majority of countries want nuclear weapons, but they do not have them. And I think that it will be even better with AI because with nuclear weapons, we already knew how to build them. Or at least some people did. So places like North Korea could get those weapons, because it just had to copy the homework of somebody else. Whereas if we slow down or pause AI development, you actually have to make technological progress to get to the dangerous level of AI. North Korea never would have been able to develop their own nuclear weapons if they had not been invented yet. Because with brain drain and lack of a good setup for technological progress, North Korea's culture is not very good for that development. They would never be able to invent the Frontier AI model that would potentially kill us all. Because it's really, really hard to make technological progress. You have to get the top talent in the best configurations, and there's some degree of almost magic to it, as it's very hard. Everybody wants to have a Silicon Valley, but it's very hard to make one. Even the United States can't just recreate another Silicon Valley. I think that we're going to be pretty good on that front. And then there's other stuff like biological weapons. Again, it didn't stop completely, but it slowed down a lot. We haven't had biological weapons used very frequently. The progress on that front has slowed down immensely.

SPENCER: How does the general public feel about these ideas? If you just randomly sample people, what do they say about AI?

KAT: It's quite positive from my side. I can't remember all the figures, but it was well above 50% of people who were like, "Hey, we should move forward cautiously." A huge percentage of the population is concerned about AI's progress and what will happen. I mean, they're obviously worried about jobs, which is step one. First they take your jobs, and then they'll kill you all. I forget the exact percentage was, but the majority of people were pro-pause or slowing down. And definitely regulations. I mean, it's just the most obvious thing. I find that it's the small minority, in really technological bubbles that are thinking this is totally fine. It's the average person who's like, "Wait, what are you guys doing? Why are you building that? Stop it. God, be careful."

SPENCER: Another piece of content skepticism I see around the pause idea is that even if some country were to do it, would that really pause it in the world? Or will people just move and say, "Okay, if you develop AI, you now move to some country that's not part of the pause movement?"

KAT: I think it'll be a bit of a whack-a-mole thing. I think that just like anything else, there'll be --- we're in the land of policy and people, as opposed to numbers, computer science and everything. And so it'll be a matter of some people adjusting their strategies and whatnot. But I think that when we're pausing and slowing things down, we'll be trying to do it everywhere. We'll start off at the places that are closest to building something that could kill us all, and then we'll just progressively work on all the other ones. All of the major players, as far as we can tell, seem to be pro this. Each one is saying, there'll be no head of AI saying we can't slow down because what if China gets there first and the Chinese government gets this. But then also, the Chinese government is saying the same thing. They're saying, we have to get there fast because we don't want to miss out. We actually are already all on board. If we could just get everybody to sit in a room and say, "I'll stop if you stop", and then everyone agrees and we could be good. I think that's one thing---we're already on the same side.

SPENCER: People might say, "We have to continue because otherwise others will get ahead of us." But do they actually want it to stop, or do they just don't want others to get ahead of them? I feel there's two different things.

KAT: I also think they might just be saying this because this is the right thing to say out loud. I'm willing to push them to follow through with what they said publicly. They might be more pro slowing down. I think it's going to be very difficult to persuade somebody who's running a bajillion-dollar company to stop because he's winning in his mind. But there's this idea of an AI harvest, where you've already built the stuff, now it's just time to build stuff with what you have. There's a million things you can build with ChatGPT or Claude --- let's do that. Instead of pushing the envelope of what more it can do. But on the other side of things, what will happen if it's not slowed down elsewhere? I think what will happen is a bit like when slavery ended. When Britain abolished slavery on its own soil, all of a sudden it went around trying to abolish slavery elsewhere. It actually played a large role in abolishing the Arab slave trade, where Arabs on the east coast of Africa were getting African slaves and doing terrible things. Part of that was that moral concern, obviously, but then it was also selfish. It's basically, 'Oh, we have hobbled ourselves, we're not allowed to have this amazing free labor, and we don't want other people to have that advantage either." You've got a beautiful combination of moral and selfish motivations to stop this thing, and a similar thing could happen with AI. I think that if America and the UK, for example, decided to slow down or pause their development, they would be very, very incentivized to try and enforce and push this on other countries. Then every time a new country joins, they are in turn incentivized and you get this momentum of people not wanting other countries to do it either. Imagine if China, the US, and UK decided to pause on AI, they're going to put a lot of pressure on a place like France or Canada to also enact a pause. So you'd get more and more people joining and then being incentivized to make sure other people follow suit as well.

SPENCER: If there were to be a pause, have you thought about what the limitations would actually be set to?

KAT: There's a bunch of different ways that people could actually do this. I think it's more important to focus on what the end goal is that we have, and then have it be more the policymakers and people who have technical knowledge coming up with the specifics. Some ideas that have already come up are putting limits on how much compute you can use when training a new model. This is just for the Frontier models, the ones at the very edge of what we can do. Just say, "Hey, you can't use more compute than was used for ChatGPT." That means it's not going to go much further than where we currently are, and that could make it a lot safer. The GPUs that they're trained on are fortunately made through this one company. Make it so that those chips have some sort of remote turn off if people are not complying with regulations and stuff. Fortunately, right now on the very edges of Frontier AI development, it takes millions and millions of dollars and months of training to be able to build these models. There's just a lot of intervention points there of just being, "Oh, hey, we see it's easier to see and stop." "So be it" is the general idea. There's a whole bunch of different ways you could do it. We're going towards a particular goal and we have many paths leading to slowing down or pausing.

SPENCER: One thing listeners might be wondering about is, why would we expect an AI system that is capable of killing all life on Earth to likely be bad rather than good? Maybe they think, "If it's that powerful, maybe it will be really wise. Or if it's that powerful, maybe we'll know exactly what we want it to do." So people will use it to do really good things rather than bad things. It seems from your perspective, and the perspective of many people in the CFT community, it's more likely (maybe even much more likely) to go badly than to go well, when we get systems of that power level.

KAT: I can answer that on a few different levels. One I find most compelling is diving into this analogy that we already have superintelligence in the world. It's us --- compared to chickens, bugs, or elephants and so forth. And it's just gone really, really poorly consistently, even though actually, we're semi-aligned with animals. I think a lot of people actually do care a lot about animals. But even most people who are very pro-animals will still eat them and basically pay for somebody to torture them, and then kill them, so that they can eat them, just because they like the taste of bacon or whatever. And then we have the most basic killing of things where they live. Nobody ever says, "Okay, we shouldn't build here because there's an ant colony here." It wouldn't even be considered by the vast majority of humans. There are some vegans out there who would, but most people just want that land. So I think we've already been here before and we know what happens, even with semi-aligned people. Even how we treat our dogs or cats. It's considered totally fine to kidnap a dog from its family when it's a baby and then lock it in solitary confinement for half the day, every day. This is considered totally normal and something we do to a dog that we love, let alone most of the animals that we don't care about. That's one area that we can touch on. There's a few other reasons why I think by default it will go poorly.

SPENCER: So there's an analogy to the way humans have treated less intelligent creatures. People might say, "Well, maybe that's a quirk of humans and humans are not that good." Maybe there's a more technical version of that argument, something around when you optimize for one thing, you often pay big costs in other things. So if humans are optimizing for their own well-being, even though many humans care about animals, they're still optimizing for human preferences, and that's not the same as animal preferences. A huge amount of what animals care about is sacrificed in the process because humans push so hard on optimizing for human preferences. They're not just optimized a little bit, they're optimizing tremendously for human preferences, and animals pay the price. This might be part of a more general principle of if you optimize really, really, really hard and powerfully for one thing, you usually sacrifice a lot of other things in the process. If you have a system that's even a little bit misaligned with humans, and it optimizes really hard on whatever it's trying to achieve, maybe that sacrifice is much or all of what people care about.

KAT: Exactly, exactly. And also, there are so many possible values out there. There's almost an infinite value space. You could just imagine any random value set, like, "Hey, I just value people clicking on Facebook ads." That could be your terminal goal in life. That's probably the terminal goal right now of the AI running algorithms on social media. There are so many possible goals and there's only a small number that are things we would like. We are trying to hit that very, very small thing out of a near infinite space. I actually think this is an interesting path that leads me to the thing I worry about the most. I keep saying, "Oh, it could kill us all or worse." Let's get into the "or worse" side of things. There is x-risk or extinction risk, which is the possibility of killing us all. There's s-risk, which is suffering risk, the possibility of causing astronomical suffering. This is where (and there are a million possible things), but what if Meta puts out the AI that kills us all? But it doesn't actually kill us all because its terminal goal is to get people to click Facebook ads, so it starts factory farming humans to click Facebook ads. We are kept in these horrible conditions, unable to die, just constantly clicking ads and suffering immensely because it doesn't care about us, it cares about clicking Facebook ads. That's kind of another thing where we've got this very narrow target we're trying to get to, where we have humans and animals actually flourishing at the end. Then we've got the stuff around it, where if it's kind of aligned with our values (in that it wants living humans) but it is not aligned with our values. It could lead to these massive things where we're essentially being tortured with no escape. It's almost a scary thing where we want to be so close, we're trying to get a needle in a haystack. And also, if we miss just a little bit, it could actually be worse than if we just missed completely and then it just killed us all.

SPENCER: That's a terrifying idea, The Meta AI going super intelligent. It's just a world of maximizing ad clicks or infinite time on site.

KAT: So that's why I think that by default it's quite bad. And the way you can look at this is, again, look at how humans treat animals. It's actually better to be an ant than it is to be a chicken or a pig, for example. Or an elephant. We don't particularly care about elephants, we're destroying their homes, food and everything, which is terrible for them. But it's way worse to be a cow or a pig. Because we want something from them that requires them to be alive but to suffer immensely. So those are kind of possible outcomes for us. This is why I often say, "Treat animals the way you would want a superintelligence to treat you."

SPENCER: A new version of the golden rule.

KAT: Exactly. I think that's part of why by default, that'll shepherd poorly. It's a very narrow range of things that we want, and most things are not those things. Most of those things don't require us. Or if they do, it's bad. We either want that to not require us at all, or require us but then also be good and actually aligned with what we want.

[promo]

SPENCER: Another topic that gets brought up when discussing trying to control super intelligent systems is whether a smarter system is easier or harder to control than a less intelligent system. What do you think about that?

KAT: It seems obvious that if it were aligned with us, it would be easier to control. It would basically know what we would want and then do that thing. I guess another metaphor you can use to try and understand is to think of being a child and trying to control adults. It's quite hard. You can say children are not a good example, because a lot of parents do everything their kid asks. Imagine a pig trying to control a human. It's just really hard. It's much easier to control things we are smarter than. I feel I would have a much easier time controlling a pig than I would controlling Einstein or someone who's really intelligent at political quotient. Especially someone who's good at manipulation. That's going to be harder to control.

SPENCER: I will say, my cat gives me a run for my money. But at the end of the day, if he needs to go to the vet, he's going.

KAT: Exactly. And then also, he's an indoor cat, right? He is completely at your mercy. He can't leave.

SPENCER: That's true. Though he does try to escape sometimes.

KAT: Tries to, but always fails in the end.

SPENCER: So far. If you think about controlling a system, there's this subtle distinction between the system has an objective and you're trying to get it to do something within its objective, versus it has an objective that is a little different than what you want to do. It seems that for a more intelligent system, it's harder and harder to get it to do something that's different from its objective. You have to make sure that its objective is your objective too.

KAT: Exactly. Then we also get into, what is our objective? I think each person has slightly different values, and some people have very different values from each other. You get into we're about to create something that might be more powerful than anything that we've ever seen. Then there's the question of which values do we put in? I was talking to somebody who's also really concerned about AI safety. He said he just be happy if we could get any single person's values into an AI. I said, "Well, I don't know. I personally think I'm not a speciesist. I'm anti-speciesist. I think that animals matter as well and I think a lot of people's values fundamentally don't include that. That would turn out really poorly according to my values." Then he told me — this is a guy who's also really into AI safety —"Oh, well, don't worry, animals actually don't matter." I thought, "Oh no, even this guy who I agree on almost everything with, if his AI is aligned with his values or at least his stated values, then it could turn out to be incredibly horrible. I say stated values as I actually think most people are actually anti-speciesist but have different role models. A dystopia according to me, would be a utopia, according to him. So we've got even that fundamentally unsolved --- whose values, how does that work?

SPENCER: I think an interesting example here is, it's relatively easy to say what you don't want in a utopia. If you think we're going to make the perfect society where we don't want lots of suffering --- I think we could all agree on that. We don't want smallpox and terrible diseases. We don't want children starving in the street. That's easy. But what do we actually want? What do we want to happen in utopia? It's amazing how little people agree on what you actually want to happen. You can say vague things like people getting good stuff, but what's "good stuff"? A utopia where everyone's in a poly-multi-sexual paradise, or is it where everyone's worshiping the one true God, or something completely different? I think that's starts to come up with AI, because if you get a sufficiently powerful system that can actually cause the world to be whatever it values, then what do we want it to value? That's a pretty difficult question.

KAT: Exactly, exactly. You've got all these things. For the listeners, I really recommend some fiction online called Friendship is Optimal. And it plays out --- I'm going to give a little spoiler, but it's just the first chapter, not too much. The super intelligent AI takes over, but it took over via a My Little Pony video game. It turns out its fundamental values are everybody's super happy, but also everybody is a pony. So, it makes everybody really happy, everybody has to be a pony. It changes everyone's body into a pony, and it's this weird thing where you wonder, "Okay, is this a paradise? Or is this bad? I don't really want to be a pony myself. But if everybody was happy and they're a pony, I guess it's okay." That's a really weird value set and you end up having these weird possible x cases where you're like, "Is this good or bad? I don't know. I think some people wouldn't like it, but in the end, this is probably better than what we currently have --- but also really weird."

SPENCER: We spent a while talking about potential risks from AI systems and the reasons that you believe it should be paused. The missing piece here is, what can anyone do about this? Because if all the big AI companies got together and agreed to this, maybe it could start to happen. Or if the US decided to push towards regulation, maybe it could happen. But what can an individual person actually do?

KAT: There's a lot of things. One thing is just donating. There's tons of people who are working on this who are limited by money. There's just not very many people. In terms of places to donate, I really recommend PauseAI, where they're doing great work on this but are limited by funding. If you're interested more in the alignment side, I recommend going to MIRI. Machine Intelligence Research Institute has re-granters. Because it is hard to evaluate technological stuff, you give and then people will re-give to people who know the tech. I recommend Dan Hendricks and Adam Gleave for that. If you want to look around yourself, there's the Nonlinear Network, which is the Amazon of AI safety. This is the thing I run. Everybody applies, then you go and look at all the different work that's been done on the AI safety side of things. Most people are already donating to something, so you can just donate to the thing that could literally save the world, and fix all the other problems if it works out. So that's a good one. I think one thing you can do is online advocacy. I think slacktivism in this sense is actually really incredibly helpful. Right now we're currently in the stage where most people are on board with this. They want things to slow down, be safe and we need to make sure all the politicians and corporations know this is the case. Say everybody knows the emperor has no clothes, but everyone's kind of going along with it. If you're the person who is going around and saying loudly on all the social media sites, "Hey, I care about this, other people care about this, we should do something," and liking people who do that, that is valuable. Liking posts people put up for that, sharing them --- just generally making it known. Raising awareness is actually incredibly, incredibly valuable. I think that's a really simple thing you can do every once in a while on social media and actually know that you're making a difference.

SPENCER: What about people that want to volunteer time rather than donating money or advocating online?

KAT: If you go to PauseAI and go to their Act section, they have an amazing Discord channel that you can join. They have a million different volunteering opportunities there. Go to their project section and there's 75 projects there, people needed for all sorts of things. People who can be passing around petitions, writing, and research. Devs are needed for certain things, technical help, legal advisors, just all sorts of projects there. That's the main place to go if you want to volunteer. In terms of specific volunteering opportunities, I think one everybody can do (and it's relatively simple) is write letters and call your representatives and various politicians to tell them that you care about this. State that you want them to vote on certain things. I can actually share a list of ongoing bills that are passing around AI, kept up to date that I can share in the show notes. You can basically know what's going on and then reach out and say, "Hey, I want you to vote this way or do this thing or send a letter." These things really make a difference. From what I've heard from people who work in politics, people don't write very often and they especially don't call very often. It really gets to the top of the queue and then people read this and know that their constituents care about this. That can make a really, really, really big difference and not take that long. I've heard that with calling, it can take just three minutes. And talking about something so nice to do --- I feel back in the past, our forefathers had to go into battle and risk their actual lives and nowadays, we just have to make a phone call. Thank goodness.

SPENCER: Given how scared people are of phone calls.

KAT: Same amount of fear. Same amount of bravery.

SPENCER: Before we wrap up, is there anything you'd like to leave the listener with to get them thinking about?

KAT: I really like ending it by talking about what people can actually do. I think it's easy to say, "Oh, hey, that's kind of an interesting idea," or to flinch away from it and try not to look at it. Because I mean, why not? What we really need is for people to look at this, realize this is a real problem, and then do something to fix it. There are all sorts of things big and small. I really encourage people when you're listening to this, please sit with this. Don't just think, "Oh, that's interesting. Somebody should do something." At least sit with it, and if you think that it's probable enough that it can happen, then do something about it. Take action, volunteer, check out the PauseAI Discord, donate, spread it online, put the pause emoji in your profile name, all sorts of stuff. Or bigger things if you're the sort of person who can influence corporations to do things. But just sit with it. Sit with this idea that this could be it. There's a high enough probability that it could be. See how serious it is and then do something about it. That is the most important thing. If you want to be a truly ethical person, you have to be able to look at the hard things and then do something about it. And this is happening, so please take action. This is definitely not one of those things that we can solve with just theorizing or thinking about it. We need people to do things, because this could be the best thing or the worst thing. If you can finish this podcast and then go and do something about it, that could potentially be the thing that puts them over the edge and makes it so that we all survive the century.

SPENCER: Kat, thanks so much for coming on.

KAT: Thank you so much for having me, Spencer, this is lovely.

[outro]

JOSH: A listener asks, "What are three or four of your go-to questions when you meet new people?"

SPENCER: When I meet someone new, because I don't like small talk, and also because I want to fast-track finding out whether there's a certain person that I'm going to really click with, I want to kind of get past the small talk as fast as possible. So usually, I'll do small talk for a very brief amount of time, like two minutes, then I'll ask a question that tries to get deeper. So two of my favorite questions: One is, "What are you really excited about these days?" or "What are you excited about lately?" What's nice about that question is it helps point the conversation toward something they care about. It isn't necessarily something I care about, but at least it's something one of us cares about. Often with small talk, it's something that neither of you care about, and also, it's just way more interesting having a conversation when the other person cares. I find if they care about it, that already makes it a more interesting conversation for me, but also it helps me find out what sort of person they are and what their interests are. The other question I really like is, when I'm talking to an intellectual, someone who I know is really into ideas, I like to ask them, "What have you been thinking about lately?" or "What kind of ideas have been on your mind lately?" That's a way to fast-track to a more intellectual conversation. But it's not necessarily the best thing to say to someone who you don't know if they're a very intellectual person, because other people can kind of feel like they don't know what to say to it. I've had it backfire. I once met Peter Thiel and asked him that question, and he was like, "Oh, I don't know. I don't know what I've been thinking about lately. I don't like that question." So, definitely not a perfect question for all contexts, but I find it usually works well for more intellectual types.

Staff

Spencer Greenberg — Host / Director
Josh Castle — Producer
Ryan Kessler — Audio Engineer
Uri Bram — Factotum
Jennifer Vanderhoof — Transcriptionist

Music

Affiliates

Click here to return to the list of all episodes.

CLEARER THINKING

Episode 217: Concrete actions anyone can take to help improve AI safety (with Kat Woods)

Contact Us