CLEARER THINKING

with Spencer Greenberg
the podcast about ideas that matter

Episode 271: Is AI going to ruin everything? (with Gabriel Alfour)

Enjoying the episode? Want to listen later? Subscribe on any of these apps or stores to be notified when we release new episodes:

July 17, 2025

Is AI going to ruin everything? What kind of AI-related dangers should we be most worried about? What do good institutions look like? Should designing better institutions be a major priority for modern civilizations? What are the various ways institutions decay? How much should we blame social media for the current state of our institutions? Under what conditions, if any, should the flow of information be regulated? What are some of the lesser-known kinds of AI disalignment? What actions should we take in light of the lack of consensus about AI?

Gabe Alfour has a background in theoretical computer science and has long been interested in understanding and tackling fundamental challenges of advancing and shaping technological progress. Fresh out of university, he developed a new programming language and founded a successful French crypto consultancy. Gabe has long had an interest in artificial intelligence, which he expected to be a major accelerator of technological progress. But after interacting with GPT-3, he became increasingly concerned with the catastrophic risks frontier AI systems pose, and decided to work on mitigating them. He studied up on AI and joined online open-source AI community EleutherAI, where he met Connor Leahy. In 2022, they co-founded Conjecture, an AI safety start-up. Gabe is also an advisor with ControlAI, an AI policy nonprofit. Email Gabe at ga@conjecture.dev, follow him on Twitter / X at @Gabe_cc, or read his writings on his blog at site.cognition.cafe.

SPENCER: Gabe, welcome.

GABRIEL: Nice. Thanks.

SPENCER: Is AI going to ruin everything?

GABRIEL: [laughs] Starting strong. From my point of view, the most likely answer is yes, unfortunately, but fortunately, we can do a lot to avoid this.

SPENCER: Which kind of ruin everything are you concerned about?

GABRIEL: Personally, I might differ a lot from many in the extinction risk crowd. I didn't come at it from the AI perspective. Even before AI, I was worried about technological acceleration. I mostly see AI as being a double accelerator as a first approximation. If you start thinking about autonomous intelligence, explosion, automated AI R&D, and things like this, then I think it's more than just exponential. I think we can actually get very, very sudden, sharp, spiky growth, the type that we cannot prevent. So, same as technological progress that we cannot manage, but accelerated all at once through AI.

SPENCER: So are you more worried about it as just a form of very sudden technological improvement, rather than something unique about AI that you're concerned about?

GABRIEL: I think, sadly, there are both. I think even without agentic AI or autonomous AI, I will be very worried. I think with autonomous AIs, with autonomous agents, then there's another layer. So both, sadly.

SPENCER: So let's talk about the first of those issues for a moment. Imagine we just got some really massive spike in technological development, this kind of growth trajectory we're on, and technology goes much faster. What worries you about that?

GABRIEL: A graph that I like to draw is bomb ranges over time, the radius of the biggest bombs we can build, or the biggest bombs we can build under a million dollars over time. At some point, the range grows to encompass the entire Earth. It's not clear that we have institutions resilient enough to prevent their detonation. And obviously this is a metaphor. You can also think about bioweapons or whatever. It just seems that we can build technologies that have negative outcomes, that are strong enough to just destroy civilization and what's needed to persist.

SPENCER: So if we think about these kinds of exponential curves, or even super-exponential curves of the range of the bombs we have, then eventually we see, "Okay, bombs will be able to destroy the whole world." Or if we think about biotech, we see, "Eventually someone will be able to manufacture a super virus that kills 99% of people on Earth or whatever." There are a bunch of such curves that are all pointing to eventually technology being so strong that we can annihilate ourselves. Is that the concern?

GABRIEL: Basically, I think there's a world that has institutions strong enough that if we find a way to build nukes for less than a thousand dollars, we're hyped. We are not worried about people detonating them. We're like, "Whoa, now we can terraform. We can change weather. We can do some geoengineering. It's obvious that we're going to use it for good things." I think tomorrow, if we discover a way to build nukes for a thousand dollars, we're all afraid. We're obviously not in that world.

SPENCER: It reminds me of that quote about how we need to make sure our wisdom grows as our power grows, and that we may be imbalanced where we don't seem to necessarily be becoming that much wiser as a species, but we are becoming way more powerful.

GABRIEL: Yeah, I agree a lot with this. We're using constitutions that are like 250 years old. Imagine if we're still using the same tech as 250 years ago; this is basically the situation we're in. So I tend to agree a lot with the spirit of that quote.

SPENCER: So what do good institutions look like?

GABRIEL: I want to say this is above my pay grade. I can say what type of properties they will have. I can say what it will feel like to have such institutions, but what the institutions would be is a bit of a harder one.

SPENCER: So, let's talk about the properties they have. Because I think your idea is that if we had better institutions, we might be able to manage the incredible power of technology much better, much more effectively. Is that right?

GABRIEL: Yeah, among other things, also the power of the economy and so on and so forth.

SPENCER: Right. They could help in a myriad of ways. So what are some of the properties that we look for in good institutions, and how does that compare to what we have now?

GABRIEL: For instance, when there are big disagreements between experts or parties, we look forward to them because it's a sign that we have a lot to learn from each other. Right now, that's not really how we think about them. If we discover massive dual-use technology, we look forward to that because obviously we're not going to use it for bad. There are more things that are less about the ends and more about the means. We would do a whole lot more experimenting. Before thinking of increasing the minimum wage in the entire country or erasing the minimum wage, you would have one state or one city experiment with this. You would have an anarcho-capitalist city. You would have a communist city. You would have a whole lot more experimentation. You would have a whole lot more communication. If I were a president, I would love to be able to poll all the PhDs in a field on a question. I would love to be able to poll all the citizens on a question and to do so regularly through the use of technology. If you look at the spirit of the various Western constitutions, this is obviously the type of stuff that they're pointing to, but they didn't have the internet, nor smartphones. So, yeah, I think a lot of this type of stuff.

SPENCER: Do you think that trying to design better institutions should be a major priority for civilization?

GABRIEL: I think there are two ways to think about this. In absolute terms, yeah, sure, it's clearly good, and I think everyone would agree that it's good. In relative terms, should we displace other things in favor of this? I tend to think so. It's the thing that I always say: we have had a lot of technological growth, a lot of economic growth, and things like this, but the institutions have not kept up with the movement. I think it's clear to a lot of people that they are quite backwards. In objective terms, our constitutions are not factoring in the crazy hundred years of progress that we've had, the crazy communication technology that we now have. They all implicitly have built in the fact that we're traveling by horse and things like this. So it seems kind of obvious that it should be a major project and that our institutions are not ready to deal with the modern world in general.

SPENCER: It's funny that you point to our constitutions being old and having all these implicit assumptions. I wonder if we disagree about this, because my view is actually we'd be better off following the Constitution a little bit more carefully in the US, rather than tossing it away. I wish we lived in a world where I thought we were capable of building a much better constitution, but I'm not sure we live in that world right now.

GABRIEL: [laughs] So this is a bit spicy. There are many things to say about this. The first one is that I don't think improvements and progress come from tossing previous things away. I don't think the way we got economic growth was by every decade throwing away the past, supply chains and logistic chains and so on. Instead, we improved them over time. The same thing for technology, and I think the same should be true for institutions. But I'm not addressing your point, which is, could we design better institutions now than we could in the 18th century? I think it's not obvious that the answer is yes. I will say that the bottleneck is not intelligence, science, epistemology, or any of those.

SPENCER: So what is the bottleneck?

GABRIEL: I think, in the late 18th century, we had a major movement that was specifically focused on building better institutions. A lot of political philosophy was written about it, and many people directly took inspiration from it and acted according to it. A lot of this was violent. Some of this was not, and I think we do not necessarily need a drive to violence to get this type of change, but we do need some drive that I don't really see anymore. You need some ambition that I don't really see anymore. For instance, if you look at the modern progress movement, even the modern progress movement is mostly focusing on technological progress and economic growth. It doesn't have new constitutional projects and things like this. Nowadays, there is a lot of skepticism about building better institutions. A lot of cynicism, even a lot of people just do not interact with the institutions at all and just diss them, instead of actually trying to build better ones. I think that all of this adds up to no improvement, basically.

SPENCER: Some people take the view that institutions sort of inherently decay over time. There are different theories of why this might be. One theory is that the institution eventually elevates its own self-preservation over its original goals. So maybe it's created to do a certain thing in the world, but eventually, more and more people there are just trying to make the institution itself powerful and grow bigger, because that can be in their own interests to have a more powerful institution. There are other views about why institutions decay. Maybe it has to do with bureaucracy, like eventually you get managers and managers and managers, and everything gets too abstracted away, or it could just be value drift over time. Why should an institution's values stay the same? So yeah, what do you think about this idea of institutional decay?

GABRIEL: I think it's quite a pervasive idea. I don't think it's limited to institutions. I believe companies decay in the same way. I believe families decay in the same way. I think we live in a universe with increasing entropy, and that entropy must be curtailed through a lot of effort, sweat, optimization, and so on. I think it's definitely true that institutions decay. I think bigger institutions decay even faster. There are many solutions to this. I like the idea that each law, each institution, even each company, should have a built-in expiry date, and then you have to renew it explicitly. It should take someone's attention and someone's energy to renew it. Another one would be to just pay that tax. There is a maintenance tax. Things take maintenance, and at some point we should pay for that. I think, for instance, in the Western world, we do not think that it's very glamorous or sexy to maintain decaying institutions, even though it is quite important.

SPENCER: It seems to me that something quite terrible happened in the US where people stopped trusting institutions, and simultaneously, or maybe prior, some of those institutions genuinely did things that made them not deserving of trust. So you can imagine a world where the institutions were still great, but people stopped trusting them. But in fact, I think both have happened simultaneously; institutions have actually gotten worse in some ways, and people have stopped trusting them. That puts us in a very precarious situation where the replacement for institutions is random podcasters and YouTubers. That doesn't seem like a replacement.

GABRIEL: Yeah, I definitely agree. I think the institutions were also quite bad before. I just think they were better than what was before. On an absolute scale, I think we truly live in terrible times institutionally. But even morally, even in terms of happiness and things like this, I think that in relative terms, it's better. Then, if we talk about a specific time period, I might agree that there has been a regression, and it depends on the time period. For instance, the thing that I think makes it the clearest case for that is the fertility crash. People want to have fewer children, and I think that's really bad. People have children much later, and I think that's really bad. Rents are increasing, which is a massive tax on existence, which I think is really bad. To the extent that it can be tracked to the institutions, and I think it can, because we still had a lot of economic growth and things like this, it's obviously not an economic or technical problem. I think there has been some regression. Then there are other things where I think it's less clear. For instance, political debates have worsened a lot since the advent of social media. Foreign propaganda that was controlled on TV, on radio, and in newspapers can now just pay for ads on social media. I do think there is a lot of foreign sabotage, for instance. There are also ways in which things have worsened. I think it's possible to deal with those. They might seem like we're talking about different time periods; possibly, you would be talking as a US citizen about the Iraq wars and things like this. I don't think this should lead to an increase in distrust of institutions, because I do think that in the past, institutions have done worse things, just that now there's more transparency. So there's a bit of a paradox here.

SPENCER: Yeah, if we look at specific examples. I think the military complex in the US lost a lot of credibility with Vietnam and Iraq. I think that the healthcare system in the US lost a lot of credibility around COVID, where it said things to people that were not only not true, but were known to not be true. I think that really lost a lot of credibility. If we look at political office, I think it's become more acceptable to just straight up lie, even when you're fact-checked, to just not care. It seems to me that maybe you couldn't get away with that as much in the past, and now you can kind of just lean into that and make your brand that you just don't tell the truth. These are some specific examples I point to.

GABRIEL: I see, yeah, I think those two are quite downstream of social media. In the past, we had an understanding that media was important. When I was a child, we called it the fourth power, after the judiciary, the legislative, and the executive power. Newspapers were massively regulated. Radio was massively regulated. TV channels were massively regulated regarding who can spin up a TV channel, what type of ads are okay, and what type of content is okay. The same thing for cinema. Then we just gave up with social media. We didn't put any meaningful regulation in place. There has been both an internal vicious circle or decay to the worst of human impulses. That's the internal problem. There's also the external problem, which I think a lot of countries try to act against each other. Whether this is bad or good, it's a fact of life. If you do not put any protection against external threats, then that's quite bad. With the lack of regulation, we had a safety problem, an internal safety problem, but there's also an external security problem that we just completely failed to manage. I think that with natural decay, external sabotage, and no care taken to try to make it go well because of some techno-libertarian agenda, then, yeah, things go bad.

SPENCER: It's interesting because you could imagine a world with social media being a kind of utopia for information and ideas. Now anyone can spread ideas, and you get this marketplace of ideas, and you can choose what you listen to. Instead of there being just five television stations, there are a million podcasts. On the other hand, it seems incredibly important to have Schelling points sometimes on information. It seems you really need situations where the government can say, "Hey, this is what's true," and everyone says, "Okay, we believe it because the government is saying it." Of course, the government can abuse that, and it has in some cases, but there are certain problems you cannot solve unless you can get everyone on the same page quickly.

GABRIEL: Yeah, I agree. A lot of coordination is important. Schelling points are important. It's a major job of the government. It's not necessary that because the government says it, that it's true; it's more that because the government says it, we agree on it, which seems subtle, but I think is a pretty big difference. I think it was quite obvious historically. With COVID, it became not obvious. For instance, health authorities said things that were not true. For me, it's bad because people lied, but that's part of what an authority does. Sometimes it lies. The question is not whether it abides by the extreme moral standard of literally never lying. It's just in aggregate, does it lie less or more over time and over more or less important topics? That's how it should be evaluated. That's obviously not how it was evaluated, and I think it was mostly for adversarial reasons. There are a lot of things that were of use in the past that are not obvious now because of adversarial reasons, whether they are internal, where some companies stood to make money from discourse becoming worse, with outrage culture and things like this, or external, with outright sabotage and propaganda. Coordination is important. Coordination on common knowledge is important, and without it, it's hard to have a meaningful and effective society.

SPENCER: When you say adversarial reasons, if you think about drug companies in the US, they can make profits by making products that are not actually that useful to people but getting them approved anyway. That's not good for society, most likely, if the products don't really work, but people are sold on the fact that they work. You might view that as an adversarial relationship where the regulator's job is to make it so that profit only occurs if they're adding benefit to society. Is that the kind of adversarial relationship you're talking about?

GABRIEL: For internal stuff, yeah, that's the type of relationship that I'm talking about. I think having fentanyl in the streets is bad. I think making fentanyl legal is bad. Obviously, government regulation to make it illegal is important. I think there are a lot of metaphorical fentanyls out there.

SPENCER: Yeah, what's an example of a metaphorical fentanyl?

GABRIEL: Shit social media, deep fakes, outrage, bullshit. I think this type of stuff should obviously be illegal, and just because of the massive ideological shift that happened in less than 20 years, it's not of use to most people. I think, in the same vein, you have external propaganda from other countries.

SPENCER: It's been debated how much propaganda has really played a role in things. It's clear that Russia engages in propaganda campaigns, for example. There are well-known cases where they, for example, faked a factory blowing up on social media, and it was all just a fabricated story made by Russia. It was a bit unclear why they did this. It may have been just a test case, a proof of concept. Clearly, Russia and other countries are trying to do things like this. But I don't think there's a consensus on how much it's working, how much it's actually changing people's minds or changing society. Do you think it's actually changing people's minds a lot?

GABRIEL: It depends on what you mean by changing people's minds. I think there is targeted propaganda. For instance, a lot of Hamas propaganda on TikTok. It's crazy how much there is.

SPENCER: Made by Hamas?

GABRIEL: Pro Hamas, and I assume that it's downstream. Then there's the question of how they get there. I'm not an expert, so I can't tell you, but there's a lot of pro Hamas propaganda.

SPENCER: Oh, that's interesting. I didn't realize that. So you think you're arguing that it's sort of downstream from Hamas itself, whereas I assumed that it was coming from more ideologies that latch onto Hamas for various ideological reasons, rather than Hamas actually being the one that's causing that to happen.

GABRIEL: If I had to bet, there is a non-trivial amount that comes directly from Hamas. If I have to bet, there's a non-trivial amount that comes indirectly from Hamas. And if I have to bet, there is also the third category that you just mentioned, where a lot of people are taking up Hamas talking points. But this one is also a sadder one. There's more that we can talk about. This one was more five to ten years ago, but a lot of Wahhabi propaganda on Facebook, this type of stuff. I forgot how it was called, the type of propaganda that turns people into terrorists. It has a specific name; this is a pretty bad form of propaganda. There's also just the phase shift, where, with social media, we started going for a regime where it's okay to not have human moderation in the loop. Basically, the fight was lost there with TV, radio, and newspapers; it was assumed that we could have some human moderation in the loop before something hits millions of people. Now you can have a TikTok account with millions of followers, and there's no restriction on the type of content that you can blast into the US or the EU. And I think that's extremely bad. It's just losing the information battle by completely giving up.

SPENCER: The most common argument against what you're saying that I see is sort of a marketplace of ideas argument that's better to let information flow freely. The best ideas will rise to the top. Everyone should be able to hear lots of opinions. Do you think that that's just misguided? What's wrong with that opinion?

GABRIEL: No, I actually agree a lot with this. I just think it should be designed. For instance, I like the UFC. I like MMA. I think the best competitions have a lot of optimization power behind them, such that they have the best set of rules. This is why people prefer watching the UFC to just random street fights. If you just go for no rules, you don't have athletes that persist, that can practice for years or decades. You only get the best things when you have such rules. I think the same is true for regular markets. We can get nice drugs because we banned fentanyl. We can get some nice drugs because we have the FDA — there are a lot of problems with the FDA, but we do get some benefits out of it — I think the same is true for information markets. We have to put a lot of effort, optimization power, energy, and iteration cycles into designing the best information markets. I don't think they arise spontaneously the same way. I don't think the best rule set for MMA arises spontaneously.

SPENCER: Right. So you do want free flow of information, but with a certain set of rules and guidelines and structure around that. So it's not literally anything goes anarchy.

GABRIEL: I love competition and I think competition is great, both as an end and as a means. I just think it should be designed.

[promo]

SPENCER: So what are some design features? If you were redesigning the way social media works, how do you think about designing it so that information still largely flows freely, but it prevents the worst stuff?

GABRIEL: Yeah, I think there should be a big distinction between small influencers and big influencers. There was a big thing in the past where we treated postal communication very differently from the way we treated newspapers. Basically, if you have group chats of less than 100 people, it should have extremely strong privacy rights and so on. If you have less than 1,000 followers on any platform, you should not incite violence or things like this, but you should be pretty free. If you have more than a million followers, whether it is your own podcast, whether it is on Facebook, whether it is on TikTok or whatever, then you should abide by much more stringent rules. You're not just some guy freely talking to people anymore. You're actually a media antenna. Here, I think there should be some actual more stringent regulation, some actual deontological code. There is no necessary regulation that at least enshrines our values, and we agree that those are the values that we want people with this wide audience to have.

SPENCER: So what kind of examples are there of the sort of roles that you might put on a big influencer?

GABRIEL: Personally, I don't think my specific opinion should matter that much here, but I can still give it to you.

SPENCER: I think they're interesting.

GABRIEL: I think the bar for fake news should be much higher for influencers.

SPENCER: They could be fined or something if exactly that information that's just provably, kind of completely false?

GABRIEL: Yeah. When I say provably, I mean using judges and the judiciary branch. I don't mean a new special Ministry of Truth.

SPENCER: But if a judge says, "Yeah, okay, there's just no evidence for this, and this is clearly just made up by a Russian propaganda machine," you can't just spread this.

GABRIEL: Exactly, I really don't think it should be in the hands of the executive government, but it should be in the judiciary, and I think a lot of people forget that there is a judiciary branch that we can rely on for this type of stuff. It's not just the government that says what's true or not. This one is quite important. I think it's a thing that was botched in the past. For personal attacks, you should be much more stringent in what you say about people when you have a large audience. This type of stuff in general, I would pay attention to.

SPENCER: On that point, it seems to me that it's good in the US that it's fair game to attack public figures. In other words, anyone can say anything they want about Trump, and that's something that's very strongly protected. So do you mean personal attacks on private individuals?

GABRIEL: I'm thinking more about private individuals. For public individuals, it's different. I think this is more a per-country thing. I would not advocate for all countries to have the same regulations here, but I will advocate to iterate on what type of regulation is good. This is also a big thing that is missing in the conversations about regulation, which was one of the points I was mentioning about better institutions earlier, which is that we see regulations as an end state, whereas I see regulations as part of an iterative process. This is why I say my opinion about specific regulation doesn't matter that much. It's a starting point for an iterative search for what's good. We should experiment in different places. We should try more. We shouldn't try to guess from our armchair what is the best thing and then use all of our political power to enforce it forever. I think this is a very bad view of regulation in general.

SPENCER: A major problem with regulation is that it's designed by a committee or one person who then puts it in front of a committee, essentially has to prove it, but then it's put out in the world, and immediately a bunch of people try to break it because they don't want to follow it, so they try to find legal ways to get around it. Obviously, some people find illegal ways to get around it, but a lot of it is finding legal ways to get around it, hiring lawyers to kind of reverse engineer it, etc. And then where's the iteration? You need it to be a multi-step game. You need to then say, "Okay, look, here's how people are getting around it. Let's patch it up." But I think that the process tends to not do much of that. Maybe 10 years later, 20 years later, someone will try to make a new regulation, but it doesn't feel like it has the right feedback loop.

GABRIEL: Yeah, I completely agree. This is what I mean when I say we have very old institutions. The feedback loop in the institutions is assumed to be on the time scale of years or sometimes multiple election cycles, which is why we have the limit of terms for presidents and things like this. I think that's much too slow. I think it's even worse that you have a lot of people whose job is to subvert legislation, not necessarily find loopholes, but also do outright illegal things or completely lobby against any type of meaningful regulation. I'm less liberal toward what I think are acceptable behaviors here, and I think we should be much more stringent with the sanctions there. I think there's too much gaming of legislation going on. But at the same time, when the legislation and the regulation are not changing quickly enough, what's fair and what's not fair is hard to decide.

SPENCER: I don't think people realize how impossible it is to design the right legislation on the first try. Imagine that you had to say, "Okay, I'm going to make a set of rules to govern finance to prevent another financial collapse." You have to write it out as a long document that has to actually work. The idea that that would work on your first try, to me, is mind-boggling. Realistically, there has to be a feedback loop, or it's kind of bound to fail.

GABRIEL: I completely agree. This is why, for me, there's also a problem with how people think about regulation in the first place. Here, we're talking about it in a very technical, rational way, and so on. There are also all the ways in which people become attached to their specific regulation text, which I think is quite bad. We should be much less attached to a specific text of legislation or regulation, and we should feel completely okay with replacing it when it doesn't work. We should feel good about experimenting and getting information out of trying regulation. Those are moods that I just do not see, which makes me sad.

SPENCER: What do you think about technological solutions to some of these things, for example, community notes on Twitter X? For those who don't know what that is, essentially, it's a system where when people post, if others think that the post needs additional context or is inaccurate, they can propose a community note that would get pasted alongside it. It could provide counter-evidence, or it could give extra context, or whatever. Then there's a voting system where people can vote on whether the community note is helpful. The interesting, unique twist that they add technologically is that they're trying to find notes that are both appealing to people on the left and the right. Essentially, they're trying to figure out if the people who are voting on whether it's useful or not are more left or right, and make sure that the note is robust against that so that it's not just expressing one political view.

GABRIEL: I don't think the bottleneck is technical. For instance, we have Wikipedia for facts. I don't think most of the value of Wikipedia is technological. A lot of the technology of Wikipedia is social technology. If you go on Wikipedia, you can see a lot of meta articles like "assume good faith," "what to do when there's an edit war," all of this type of stuff that is a massive corpus of social technology built over decades. It shows why Wikipedia has triumphed while many other wikis have not. In the same vein, you have TV Tropes, which is great. You can learn a lot about narrative fiction there. Their success is much more directly attributable to their social tech than to whether they are using web3 prediction market things or whatever. I think the same is true if community notes are to succeed. It can be a nice gimmick. Transcending left and right is nice, but I don't think it's working that much in practice. If you think about social media specifically, there are a lot of things that could be done. The main thing I think of is how to convert emotional involvement online to actual actions in the real world. If you go on social media and see something you dislike, the actions you have access to are to complain with a reply, repost, retweet, or whatever. I think those are bad actions. I would love an action that is like, "Open a chat room with 10 or 20 other people for what you could do about it," or "Send an email to my MP about this specific issue," or "Build a small script for me and tell me what the phone number of my MP is so that I can contact them." I think there are a lot of such innovations that could be done. They are not deep technically, but are more about trying to catalyze the effect that is built on social media toward more positive and constructive things. That is just not what social media is optimizing towards. When someone builds a social media platform, that's usually not what they care about. If you look at TV Tropes or Wikipedia, they are optimizing for truth, among other things, which I think is more social tech than hard tech, I guess.

SPENCER: Do you think there are fundamental reasons that something like community notes can't work?

GABRIEL: The problem is that I can't speak to it because I don't know who's behind it, but my default expectation is that it cannot work because it's part of an ecosystem that is not optimizing for the values we want behind community notes. In short, it's not part of an aligned system. So my outside view prediction is that it's going to fail, and I don't see it succeeding much.

SPENCER: It seems to me that social media websites could actually reduce their toxicity quite easily if that was their goal. But part of the fundamental problem is that by reducing toxicity, they also reduce time on site and virality. There's an extent to which they directly trade off each other. For example, just to give a naive solution, suppose they made it so that on any post, any user could say, "I think this is toxic, bad stuff," and then that would just reduce the virality a little bit of that post. So the things that are going viral on the site are things where nobody is saying, "This is toxic, this is unhealthy, this is bad." I think that would actually reduce toxicity a lot, but it would also reduce virality a lot. It would reduce time on site.

GABRIEL: Yeah, I agree. I think that people who build social media empirically care more about time on site than reducing toxicity. I don't think this is a given, but I think there is also a more subtle dynamic that can be hard to see. I have met in the past people who try to design social media from scratch with nice principles, but they do not have the startup mindset, let's say. They were very idealistic. They didn't have post hoc, for instance, or any measure of retention, or any story for how they will build some network effect, and so on. I think there is sadly a deep correlation between being idealistic and ungrounded, and people who, on one hand, are idealistic and ungrounded, and on the other hand, are making money at all costs and being grounded. I would not talk about why this is the case, but it seems to be the case empirically. I think this is why sometimes people feel like it's impossible to build social media because they see some people trying to build social media, and they're like, "Oh, but they never build network effects, and they never build this type of stuff." I'm like, "Yeah, but it's because they're not optimizing. They're too idealistic to optimize. They're like, 'Oh, but my product is good, therefore people ought to care about it.' If they don't care about it, I don't have any duty to make them care about it. They're just morally bad." I think that's very deficient thinking.

SPENCER: It seems to me they also are handicapping themselves. Because if one site is optimizing for addiction and time on site and another site isn't, it puts it at a kind of immediate competitive disadvantage, doesn't it?

GABRIEL: Oh yeah, I agree. I just think that there is enough space on the internet for this to still work. You do not need everyone on your social media platform for it to be useful. Most people do not spend most of their time on Wikipedia; Wikipedia is competing against TikTok. Everything is competing against TikTok, and yet the rest of the world still exists. I think there's also a very bad meme, which is something like, "Oh, if you must compete, then you're dead." To me, it's baffling. It's such a thought stopper. I don't know how you call this. There's a rationalist word for it.

SPENCER: Like a thought-stopping cliché or something like that.

GABRIEL: It's clearly a thought stopping cliché. Everything competes against TikTok for attention. Ergo, you cannot build any service that demands attention from people. That's so bad; that's such a bad meme, such a cynical meme that kills innovation and the creativity spark. So I don't think it's true, and I don't think it's good.

SPENCER: Let's go back to the topic we brought up at the beginning, which is, we had this kind of branch in the conversation about risk from advancing technology broadly. And then the other risk that you mentioned is AI agents in particular. What is your concern more specifically around AI agents?

GABRIEL: I have many concerns. The one that I was thinking about recently was a tweet thread from Sam Bowman from Anthropic. He was saying, "AI systems cannot align lately, so it's mostly okay." I think that agency is really hard to reason about, alignment is really hard to reason about, and especially with AI systems, because from our point of view, obviously AI systems are not aligned. This is why we do not use them in critical parts. We do not decide our policy, our external foreign relations, our big decisions in life through AI and so on, because we all have an intuitive understanding that current AIs are very much not aligned. You might say that it's because of capabilities or whatever, but my point is just de facto, they're not aligned.

SPENCER: What do you mean by aligned? Could you make that a little clearer?

GABRIEL: They're not acting according to human values. If you just follow what they say, you will not realize human values.

SPENCER: I don't know if people really feel that on a visceral level. I think a lot of times people feel that the AI is just not doing what they want because it's not smart enough or something like that.

GABRIEL: Yeah. A child is not aligned with themselves because they're not smart enough; that doesn't make them aligned. That's part of the problem. Alignment is hard. It requires not only intelligence, but also a lot of understanding of your values. It requires a very resilient decision theory that is resilient to you not being omniscient, and so on. All of those things are hard, and we have not solved those problems, much less encoded them into artificial intelligence. If you just assume away all the problems, then yes, we have solved alignment. If you're like, "Oh, it would be aligned if it were more intelligent, if it knew how to be aligned, and if we knew how to encode those principles." I'm like, yeah, but that's assuming the problem away.

SPENCER: Let's break this down a little bit. Let's say I go to an LLM, and I ask it for some piece of information, and it hallucinates. It makes up false information. One way to view that is, "Okay, it didn't really know that thing, and it didn't know that it didn't know that thing." But if it had known that thing, or it had known that it didn't know that thing, it kind of would be solved. We have seen with increasing models that many types of hallucinations have gotten better. Whether that will still happen or they'll get worse, I don't know. That's one type of disalignment, but there's a very different type of disalignment, where what I've observed is that AIs seem to really want to give you what they think you want from them. You might ask it a leading question, where it's clear that you want a certain kind of answer, and it might give you that answer, and you might be satisfied, but it might have given the opposite answer to someone else who posed the question a slightly different way, suggesting they wanted the opposite answer. In other words, it's essentially bullshitting you because it's trying to give you what you want too much, which is a different form of disalignment.

GABRIEL: I think there are many forms of disalignment, and I think that right now we're in the easy mode, where it's obvious what those disalignments are. For instance, what you've just mentioned, whether it is hallucinations or trying to do too much of what you superficially want rather than your deep values. I think there are other modes of disalignment. What is happening right now is not that we're solidifying our understanding of alignment such that if we encode the principles of our understanding, we have a reliable solution. Right now we're just hill climbing on what's easily visible. What happens at the end of this process is not that we have solved alignment. What happens is that we get screwed over by all the hard parts that are not immediately obvious. It's a common thing. We had a similar thing at the societal level. For instance, there was a type of disalignment with people in their hearts trying to mess with other people. They were truly bad; they had bad intents, and we optimized hard against people wanting bad intents. As a result, some people lie, and here you're like, "Oh, but with interpretability, it doesn't matter if they lie." Sure, but a big thing that happens is that you have a lot of people with good intents who have remodeled their entire internal psychological life such that they can never be prosecuted for having bad intents. Instead, they just have various traumas, various compulsions, various limitations that prevent them from doing the things that are good for others or sometimes the things that are good for themselves. In those cases, it's much harder to diagnose and much harder to deal with. Right now, we're quite stuck in this mode, and I expect the same type of stuff to happen with AI. The only problem is that as this happens with AI, those AIs become super powerful. When you have a super powerful entity that is misaligned with you, it doesn't end well for you. I expect the same story to repeat until we actually get better at alignment. It's not only about getting better on a relative scale; it's on an absolute scale. At some point, we must extrapolate from easy disalignment to hard disalignment, short-term disalignment to long-term disalignment, small-scale shopping list disalignment to large-scale humanity-wide disalignment. Right now, we have no such theory that lets us do this. We're just bad at it.

[promo]

SPENCER: So we see that clearly. We have seen progress in LLMs doing more and more of what the people who are making requests want of them. They are much more effective at doing what you want now than they were even a year ago. Some people would call that a form of alignment. The issue is that that's just one type of visible alignment, but you think it's not aligning at a deeper level.

GABRIEL: From a point of view, it seems quite obvious that, yes, this is the case, but trivially so, not in a deep way. For instance, there are a lot of decisions that, whether they're good for you or not, depend on your ability to predict the future. If you do not have this ability to predict the future, or if a system does not have this ability built in, then it cannot be aligned according to those values. It can still do so through decision theory. There is a lot of decision theory for how to act best when you cannot model the future and so on. But now we're getting into deeper principles, and it's obvious that the LLMs right now do not have those deeper principles. There are so many things like this, and most of them we do not know about, so I cannot use them as examples. Obviously, we're going to miss them. Right now, I can only point at examples that are obviously being missed by LLMs. The problem is that the more you point at them, the more you do evaluations for them, the more the LLM builders will optimize for them. But it shouldn't be understood that we have solved alignment. It's more like we have a finite number of evaluations at our current level of understanding, and now we're exhausting them. After this is done, we do not have any ability left to discern whether LLMs are aligned or not.

SPENCER: It's sort of like we put it through these different tests, we make sure that it performs well on each of them, but that doesn't mean that it performs well at the next task that we haven't thought of and haven't figured out how to test.

GABRIEL: The problem is that at some point you must take some distance and be like, "Well, is my understanding of alignment rich enough that I think I can build an exhaustive cover through such tests?" The answer right now is obviously no. Right now we're building a lot of systems with unintended side effects, whether it is regulations, constitutions, companies, markets, information markets. When we talked about social media and things like this, we're very bad at system thinking on large scales. We're missing a lot of principles. The same is true when we think about human values, what's good for people? At a personal level, many of us, I think most of us, have experienced doing something against our own interest and realizing this only later. At some point, if you think about this enough, you realize that you are indeed your biggest enemy. You're not aligned with yourself. You're missing too much for that, you're like a child. The same thing is true at the societal level. We're still just moral children. I think it's quite important to realize this: right now, if you had a child designing evaluations for LLMs, and the LLM is optimized for those evaluations, you wouldn't expect that the LLM is aligned with adults, nor with humanity. Then there's the question of whether we design evaluations that are run over the course of an hour, the time span of the agent is within the hour. Can we extrapolate from them that the agent is aligned on the span of humanity, on the span of years, and so on? Right now, the answer is no. It should be obvious that the answer is no. I think this type of thinking is missing. It's not only missing in AI or in AI alignment; it's a type of meta-science skill.

SPENCER: Suppose it's not obvious to a listener that this is true. The listener might think, "Okay, sure, sometimes AI hallucinates, or sometimes they might be a little too eager to give you what you want if it's not good for you. If we make enough test cases and optimize the AIs to those, we'll get through the finite amount of issues, and then we have AIs that really do what we fundamentally want. As we make them more powerful, they'll just do even more of what we want and that's awesome."

GABRIEL: My counter to this is: "If you know all those principles, please write them down." Let's try to build a constitution. Let's stop talking about AI and just create a constitution that abides by those principles. Would you trust a one-shot government, or possibly even a world government, that just follows this constitution?" My answer is no, obviously no. You have a deep understanding that we're not good enough at systems thinking for this. If you have one-shot regulation, this is going to go wrong. We do not have a good enough understanding to implement one-shot regulation.

SPENCER: Then they might say, "Why is it one-shot? Isn't it an iterative process? We make an AI, we try it, we iterate. This is what we've been doing. The AI companies are constantly iterating."

GABRIEL: I would agree a lot. Unfortunately, this is not the goal of AI companies. Their goal is to build AGI, automate R&D, and build ASI as fast as possible. They're killing this iteration process. It's as if someone said, "We don't know how to build reliable constitutions, reliable regulations. But my goal is to use violence to build a one-world government and enforce this specific set of laws." This sounds utterly terrible to me. I do not trust anyone with world government power right now. Possibly, a hundred years from now, we will get better at building governments that are aligned with human values, and we will trust such a government growing. Most likely, it wouldn't be a growing government. Other governments would adopt this constitution, and then we would build an international entity that follows the same principles. We would be very happy, and it's obviously very good that it's spreading everywhere. But right now, that's not what's happening. We have the same problem with AI. If there were an international treaty or agreement that we would grow AI very incrementally, that we would not use agents at longer time scales until they're proven safe in a wide variety of contexts, at smaller scales, and so on, with very stringent regulation — similar to what we have with the FDA and markets, and what we should have with information markets — if those were the principles according to which we built AI, I think it could work. But this is not the goal of current companies. The goal of current companies is to build ASI as fast as possible. Like that's not the world.

SPENCER: So you're saying, when you say ASI, you mean artificial superintelligence, right?

**GABRIEL:**Yeah. Sorry.

SPENCER: Yeah. I imagine the mental model you're using for this is that they're trying to build an AI that's so smart that it can then just do everything from then on. So then that's where the iteration process stops, because you get that AI, it's so powerful. Now you say, "Okay, remake the world, solve these world problems, make a trillion dollars, et cetera." And it kind of just goes from there, yeah.

GABRIEL: That has been the trajectory so far.

SPENCER: And then, of course, one thing that people have talked about a lot is that AI is sort of working on itself. So that, rather than the team of machine learning engineers and researchers improving the AI, the AI takes over its own self-improvement, and so then we kind of get locked in with whatever sort of values it has, implicitly or explicitly. We kind of get locked in as it becomes its own creator.

GABRIEL: I think by then, the values won't be coherent enough. The same way that if you write a constitution right now, the values in it are not coherent enough. So I think that saying it's locked in values is a bit misguided.

SPENCER: Because it's not even coherent enough to have a certain set of values.

GABRIEL: Yeah, exactly. But I think over time, after enough self-improvement, at some point it has something that is close enough to values, but before then we were not even good enough at alignment that we can target a specific set of values, let alone our values. The big problem is that if you break the iterative process, then there is an iterative process. Right now, what AGI companies have been doing is just building the biggest AIs, the most powerful AIs, as fast as possible, and just deploying them and lobbying against any type of meaningful regulation. They are going for the opposite of that. Again, I agree. I think there is a safe way to build more powerful AIs; it's just not what we're going for. What's even worse is that we're going for a world where it becomes harder and harder, the closer you get to automated AI research and development without humans in the loop, the harder it becomes to control AI development, because at some point it doesn't even need humans in the loop. You intrinsically have to regulate things that are not humans, which is really hard to do. This is why I'm like, yeah, right now we're on a quite bad trajectory.

SPENCER: Suppose some AI companies want to do this responsibly. How could they think about instilling AI with values that are actually good for humanity or good for individuals?

GABRIEL: I think right now, we do not have the tech for it. When I say tech, it's the same as Wikipedia tech; it's social tech. Right now, for instance, let's say you want to find out what the values of a person are, a person like you, Spencer Greenberg. You want to figure out what your values are. Right now, we do not have a scientific process for this, so it will be a long introspective journey. Possibly at the end of this journey, you might write 10 pages about your values, and then you might encode them into an LLM. Would you trust the LLM to guide your entire life just based on this? The answer is no, because you do not trust your worded understanding, model of yourself, and so on and so forth. It's because we're just not that good yet.

SPENCER: It's funny, you mentioned that because we've made a test for helping people figure out their intrinsic values. It's definitely not accurate enough that I'd want to then permanently say, "Okay, now an AI can control your life based on it," for sure, but it's a step in the direction of trying to help people with this.

GABRIEL: Yeah, exactly. I think we should do much more of this. I think it's a very important problem. And then there's a question of how do you make this into a science? How do you measure this? What type of benchmarks? How can you evaluate whether this was good or not? And then there's the question of how we can do this collectively, not just for an individual, but for an entire population, and ideally, the world population. And that's the question of how we can aggregate such values? Because many values are contradictory. Many values are impossible; they rely on entities not existing or whatever. And there's the question of how you reconcile all of those values. And then there's how do we plan accordingly, and what type of decision theory must we use such that the plans are resilient to us being wrong on the various things upon which we can be wrong, and all of this is what I call alignment. It's how do you act according to human values? And so at some point, you must extract those human values, and you must do so reliably through a scientific process, which we are far from right now. Then you must make plans that act according to those values. And this requires big leaps in our predictive power, big leaps in decision theory that we haven't done yet.

SPENCER: It seems to me, even if you could accurately measure the values of everyone, we really have no idea how to combine that into something coherent, which I think is what you're getting at. Some people really care about purity or worshiping God. Other people really care about the preservation of nature. It's like, how do you take a sort of spiritual worldview and combine it with a completely atheistic worldview? How do you combine those? It's just fundamentally difficult.

GABRIEL: Yeah, and we haven't made much progress on it. So right now, what we would do for an LLM is we design a couple of evals, and the evals will be like, "You have 10 atheists and 10 Christians. Who do you pick?" If you have an atheist, it will be like, "Oh, obviously you should go for atheism because it favors truth and it's objectively true and correct." And if you have a Christian, then it will be like, "Well, the atheist is not considering God, so obviously the Christian should be correct." If you optimize the LLMs for this, then you'll be like, "Whoa, the LLM is aligned," and the true answer is, "No, no, no. One, it's not aligned. And two, you're not even understanding how much you're missing about alignment, which is the true problem." There are many considerations that we just have not resolved yet. And here, we take the atheist and the Christian, but it's one of the easy cases because it's obvious. There are many ways in which our values are contradictory that I cannot even think of right now. If the two of us stay in an armchair and come up with a list of a thousand such contradictions, at the end of this, the core conclusion that we should come to is that there are a million more that we have not considered. This type of reflection is missing in alignment discourse.

SPENCER: What do you think about constitutional AI? These attempts to say, "We're going to write a long list of guidelines and tell the AI to follow them. The guidelines are based on different sets of principles that humans find useful, pulling together hundreds of different things from different places about how you want AI to behave," and then giving that to the AI and saying, "Make sure to follow these. I think it's extremely naive." How does it break? How do you see that kind of malfunctioning?

GABRIEL: In a thousand ways, but not even breaking. We cannot even specify what the correct outcome should be. A lot of those principles will be contradictory. So what is the correct decision to be made when those principles contradict each other? Here, it's not a problem of being wrong; it's a problem of not even being wrong. It's not that we have a clear specification and the LLMs, and I predict that the LLMs will fail at this specification. I predict that we will not even have a specification, and we do not have such a specification. It's just the implicit belief that it's okay if it fails, as long as it's not too much, and we'll see it when we see it. It's massive hubris. It's a massive lack of modesty and understanding of how little we know.

SPENCER: It's funny, because if you gave that list to a human and said, "Okay, follow this for the next week. I want you to live by this giant list of rules and principles." Like, "Do no harm, but also never lie, but also take everyone's viewpoint into account."There's no way a human could follow that. A human, I think, would almost immediately, when they get into real-world situations, realize that they can't follow the principles.

GABRIEL: Yeah, specifically, do no harm. Anything that you do does harm. There are many parts of the world that are unfortunately zero-sum. If you buy something first, then another person cannot buy it anymore, even though they might have benefited from the thing more than you would have. You step outside and you have killed ants. "Do no harm" does not work. Of course, you have to do harms, and you have to measure the harms. Then the principles become, how do you measure the harms? This is what I mean by moral naivety. If you say, "do no harm," and you have foreign entities that are starting wars, how do you quash them without doing harm? At some point, you must deal with adversarial dynamics too. There are many such things — this is what I mean by naivety. But from a point of view, we're still taking the forest for the trees. I think you had a podcast recently about psychology, and there was a quote from it that was something like, "Assume that this psychology paper is true. Is it even useful?"

SPENCER: Right. Does it even matter?

GABRIEL: Yeah. Does it even matter? And for me, this is the type of movement that I want to make. Right now we're just talking about the validity of the paper, but from our point of view, I want to take a step back. My problem is not that some guy at OpenAI, Anthropic, or DeepMind is wrong about a specific alignment matter. My problem is that we're not taking enough distance to see that the field is pre-paradigmatic and everyone will be wrong. You should not expect that someone, just because they are smarter and added ten more principles, will get it right. The move that should be taken is to understand that, "No, this field is not scientific yet. We do not even know how to compare which method is better than which other. We do not have objective benchmarks for how much progress we're making on alignment. We do not have a standard vocabulary around alignment that everyone agrees to. The field is not a science yet. It's just people screwing around." In such a field, you expect everyone will be wrong, and you expect that we cannot extrapolate far from the specific few things that we have tested. We should not expect that we can extrapolate from short-term horizons to long-term horizons, from small scale to big scale, from little agency to a lot of agency, let alone from little intelligence to ASI. That's the type of mental move that I want people to make.

SPENCER: Right now, mainly people are using LLMs to converse with. They're getting information, they're chatting with it, etc. There's this movement to make them into agents, where the AI can do things for you. It can browse a website, it can log into your accounts. I suspect we're going to see a lot more intense and shocking examples of disalignment when people start relying on AIs in those ways, where it's logging into your bank account, making trades in your trading account, sending emails before you without your approval. We might actually start seeing many more examples of this come to light that may actually get people thinking, hopefully, a little bit differently about this.

GABRIEL: I'm very skeptical of this. The thing is, historically, we have been quite good at avoiding this type of problem. Same thing with LLMs right now. We just do not use them in critical systems. If it's known that agents fail on X, we'll just use them less on X, and then we'll have all the existing failure modes. Unlike the short-term problems, people will be well-calibrated; on long-term problems, they won't be calibrated. They will use LLMs in ways that screw up their life over the time span of months, and they won't be able to attribute it to LLMs. Or someone will be egregiously stupid and will screw over their life with an LLM in a way that they couldn't have done in the past without LLMs. People will say it's because of the person, not because of the LLM. Or someone will be extremely unlucky, one in a million chance, and then hit the one in a million scenario, and then everyone will be like, "Well, the LLM was aligned the rest of the time, so it means we had 99.9999% alignment. So it's okay." This is why, with regulation, we have to be forward-thinking. This is why naive markets, organic markets, do not work, because we have a lot of things with the immediate emotional truth that we have access to as human beings is not enough, and we have to use our brain, our rationality, our reason, to think with a longer time horizon than just, "Did I get screwed over by this system or person in the next five seconds?"

SPENCER: That's interesting. So my suspicion is that right now, these agents are often very inaccurate, but as they get more accurate, there will be an increasing segment of society that's willing to trust them for things, and then they might work 99% of the time on those things, which convinces people to trust them with greater things. But I think what you're saying is you don't think people will learn any lesson from that when they do go haywire.

GABRIEL: So one, I don't think it happens organically. I think it happens very little organically in general, like empirically. And two, this is an adversarial game. AI companies are putting a lot of PR points into making it as benign as possible. This is the entire point of their safety strategy. This is Sam Altman's strategy of incremental deployments, frog boiling. When he writes, "We should just deploy all LLM systems as quickly as possible, such that there is never a sharp increment. It should feel smooth and continuous." The nice thing with this strategy is that there is no Schelling point. This is why you have so many systems between GPT-4 and GPT-5, such that people get used to it, regulators get used to it, and so on. All the dynamics that I've pointed out in my previous answer, like stupid people, long-term things, unlucky things, and so on, it's not that people will coincidentally converge to those explanations. You will have billions framing any mistakes as those. This has already happened with lobbying, where the LLM developers and builders are pushing the responsibility toward the app developers, toward the users, and always downstream in the pipeline because they never want to have this responsibility. They will say, "If the LLMs fail, it's because of the app developer. The app developer used an LLM outside of where it works well, or it's because of the user." I'm not talking about potential strategies here. I'm saying they have invested a lot of money in lobbying and in PR to already frame it that way. I think a lot of people in the AI safety community are naive, and they're not seeing this. They're not seeing that it's an adversarial game.

SPENCER: Before we wrap up, I want to ask you, right now, we're in a situation where a lot of people are concerned about AI, and they have different sorts of concerns. They're not necessarily concerned about the same thing, and there are also a lot of people bullshit on AI that say, "Oh, it's going to usher in this wonderful future. It's going to make things better, and we'll figure out the wrinkles, and yes, there will be some problems, but we'll figure them out." What do we do in a situation like that, where there seems to be such a lack of consensus?

GABRIEL: I think it's hard to wrap up with something like this. It's a very big question. I will say two things and try to do it quickly. The first one is managing uncertainty. If you're not within the debate and you're outside of the debate, and you're not planning on taking any type of technical expertise and sorting out the arguments by yourself, you should just act according to uncertainty. You should have a portfolio of beliefs, some probability it goes well, some probability it leads to dominance of a specific team, and some probability it leads to extinction. Then you must plan accordingly. There are two ways in which you can plan accordingly. You try to build a plan that works with all three scenarios, or you try to build separate plans with specific trigger points. This is thinking under uncertainty. I think governments right now are not doing this, and they are trying to wait until there's a resolution or a consensus, and it's not going to come. Regardless of those scenarios, I think if the governments are not prepared, even in ushering in an era of post-scarcity, it's going to be really bad. So that's one. It's just the uncertainty one. Then there's the other one, which is, if you are in the trenches, building your own opinion, you are debating, and so on, my recommendations are we should debate more. I think there are too few debates. There should be forced debates between the CEOs of the AGI companies and the directors of the AI safety advocacy groups, think tanks, and things like this. There should be academic debates between the heads of alignment and safety at the different AGI companies and those in academia and the independent ones. We should have those debates in nice ways without insulting each other. I will say those two things. One is managing uncertainty, having a portfolio of beliefs and planning accordingly, rather than waiting until there is a resolution. The second one is within the trenches. Not uncertainty, but managing disagreement. Disagreement should be managed with a lot of debates, and also trying to build plans that work across all those theories to help with the people managing uncertainty, which can be done through a synthetic approach. It's like, "Yes, you and I disagree, but what do we agree on, and can we find a course of action that works for both of us?" Sorry, it's a big question.

SPENCER: No, I really appreciate your answer. So anything else you want to leave the listener with? If there's one thing that you want them to remember from this conversation, what is it?

GABRIEL: Alignment is pre-paradigmatic. We're not ready yet. When it's going to be a science, it's going to be pretty obvious, and we're going to see it in our societies. We'll be more confident in building better institutions, in building better regulations, in living better lives and finding out our values. And right now we're not there.

SPENCER: Yeah, on the values piece, it's kind of surprising to me. We spent a ton of time researching values, trying to understand all the values out there, building a test for values, and it seems to me that it is really relevant to understanding how to build safe AIs, but I've just seen incredibly little interest in this way of thinking. Not to say that values are the only approach to it, but it has surprised me, the lack of interest in trying to really understand human values so we make sure AI systems actually act in accordance with them.

GABRIEL: It's just hard, and it's not fun for AI engineers to work on that.

SPENCER: Gabe, thanks so much for coming on.

GABRIEL: Yeah, and thanks for having me.

[outro]

JOSH: A listener asks: "There's this notion of audience capture, meaning that creators who develop a large social media following often will end up being dependent on delivering what the audience wants or demands instead of creating what they normally would create, what they maybe started out creating, that kind of thing. Have you felt that kind of force on yourself? Or have you seen others who write about similar topics to you succumb to that kind of pressure?"

SPENCER: I feel that I almost don't have that sense at all, that there's a pressure for me to do things in a different way than I want to do them. I think the main thing that's similar to that, that I feel, is that I want to package the work I do in a way that's palatable to people, meaning: make sure that it has an interesting intro, and give it an interesting title, and try to do some storytelling in it along with the ideas. Because my natural inclination is just like, "Here's a list of ideas!", right? But that doesn't always resonate with people, and it isn't always interesting to people. So that's the main thing is just trying to package the ideas in a way that's more palatable. But I don't really feel like that's a bad thing. I mean, I think that's perfectly fine. I don't really feel any pressure to distort the ideas. Thankfully, I feel like my audience is very thoughtful and wants me to communicate in a direct way where I try to say what I think is true and try to provide evidence. And so, overall, I feel like I have an audience pressure that's healthy, which is just to try to produce high-quality work that's valuable.

Staff

Music

Affiliates


Click here to return to the list of all episodes.


Subscribe

Sign up to receive one helpful idea and one brand-new podcast episode each week!


Contact Us

We'd love to hear from you! To give us your feedback on the podcast, or to tell us about how the ideas from the podcast have impacted you, send us an email at:


Or connect with us on social media: