Episode 031: Superintelligence and Consciousness (with Roman Yampolskiy)

Enjoying the episode? Want to listen later? Subscribe on any of these apps or stores to be notified when we release new episodes:

Listen on

Apple Podcasts

March 11, 2021

What is superintelligence? Can a superintelligence be controlled? Why aren't people (especially academics, computer scientists, and companies) more worried about superintelligence alignment problems? Is it possible to determine whether or not an AI is conscious? Do today's neural networks experience some form of consciousness? Are humans general intelligences? How do artificial superintelligence and artificial general intelligence differ? What sort of threats do malevolent actors pose over and above those posed by the usual problems in AI safety?

Dr. Roman V. Yampolskiy is a Tenured Associate Professor in the department of Computer Science and Engineering at the University of Louisville. He is the founding and current director of the Cyber Security Lab and an author of many books including Artificial Superintelligence: a Futuristic Approach. Dr. Yampolskiy's main areas of interest are Artificial Intelligence Safety and Cybersecurity. Follow him on Twitter at @romanyam.

Further reading

Superintelligence: Paths, Dangers, Strategies by Nick Bostrom

JOSH: Hello and welcome to Clearer Thinking with Spencer Greenberg, the podcast about ideas that matter. I'm Josh Castle, the producer of the podcast, and I'm so glad you joined us today. In this episode, Spencer speaks with Roman Yampolskiy about problems related to independence and consultation in AI, the relationship between consciousness and computation, the value alignment problem, and unethical and malevolent behavior in higher intelligence.

SPENCER: Thanks so much for coming. It's great to have you here.

ROMAN: Thank you for inviting me.

SPENCER: The first topic I wanted to talk to you about is superintelligence broadly, and then additionally, the question of whether it can be controlled. So could you just start by telling us a little bit about how you think about what superintelligence is?

ROMAN: For me, it's an entity that is smarter than me, smarter than all other humans in every domain. So it's a better chess player, better at driving cars, better at doing science, a kind of superset of all possible skills.

SPENCER: I can understand wanting to define it that way. But I would also say that more narrow superintelligences, if you will, that are just superhuman in more specific abilities could also be really interesting. So maybe we'll talk about those as well. But let's start the conversation talking about a hypothetical superintelligence, presumably some kind of artificial intelligence made with computer science and algorithms that's smarter than humans in all intelligence domains. So why do you say that such an entity cannot be controlled?

ROMAN: Well, after years of research, I really started this work with a lot of belief that it can be done. That was the intent. But the more I analyze the problem, the more I look at it, the more I find that all the components of potential solutions are either known to be impossible or very likely to be impossible in practice, and just the combination of all those subproblems taken together is very unlikely to be possible if subproblems are not.

SPENCER: Can you walk us through a little bit of your progression as you studied this and the different solutions you looked at and why they don't work?

ROMAN: Right. So how would you define a well-controlled superintelligence? What does it mean to have control? You can start looking at different types of control. You can have direct control, where you give orders, and the system just does whatever you say.

SPENCER: This would be, for example, imagine someone builds an artificial intelligence that's designed to be a helper for humans, and you want to be able to give it an order, and it would go carry out your wishes, right?

ROMAN: Exactly, exactly. You give it direct orders. You want something done, so you say, do it, and the system does exactly what you told it to do. Now that's probably the most common interpretation, and it's very easy to see how it backfires. That's kind of your standard Genie story; you have three wishes. You wish for everything to be made out of gold or something like that, and it's always not what you had in mind, right?

SPENCER: I think when people hear those stories, they seem like a clever twist, that the thing you wish for wasn't the thing you wanted. But I think there's something deeper and more fundamental, which is that when you make a wish to a powerful entity, it's very difficult to specify the full wish in some sense, right? So you might say, oh, I want to be rich, but you might care a lot about the mechanism by which you become rich. In making that wish of, I want to be rich, you don't mean, you know, turn your friends and family into gold, okay? A human would obviously know that you don't want your friends and family to turn into gold. But if you're telling some powerful entity that has no constraints on how it achieves the goal, it may achieve the goal in a way that you're very unhappy with.

ROMAN: And there are common examples. Okay, I want you to cure cancer, so maybe killing all people will get rid of cancer. There are a lot of different ways to misinterpret what you had in mind and find a very awkward path to get to that goal. So I think it's pretty obvious that direct orders are not what we want.

SPENCER: I want to keep talking about this a little bit because I imagine a lot of people may not be convinced by this, because they might say something like, well, if it's really that intelligent, can't it figure out what you intended? So, you know, if I'm talking to you and I say, hey, Roman, would you mind picking me up a drink when you're outside? You know I don't mean a drink of poison. You know I mean that I don't want you to kill anyone in order to get it or steal it. So I think a lot of people would imagine, if this is really an intelligent entity, can't it actually allow the implicit inference to figure out what was really intended by that order?

ROMAN: Right? But it is guessing. When you told me to get a drink, did you want a Coke or vodka?

SPENCER: Right? And I think this goes to questions of how does Google answer a question that you give it? It is also guessing, in a sense, and if it's guessing accurately enough, we say it's done a good job. But on the other hand, Google answering a search doesn't have the massive side effect of causing some horrible problem in the world. So I guess a related question is, how much of this issue is linked to the power of the AI to affect things in the world? Because if all it could do, for example, was provide you with information, maybe it would seem less scary than if it could actually go out and do a bunch of things, change a bunch of things in the world in order to achieve the order that you gave it.

ROMAN: It's pretty easy to convert information into actions in the world, whether you're predicting the stock market or solving some scientific problems. Information can be monetized. It could be acted on. Just because it's only information doesn't mean it's safe.

SPENCER: Well, and this starts to get into this topic that I think a lot of people think about when they think about making safe artificial intelligence, which is, could you somehow create a buffer between it and the world? So the types of actions it can take are really constrained, like, the only thing it can do is output text on a terminal interface or something. And when I think about the scenarios, immediately I begin to wonder, what is it possible to achieve through some kind of very minimal interface? And it seems like, actually, potentially, the answer is a great deal. If you think about what hackers are able to achieve by just in a chat room talking to someone pretending to be technical support or something like that. There actually is kind of a surprisingly large range of things someone could achieve, especially superintelligence that's smarter than us in every way, by stipulation, through even just the simplest text interface.

ROMAN: That's the whole idea, AI boxing will put it into this confined, virtual environment. But then if you are interacting with it, if you're observing it, you're no longer keeping it isolated. So now it has access to all sorts of social engineering attacks against you.

SPENCER: One of the things that really struck me about the keep the AI in the box argument was I heard about a hack that basically works by flipping bits rapidly in a computer and then exploiting certain quantum mechanical effects that cause other bits in the computer to flip. So it's not due to the logic or programming of the machine. It's actually due to the physics of the way bit flipping occurs. This was kind of mind-blowing for me because it made me think about even if you had a system where you could prove that logically, based on your software and algorithms, something was constrained in its action. In theory, if it was smart enough, it could figure out how to explain physics to actually work around the logical constraints.

ROMAN: Right. It can think outside of the box you place it in. Any proof you develop is a proof of that specific method of escaping it, but there is an infinite possible number of perhaps you're not considering.

SPENCER: Okay, so we've talked about the direct ordering approach. There's one more thing I want to say about that, which is that it seems like a very intelligent agent automatically gets a lot of different goals. I think that people don't necessarily realize this because if you build an agent that all it's there for is to answer questions on a website or something like this, and it's not that smart, then it's kind of limited in the actions it can take. But if it's super smart, and you ask it a question that it doesn't know the answer to, if it's smart enough to realize that it doesn't know and that the only way to find out is to do some complex set of actions in the world, then suddenly it opens the door to having lots of other side goals that are different from the ones you intended to program it with. For example, it may realize that if it acquires resources, it will better be able to serve your needs by answering questions in the future because it could spend money to hire people to help it get more knowledge. Or it might be so smart that it realizes that if it were to be shut off, it would be worth answering people's questions. So it actually doesn't want to be shut off because then it can't achieve its goals of answering questions. Then you might say, "Well, why not program it so that it doesn't mind being shut off?" Even if you achieve that, it may not mind being shut off, but maybe it now has an incentive to create some copy of itself that won't be shut off so that it can still answer questions even if it is shut off and so on. So there's kind of this whole world. Is there anything you want to say about that?

ROMAN: It's important to understand that you can also encode pretty much anything as a question. There are no limits. It's not just safe questions. You can ask about engineering designs. You can ask about secret information. Anything can be encoded as a form of a question to such an entity.

SPENCER: Right. So if you ask it, how do I make some extremely dangerous weapons? Or you ask it a question about U.S. military secrets, how is it going to figure those out? Who knows, but if it's a superintelligence, maybe it finds some way to figure out.

ROMAN: Exactly. How to take over the world, how to win elections, anything.

SPENCER: And this goes into a topic that I think is important to distinguish, which is, on the one hand, there's risks of creating a superintelligence that come from the superintelligence itself, and on the other hand, there's risks from creating a superintelligence that come from whoever might end up controlling it. So even if you could control it, then there's a question of who is in control of it and how much power it gives them. Just for the sake of argument, imagine someone is in control of a superintelligence, and they start wielding it towards hacking all the world governments, or towards trying to win at the stock market and make a trillion dollars, or even controlling a population. Even the idea that it's controlled doesn't necessarily give me a huge feeling of safety, but it sounds like the work you do is more focused on the question of whether it can be controlled at all. Is that right?

ROMAN: Well, I do both. I have quite a few publications specifically on malevolent design and purposeful use of even well-aligned AIs for malevolent purposes, and I think that's the hardest problem. With software and design, there is a lot we can do with the malevolent aspects of it. When someone does it on purpose, it becomes a much bigger problem. As far as I know, there are very few potential solutions for that. So let's look at the options. Direct ordering is one extreme. What is on the other end? The other end is where you have this ideal advisor, a system that knows better than you, smarter than you, and will decide for you what's good for you. The exact opposite of direct ordering.

SPENCER: Does that mean it's actually taking actions on your behalf without you even having to tell it what to do?

ROMAN: You don't have to tell it anything. It knows what you want. It knows what's good for you, and it's just better at making those decisions. You tested it forever, and at some point you went, why are you asking me, you know what drink I want.

SPENCER: That just sounds scarier to me than the idea of trying to order it around. What are some of your thoughts about the risks in that approach?

ROMAN: Here, you're no longer in control, right? Someone else makes those decisions. You may like them a lot. You may be happy with the decision the system makes, but you're definitely not in control by definition.

SPENCER: Many people might find that idea unappealing right out of the gate.

ROMAN: If you ever change your mind, it's not obvious that you can undo it. Once you transfer power and go, okay, clearly you're smarter, make those decisions for us, it may be a one-way change.

SPENCER: Got it. So we have this kind of spectrum from you telling it what to do, and it's only acting based on your commands, all the way to it's just always acting on your behalf no matter what. Are there other general options that we should be thinking about?

ROMAN: Anything in between. You have these mixed models where the system has a lot of capability and is trying to figure out what you would want, maybe in consultation with you, and help you with those. There are degrees of how much independence it has versus how much consultation it has to do with you. For each one of them, there are problems related to this particular approach. If, for example, a system is more capable and smarter and is consulting with you, it's very easy for the system to convince you to take a particular decision. It controls the facts. It knows better what you would prefer. It's kind of like propaganda-based guiding of voters towards a particular outcome.

SPENCER: Got it. In these scenarios, to what extent are you thinking of an AI as having a guiding objective function that it's trying to optimize versus other ways of designing it?

ROMAN: Whatever way you select, it still has to somehow interact with our desires, right? Is it trying to do what we want? Is it trying to do better than we can? The point I'm trying to make is that under each one of those scenarios, we are either unsafe or not in control. There is not a scenario where we go, this is exactly what we want to accomplish here.

SPENCER: Let's think for a moment about what the different options are for how we would even program something to do our desires. One way is to build an objective function, like you do for a traditional machine learning system, where you basically encode everything that it's trying to achieve, and you tell it to maximize that objective function. One of the disturbing things that you see come out of this is when this is applied in game-like scenarios. For example, in Mario Brothers-type games, the AI will find very weird ways of maximizing the objective function that were really not intended. I've seen this happen in a racing video game, a water racing game, where it figured out that if it just went in a circle forever in a certain spot, it could actually get arbitrarily high numbers of points due to a weird construct in the way the game was made. You're thinking about designing the objective function; you're certainly not intending it to just go in circles forever. You're intending it to actually learn to play the game and get to the end of the race. It seems to me in these objective function-based strategies, one of the main challenges with them is that often there are weird edge solutions to them that actually lead to a higher number of points, but are really not what we intended to encode when we designed the objective function. Any thoughts on that?

ROMAN: Absolutely. I have a number of papers on exactly those types of examples, AI accidents. We call them AI failures, and you can see a trend. They're becoming more common, more impactful, but it's always kind of the same. I made a thing to do x and it failed to do x for me. Huh? What a surprise. The more complex the environment, the harder it is to specify all the things it shouldn't be doing as part of your common sense. Obviously, I didn't mean for it to turn off a computer. Obviously, I didn't mean all these other possible ways to accomplish the same thing. When we get to the real world, the complexity is just so big we cannot possibly specify even a small number of those restrictions,

SPENCER: And when you get really down to it, at some level, we want to act in accordance with human values. Philosophers have debated for thousands of years how to define human values, how to define what's good, and it just seems completely untenable. The idea that we would actually be able to program in an objective function, the things we truly care about, rather than just some coarse approximation of it. The problem with the coarse approximations is that if you optimize on them too hard, there could turn out to be edge cases that lead to weird consequences we didn't expect.

ROMAN: And this is the type of thing I mentioned as impossible solutions to subproblems. We know things like voting is impossible to really accomplish, where everyone gets what they want. There are proven theories in ethics and morality; we know we cannot get people to all agree on what is good, even what is bad, what is the right course of action. And that's all of us as humans have pretty much the same brain design, and we're still not in agreement.

SPENCER: Yeah, I think one way to think about this is if the superintelligence is optimized too heavily on the goals of one person, even if it is successful, then it potentially gives that person the ability to push their goals on everyone else. On the other hand, if we have to optimize it with respect to the goals of all people, the whole world's goals, then we have to understand human values and morality and all these incredibly thorny issues that we seem very far away from actually being able to encode in a reasonable sense. Both of those seem to have real challenges, and that's if you're actually able to control it at all. So, okay, we've talked about the objective function-based approach. What are other ways of thinking about designing an AI that's not just based on optimizing objective function?

ROMAN: You can try to create a number of small modules, all of them having their own purposes, and the combination of them serving as a bigger, unified intelligence that has certain advantages. But there are some emergent properties where it's not obvious how the whole will act, and it doesn't seem to be any safer, again, not because of how it operates, but because the thing we're trying to do doesn't seem to be possible. We are not sure what it is we want. We don't agree on what our goals are. We don't have a pretty good idea of what we should be doing with our time, so it's very hard for us to transfer that into systems that are supposed to be accomplishing that since we don't know what we're doing.

[promo]

SPENCER: With regard to these systems that don't have one overarching objective function that they're trying to optimize for, but rather have these different modules, does each of those sub-modules have its own objective function? And it's just that the whole system doesn't have one objective function?

ROMAN: Yes, that's a good way to think about it. Think about a human being as this society of agents. You have one agent that goes, "Am I hungry or not?", and if I'm hungry, we should look for food. Another agent is one looking for water; another one looking for sleep. You have thousands of those things. You don't have to have one large goal in life, like wanting to become president. It's enough to have all those interacting together, and that will get you through your day, just trying to satisfy the submodules.

SPENCER: It seems like it would be even harder to understand what such a system would do compared to the objective function perspective, because you have all these different objective functions in competition. You can imagine at different points, different ones of them temporarily winning over others. It's kind of like having an objective function that's changing over time, which seems to me potentially even harder to model and analyze than just a single objective function. What do you think about that?

ROMAN: It may be the case you're looking at emergent behaviors when you're looking at swarm algorithms. For example, each individual in a bee hive has a specific purpose, but the whole hive has a very different set of outcomes, so it may be harder to analyze.

SPENCER: I've also seen some other approaches to try to control very intelligent systems. One would be the debate approach that OpenAI has been pursuing, where the idea is that maybe a human can't directly inspect what an AI is doing or even necessarily understand the reasons that a very intelligent AI made a certain decision, because it may depend on thousands of factors. But perhaps, if you had AIs in competition with each other — let's say one AI is trying to prove X and the other AI is trying to prove not X, and they're both trying to explain to a human why their perspective is right — maybe a human could judge that debate much more effectively than they could judge the output of a single AI. So there are a lot of different variations around this. I know they have some kind of practical, real-world experiments they're trying to do, which are kind of baby versions of this. I'm curious to know what you think about this general attitude.

ROMAN: I think it's an adversarial approach. You're putting a human in the middle of two adversarial agents, kind of like we have debates during elections or maybe during jury deliberations. You have two lawyers trying to impact the jury. Usually what happens? They go beyond just logic and arguments. They try to manipulate your psychology. They try to profile you. If a system is smart enough, there are different ways of impacting human observers.

SPENCER: It's funny, I worry about exactly the same thing, that they might learn to exploit bugs in the human mind. Even if one of the AIs is trying to convince you of the right thing and the other AI is trying to convince you of the wrong thing, it's not clear to me that the one trying to convince you of the right thing actually has an advantage --- if this agent had figured out the full mapping of ways to persuade a human being.

ROMAN: Absolutely, we see it all the time. Again, with elections as a great example, your personal preferences and biases will play into it. If something is easier to explain and understand, it will win over a more complex but correct idea.

SPENCER: You can imagine an AI convincing you that if you believe X, you're a bad person, or if you believe X, your parents won't love you. I'm being a little silly, but basically, the idea is it's not clear that truth necessarily has an advantage in that particular debate. I hope it does, but I'm not sure about that.

ROMAN: It would really depend on who the audience is. Even within humans, there are very different populations with different education, different intelligence, so the same argument may not work really well in different crowds.

SPENCER: Another approach that I've heard mention of is basically trying to get the AI to predict what you would have wanted. The training data, in some sense, would be that it produces an output, and then the human evaluates that output and says, "Oh, yeah, that's the kind of thing I would want. That's the kind of thing I wouldn't want." So you're basically, rather than trying to train it more directly on predicting the answer to this question, training it on predicting whether a human would be satisfied with the thing you produced. What do you see as the challenges with that kind of approach?

ROMAN: So if it's a superintelligence we're talking about, I think the line goes, what would you want if you were smarter, if you had more time to think about it, if you were better educated? Essentially, the question is, what would that other person want, not you, but someone else.

SPENCER: You mean, the version of you that's smarter and has infinite time to think and has reflected on their values.

ROMAN: Right. Better looking, whatever. And that person may want to drink wine and watch theater, and I like beer and circus, and I don't care what that other person wants.

SPENCER: I think I'm not quite getting how that connects to the general strategy of trying to train an AI on predicting what people would want and having human labelers kind of rate, yeah, that is what I would want. That is right.

ROMAN: Let me explain. So if it's a narrow system, the systems we have today, predicting what you want is wonderful. It works. But this is kind of what we get with the Facebook algorithm, right? It just takes you down this rabbit hole where you get more and more weird videos about Flat Earth because you seem to like them and click on them. It's accurately predicting that you're going to like them.

SPENCER: Yeah, a kind of race to the bottom in terms of what it gives you.

ROMAN: Exactly. It trains you to respond to this easily predictable stuff instead of diversifying your intellectual interests.

SPENCER: Got it.

ROMAN: Now, if the system is super intelligent and it goes, "Okay, what would you want if you were smarter?" Not you with your current situation, but you who is much smarter, better educated, has more time to think, and better resources. What would that agent want? Even if it correctly computes that, it has nothing to do with you. That's not you; that's someone else.

SPENCER: You're saying you might still be very dissatisfied with that.

ROMAN: Absolutely. If you have kids, you know better. You know that they should be eating carrots, but they want to eat candy.

SPENCER: Got it. But what if you train the superintelligence to try to predict what you actually want rather than what you would want if you were the smarter, more thoughtful version of yourself? Is the problem there that it just becomes like Facebook squared, where it's hacking even more of your dopamine system and just getting you to click in circles for the rest of your life?

ROMAN: That and also, we are terrible at knowing what we would want. Right now, you may have those specific dreams. You think if you just had a billion dollars, you would be so much happier. But once you get it, you may realize, okay, this is not at all what I wanted.

SPENCER: I think that kind of averages out if you were to have lots and lots of people rating lots and lots of actions and trying to train an AI system that way.

ROMAN: It's even harder with multi-agent systems. I'm talking about a single agent. If you're doing it for multiple people, everything cancels out. For everyone who wants one thing, there is someone who wants the opposite.

SPENCER: Got it. I see. Do you see any of the current approaches being worked on that might possibly be extended to the superintelligence situation as being at all promising? Or have you kind of given up on all of them?

ROMAN: At best, they take a specific problem, a specific bug. I haven't seen anything that is a prototype for this will solve this big issue, and it scales to any level of intelligence. We just need to code it up.

SPENCER: Got it. Are there any promising threads that you think are worth pursuing, even though they're not there yet?

ROMAN: I haven't seen anything. My experience with it is a fractal. We identify a problem, we propose a solution, but at the end of the paper, we identify 10 new problems within that solution, and you zoom in on that, and you repeat the process, and every time you discover 10 new problems.

SPENCER: Yeah, that's pretty demoralizing.

ROMAN: You want to understand what the reality is. If it's really that way, and I'm correct, then we need a very different type of approach.

SPENCER: I think a lot of academics are less concerned than you are. I would say that I'm substantially more concerned about artificial intelligence than most academics are. I'm curious, do you agree with that characterization, and if so, what do you think is going on there? Why do you think academics are less concerned about the control issues of artificial intelligence?

ROMAN: I don't know about academics in general. Are you talking about computer scientists specifically working on artificial intelligence?

SPENCER: Yeah, exactly.

ROMAN: I'm actually working on a paper right now, kind of addressing different types of skepticism about AI risk. I think I'm up to 30 different reasons for why it happens. A lot of it is motivated thinking, biased reasoning. If you are working on something, it's very hard for you to say, this is terrible. I shouldn't be doing it.

SPENCER: To clarify that, you're saying if someone has devoted their life to working on artificial intelligence, they're not going to want to believe that it could lead to really negative outcomes.

ROMAN: Exactly. If your prestige and job depend on continuing this research, you're not going to accept any arguments saying that maybe this is dangerous. With years of experience, you develop a biased attitude that, okay, we're barely making progress. I've been doing this for 50 years, and we barely learned to do this. You're saying we're going to get superintelligence. This is nonsense. We'll get another winter.

SPENCER: Right. You're referring to the AI winter. Can you tell us a bit about that and why you don't see that as very plausible?

ROMAN: Historically, every couple of decades, we make some good progress. AI researchers promise that human level is right around the corner. It doesn't work out, funding is cut, and it just kind of kills the research for another decade, and then it restarts again. It seems this time is different because so much funding comes from corporations and private money, not just government, NSF, DARPA, IARPA. It would be very unlikely to lose funding, given how much commercial interest there is in this technology, not just amazing progress and how well it scales, but just the funding side of it.

SPENCER: My understanding is that OpenAI just did a billion-dollar deal with Microsoft, just to give one example.

ROMAN: And they started with a billion, so again, funding doesn't seem like an issue.

SPENCER: Yeah, that's interesting. What are some other reasons people give or you think motivation is why people aren't so concerned in the academic field?

ROMAN: Some think it's impossible, in principle, to create human-level intelligence, so it will never happen. Others have suggested that if it's so smart, it will be nice, because smart people are nice, so it will be benevolent.

SPENCER: Yeah, let's talk about that one for a moment. I've heard that too, and I think it's a very interesting thing. It seems like just right off the bat, there are sociopaths. They're very smart, they're unusually smart, and do not seem benevolent at all. I'm a little confused by people thinking that intelligence automatically makes you benevolent, but it also ties in with this idea of the orthogonality thesis, which, as I understand it, is the idea that, imagine you're trying to develop a really smart AI. In principle, it could be really intelligent and be given almost any set of goals, and the intelligence is essentially orthogonal from the goals you give it, right? You could make it really intelligent and tell it to just spend its entire existence making paper clips. Or you can make it really intelligent and tell it to spend its time trying to cure cancer. In both cases, it could be equally intelligent.

ROMAN: Absolutely. You can see it with humans. You can find humans of any level of intelligence doing all sorts of crazy things. I don't think that's a solid argument. There is some research showing that as we get more advanced, society as a whole gets a little bit nicer. We kind of give up on slavery. We're trying to torture animals less. I think that's the trend they are trying to extrapolate. But it's definitely not a guarantee of safety, in my opinion.

SPENCER: That's really interesting. You also have comments about how trying to prepare for a superintelligence being is like worrying about population problems on Mars. It's this idea that we're 1,000 years or 10,000 years away from having these systems. What do you think of that kind of response?

ROMAN: It's common. It's popular. But think about something like the car industry. The car industry started with electric cars at a time when climate change was not an issue. Thinking about global warming or global cooling would have been insane. I mean, you're 100 years away from any issues. Why would you think about it? But if we had and didn't start using fossil fuels, we would be in much better shape, we would be safer. It's kind of the same. Let's say it's not even close, and it will take 100 years to get to AI. Making good design decisions today will definitely pay off in 100 years.

SPENCER: Yeah, I think I would add to that that there's just so much uncertainty. For some reason, uncertainty often makes people feel better about things. They're like, "Ah, it's uncertain, so let's not worry about it." But for me, I have the opposite reaction. I don't know whether it's 10 years or 1,000 years. I genuinely don't know. That actually makes me more nervous about it because it means it could be 10 years. How low a probability am I willing to assign at 10 years, 20 years, 30 years, 40 years? It's hard for me to assign such a low probability at, let's say, 25 years, and not think we shouldn't be taking this seriously right now.

ROMAN: Absolutely, I agree with you. It's better to be careful than regret it later, right?

SPENCER: People think of it as overconfident to think that something like this might happen. There's also this other weird reciprocal form of overconfidence, which is the overconfidence that it's definitely not going to happen, you know, in 20 years. I understand why some might think it's not that likely to happen in 20 years, but I have a harder time understanding why someone would assign it essentially zero probability and how they would come to assign it zero probability.

ROMAN: But if they do, 20 years is a tiny amount of time for doing such complex research. I mean, the problem is harder than the problem of actually designing AI, so you probably need at least that much time to even have a chance.

SPENCER: I think that's a really interesting point, which is that we have these two different goals. Most of the research in AI is about either engineering-type stuff, like applying AI successfully to a particular application, or AI research, where they're trying to push the state of the art in terms of getting it to do stuff that it hasn't done before. A tiny fraction of this research is on risks and how to prevent them. So it seems like even if we thought it was 50 years away, it wouldn't be bad to start formulating a really robust area of AI risk research to try to make sure that in 50 years, we haven't just started studying this. We actually have really good theories about how to do this.

ROMAN: I am in complete agreement. I actually think it may be much sooner than 50 years. There are good reasons to look into it.

SPENCER: On that issue, I just wanted to mention that I was looking today at some numbers. These were estimated in 2018, so they're a little old, but at the time, there was an estimate that AI compute was doubling every 3.4 months, which is significantly faster than Moore's law. It's shocking how rapidly computation is increasing for AI. Of course, throwing more computation isn't guaranteed to produce better outcomes, but one of the really fascinating lessons of the last five years is that more compute actually goes really far. If you look at the difference between early natural language processing systems versus GPT-2 versus GPT-3, it's remarkable how much compute was able to push things forward. Any thoughts on that?

ROMAN: I agree and I don't see any signs of it slowing down. You keep making bigger models. If you have enough compute and sufficient data, I think you're going to keep making good progress for a while.

SPENCER: That sort of leads to this difficult question of where does the progress stop? Can you just throw more and more compute until you have something that's essentially super intelligent, or do you hit a whole bunch of roadblocks? At some point, do you really need new intellectual breakthroughs to make further progress? I'm curious to hear your thoughts on that.

ROMAN: Nobody knows for sure. Given that we are making very good progress, and this is kind of modeled on the human brain, human neural networks, it's very plausible that it will continue working, at least to the scale of the human brain. I think those models are still orders of magnitude smaller, so there is lots of room for improvement, but when they are equivalent to a biological model in size, they actually perform way better, way faster. So at that point, we might see super intelligent performance right away.

SPENCER: Let's change topics now and talk about consciousness, because sometimes the question of whether something is conscious gets wrapped up in this discussion of building AI. The first thing I want to say about this is that whether an AI system that's really intelligent actually is conscious or not, and by that, I mean has internal experiences, is able to feel something, or there's something that it's like to be that AI, doesn't really have much bearing, I think, on the risk question, because really the risk question is around what it's capable of doing and whether we're capable of controlling it, not whether it has internal experiences, and not whether there's something that it's like to be it. Do you agree with that framing?

ROMAN: Yes, I definitely agree with it. I kind of joke about it. If you have this robot chasing you about to kill you, do you care if it feels anything at the moment, right?

SPENCER: So, okay, we can put that on the side and say, okay, consciousness isn't that related to the risk question. As far as we know. Obviously, consciousness is a great mystery. But as I understand it, you believe that it might be possible to detect consciousness. I'm curious to hear about your thoughts on that.

ROMAN: All right, so it's definitely impossible to experience what someone else is experiencing. What is it like to be a bat? What is it like to be you? But it seems like it might be possible to check if you are actually having an experience of some kind. So if I can encode some sort of pattern and present it to you and then give you a multiple-choice question about it. For example, with optical illusions, if you ever had a chance to experience a very cool optical illusion, something happens which is not a part of it or part of you; you kind of get this artifact. Something is rotating, changing colors, something happens which is not easily predictable before you experience the illusion. If you can design new illusions like that and give a multiple choice to an agent, whether it is a biological agent or an artificial agent, asking them, from those choices, what are you experiencing? If they can correctly answer that question multiple times, let's say 100 illusions, and they get all of them right, I have no choice but to conclude that they actually experienced it in some way. It may be completely different for them, but they definitely have some sort of internal representation for that experience.

SPENCER: I'm trying to grapple with that because it seems to me, just a priori, that consciousness is extremely hard to detect, partly because we understand it so poorly. But if I just try to react to that specific idea, how can we tell the difference between actually having a true internal experience of that optical illusion versus when that sensory input goes through its full system, it behaves as though the optical illusion occurred, but it never had an internal experience? Much like these adversarial examples, where you can give an input that to us looks just like a dog, and you can get a neural net to misclassify it as a cat just by adding some carefully crafted noise. That's almost like an optical illusion for a neural net, in some sense.

ROMAN: Exactly, and that's my argument. They are experiencing that cat, and you don't have the same experience when you look at it. I'm not saying you're going to have the same experiences, but you can tell whether it had a cat experience or not.

SPENCER: Are you suggesting, then, that neural nets today are actually conscious?

ROMAN: That is part of the argument. We obviously have non-human level consciousness. We are much more complex. We have rudimentary consciousness. I think consciousness is kind of a side effect of computation, a side effect of intelligent processing. The more complex they get, the more complex qualities they can perceive. At this point, it's something as trivial as an optical illusion. But as they get more complex, they can have multimodal experiences and even go beyond what human quality is.

SPENCER: Yes see, that's funny. I really didn't expect you to take that branch. I was actually trying to make a counterargument against your view, and you're like, nope, I'll bite that bullet.

ROMAN: Well, it's in a paper. It's too late for me to say anything else. I published it.

SPENCER: Interesting. So how confident would you say that AI systems currently experience some form of consciousness? Because I think I would put quite a low probability on it. That being said, it's one of those probabilities that has extreme uncertainty around it because I don't even know how to think about it.

ROMAN: If you agree with my proposed definition. There are experiments where systems, artificial neural networks, have experienced optical illusions in the same way as humans, without being programmed to do so. So I think there is a very high chance that they are, in fact, experiencing something very simple, very trivial, but I think it's detectable.

SPENCER: But just to push back on that for a second, most neural networks today are essentially functions. They map input to output, and yes, they're very complicated functions, but essentially they're just functions. Any function is sort of deterministic; in some sense, if you put in a certain input, it'll give a certain output. Optical illusions, like an adversarial example, that to us looks like a dog and to AI looks like a cat, isn't that just saying that that function has an input that, to a human, happens to look like a dog, but according to this function gets mapped to a cat? I'm not sure how much we can really read into that.

ROMAN: So it's definitely not a trivial claim. We can switch this around and go to humans and say, okay, so neurons just pass electricity around, setting electricity on or off. What is consciousness in that system? You're still recognizing cats and dogs. You have feelings, salty, sweet; at the end of the day, it's all electricity on and off. Is that consciousness?

SPENCER: Right. And I think you can make the argument, isn't the human just some kind of function that has a state, or something like that, and the state changes over time? I think part of the answer is we don't know. We don't fully understand what a human is exactly. This seems to me like one of those weird and deep mysteries, which is how do you take this physical view of the universe that we describe very successfully with our physical laws and then mash it with the fact that we know we have conscious experience, and yet the physical laws don't have conscious experience anywhere in them? I guess I just see that as a big mystery that I don't feel like I have much understanding of.

ROMAN: I agree it's one of the hardest problems we know about, but I think at least at the level of trying to detect some trivial experiences that may be a useful test. It may not perfectly map to what we're looking for, but it gives you some idea. If I know that this agent is experiencing the same thing as I am, perhaps in a completely different way, that's interesting.

SPENCER: I applaud you for your ambition, because I think people have to try ideas and see how they go. I'm not necessarily persuaded by that particular one, but I do think that that kind of thinking is exactly what we need to try to make progress on these really thorny topics.

[promo]

SPENCER: Let's go to the next topic, which is the question of, are humans general intelligences? Maybe you want to say a little bit about what general intelligence is, and then your perspective on whether humans actually satisfy the definition.

ROMAN: Right. So we frequently talk about artificial general intelligence, and we equate it with humans, and we also call it human level sometimes, but I think there is a difference. We are general intelligences in the domain of human expertise. We are very good in our three-dimensional world, human society, and communication. But there are problems that artificial general intelligence could solve that humans cannot. I think AGI is actually a greater set of problem-solving capabilities than any human.

SPENCER: Are you able to point to any examples?

ROMAN: Sure, but they are very hard for humans to understand by definition. I think one good example comes from Yann LeCun. He gives you this situation where you have your optic nerve connecting the outside world and your processing units. What if it's cut and all the wires connected, not sequentially, but to random pixels? Would you still be able to do optical processing, visual recognition? I think his argument is that you won't be able to do it. You won't be able to reassemble the image. Whereas for AI, it doesn't matter where the pixels are in that regard.

SPENCER: Right, because the neural net can just essentially update its weights to adjust for the fact that the pixels are not in the right order. Is that the idea?

ROMAN: Some of the examples I cite in my paper on that subject have to do with interesting biometric properties. For example, an AGI can look at DNA and from that information extract things like, what does your face look like, or what does your voice sound like, whereas, as far as I know, humans are not capable of doing such things. Now, those are kind of borderline examples where we still can understand them; we just can't do that. But there is a whole infinite universe of problems where an AGI can do it, but we can't. I think it's interesting to note that.

SPENCER: It is really interesting. You know, I guess one way to think about this is that there are certain types of tasks that maybe humans can do, like a very simple version of it, but because of our limited computational capacity and limitations of our brains, it seems that without severely altering ourselves or using tools, we'll never be able to do a more advanced version of it. An example of this, as a mathematician, I think about math and how a mathematician just can't keep 1000 things in their mind at one time. It's just not something that's possible. You could easily imagine math problems that actually require keeping 1000 things in your head at one time. Because math has this sort of unlimited level of complexity, it seems to be infinitely deep. You could imagine even the most intelligent being in the universe, even if they're way more intelligent than us, still having math problems that they struggle with or can't solve, right?

ROMAN: Yes, you have these different classes, with different automations, different machine classes based on the amount of memory you have, based on the amount of connections you can make. If you were modified to give you bigger memory, faster processing, you would be able to solve different problems, but you wouldn't be in a class of regular humans anymore.

SPENCER: So do you view that as the main limitation, not having enough computational speed and time, or do you view it as a more fundamental architectural limitation, that there are some things that humans couldn't do, even if we could speed up our minds and increase our memory?

ROMAN: It's definitely a limitation on humans. But from a safety point of view, I see it as even if we just get to AGI before we even get to superintelligence, that already will be able to do some things we can't even conceive of.

SPENCER: Can you distinguish between AGI and superintelligence? Just to be clear on that.

ROMAN: So traditionally, when people talk about AGI, they talk about human-level performance, right? It's as good as a human in all these domains. Or super intelligence is superior to us in all those domains to all humans. So just kind of next level, it can be argued, whatever, it's very easy to get there from AGI to superintelligence. Is it hard takeoff, soft takeoff, but they seem to be different in terms of capability. I'm pointing out that even AGI can already be super capable in many domains where humans do not perform well at all, and I don't mean traditional domains, obviously, mathematics, calculators. So that is super intelligent in the narrow sense I'm talking about in a general sense.

SPENCER: Right. One of the interesting phenomena we see is that very often we'll have AI worse than humans, much worse than humans at something, and then it will briefly be as good as humans, and then it just blows past us, right? There's a brief period where computers are equally good to humans at chess, and then it's just, okay, now the best agent in the world is a computer, and it seems like we'll forever be unless humans significantly modify themselves. So it seems like, at least in these narrow domains, the level of human intelligence is just this very thin region that just gets blown by very quickly.

ROMAN: Definitely. And for a few weeks or months, you have a hybrid team, human and computer as the best, and then humans become completely useless.

SPENCER: Really interesting. I also want to dig in a bit into different ways that a computer can be more intelligent than a human, at least hypothetically. A few that I think about are, one, it could be like a human mind, but operate at much higher speeds. I think the easiest way to think about this is just imagine that you had this capability, like, imagine that you could think 1,000 thoughts in the time it currently takes you to think one thought, or, better yet, read 1,000 books in the time it takes you to read one book. Or, if you had a fast enough keyboard or output device, imagine you could write 1,000 pages in the time it takes you to write one page, and just how much more effective you would be at doing things. And that's just one dimension of improved intelligence: speed. Another is just having more information, right? A human cannot take the whole internet and stick it into its brain, let alone Wikipedia, whereas Google search, in some sense, has most of the internet kind of already encapsulated. So you can imagine improving on a human mind just by allowing for much larger information capacity. A third one is parallelism. If you're working on a problem, you have one mental thread that can work on it, but imagine that you could spawn 10,000 copies of yourself that would all work on that problem in parallel, maybe, for example, trying different strategies, but all be super coordinated and sharing information with each other and all working together towards the same goal. It's sort of like you had your own personal army to achieve anything you wanted to do. Unlike faster speed, where you're doing things one after another, they could actually be all doing things in the world separately, in different places, for example, all working towards the same goal. So that's the third one: parallelism. The fourth one that I think about, which is kind of the hardest to talk about and understand, is the idea that potentially, you could have an AI that just simply thinks better. It's hard to define what that means. Imagine two people, and given the same information, one of them comes up with better strategies or more accurate conclusions, even though they have the same information. This comes back to models of how we make predictions accurately: Bayesianism, probabilistic thinking, and all these kinds of things. You could imagine that an AI algorithm could just be better one day at processing the same information as a human. So I've got these four different ways a machine could be better than a human in terms of intelligence. Are there other ones that you'd want to add to that list?

ROMAN: No, I think you captured the most important ones. I think Boston does a good job reviewing those. So we already basically have speed and we have information databases. The hard one is superior thinking, right? Higher IQ, higher intelligence. We definitely know it's the case. The difference between humans and animals, for example, is huge. There is some difference between individual humans as well. So it's easy to picture a software agent, which is standard deviations away from the smartest human, also probably the hardest one to design, to develop, right?

SPENCER: It's really interesting to think about what is the difference between Einstein and a random human. It seems in some real way, Einstein was much smarter, not necessarily smarter at everything. I'm sure there were things he was not good at, but there were some real ways that he was smarter than the average human. Imagine a machine that is 10 times further along that spectrum between the answer and the human, or 100 times. Another thing I just want to say about this is that thinking about those four ways of being more intelligent than a human, you know, being faster at processing, having more information, having parallel minds that can work on the same problem in different places at the same time, and then being better at thinking. The better thinking one, as you said, is kind of the hard one that a lot of effort is going into. The other ones, you kind of almost get for free with an AI, because if you figure out how to build the better thinking machine, well, presumably, as long as chips keep getting faster, even if it starts at the speed of a human mind, it will speed up beyond the human mind. And the more information, well, we already kind of know how to pack huge amounts of information into a system, whether it's a database or a neural net, and the parallel minds, that's just booting it up on multiple servers, right? So it's easy to underestimate AI's three advantages: the faster speed, more information, and parallel minds. But if one day it becomes comparable or better than humans at better thinking, then it just automatically wins on all four dimensions. I think that's sort of underappreciated.

ROMAN: Right. Definitely, there is an overhang of hardware capabilities we can rely on. But for now, we rely on those three to kind of present better AIs as they are today. Even if they're not as good at thinking, they are so much faster and have access to databases of knowledge that they become competitive in many things, like Jeopardy.

SPENCER: Let's switch topics again. I want to hear your thoughts about solving the value alignment problem with individual virtual universes. That sounds really interesting. Tell me about that.

ROMAN: We talked about how hard it is to figure out what individual humans want, and then how do we aggregate this over 8 billion humans? It doesn't seem solvable. If the functions are not compatible, you can average out, you can ignore someone, but none of those solutions are desirable, at least to the agent you're ignoring.

SPENCER: So basically, you're trying to aggregate over a huge number of preferences. People don't even know their own preferences, but even if they did, you still have this problem of how do you combine them all into one mega preference?

ROMAN: Right, If I somehow magically figured out exactly what you want and you were static and that you didn't change that much, I still would have no way of making you compatible with everyone else.

SPENCER: Got it.

ROMAN: So one possible solution, if we make as much progress in virtual reality as we do in intelligence, is that we can create complete universes, and then everyone can be assigned one universe, one virtual world, where whatever values they are interested in would be fulfilled 100%. Of course, you're going to have combinations. You can share universes if you wish. You can exchange them. But it kind of addresses this issue of you cannot have everyone get what they want in the same world.

SPENCER: It's interesting to think about this idea that if you had really accurate virtual reality that felt like you were really in that place, in many ways, it would be preferable to normal reality. Yet at the same time, I think a lot of people have an aversive reaction to that possibility of living in virtual reality. Why do you think that is?

ROMAN: Right now the quality is not the same. So you are asking, do you want to be in the real world or in this video game, if you couldn't tell the difference? One of them is easier to implement, cheaper in resources, and you literally get whatever you want. You can play any role. You can be president, you can be a slave, you can be anything, and you can't tell the difference. It would be very hard to argue that the same people who play video games right now, who go to the movies, who read fiction books, would not be interested.

SPENCER: Another thing I think people say about this is that the virtual world is not real. On one level, I understand that, and on another level, I don't buy that argument. Does it really matter if the thing you're looking at is real or if the thing you're looking at is simulated? At the end of the day, you can quote real stuff. What you're really seeing is just photons of light hitting your eyes and sending electrical signals. You're seeing this simulation that your brain creates that feels like, oh, there's a tree over there, compared to the virtual reality, where you know the tree over there is a simulated tree, but it's essentially producing the same experience. To me, if the benefit of looking at a tree is actually looking at it, in both cases, you have essentially the same experience. The way I relate to this argument about things not being real is that I and most people have a strong preference that our interactions with other people are real in the sense that we're really interacting with conscious agents that have feelings and things like this, and we're not just talking to philosophical zombies or algorithms that have no feelings or experiences. If I were to live in a virtual world, it would be very important to me that the other agents in it are actual, real conscious entities, and I'm not being deceived about that.

ROMAN: It's easy for me because I already said that I think any type of computational process generates rudimentary consciousness, so I think you can't avoid it. They would be conscious if that advanced. As to the reality of it all, let's say this is actually a good solution, and it was implemented, and we are right now in that virtual reality. You don't know that it's a simulation. You think it's the real world. How would that be different?

SPENCER: Right?. Basically, if it was sufficiently accurate, you wouldn't even know whether you're in it or not unless you've been told. That's an interesting flipping of the default, saying, oh, would you want to leave it? Let's say you found out this entire life you've lived is actually virtual reality. Would you choose to leave it?

ROMAN: Exactly. Do you want to go to the base reality? And in base reality, your life is much worse. Actually, you cannot afford all these nice things.

SPENCER: I think we're basically now describing The Matrix, except that there was malevolent intent in that case. But it's interesting to think about a variant of that, where, instead of malevolent intent, imagine there's benevolent intent, right? Imagine the AI created the simulations not to exploit humans, but to give them better lives. And then you, as a human, discover this, and you're like, Oh, wow. This is not the original base reality, this is a simulation that is supposed to be better than this reality. And let's suppose you actually really believe that it was done benevolently for your benefit. Would you, in that case, want to leave or would you want to stay there?

ROMAN: You have those options. You can shop around. You can see, oh, they design much better virtual reality here. Let me go check that out. Let me visit a friend. Let me completely change my universe. You have those options.

SPENCER: One of the things I think about with virtual reality is that it has so many advantages over physical reality. For example, you can fly, that's really cool. You can teleport, that's really cool. You can't be hurt, right? You could go skydiving, and you can't actually get injured. A lot of the value of it hinges on how accurate it is, because if it feels like a crappy video game, okay, well, skydiving in virtual reality kind of sucks, but if it doesn't feel like a crappy video game, if it feels totally realistic and convincing, then wow, it's like skydiving, but it's much cheaper, and I also can't be injured. Even though skydiving is not really that risky, there's some risk of injury and so on. It does seem like there are some really substantial advantages, if we think about it that way, especially if you imagine having control to some extent over the world that you live in, like you're talking about, where you can control various aspects of it that make it kind of your ideal world.

ROMAN: Right. And again, let's just ask a fundamental question. Do you think that technology will be developed? Is it possible to create virtual worlds with the same resolutions, the same experiences as the so-called real world? If the answer is yes, then the conclusion follows: nobody's going to permanently avoid it. You might have some interesting groups, like the Amish today, who don't use any technology, but most people will check it out.

SPENCER: Do you see any big downsides to having a kind of VR civilization?

ROMAN: Yes, it kind of pushes the control problem to the next level. Now, whoever controls the substrate on which all this is running controls everything, your whole universe. So you want to, again, figure out how to do substrate safety, and that is not any easier.

SPENCER: I see, because if an agent is in control of the world, it's not just that they control militaries. It's like they literally control reality from the point of view of VR, right?

ROMAN: Right. So what we addressed is what happens in the universe. You can now decide, and we don't have to negotiate, but everything else about control and who's in charge is still unsolved.

SPENCER: So let's go into our last topic, which is one of the things that I think you're known for, this idea that malevolent AI that's purposely designed to achieve what we might consider bad ends is actually potentially a really dangerous problem, and maybe it's been underestimated in its seriousness. So I'd love to hear your thoughts on that.

ROMAN: If you think about problems we worry about with AI, poor design, poor implementation, misaligned values, anything you can think of, all those same problems remain, but you also get this additional malevolent payload. So the same people who are designing computer viruses today would try to design superintelligence, social engineering, viruses, psychopaths, terrorists, all those malevolent agents would be very interested in taking even a well-designed AI and flipping a few bits and seeing what that does. And it's not obvious how we can prevent that threat from insiders on a team who get bribed or blackmailed. There are one or two publications kind of describing how bad it can be, but I haven't seen any proposals for actual workable solutions.

SPENCER: So are you more worried about rogue hacker groups or individuals that might design a virus today? Or are you more worried about large governmental groups that are doing this for military purposes or social control?

ROMAN: I don't have to pick; they are all very worrisome. They will attack in different ways and present different problems, but none of them are resolved.

SPENCER: I think one of the big challenges around AI as it gets smarter and smarter is that even if you had some really well-controlled systems by really benevolent actors, what if the source code leaks out, or it gets copied, or it's open source, and then other groups just use it for really bad purposes? Or maybe they don't purposely use it for really bad purposes; maybe they just let it get out of control because they don't control it properly, right? So there's a dynamic that's really worrisome. We don't just have to worry about the best uses of it. We have to worry about all uses of it, because even the outlier that's using it for malevolent purposes could actually contain really serious consequences.

ROMAN: Absolutely. And again, when it's a mistake, an accident, a bug in your code, you can fix it. You can come up with a better algorithm. If I'm doing it on purpose, what can you do to prevent that?

SPENCER: And you think about groups that might, let's say, because of really extreme religious views, want to commit an act of terrorism, right? What would that look like in a world where we had super intelligent AI and people could actually build them? What would terrorism look like? I mean, have you thought through more concrete scenarios? Because I think it is still a little abstract; it might be helpful to think about what could actually go wrong.

ROMAN: So it depends on the capabilities of AI. Are we talking about narrow systems today? There is a lot of literature on how you can misuse those for different purposes. If we're talking about human-level intelligence, you can definitely weaponize that. And with superintelligence, anything goes. You cannot predict it post-singularity point.

SPENCER: So let's quickly talk about some of the current uses of current AI technology that are malevolent. And then let's go up the chain. So, yeah, what kind of bad cases are being used today?

ROMAN: So three little things: deep fakes, you can produce videos, audio, pictures to deceive people, manipulate democracies.

SPENCER: To pretend that someone was having a conversation they weren't really having, and that kind of thing?

ROMAN: Right. A video is released the day before the election with Biden doing some unspeakable things, and it crashes his numbers. The next day it's shown that statistically, it's very unlikely that it's a real video, but it's too late at that point, things like that.

SPENCER: Got it. What are some other current uses of AI that you would point to?

ROMAN: I saw some cool ones where a deep fake of a CEO's voice was used to call a company and ask that a certain amount of money be transferred to the bad guys.

SPENCER: I see. So it's like, "Hey, this is Bob, the CEO. I really need you to transfer this money." And it really sounds like him, and it's calling the number that he would call, but it's just not him, right?

ROMAN: Right? You're limited by your imagination right now. You basically have this tool to fake audio and video interactions. What type of social engineering attacks you can come up with is up to you.

SPENCER: So now let's imagine that AI has advanced significantly further, and we're getting into the territory of bordering on human-level AI in various domains. What do you start to see as new threats that are coming into existence?

ROMAN: Any crime humans do, you can automate. If you have human-level AI, whatever people are doing, any type of manipulation, stock market hacking, it doesn't matter. You can automate it and do it a million times.

SPENCER: The idea, going back to this idea of an AI that thinks much faster than us, imagine a computer hacker that was about as good as a normal human hacker but could do 1,000 actions in the time a normal human takes to do one action. How incredibly effective they could be compared to today's modern hacker.

ROMAN: The main difference would be that there is no deterrent, no punishment. Right now, many people don't commit crimes because they're scared of getting caught. If I can just produce additional pieces of software to do it for me, there is no responsibility. You can't trace it.

SPENCER: Because it would be a lot like designing a computer virus today, right?

ROMAN: Exactly. Just the capabilities would be so different. The types of crimes you can commit would be completely different.

SPENCER: So imagine a computer virus, except that it actually can do intelligent things. It can go into your banking system, navigate it, and steal all your money without it having to be pre-programmed, because it's actually smart enough to do what a malevolent human would do if they could somehow compromise your computer system.

ROMAN: Right. And a lot of unethical things which humans choose not to do can be done that way, so you have less direct contact with the victim. It would not be good.

SPENCER: Got it. So now let's start talking about going beyond human level. What are some of the nightmare scenarios you see in terms of malevolent uses of superintelligence?

ROMAN: So again, it's very hard to predict what the system is capable of, and I don't think it can be controlled. Even the bad guys would have a hard time telling it what to do, good or bad. Assuming you can solve those problems, then you have some crazy actors and terrorists. You go from existential risks where they can kill you, to suffering risks where torture becomes a possibility. And again, you are limited by what superintelligence can imagine in that regard.

SPENCER: Right. So basically, using super intelligent AI to fully control another person, or even potentially control large groups of people, is that what you're talking about?

ROMAN: I cannot possibly predict what might be done by a much, much, much smarter agent. My worst nightmares may not be clever enough. It's also probably better not to scare your listeners.

SPENCER: Totally. I appreciate that. When I think about these kinds of risks from superintelligence, often you'll hear someone give one example. But I think it's worth keeping in mind that those examples are just the best that you could think of in five minutes as a non-superintelligence, right?

ROMAN: Exactly.

SPENCER: I still think they're useful because they at least get you to begin to think about what is possible. Why is superintelligence dangerous? One of the examples I come back to is if someone really controlled a superintelligence. If the best investors, like Warren Buffett, can make billions of dollars investing, and this thing really was more intelligent than a human in all relevant capacities, then presumably it could at least have a reasonable shot at making way more money than that. Then you're already talking about not just being in possession of superintelligence, but being in possession of superintelligence plus billions of dollars, right? You've just bootstrapped from superintelligence to superintelligence and sort of unlimited money. Then it's like, okay, if you had billions of dollars and a superintelligence under your control, presumably good hacker teams are able to, right now, take down large corporate websites and hack into government systems. Now you're like, with the superintelligence and billions of dollars, surely you could do at least as well as them, probably a lot better. Now you're talking about whoever controls that being able to hack world governments, hack large corporations, and so on. You can just get this continual bootstrapping effect of building on more and more scary things on top. That's just a human thing for five minutes about what might be possible. A superintelligence presumably could come up with much cleverer strategies than what we could.

ROMAN: Right. We simply cannot predict what would happen. There are unknown unknowns; a dog would not be able to predict what you can come up with. It's outside of our intelligence, within the general sphere of human expertise.

SPENCER: So if someone is concerned about these potential future scenarios, what do you think the best thing for them to do is?

ROMAN: It depends on their position in life. Are they a researcher? Are they just a regular person? It depends on what options they have.

SPENCER: Let's talk about a few options. So imagine, first of all, that they're already an academic who does computer science or AI research.

ROMAN: I always recommend, once you get tenure, start working on the most important problem you can help with. It doesn't even have to be AI safety. There are lots of other interesting problems, life extension, for example, things like that. So many people I see have the option of choosing what they work on, and they choose something very safe, something they can do without having tenure.

SPENCER: Is that just due to inertia, or are they just going with their interests, and they were already working on something for a long time?

ROMAN: It's hard to tell. Maybe it's just the prestige of being in a well-recognized area. There are lots of opportunities to get funding and publish. Maybe it never occurred to them that that's one of the main benefits of academia.

SPENCER: Now let's talk about someone who's not in academia, or at least not in computer science or AI. What could they think about doing? One thing I would recommend is reading the book Superintelligence by Bostrom, which is a really deep dive into a lot of these topics. But what are some other things that people could do?

ROMAN: So educate yourself, and then try to make a difference where you can. Even voting for people who might have a better scientific understanding is a good idea. Try to elect some engineers and scientists, as opposed to lawyers.

SPENCER: So I'm in the camp that any major risk to civilization we should take extremely seriously, and that we as a species seem to way underinvest in these kinds of risks. I put AI in that category; I don't think anyone knows what's going to happen. I think it's incredibly hard to predict, but I think it's very hard to rule out at least the possibility that it could go really badly, and it's also really hard to know when that could occur. So that's why I take it very seriously, not because I know it's going to go badly, but because I don't know, and I think it's one of the things that could go really badly. I think there are other things that I put in that category as well. We've recently seen the horrors of the COVID pandemic. But the fact is, there could be much worse pandemics than that. There's nothing that says this is the worst pandemic that's going to happen in the next 100 years. It could be one that's 20 times more fatal. So personally, I'm a big fan of people taking all these kinds of issues seriously, including AI, and I just wanted to thank you for coming on and talking about what I think is a very important and interesting subject.

ROMAN: My pleasure. Thanks for inviting me.

[outro]

Staff

Spencer Greenberg — Host / Director
Josh Castle — Producer
Ryan Kessler — Audio Engineer
Uri Bram — Factotum
Jennifer Vanderhoof — Transcriptionist

Music

Affiliates

Click here to return to the list of all episodes.

CLEARER THINKING

Episode 031: Superintelligence and Consciousness (with Roman Yampolskiy)

Contact Us