August 17, 2023
Can we really deeply change who we are? Can we choose our preferences, intrinsic values, or personality more generally? What are some interventions people might use to make big changes in their lives? Why might it be harder to be a generalist than a specialist? What are some of the most well-known "findings" from the social sciences that have failed to replicate? Do some replications go too far? Should we just let Twitter users take over the peer-review process? Why hasn't forecasting made major inroads into (e.g.) government yet? Why does it seem like companies sometimes commission forecasts and then ignore them? How worried should we be about deepfakes?
Gavin Leech cofounded the consultancy Arb Research. He's also a PhD candidate in AI at the University of Bristol, a head of camp at the European Summer Programme on Rationality, and a blogger at gleech.org. He's internet famous for collecting hundreds of failed replications in psychology and for having processed most of Isaac Asimov's nonfiction of the mid-twentieth-century to score his predictive performance.
SPENCER: Gavin, welcome.
GAVIN: Happy to be here.
SPENCER: I met you pretty recently and it immediately struck me that you're a polymathic sort of person who's just thinking about all kinds of different interesting ideas. I right away thought, "Oh, this person would be a great podcast guest," so I'm really excited for this conversation. Let's start with an interesting topic, one that people don't talk about very much, which is, can you choose who you are or is who you are essentially fixed? Are we just dealt some genes and maybe some childhood experiences, and that just determines who we are? Let's start there.
GAVIN: Yeah, exactly. It's a hand-waving question for two reasons. One is that I'm not a psychologist. The other one is that, even within psychology or the psychology of adjustment, it's not really that well covered; hence, why your organization, Spark Wave, is able to do so many new and cool things. It interests me a lot. Adults often talk and act as if they were fixed already. But I've got a few really interesting case studies of people changing quite radically, and this suggests it's interesting in itself as a scientific question: is personality fixed? Are preferences fixed? Probably not. And then also, just in terms of how you live your life, it's a vital question as well. One example — most people can see that really dramatic interventions work — when I went to uni, I intentionally left my computer at home with the idea that I would go cold turkey on video games. I would be forced to go to the library and I'd read more and stuff. It's easy to see why such a thing would have large effects on your behavior, at least, if not your personality and self-concept.
SPENCER: What happened to you when you did that?
GAVIN: Indeed, I didn't play video games for five years which was, at the time, completely unprecedented. I just spent large amounts of time in the library, mostly doing the same kind of wasting time — not focused study, just more Wikipedia walks and reading random stuff — sort of launched this intellectual exploration, making it part of my life. And yeah, it worked extremely well.
SPENCER: Yeah, I have this long list I collect of interventions that are occasionally completely life- changing, and the funny thing about them is they're very idiosyncratic. For any given person, they probably won't change your life. But these are the set of things where, occasionally someone does them and they're like, "Holy shit, my life will never be the same." Examples would include doing a ten-day meditation retreat, or going to Burning Man, or having a near-death experience (not that you'd want to have that on purpose, but you know).
GAVIN: Right. One nice lens I'd like to use on this whole topic as well is hobbies and preferences and media, and what you consume. When I was a young man, when I went to university, I came up with this argument — extremely hand wavy — that says something like: what you like constitutes a decent chunk of who you are. Your preferences are adaptive. If you do stuff, sometimes you grow to like it. You can choose what you do; therefore, in some sense, you can sort of choose part of who you are, just if your preferences are, in fact, adaptive based on your behavior. You can choose your behavior. That's one hand wavy argument which I used at the time.
SPENCER: I like that. It reminds me of how I think about wine. I don't like the taste of wine. And I thought, "Okay, I know that I could come to like the taste of wine. If I just drink it regularly, I definitely would eventually at least grow in my appreciation of it." And I thought about, well, would I want to do that? I'm like, "Absolutely not." It's super expensive. It runs a risk (at least a slight risk) of alcoholism and so on. And I'm like, "Of all the hobbies I could choose, that sounds like a terrible one to choose." So then I've kind of just avoided learning to like it.
GAVIN: Yeah. That's an example where it's health conflicting with your other preferences. But I think a lot of the time with this, it's like an identity problem. You view yourself as the sort of person who doesn't do that thing. I didn't exercise throughout my teens and 20s, and now I'm really into weightlifting. And part of this was just shifting the idea of myself as an intellectual who rises above the body (the merely material or something).
SPENCER: How did you do that at MD shift?
GAVIN: Well, in my case, I needed a pretext. So I looked up all of the amazing evidence and anecdotes about the psychological benefits of weightlifting and the other health benefits, particularly your bones and joints and stuff. So I had this pretext.
SPENCER: The nerdiest possible way to convince yourself to weightlift.
GAVIN: Yes, I know, I know. [laughs] And then I found a friend, who was also a massive nerd and nerded it out with me, so yeah, eased myself into it. And now my self-concept is larger or something. It includes both the intellect and the body.
SPENCER: It's funny how it seems like sometimes what we benefit from most are the things most outside of our sense of what we would do. Like nerds probably could benefit a lot from more exercise, whereas a jock, more exercise in the margin is probably not gonna do that much for them, just as an example. Sometimes I think about things like psychedelics, where the people who do lots of psychedelics, you're like, "Really, dude, you don't need any more psychedelics."
GAVIN: Yeah, this actually interests me. I looked for studies — How many hobbies do people in general have? How long do they stick at them? Are they lifelong projects? — these sorts of things. I tried to find some actual survey data; I couldn't really find much. But my impression is that most people have two or three and just stick to them. Maybe in midlife, they pick up carpentry or they pick up sailing, or they pick up one thing when they retire or something. I'm sure there is some research on this somewhere but I couldn't really find it. And yes, it seems like people do double down on their single comparative advantage rather than trying to branch out.
SPENCER: We've talked about that one way to define yourself or choose who you are, is by trying to expand your sense of identity, or maybe purposely doing things that are sort of against what your identity says. I remember an anecdote, I think it was Eliezer Yudkowsky that said that he wanted to try drinking alcohol once (and I don't know if he got drunk or not) just because he had never done it. And he's like, "Well, I don't want to not do it just because I'd never done it. That's a really stupid reason to never do something." And I think there's something to that. But what are some other ways that we can choose who we are besides playing with a sense of identity?
GAVIN: Yeah, it's a good one. Another one is the social aspect. If there's a group of people you want to understand, then go and hang out with them, and see what they do and just try what they do. One of the reasons I love this meta hobby of trying things out and trying to like more things and trying to get into more things, is that it helps you understand, massively more people and massively more types of people. I don't know which comes first, whether you go and hang out with them, or you get interested and then you go out and hang out with them.
SPENCER: Yeah, it feels like one of the strongest intervention forces is, put yourself in a social group that really cares about a certain thing X, and then there's a good chance that you're gonna start caring about it more and that you're going to change your behavior so that you'll be more that way. I've seen this happen in rational circles, where people become friends with a bunch of rationalists and then the way they talk about things starts changing rapidly. They start using probabilities, [laughs] they start caveating what they're saying, and so on.
GAVIN: Yes, and there, we probably need to put a little bit of cold water, which is that this can be used against you. Famously, high-demand groups or cults do this kind of thing in order to change your personality against your own interests. But all else equal — filtering out people who are preying on you — it's better to like more things. It's better to be able to speak more cultural languages or something.
SPENCER: But that also speaks to the importance of choosing the group wisely. Because some groups are actually just [inaudible] ones, and other groups are going to bring out maybe new sides of you that you want to cultivate, you want to become more like.
SPENCER: You mentioned personality. I'm curious about that because my understanding is that, with the Big Five personality test, for example, it gets pretty stable in adulthood. Let's say you were to take a Big Five personality test when you're 25, and then you take it again when you're 26. Those results will typically be pretty strongly correlated, maybe something like 0.6 to 0.7 correlation between the two results, something along that kind of level.
SPENCER: But that doesn't mean that we can't change at all, right? First of all, a 0.7 correlation is not a one correlation, it's not a perfect correlation. Some of that's gonna be noise, but some of it will be drift. And we also know that throughout the lifespan, personality tends to change somewhat. I'm curious about that.
GAVIN: Yeah, the big point I'd make here is that this sample will contain very few people who've done the sort of dramatic interventions I described earlier: not having a computer, not having a phone, intentionally depriving yourself or immersing yourself in this or that. Things happen to people, particularly over the 60-year course of some of the really cool longitudinal studies. But I imagine most people in the sample producing that 0.7 just won't be mixing up very much.
SPENCER: Right. So even if, on average, people's personalities don't change very much, it doesn't mean that you couldn't change a lot more if you're really strategic about it.
GAVIN: Yeah, the big age effect that I know from the big 63-year follow-up study was, there's this narcissism construct, which is just: Do you think of others? Are you greedy? Do you get out of the way? This kind of thing. There's an entire standard deviation decrease in that between 20 and 65.
SPENCER: And is that over the lifespan? Is that the same people being tracked or is that comparing generations?
GAVIN: Exactly. It's an age effect, yeah. So that's quite striking. I'm not aware of too many things of that size. In general, yes, the Big Five personality, nice, mostly stable in the sample of people going through an ordinary amount of turmoil in their life and an ordinary amount of intervention.
SPENCER: What about neuroticism? I've heard that that Big Five trait tends to go down as people age.
GAVIN: I vaguely remember that, but I don't recall really. And then maybe, along with the narcissism thing I mentioned, that's probably related to agreeableness, which I think also goes up a bit. Yeah, I'm not sure about neurosis.
SPENCER: Looking at one paper of how the Big Five changes across the lifespan, there's a paper, "Age Differences in the Big Five Across the Lifespan: Evidence from Two National Samples," and it's interesting. It finds that extraversion and openness were negatively associated with age, so that means they would go down with age, whereas agreeableness was positively associated with age, it would go up. And then I think they found inconsistent results for neuroticism. I think that, in one data set, it went up and, in one, it went down, but they were slight effects. Are there any other big interventions you'd point to for someone who really wants to change themselves and wants to get a much bigger change than people typically do?
GAVIN: I don't know. The only strong examples I have are mostly the hobbies stuff and the interests stuff. But one friend who was, again, a lifelong nerd, and is now a really huge football fan, soccer, and knows everything about every team at Liverpool FC, team lineup of the last 100 years, and so on. Now, as Scott Alexander noted recently, this kind of sports fandom looks a lot like nerdiness. It just happens to be on a topic which we don't associate with the nerdy sociological group or something. Another anecdotal one might be something like starting a business or starting any incredibly challenging project which demands everything you've got and requires you to perform all kinds of unprecedented tasks and these sorts of things. Yeah, I don't want to foreground anything in particular because they're all just anecdotal. It's all just what makes sense.
SPENCER: Yeah, I agree with that example though; I think being a founder really pushes you in all kinds of directions. It forces you to do all kinds of things you've never done before. And some of them are just stressful, but some of them are stressful and also meaningful and change you as a person and give you greater feelings of capability and agency and so on. That seems like a good one. Another one that comes to mind is therapy. It doesn't work for everyone but some people I think are radically improved by it, if they get the right therapist and they're working on the right problem. For someone with addiction, it can literally just completely transform them if they can get over an addiction. But even things like anxiety disorder, where they're avoiding a lot of things, or depressive disorder, where they tend to just sit at home alone all day long and feel shitty, it can be just absolutely life-changing. Did you have any other subcategories of how we can self-invent?
GAVIN: Yeah, another one which interests me is research interests. If you're an academic, you are more or less required to list two or three hyper specialized little niches that you work in, and this forms your position in the market. And departments have basically job specs: we need a natural language processing guy, we need a Bayesian filtering guy, these kinds of things. And the academics I've seen, all the incentives lie in the direction of picking a couple of things, and then publishing them all the time and keeping up on these incredibly small, focused areas, and then you work on it till you die. [laughs] And this has always unnerved me a little bit because I can't really say what my two or three research interests are. But we've been speaking about psychology for 15 minutes; I have no particular psychology background, it just interests me. My field is nominally machine learning and I'm quite interested in machine learning, but I definitely don't think about it every day. I definitely don't think there's anything that I think about every day. So the pointer here is something like generalists or polymaths or dilettantes — as I'd prefer to call myself — and the academic incentives pull against such people. But the question is, is there anything distinctively valuable? Is it a shame that the incentives point away from such people?
SPENCER: Yeah, it's interesting, because it does seem like there's strong pressure to go down a standard path, like if people are funding a certain thing, or your thesis advisor works on a certain thing or whatever. Whereas if you're dabbling in different fields, maybe it's harder to find a path to latch yourself onto. On the other hand, it does seem that a lot of invention or discovery comes from mixing things, bringing an idea from one thing into another, so it may give you a creative advantage to be that way.
GAVIN: Yeah, the standard won't be like cross-pollination, bring a method from field A into field B, and that's how you justify your existence in academic terms or something. Another shortcut you can do, another cheat code I found, is that there are these fields which I call the methods fields: statistics and mathematics and philosophy and computer science. Just knowing these gives you a license to work in a massive variety of things because you know the methods, you know the actual underlying principles that the object level scientists, object level researchers, are working in. John Tukey, the statistician, has this great line, which is, "Be a statistician. It lets you play in everyone's else's backyard." That is, in fact, the strategy I've taken. I have a good grasp of stats and a good grasp of programming, mathematical modeling, and this just makes you quite welcome in a lot of different fields. So that's like a weak sense of generalist in which you understand a very common method and then you're welcome.
SPENCER: I'm really drawn to those areas as well, the areas that give you powerful methods that can then be cross applied in many, many different things. Machine learning is a great example, or even certain kinds of thinking strategies could be an example of that, like critical thinking is one of the most broad kinds of tools.
GAVIN: We've got a professional philosopher on staff at my company who just reliably spots sloppy thinking. I don't know. I've got a philosophy degree. I'm relatively cynical about the field in some ways. But going through a philosophy PhD does indeed sensitize you to sloppy arguments, to ambiguities, to overloaded words and things. So I hope that my company's outputs are slightly freer of sloppiness as a result.
SPENCER: Let's change topics now. Let's jump to a topic that I know is the intersection of our interests, which is the replication crisis. And I know that you've had some kind of personal experience related to the replication crisis. Do you want to tell us about that?
GAVIN: If I'm internet famous for anything, it's this really basic list of replication failures in psychology I put together. I was procrastinating on my qualifying exams for my PhD one January so I just put together all of the famous things that I'd heard, in fact, did not replicate, or the effect was much smaller than we thought. Famously, things like the age priming where you say words related to old age and then people are supposed to walk slower down a hallway subsequently to hearing these words and stuff. Or ego depletion, the idea that making decisions tires you out and has a muscle-like mechanism where glucose replenishes willpower and things like this. And then another one is mindset, like grit and these sorts of things, where it turns out that we can find the effect, but it's 20 times smaller than originally claimed and things like that.
SPENCER: Can you unpack that last one? Are you talking about grit's predictive ability, like people accomplishing their goals? Or what do you mean?
GAVIN: Exactly. There's two things. One is, indeed, let's define this construct about growth mindset. There's two versions of this effect. One is just the predictive ability: if you have the belief that people can improve — which is closely related to what we were talking about earlier — then what does that predict about your academic performance later? And then the second one is this claim that going to a class — going into one one-hour session about the importance of growth mindset and the truth of growth mindset — what effect does that have on your academic performance? And certainly, the second turned out to be 20 times smaller than we thought when we did this enormous — I think tens of thousands, if not a hundred thousand — participant study.
SPENCER: Yeah, growth mindset is such an interesting example. It's an intuitive concept that, if you believe that, when you fail at something, that just means you need to learn more and work harder, to eventually succeed at it, versus if you believe that, when you fail, it means you're dumb, and you're never gonna get it. Intuitively, it makes a lot of sense that the first would be a more helpful coping strategy. And then a bunch of evidence came out that this was a really useful thing and it predicted performance. And then they did the intervention studies; they were trying to teach people growth mindset, and they still do seem to find some effect but, yeah, just really, really small. But then now you get into a debate, okay, so it's a really small effect but maybe it's still worth it. What is the average effect of an hour or two in school? Probably essentially nothing, right? So maybe, compared to other modules in school, it's actually a good use of time.
GAVIN: I think the problem here is just that there are thousands of these things. The prior on growth mindset being a good healthy thing for a young person to have, I agree, it's pretty high, it's probably quite harmless. If you're literally just taking an hour out of school to talk about it, the cost seems pretty low. The problem comes in more generally, if we don't have a filter, if we just allow intuitive priors to determine these things, well, then the floodgates are open. There are hundreds and hundreds of interventions, pseudo-interventions, ideas that we could be throwing, and this could quickly eat up large portions of your education with questionable effects.
SPENCER: I thought for growth mindset though, we have more than priors. I thought we had pretty good evidence that it does have an effect. It's just that the effect size is pretty small. Yeah, I think you had the Cohen's D at 0.08.
GAVIN: That is right.
SPENCER: And so in non-math speak, it's like saying, let's say you were talking about academic performance and people's standard deviation is usually one, let's say, in how much they differ in their academic performance, then this is like eight percent of that typical variation is being moved by this intervention. But if it's only an hour, I still feel like that's pretty good. If it's an hour. If it takes two weeks, then maybe it's more questionable.
SPENCER: What was your overall experience putting these together? Did it change the way you think about science? And I'm also curious how people reacted to it.
GAVIN: Yeah, not necessarily putting together the list, but certainly, the replication crisis was quite hard for me because, throughout my teens and early 20s, I was like this classic science fan type — reading deGrasse Tyson and Carl Sagan and Dawkins and so on, and indeed, Kahneman and Daniel Gilbert and people — really quite devoted to pop science, and did base some of my life decisions on these kinds of things. So it was quite difficult. Now, realizing that all science — and social sciences maybe in particular — are just really difficult, and that the methods we're using aren't really up to the task, or we haven't powered the experiments properly, or we're literally misusing the null hypothesis significance testing paradigm, or these sorts of things, that's a really crucial part of becoming a scientist. You need to understand just how insufficient the average standards are in your fields. And so it's good that I know, but yeah, it was quite personally difficult. I was very pleased by the reaction. In fact, my original list has been taken up by this volunteer group of junior psychologists and PhDs who have expanded it enormously. The original list was like 60, and we now have 400 effects. And it looks like we'll be keeping this updated, more or less indefinitely, which is just amazing. And so I definitely don't want this to become like the classic punching bag thing, because there are many, many, many people in psychology who are working incredibly hard, without necessarily many rewards, to try and clean things up.
SPENCER: That's really awesome that it's become a crowdsourced thing now, to build this out and keep it up to date. I love that. It's funny when I go through the list, I sometimes disagree with the analysis. I usually agree with it, but I sometimes disagree. I guess I have this feeling that sometimes replicators go a little too far. I've noticed this phenomenon where there'll be something that I think is real, studies will find it's real; replications will fail to find it and then people will conclude it's not real when I think in fact, it is real but has a small effect size. And I think this is pretty easy to do because, unless your replication is really high-powered, it can be really easy to miss a small effect size.
GAVIN: Yeah, a classic mistake is, 'it didn't replicate, therefore, it's not there,' where it didn't replicate is often just not reaching significance, [laughs] which is literally an abuse of the method. So yeah, replications which just default to null just return us to where we were before, we fall back to our prior. Yeah, I agree that certainly, exactly the same misinterpretations that applied to the original studies also sometimes happen in the replications.
SPENCER: Yeah, and I have different opinions of different ones. I think a lot of the social priming stuff, priming people with words related to being old and then watching how fast they move, I think those just don't exist, they're just not effects. But then take something like power posing where, originally they showed power posing works, then it gets critiqued. And because there's all these flaws with the original study, then people are convinced power posing doesn't work. But actually, I think that the reality is doing power poses does make you feel more powerful. It's just a very small fact [laughs], just not TED Talk-worthy, right?
GAVIN: Yeah, I remember your follow-up on this. I will defend some of the criticism because the original claims were about hormones. It's about the effect on your cortisol and testosterone.
SPENCER: Yeah, and those just don't seem to replicate.
GAVIN: No, they just don't. So we've got this nasty interaction between scientific questions and sociological stuff about, "Oh, that didn't deserve a TED talk. Let's bite back on this person in particular." But no, it's an incentive question, it's a method question which also, unfortunately, gets tied up with sociological stuff.
SPENCER: When you started learning all about this replication crisis stuff, did you feel like you had to unlearn a lot of the things you thought you knew?
GAVIN: Yeah. In fact, that is the origin of my list, which is, "Oh, no, my brain is full of stuff which I keep on telling people excitedly at parties. I do not want to do that anymore." I love just relying on, first of all, my work is completely derivative of the work of the actual replicators who put in hundreds of hours — where I put in maybe half an hour — per effect.
SPENCER: Oh, we were just talking about unlearning. Was there anything else you wanted to say about that?
GAVIN: Yeah, so it's been viewed maybe 30,000 times. And I hope that the unlearning that I went through has helped other people in the same way.
SPENCER: Yeah, you have one of the best unlearning resources out there. [laughs]
GAVIN: That's right. That's actually another question, which is, Why did I have to do it? Why did it take five years after the replication crisis became a big deal, big news. Why was a random dilettante...and I don't want to be too cynical here. The null hypothesis has to be, most things don't get done, intellectual space is really large, people are busy, there's no agenda here. I think there's also an incentive thing, where I have slightly annoyed a couple hundred psychologists maybe? Or in principle, I potentially annoyed a couple 100 people. They'll never be on my hiring committee. But if I was in the field, that might be on my mind. And yeah, there's just no real glory. There's no academic rewards for this kind of distillation work really.
SPENCER: Have you experienced that annoyance, like have people actually expressed it to you?
GAVIN: Yeah. If you're going for comments, [inaudible] post, there's a few people — oh, some of it's fair — some of it is people pointing out that, actually, it's a more limited effect which this replication targets rather than the general thing I claimed, but yeah, certainly, if you look on Twitter, there's a bunch of people wondering why this outsider feels entitled to crap on everything.
SPENCER: Yeah, I had an interesting experience when I was... I have this idea of importance hacking. It's when researchers use methods to push a paper through peer review by making it seem important or valuable or interesting when it's not, when it's actually really uninteresting. And it only works if they sort of trick peer reviewers because, if peer reviewers thought the result was totally uninteresting, why would they accept it? And when I posted about this, I got a lot of positive feedback. But from some insiders, I got negative feedback, where they're like, "This isn't a thing. This isn't real," and then other people jumping in and being like, "Listen, they're saying it's not real, and they're insiders so who are you to say that it's real?" It's a very interesting thing to me because, when I hear someone say, "It's not real," I wouldn't be like, "Okay, give me your arguments. I want to hear your actual [arguments]. Well, what about this case? What about that case? And what about that case?" But there's definitely a feeling like, "Oh, well, you're not an expert. Who are you to say that this thing exists? If people who are experts are saying it's not, then clearly you're wrong," which I get, and I think it's a reasonable heuristic. If a physics professor is claiming something about physics and some random person claims something else about physics, yeah, you're gonna go with a physics professor. But it works so differently from the way my mind works, where I'm like, "Okay, but give me the arguments. I don't care. You might be an expert but, if you can't explain to me when we're reasoning, then why should I believe you?"
GAVIN: Yeah, I think it's not a coincidence that you're a mathematician. Noam Chomsky has this wonderful line — obviously, Chomsky has published unbelievably important theorems alongside his mainline linguistics work — and he has this line, something like, "I've never had a problem publishing in mathematics. People observe my arguments and test it extremely rigorously and then publish The Linguist. But none of my work in political philosophy has ever gone unmolested, has ever made it through," something like that.
SPENCER: That's funny. Yeah. But here's the thing I found there when I was debating some of these people, when I did the thing to make it clear that I was listening really carefully to experts, that'd soften them a lot, like, "Look, I've read what the experts said. Here's point by point what they said and here's what I think they might be missing." By showing that proper deference, then people were like, "Okay, you really listened to the expert," and then they were more willing to listen to me. I thought that was interesting and you can accidentally trigger a sort of insubordination thing if you're not careful.
GAVIN: Yeah. And I think, as a time saving thing, if you're a physics professor, I understand that you get lots of emails from people who have a new theory of everything. And if you stopped to really understand what was wrong with all of them, then you'd never get anything done. I've heard this from Sabine Hossenfelder and Scott Aaronson and so on. As a time saving thing, as a heuristic strictly, I guess there's this necessary evil or something. But no, my intuitions are with you on, ideally, we should listen to each other and actually analyze.
SPENCER: Yeah, maybe part of it is a question of how quickly do you update so like, in a vacuum, one person is a physics professor, one is some random person, I'm gonna trust the physics professor. But let's say the physics professor is saying a bunch of stuff that I'm like, "Based on everything I know about physics, that just seems completely wrong and off-base," and then the other person is saying really good points, I'm gonna quickly update and be like, "Okay, that physics professor may technically have a PhD, but maybe they weren't...[laughs] or something. Whereas, I think other people are much less quick to update on the arguments.
GAVIN: Yeah, another line here is going back to the peer review thing. I did a bunch of outreach on Twitter about epidemiology work. You publish your paper during COVID and then you immediately talk about it because it might be important. And people really don't understand what peer review is. Or I think maybe somehow they've gotten the idea that all scientific peer review is like mathematical peer review. As I understand it, in mathematics, they do actually check line by line. It does take a very long time — like a year ideally, to review a lot of mathematical papers — whereas in most other fields I've published in or seen, it's just really nothing like that. So people are using peer review to mean, this work is very unlikely to be wrong or something, where it's just nothing like that. It just says this paper is no worse than average for this field. "This paper is not officially absurd," is what I think in most fields 'peer review' actually means.
SPENCER: I've had this funny experience of explaining how peer review works to certain lay people, and then being completely shocked, because they're like, "Wait, that's what peer review is?" because people's conception of what it is, is so different. So, for example, a lot of times when I publish, the reviewer knows who I am, which people are like, "Why would they know who you are? That's so biased. Shouldn't it be blinded to who the submitter is?" which I agree with. And I think there are some fields where you're blind to who the submitter is, but in a lot of fields, you know the submitter, which just seems strange. And another thing that I've mentioned to people — and they've been really shocked — is that they generally don't even try to run your code or look at your data at all.
GAVIN: Yeah, it's never happened. No one has ever run my code. No academic.
SPENCER: It'd be fun if you just put as the first line of code like, "Error! If you're seeing this, good on you." [laughs]
GAVIN: We could talk briefly about ways to fix this or reform proposals people have had. A classic one is post-publication review, instead of pre-publication review. Instead of the reviewer spending one hour trying to decide whether this should be published or not, just everything goes up, and then we review it afterwards, to re-filter afterwards by letting people know and having public reviews and things, that's a classic one. Another one I haven't heard mentioned much is the idea that we should just unleash Twitter or something. Twitter is one of the best places on the internet for scientific criticism. It happens within days of the preprint going up, you have back and forth between the authors and the critic in the threads, just incredibly good. And if we could incentivize this, if we just pay the trolls basically, then I'd feel much, much better about peer review.
SPENCER: Yeah, some people have talked about this idea of, well, what if every paper had a comment thread underneath it, and people could critique it there? But I think the common reaction to that is then you get a bunch of nasty comments that are maybe off-base, but they could hurt people's reputations and maybe it just doesn't have that sort of scientific feel to it where people are just talking about the arguments. People do just say nasty things.
GAVIN: That's true. Yeah, I once daydreamed about designing a site for this, and you just have two comments sections: one is people who have demonstrated good faith in some way — whether that's by having the credential or just having a good track record on the site — and then there's the id, the unfiltered stream of consciousness section. That was one idea I had.
SPENCER: Yeah, I do like the idea of a comment thread beneath each paper, but the commenters have tied it to their identity, like you can see, "Ah, this is a professor in this area," things like that. And maybe there are anonymous commenters, but they get down on writing in the comments, like it doesn't count as much or something. There's obviously a role for anonymous comments, too. Sometimes people have a really good point but they're scared to say it because of political things or whatever. So what do you think peer review is doing? I have the impression that it does screen out a lot of really bad things. If we didn't have peer review, there would just be real shit getting published, and so at least it's stopping the obviously bad stuff. I'm wondering if you agree with that.
GAVIN: Yeah. We have the experiments already. We have medRxive, and Site Archive, and Archive (and also, what's it called, the Rex , the archive for things that don't get on archive). So indeed, you can find in my specific small part of COVID science, you see some really, really, really egregious claims about how aerosols work, where it's just unphysical almost. I find myself not too worried by this. Yeah, I do have a relatively naive marketplace of ideas thing here where, in fact, when this aerosols paper went up, it was very, very quickly exposed as nonsense and everyone, at least in the field, caught the error and told everyone that it was an error. Did it end up misleading large numbers of people? Yeah, I usually feel that attempts to fix these things upstream — so an attempt to, say, not have this paper even added to the preprint server or something — we currently don't have the resources to do this well. I'd much rather try and improve the downstream filtering or something, or the ability to sort things which are already available, rather than attempt to clean the flow upstream.
SPENCER: It seems to me like an ideal system will be one where anything can get put up. But then there's a series of credentialing systems, that there could be really, really stringent ones that are like, "We not only had three experts read this, and they all said it was good, but also we checked the data," and whatever, the very highest level all the way down to like, "Yeah, we just put it up. Maybe it's crap, but nobody looked at it."
GAVIN: Even just what I described peer review as actually being, which is, somebody who knows something about this has looked at this and doesn't think it's absurd, that's a great badge. I think that's an extremely valuable service. I definitely want that to exist in the glorious peer review 2.0 system. But yeah, it's like the first badge that your paper gets, "Somebody has looked at this."
SPENCER: And it seems like in practice, journals, part of what they're doing is a peer review thing. But another part of what they're doing is they're just saying, "This is a meaningful result," or "This is a result that's worth reading," sort of like you can choose what news outlet to read things from and, if the New York Times publishes something, you're like, "Oh, that's more impressive than at some local paper." It's probably more interesting to a general reader than what the local paper's publishing about what's happening at the library or whatever. So when Nature publishes something, it's giving it attention, and it feels like that also is a valuable role. There should be some set of people deciding what gets attention. But then obviously, it doesn't mean that the current system is a good way to decide that.
GAVIN: Yeah, my experience with what Andrew Galvin calls the tabloid journals — because they publish from all kinds of fields and they publish all kinds of stuff — it's incredibly high variance. The best review I've ever had on any of my papers was at one of these big journals, and the guy just read the derivation of estimator and refactored it for us, and just made boatloads of incredibly thoughtful suggestions for how to firm things out and whether to falsify things. Both the best and the worst review I've ever had was at one of these big journals. I don't fully understand why that is.
SPENCER: I don't know about you. But my experience with peer review, where I'm on the other side of it submitting my papers, is so varied. It ranges from extremely thoughtful commentary that really helps you make your paper better, to extremely annoying, pointless, time-wasting requests that you feel like you have to do. They don't really make any sense, but then you are worried you're not gonna get published if you don't do them.
GAVIN: One of my reviewers did just admit that they had never mathematically modeled anything. It was a modeling paper. That was quite striking.
SPENCER: What do you mean by that?
GAVIN: We had an epidemiology paper and the reviewer, in the course of the review, just openly said that they had never modeled anything. They're not an epidemiologist.
SPENCER: Seems like the wrong choice for reviewer for that. [laughs]
GAVIN: Well, yeah.
SPENCER: All right. Let's switch topics again. Let's talk about forecasting. More and more, I think people are coming to learn about things like prediction markets, The Good Judgment Project which got these super forecasters. But you point out that there's this problem that may be even more difficult than making good forecasts, which is getting anyone to listen to you when you make a forecast, even if you have a good track record. I'd love to hear your thoughts on that.
GAVIN: Yeah. The forecasting problem is estimating probability as well. And then the greater forecasting problem is, as you say, getting anyone to care. A large part of my company's work is judgmental forecasting in the Philip Tetlock super forecasting sort of mold. And it does confuse me because we do have clients and they do commission forecasts, and they're interested in and they use them. But more broadly, the industry is still incredibly small. It's been ten years since the ACE experiments, the big super forecasters versus CIA Tetlock experiments, and it's still a really small industry, not really that much uptake in the media, not necessarily that much uptick in policy. And it's confusing, because it is better information — we know, in expectation, that this is better information — and that seems incredibly valuable if you're making policy decisions, which spend billions of dollars making that one percent better. Seems like it's a slam dunk. Yeah, I have been thinking about why the uptake is so low. One thought is that the advantage isn't that large. We reviewed the evidence, more recently, about super forecasters, top two percent forecasters versus experts. And it seems like a three percent to ten percent sort of advantage rather than the much larger things which you sometimes read about. Having better probabilities doesn't necessarily guide your actions. It doesn't tell you what to do. It doesn't help you mechanistically understand your problems. It was true in COVID. Dominic Cummings, a very powerful policy person in the UK, very big on forecasting, but just much, much more interested in having SEIR models — the basic mechanism of how disease works in the population — and that guiding policy more. Another thing is, you're upsetting the existing order. You have this extremely ancient system of experts who have careers riding on being the person people turn to. I don't know of that many explicit cases of guild behavior, of the forecasters being intentionally attacked or shut down or anything, but presumably some of that's in the background.
SPENCER: Yeah, it seems like COVID is a perfect example where forecasts could have really helped. Think about small businesses and how many of them had no idea what was going to happen to their business, or how long they were going to be closed for. Just having estimates around that could have helped millions of small business owners at least have a better sense of how to make decisions.
GAVIN: Early on, even hospitals. My friends ran a pro bono consultancy offering tailored case load estimates, and major American hospitals were amongst the people coming to them — we're talking April, May 2020, so very early on — but the New York hospital system didn't necessarily know how many cases they should expect next week.
SPENCER: It also seems to me that policymakers, at least in an ideal world, will be relying on forecasts a lot, especially conditional ones. If we were to implement such and such policy, what would the effect be on GDP or unemployment or whatever? But I wonder whether there's a resistance to that, due to political issues, like if you're pushing for a policy, you kind of want to give all the arguments why it's good, and none of the arguments why it's bad, and try to refute anyone who criticizes it, just from a political point of view, to push it through and get everyone excited about it. Whereas, a forecast might come in and say, "Well, actually, it's probably not going to help with this thing. Or it has a 10% chance of making this thing worse." And while epistemically good, maybe that feels like practically working against your interest or something. What do you think?
GAVIN: I have a kind of edgy thought here which is, in numeracy, there was this horrifying study in 2012 of British MPs. They asked them to give the probability of two coin flips coming up heads and heads, and most of them got it wrong. I think something like 57% of British MPs got it wrong. That's a horrifying reason why giving people probabilities might not actually improve policy. As for the arguments as soldiers, no, I don't necessarily want to know the pros and cons. I just want retroactive justification for the policy I've already chosen. I expect that to be very explanatory.
SPENCER: Right. There's a window of time when you're deciding on the policy, and maybe right at that moment, you would want different forecasts. But then once you're all in — you're throwing your weight behind it and people associate the policy with you — you don't want to know the evidence, you want to just know good things, don't want to know any bad thing.
GAVIN: Yeah. I have this piece on exactly this question, how can policymakers use it, and even then — they can still just privately receive the forecasts — it still behooves them to know what's going to happen, to have better ideas or what's going on. It's not like they want to blind themselves. The argument against this would be, even internally, you want to have people lining up behind it. You don't want to give excuses for even backroom dissent or something. I'm not sure how cynical to be there.
SPENCER: Someone I know who worked in the White House used to make estimates, and his boss would take the error bars off of his estimates before showing them to the politicians, and would justify it saying, "Look, they don't want the error bars. They just want to know what's gonna happen." And it was interesting because it wasn't like this person had an agenda to make things go a certain way. They were just reacting to the local incentives. Politicians would be like, "Wait, you're saying you don't know? Why are we listening to you?" so they were like, "Problem of the error bar. Now we know."
GAVIN: Yeah. One more answer to the greater forecasting problem of why people don't listen, maybe institutions are just really slow. It's only been ten years. It's still relatively new on the scales of governments and on the scales of epistemic institutions more generally. Maybe we'll see it pick up. I should just know the growth rate of the forecasting industry; I just should know that, but I don't.
SPENCER: I've heard about a lot of initiatives to do small internal forecasts, like companies having forecasting systems internally and stuff. I get the sense that they usually die out after a while, but maybe there's some that have succeeded. Do you know of any that have persisted?
GAVIN: No. We have a survey of this as well. We looked at Google's internal prediction market, and some other people's. And, yeah, the answer is just what we see again: you need incentives, you need volume, you need people to be waiting a lot of time, updating these things for them to remain relevant. And even Google, which is like (I don't know) 160,000 people, not large enough, not enough volume, not sustained. And indeed, I think it's now shut down the internal prediction market.
SPENCER: Something that I think doesn't exist but I think would be really cool is, if one of these prediction websites — whether it's Metaculus or Manifold or what have you — if they would implement a thing where, if you pay X dollars, we'll guarantee you this many predictions on this topic. And obviously, they'd have to figure out what they could guarantee with that. But I think that would be really interesting, because it would give businesses and nonprofits and stuff, a very clear way to find something out. Let's say, you're like, "Oh, well, we really need to know if consumer demand for this thing is going to fall." And then you're like, "Well, we know if we pay $2,000, we're gonna get this many forecasts," and then you can exchange money for information in a very clean way.
GAVIN: Yeah, doing that scalably is the interesting part. It'd be nice if the — is it the SEC? It's one of the regulators — would stop shutting down perfectly good, large prediction markets, which are already doing this kind of service in a distributed way.
SPENCER: As soon as money gets involved, it gets tricky, because then you can get into issues of, well, is it just betting or is it investing in securities? And then it's really complicated, whereas Manifold does it for play money, which avoids all those issues, but then creates other issues, like, well, how do you incentivize people properly? Given that they're using play money, they've done a remarkably good job of getting a lot of people interested. But it still is not the same as, "Oh, I'm going to do this as a part-time job."
GAVIN: They had this wonderful program which was, you could convert your play money into charity donations, which — in theory at least — is also not gambling. You're also not profiting off of this situation. But that too, has been shutting down. That too, was a little bit too close to gambling for the regulators unfortunately.
SPENCER: Let's jump to our last topic before we wrap up. I want you to tell the audience, why were you hesitant about coming on this podcast? I think it's a very interesting reason.
GAVIN: It's the first long excerpt of my voice on the internet. You only need a minute or so of audio now to spoof. Using machine learning speech generations now, there are already horrifying stories about spear-phishing, about people's accounts and identity being stolen, horrifying story of this woman gets a phone call and it's from her daughter and she's screaming, and this horrifying voice in the background saying that she's been kidnapped. She's getting ready to transfer ransom money to this person when her daughter, who's away on a ski trip, finally gets through to her on the phone. This is already happening. Because it's only a minute of audio, I feel that anyone who really, really wants to target me could already do it relatively easily. But yes, the future of privacy and this kind of thing was on my mind.
SPENCER: It's very relevant because, just yesterday, someone released a remix of the Lex Fridman Eliezer Yudkowsky podcast, except that Eliezer, instead of arguing against AI capabilities, was arguing in favor of AI capabilities in order to maximize the chance of creating cat girls, which is obviously ridiculous, but it was just done as a kind of a silly thing. But it was like, "Oh, wow, if they'd actually wanted to send out misinformation, that would have been incredibly easy because the voices sound really quite convincing.
GAVIN: The most ridiculous part of that exercise was Lex Fridman pushing back on one of his guests, which he never wants to go back.
SPENCER: [laughs] Yeah, one thing I wonder about, Adobe Photoshop has been around for multiple decades and people have been able to fake photographs using it quite easily. It doesn't take that much skill to make people look like they're in the same scene when they're not, at a level that's very hard for a normal person to detect. Maybe an expert could tell the difference, but most people couldn't. And yet, we don't seem to have this massive problem of people taking photographs and manipulating others that way. And it makes me wonder if these concerns may be overblown? Is it maybe we will need to learn, "Oh, yeah, video is now fakeable," so just like with a photograph where you can't be sure it was made, we have to treat video the same way? Or is there something fundamentally different about video or voice where it's not going to be like photographs, like it's a much, much bigger deal that they can be faked?
GAVIN: Yeah, it seems like slightly more of a sterile, seems like slightly more easy to have a bad reaction in the moment. If somebody sends a clip to one of my friends, of me in distress, even if within a minute or two skepticism kicks in and they realize, it's still a deeply unpleasant minute or two. There are actually cameras which digitally sign; they've got this module inside them which just have a private key basically, and certify that this image was taken on this device, and has the following hash, which if it's photoshopped, will be different and stuff. So I feel like we have the solution to this. It's public key cryptography which...
SPENCER: Does anyone use that though? I mean, in the multi-decades of Photoshop, I feel like people just look at the photo and just like, "Ah, it's probably made up or something," you know?
GAVIN: Yeah, I'm not sure. It worries me more than people putting my head onto a goat or whatever people did in the noughties. But yeah, I'm not sure. I do expect there to be a deep fake international incident at some point, just because (I don't know) there's sometimes profit in treating things like reacting to insults, even if you don't think they're real or something like that, pretexts. Yeah, I'm not sure how alarmed to be in general.
SPENCER: Maybe video, one thing that makes it really different than a photograph, is you can tell a more complete story. Sure, you could have a photo of someone allegedly cheating on their wife or whatever. But with a video, you can have them say extremely specific things, so you can click-craft a narrative. I could certainly imagine a video being spread in such a way that it really convincingly seems like it was recorded by some big news outlet, or that some big news outlet got their hands on it. And then you can imagine that spreading so quickly around the internet that, by the time people realize it's happened, there's already millions of people that have seen it and assumed that it was legit because they thought it came from somewhere legitimate.
GAVIN: It does force us to slightly downgrade our confidence in most video evidence, most audio evidence. Does that practically flip us in any case? Maybe it makes some legal cases harder. Maybe eventually juries become just more generally skeptical of this evidence. I think that is an effect. I, of course, don't know logically.
SPENCER: Well, it seems like what happens with photographs is, a lot of times people just don't deny that they're real when they are real. All these people have photos of them with Jeffrey Epstein which obviously makes them look bad, but I've never heard a single person be like, "No, that was Photoshopped." [laughs] And maybe it's a similar thing where, with real video, people will just usually not deny it and then, when it's fake, people will be like, "That video never happened." Maybe in practice, that's how things will go.
GAVIN: That's actually surprisingly optimistic, like lying is not the default or something, is what you're pointing out there.
SPENCER: Well, it's really interesting because there is a segment of the population that just has no qualms about lying, is just willing to say anything, and that segment of the population, it's real. However, I think it's a small fragment, maybe it's like one percent of people or something like this. So most of the time, you're not dealing with a group that's just willing to say or do absolutely anything to get at an outcome. And so there's some set of social norms still binding people where there's literally a photograph of them doing the thing and they're like, "Okay, yeah, that was me." They're not just like, "What?! That's ridiculous. That's not me!"
GAVIN: Yeah. A more interesting thing than scams or trolling or something, is simulation. So me — not just being spoofed, and saying things I wouldn't say — but being simulated, and the AI model saying things which are from the distribution of things I would say, is a really interesting example, just quite involved, but I'll see if I can do it from memory. There's this LessWrong user called Wei Dai who has decades of text on the internet, was one of the very early cryptocurrency people, and is just an internet famous kind of guy. And when GPT-3 came out, he went on LessWrong and tried to warn people. "Look, if you have any text in your name on the internet, it will soon be relatively easy to simulate you, to produce a fine-tuned language model which speaks as you speak, which knows things about you, which people can talk to, sort of against your will." And the first thing, the first comment underneath this, is somebody trying this out on GPT-3 and asking, "What would Wei Dai say?" And the output is, "This user has deleted his comment." So the GPT-3, not fine-tuned, already knew enough about this random internet famous guy to know that he would not approve of being simulated in this way. This is extremely striking. This is much more expensive. It's pennies to fine-tune a speech generation model on me. It's probably $100 or more to get a really good fine-tuned version of me from my text, of which there is also an enormous amount. This is much more interesting to me. It's like a really deep kind of privacy — I'm not gonna say innovation — this is a deep form, much deeper than tricking somebody into thinking I said something. This is like, you've created a model of me and then you're interacting with it without me being involved, which is interesting.
SPENCER: There's a fairly famous person who I wanted to come on the podcast, but I knew that he had said, blanket, he will never come on podcast. And so I reached out to him. I said, "Hey, would it be okay if I train a model on everything you've written and interview that instead? And I'll make it super clear that it's not you, just the AI." And he's like, "No! Please don't do that." [laughs] Which I get because, even though it's not you, people might make an assumption that like, "Well, the model of you said that so maybe you believe that thing." Even though you've never been asked that question, you've never responded to that thing, maybe the model has inferred from your other beliefs that you would believe that and maybe it's right.
GAVIN: I'm not even pointing at anything as concrete as that. It's just this kind of slightly eerie thing in the same way that seeing a hologram of Tupac performing 30 years after his death — I mean, it's cool — but it's also people losing control over their own concept. You die and your employer or your license holder can continue to make use of you. I'm not going to be claiming this is bad. I'm not claiming this is even negative in any way. But it's definitely eerie and opens up this new part of social space that we're going to be exploring in the near future.
SPENCER: The idea of having a chatbot of me that has been trained on my writing, and people can ask it questions, I find that maybe somewhat appealing as long as it's not conscious. I don't want that AI to be able to suffer or something. But now I'm imagining, well, what if people start asking it incredibly controversial questions? "Here's what simulated Spencer says about XYZ." And I'm like, "Ah, I don't like that idea. What if it says something I totally disagree with?"
GAVIN: At the same time, here's another strange thing I sometimes think about. I started blogging in 2010, which is nine years before language modeling was very good. And so I have this kind of cryptographically signed proof that I can write, I have this cryptographically signed proof that I can think, because I was writing and thinking before it was possible to simulate or really cheat on this except by spending 1000s of dollars paying somebody and then taking their writing, passing it off as your own. So I sometimes wonder about my GitHub repo of my blog being this kind of credential of being a human being who can think and write or something.
SPENCER: Yeah, that's interesting. And in the future, well, maybe you'll just get an AI to write your whole blog and you just told it what impression to create through your blog posts and then it'll write the whole thing. That's a weird idea. Speaking of which, would you be okay with me asking GPT-4 how you would wrap up a podcast episode?
GAVIN: Go ahead. Go ahead.
SPENCER: All right, give me a second. All right. So I asked GPT-4, "How would Gavin Leach end the podcast episode in order to wrap up a discussion about how AI can be used to simulate specific people?" Here's what it said: "Alright, folks, that brings us to the end of this fascinating episode, where we dove deep into the world of AI and its uncanny ability to simulate specific people. We've had a great discussion about the potential ethical concerns and the impact of this technology. Before I sign off, I'd like to leave you with some food for thought. As these AI simulations become increasingly realistic, we must really remain vigilant and mindful about how we use and interact with them. We have the power to shape the future of technology. It's up to us to ensure its benefits to humanity while respecting privacy and individuality." I think that's a pretty good place to wrap up. What do you think?
GAVIN: Yeah, more of a politician than me, but yeah, it does the job.
SPENCER: Yeah, it definitely feels like a more optimized-for-public-facing, nice-sounding words, as opposed to actually thoughtful and trying to figure out the truth. But give it a year or two and maybe it'll sound just like you.
SPENCER: Thanks, Gavin. This was a really fun conversation.
GAVIN: Thanks, Spencer.
JOSH: A listener asks, "How can I keep up with the news without spending a lot on subscriptions?"
SPENCER: I find that the amount of news you actually need to stay up on the important stories is not very much. So I use a service that just sends me one news story a day. The one I use in particular is called Flipside and I like it because it summarizes the news from the point of view of the left and from the point of view of the right. And I like seeing those kind of two different perspectives represented. Sometimes they also have a libertarian point of view as well. So sometimes you get three points of view. I feel like that's about the right amount of news because it's unlikely that there's multiple really, really important stories every day. In fact, I think most of the news is just not very interesting and in a month it will be irrelevant. And so it's just important to focus on the major things that happen and not the noise that's constantly bubbling all around us.
Click here to return to the list of all episodes.
Sign up to receive one helpful idea and one brand-new podcast episode each week!
Subscribe via RSS or through one of these platforms: