Episode 141: How can we make science more trustworthy? (with Stuart Ritchie)

Enjoying the episode? Want to listen later? Subscribe on any of these apps or stores to be notified when we release new episodes:

Listen on

Apple Podcasts

January 20, 2023

How can we make science more trustworthy? When scientists break into factions around a particular topic, whom should we trust, and why? Why did trust in science as an institution plummet drastically during COVID? What is the state of the evidence for the effectiveness of hydroxychloroquine, ivermectin, or vitamin D against COVID? Why is controlling for variables so difficult? What evidence is there for how well IQ represents intelligence and predicts useful things about people's lives? There's the famous quip that "IQ tests only measure how well people do on IQ tests", but we also all seem to know that some people are smarter than others; so can't that disparity be captured in a single number, or even in a small set of numbers?

Stuart Ritchie is a Lecturer at the Social, Genetic, and Developmental Psychiatry Centre, King's College London. He received his PhD in psychology from The University of Edinburgh in 2014. Since then, he's been researching human cognitive abilities like how our mental abilities age and how education can improve intelligence. His other interests are in the subject of Science Fictions: the problems with the scientific system and how we might fix them to improve the quality of research. Learn more about him at sciencefictions.org or follow him on Twitter at @StuartJRitchie.

JOSH: Hello, and welcome to Clearer Thinking with Spencer Greenberg, the podcast about ideas that matter. I'm Josh Castle, the producer of the podcast, and I'm so glad you've joined us today. In this episode, Spencer speaks with Stuart Ritchie about making science trustworthy, controversies in research and models, and tests of intelligence.

SPENCER: Stuart, welcome.

STUART: Hi, thanks for having me on.

SPENCER: Often you hear these days that we should trust science, that one side is telling us that and then there's another side that says, "Actually, we should be skeptical of scientific experts who are often wrong." It seems like it's a really tough conversation to navigate. Because on one hand, science has given us answers like nothing else before, right? There's so many different problems that science has made progress on, and it really has been our best method for making progress. On the other hand, science sometimes gets things wrong, and science sometimes gets political in ways that can actually make it unhelpful. So I'm really excited to dig into this with you today, and really get into the question of when should we trust science? How do we make science more trustworthy? And so on.

STUART: Yeah, these are scary questions. I think we've learned a lot in the last couple of years in the pandemic as to what makes science trustworthy, and what makes it untrustworthy. It's almost like a little microcosm of what we can learn about the kind of meta-scientific principles on how to make decisions about what's trust.

SPENCER: Yeah, that's a good example. I think that a bunch of faith in science was lost very rapidly during the COVID period, partly for political reasons. But also partly because some people might have overstated things based on the current understanding of science that may not have been correct at the time. And suddenly, people saw that and had a negative reaction to that. I'm curious, just on the COVID front, what is your interpretation of what happened there?

STUART: Well, I think there's a lot of the science that was good during COVID. So, a lot of the vaccine trials and the treatment trials of some of the steroids and stuff that we worked out actually worked for their effects on COVID recovery. They were good because they followed the kind of principles that we've been talking about for the past 10 years on the kind of replication crisis. That is, they were very open and transparent about their plans before they even did the study. They published the registrations, and I remember a few times, people questioning the registrations before the trial had ever happened, said, "Well, you're definitely going to do this as this. You definitely gonna have this outcome. And have you changed the outcome from what you initially said? And why are there multiple registrations?" There were good questions going on before the trials even began, and the lower quality science has been the stuff that has not followed that. So, there have been low-quality trials with non-justified methods, with small sample sizes, done in places where there's no transparency about exactly how the procedure has happened. I'm thinking of things like hydroxychloroquine, ivermectin, and so on. So, I think you have both sides of the coin here. You have the really good stuff and the really bad stuff. I'm just glad that the really good science was applied to the vaccines that actually got us out of the pandemic.

SPENCER: But this creates this real issue, which is if people say, "Trust the science." Which science? There's good science, and there's bad science, and it's not so clearly delineated. It's not like all bad science is published in bad science journals and all good science posts are in good science journals. It really actually takes some nuance to tell what is what.

STUART: Yeah, absolutely. You can't just rely on the journal. I mean, look at what happened in May or June 2020 in The Lancet and the New England Journal of Medicine, two of the top medical journals in the world. The ones that you would want to implicitly trust when you open them up. They published papers which I don't think they've officially called fraudulent. There's not been an investigation that's come up and said, "These papers are fraudulent." I'm pretty sure that they're based on incorrect data from Surgisphere. They were on hydroxychloroquine, and they said hydroxychloroquine was actually bad, when there had been a few months of study saying that hydroxychloroquine was good, in terms of COVID. They had to be retracted about two weeks after they got published, because the scientists who had published them —- these were Harvard researchers —- hadn't actually looked at the data. They hadn't checked it. They were purely just handed this dataset by this company, Surgisphere, on the survival of patients who had hydroxychloroquine across different countries. And it was only after the papers came out that everyone read the studies. It was kind of a hot topic because Donald Trump had mentioned hydroxychloroquine and it's kind of a big political thing at the time. It was only after people looked at them that they found impossible numbers, ridiculous effect sizes, just absurd things. There was this kind of delayed response where The Lancet had said, "Well, in future we're going to make sure that people who review our papers actually have expertise in data science and stuff like that." And it's like, well, why wasn't this the case before? So, you had these big institutions that you would hope to be able to trust, even during the pandemic, something really serious. They were just acting in this totally untrustworthy way.

SPENCER: Yeah, you mentioned Ivermectin briefly, and I'm curious to dig into that a little bit more with you. It's one of the most polarizing things that we've seen in terms of COVID treatments, where you have a huge number of people that say, "No, Ivermectin doesn't work." You have people taking an even more extreme view saying, "Oh, it's a horse tranquilizer, and it's killing people." On the other side, you have people say, "If we have given everyone Ivermectin like the whole, then the COVID pandemic would have been over by now." And it's being suppressed, and so on and so forth. So, what's your interpretation of what's happening there?

STUART: I was always fairly skeptical of Ivermectin right from the very beginning. I wrote an article in May 2021, which is when people were first discussing this saying, "Look, no pun intended. Hold your horses. There's no evidence actually that this works. But there is no good evidence." There's a few scattered studies as there are — I often find this when I look into an area of research — that was just a few tiny, scrappy studies that you can make any strong conclusion from, and we needed to do some actual trials. There didn't seem to be any particular, like theoretical rationale for why an anti-parasitic medicine would work for COVID. Now, that's not to say that there couldn't be, it is possible that medicines would have different off label uses. That happens all the time. But there wasn't really a good theoretical mechanism.

SPENCER: Well, if it helped with parasitic worm infections, and those made people worse from COVID, that could have been their assumption.

STUART: Sure. There's that potential thing where it's like there's a sort of secondary effect. But the question is, whether the developed world, where there's not very many parasitic worm infections, would work then? I just got a sense of the slightly swivel-eyed nature of the proponents of it, that they were really, really, really pushing it in this almost religious way, right at the very start of when we had the evidence is basically nil for it. I saw something similar by the way with vitamin D. People were going on and on and on about how that would be, if everyone had vitamin D supplements, then they wouldn't catch COVID. It's a very similar sort of thing. My reading on both of those treatments (and hydroxychloroquine too) was that when the larger trials were done — the higher quality trials — the signal just wasn't there. There really wasn't an effect. And at the same time, a lot of the original studies that had been put up and sometimes included in meta-analysis turned out to be fraudulent, like turned out to be literally made up studies, or certainly there was no good idea of the provenance of the data. So, the level of advocacy versus the actual quality of the data was totally off.

SPENCER: Yeah, we're gonna talk about both vitamin D and ivermectin a little more with you. Because I think they both are really interesting case studies. We've now been told regarding vitamin D that helps just about everything, right? You can find studies saying almost every benefit you can imagine you get from vitamin D. And yet, when people go and do really high quality studies, generally they don't find it helps.

STUART: Yeah.

SPENCER: I find that really fascinating. My suspicion there, what's going on is that there are many, many bad things that happen to you that are correlated with low vitamin D levels. I don't know why that is biologically but if you do a correlational study, and with people who are doing badly, they often have low vitamin D. That gets people really excited. But then when you actually give people vitamin D, you realize it doesn't solve the problem. So it's a correlate, rather than a causal mechanism, with perhaps one major exception, where there may be some stuff around bone health. The stuff that I've found most persuasive around vitamin D was it maybe it helps older women. And it may have mechanisms in bone health like preventing fractures and stuff like that. I don't know whether that's settled, but it does seem to be more promising than the other things.

STUART: I have these huge arguments with some of my friends who are big into supplements and taking lots of supplements and things. A lot of the evidence comes from, as you just mentioned here, correlational, observational studies, which are all very interesting as far as it goes, but are often totally contradicted when proper, high quality randomized controlled trials come along. And it just goes to show this really scary thing about controls. When you say that you're controlling for something in an observational study, maybe you've controlled away some of the variance that's due to maybe, whether it's socioeconomic status, or education or some other thing, which you're trying to partial out of the study. But it's very hard to parcel out entirely, and you can still have these confounding effects, even when you've done your very, very best to control. There's an interesting paper by [inaudible] — who is a neuroscientist/psychologist who can be critical of the world of psychology and this kind of research — who talks about controlling for stuff is much harder than you think. And scientists rarely go as far as he recommends in really controlling stuff like socioeconomic status. They'll often have one measure of income that they control for in their observational studies. But he's saying what we want is multiple measures and creating latent variables and controlling for them and making sure that you have as much of the variance as possible, partial doubt that might be related to your confounding factor. So, I really think that when your entire evidence base is reliant on observational studies like that — especially if they're just one off snapshot, observational cross sectional, observational studies — you're setting yourself up for a really unfortunate surprise, when the randomized control trials come around.

SPENCER: Yeah, so to unpack that controlling point. If you find low vitamin D levels correlate with something like (let's say) bad cases of COVID, or some other negative health outcome, what researchers want to do is say, "Okay, well, maybe this is due to some confounding factor like age. Maybe it's just that older people have lower vitamin D levels, and also happen to have worse cases of COVID." So, maybe it's not low vitamin D levels causing worse COVID. Maybe it's just age being a confounding factor. So then they'll go and they'll add ages as control. And there's different ways to do this. One is you can try to balance your group, you could say, "We've got a control group, and we've got this group that got COVID. And we're going to try to match them based on age." Another way to do it were fancy ways to use linear regression. You add age as a variable in your linear regression to try to do a statistical control, right? But what if it's not age? What if it's another factor? What if it's gender? What if it's something else, and so then you kind of keep adding more and more of these. The problem I see with this controlling, it tends to be an improvement over not trying to control for anything. The first problem is that you may just not have all the factors, right? So, you can only control something you've measured. If you didn't measure it, you can't control it.

STUART: Yes.

SPENCER: Second, even if you did measure it, there's some chance of there's some nonlinear effect. Usually, when people are talking about controlling, they're usually doing a pretty simple linear control. So maybe that could be an additional problem. Are there any other problems you would point to in terms of the difficulty of controlling?

STUART: Not just that. I mean, the thing that I was getting at a moment ago was that there's measurement error in the controls as well. So, something like age, sure, that's pretty easy to measure and sex, pretty easy to measure. Although age, you can also measure age in years, which is less reliable than measuring in months, which is less reliable than measuring it in days, in terms of getting all the detail. But when you're thinking about measuring something like controlling, I should say, for something like education, do you put into your linear regression the number of years of education that someone's had? Or do you put in the specific qualifications that they had? Or how are you measuring that? Or are they just trying to remember how many years of education they had? Are they going to make mistakes sometimes? Is it going to be an error around that? As I mentioned, socioeconomic status, what actually is that? How do you measure that? People measure it in different ways. Some people think it's just education. You read papers that control socioeconomic status and literally what they have done is years of education into their regression. You read studies where they use income only. You read studies where they use the number of cars that the family owns, or whether they rent or own their house. All this kind of stuff are all different ways of measuring this concept. And, so when someone says, we controlled for socioeconomic status, you gotta ask yourself, "What specifically did you do and how reliable were the measures that you used?" So yeah, exactly. You might have missed a factor, there's nonlinear effects in the things that you've measured. The factors that you've measured, the confounding, the possible confounding factors might themselves be subject to the usual measurement errors that we have in these kinds of studies. So all of that adds up to the fact that just merely saying we control for something and just merely throwing variables into a linear regression — as you say, "better than nothing" in many cases — but it's not the be all and end all. One more thing is that there's also this concept of over controlling, or it's been described as Everest Regression, which is that controlling for height. London and Mount Everest have the same air pressure. Well, yeah, absolutely, that is true. But you've controlled away the thing that's of interest here. You often find in studies of things to do with intelligence, for instance, which I'm sure we'll talk about in more detail. People control for education, without realizing that intelligence often causes education. So people who are more intelligent go further in school. And so if you control for education, you're actually taking away a big chunk of the variation that actually might be due to intelligence. So when you find that some other third factor, like income or whatever, isn't correlated that strongly with intelligence after controlling for education, well, no wonder. But it's because you've actually overcorrected your study. So there are all these conceptual problems with controlling that I think scientists don't think about enough. And certainly when you just hear people talking about research and saying, "Oh, yeah, that was controlled for this." They don't talk about it in anything like as much detail as they should.

SPENCER: Yeah, I've seen this happen quite often, where people want to control for something, but the thing they want to control for could be caused by the variable of interest. So in your example, if you're studying IQ, that's a variable of interest. Well, if education can be caused by that, then you have to be very careful about controlling for it because you can actually make your model less accurate rather than more accurate to control for such variables. So, it gets really subtle. I do just want to add about the vitamin D topic that there are some people that probably should be taking vitamin D. For example, my understanding is that vegans have a tendency to get low vitamin D levels. Also people with dark skin, especially living in cold climates can have low vitamin D. So we're not talking here about people who actually have low vitamin D levels, if you do, you should take vitamin D. We're talking about should a healthy, normal person consider taking vitamin D?

STUART: Who's getting a balanced diet and all that sort of stuff? Do they need extra supplements of vitamin D on top of their standard intake of it? Yeah.

SPENCER: So I do want to dig into the topic of intelligence with you, because I know you've done a lot of research on that. But before we get there, let's just talk about ivermectin a little bit more. I find it really fascinating, just at this point, how there have been a whole bunch of different studies done. And yet there seems to be still a lack of consensus about the effectiveness of it. Now, would you say that, among scientists, that there is a consensus, and the lack of consensus is really coming from a side movement that's outside of science? Or how would you put that in perspective?

STUART: So I think, right from the start of this debate, you've seen the kind of advocacy group for this, which should worry anyone who's looking at treatment. I mean, obviously, there's advocacy groups that have been completely right and spot on about particular treatments, and being an advocate for a particular condition, and so on. So totally, that is possible. But you should always be a bit wary, because there's clearly biases here. So you had this right from the very beginning, you had this Frontline COVID, Critical Care Alliance (FLCC), who had been pushing this. I remember seeing presentations that they were giving at various places, including I think, at the World Health Organization at one point. Certainly to politicians in the UK and the US advocating for this in ways that just that the evidence just wasn't there. So they had some people who had to meta-analyze some of the original studies. And then when you took out the studies that turned out to be fraudulent, they had to remeta-analyze them. Turns out the evidence was a lot less strong. In fact, it didn't really show that there was any evidence. They got this huge publicity from being on things like Joe Rogan's podcast, you had Brett Weinstein, and people like that, who are really, really heavily pushing this on their podcasts and online and so on. It's a fascinating kind of patchwork, isn't it? Because these people were not anti-COVID being serious, they were very serious about COVID. And you had other people who were also pushing dodgy science who were saying that COVID was not a particularly big deal and so and so. I find it really interesting that lately you have every possible level of taking COVID seriously, are a fan of not very good studies of research, and so on. So yeah, I really think this is kind of an outside force that was really pushing this, where you had studies from the standard places, Oxford University, and whatever that came out and found that there's not much of an effect, or certainly. Their overall conclusion was that this is not worth giving spiel.

SPENCER: I suspect that something that happened with ivermectin in particular, is that the initial evidence-based looked really good. In part that might be because there were some fraudulent studies in the mix that have just mind blowing effect sizes. And in part, because early studies tend to look good for things, right? They tend to be lower quality and not as rigorous. So if you read the studies, and you become convinced by them that ivermectin is amazing, not just okay, but amazing, you start pushing out that idea and then the evidence starts tilting the other way. Now, you have all these other forces right? Now, you've been publicly saying that it's good, you've convinced yourself it's good, you're taking it, you've given it to your friends, etc. And that's going to create a pretty strong psychological momentum.

STUART: Brett Weinstein took it live on-air, on his podcasts. I remember watching the little video of him taking it out live on-air. Once you've done that, you're going to look very silly, when you say, "Actually, your studies were largely fraudulent," or at least a few of the big ones, the biggest one, I think, turned out to be fraudulent. Several other ones were low quality to the point of being really unreliable. And you got to remember the decline effect. There was a really famous article that certainly caused a lot of discussion. This is probably about 10 years ago, on the decline effect in the New Yorker by Jonah Lehrer, who was the science writer, who was an amazingly fluent, brilliant writer who turned out to plagiarize stuff. I think he made up some quotes from Bob Dylan in an article, a really unnecessarily, really weird thing to do. He kind of got disgrace and he's never really written anything ever since. But he wrote this article about the decline effect about how, initially, studies do often show this really, really powerful effect. And then it gets worse and worse as new studies come in. And that can often be because people are tightening up the methods, getting around the issues in some of the earlier studies. But it's not just COVID. This is a general thing which happens. It's partly what's happening in the replication crisis, which is that there's these big effects studies of things like social priming, or whatever you want to call it in psychology. Then the new studies come along and find that, either they can't find the effects at all — they can't find statistically significant results in the same direction — or they find statistically significant results, but they're much smaller (the effect sizes are often 50% smaller). A lot of these studies even if the replication is successful, which I think is the kind of slightly under-discussed aspect of the replication crisis. So the decline effect is real. Incidentally, the decline effect, the actual first person who talked about that was J. B. Rhine, who was a Paris psychologist, was trying to explain why the psychic effects he found in some of his initial experiments were much smaller, especially when skeptics did experiments and so on. There were all theories of decline effects, including that the psychic powers of the skeptics might be, you're so desperate not to believe it that your own psychic powers nullify the psychic powers of those around you, and so on, which is an amazingly —

SPENCER: That's why psychic powers don't work around me, because I'm just too skeptical.

STUART: Exactly [laughs]. It's incredibly like a circular way of doing it, in which you gotta respect someone who kind of reasons themselves into such an absurd argument.

SPENCER: Yeah, that decline effect is just really interesting and important. I would also add that there are statistical reasons for it, which is, the way that publication often works is if you don't have a result that's below p less than 0.5, you can't publish the study. But by using relatively small sample sizes, and a lot of studies — which is generally true especially for early studies on a topic. They're not using these huge sample sizes in the first studies — you get into a situation where it's often underpowered, meaning that if the effect you measured wasn't above the real effect, you probably wouldn't have even been able to find it in your study, right? And so you have this bias towards the effects you can detect are ones that are actually above the real effect size.

STUART: Yeah, it's the Winner's Curse kind of idea, as well, and this happens a lot. So there's all these things that conspire to make the initial studies on something seem like they have a much bigger effect and the later ones. So, that's just something to bear in mind, I think, when you're just looking at it. It was especially relevant in the pandemic. But in any area of sciences, the effect sizes are probably going to get smaller than the initial studies. And I mean, maybe an ivermectin fan would say, "Well, the effect sizes were absolutely dramatic in initial studies. Even if there are 50% of the size, I'll still go for it." But under normal circumstances, the effects you're getting are not going to be dramatic. Unless you're studying, like what happens to people's psychology, when they get a bullet in their brain or something like that. Those are gonna have big effects. But in general, it feels like in psychology and so on, the effect sizes are pretty small and so when the replication studies come along and cut them by 50% or so, they're going to be even less relevant and less interesting, and maybe even less clinically, realistically useful.

SPENCER: You mentioned these people who want to throw control for a hundred things, control for a thousand things. But I think one of the big challenges is that you might accidentally be throwing things into the pot that are caused by the variable of interest. And that actually might make your model less accurate rather than making it more accurate.

STUART: There are ways of trying to map these things out. There's a whole world of epidemiology of directed acyclic graphs, which are just these just kind of lane and arrow diagrams, where you're trying to put down all your assumptions. You're trying to actually show all how the variables relate to each other and what you are implicitly saying causes what in your model. And when you do that, you often realize, I don't know whether variable x causes variable y, whether it's the other way around, whether they reciprocally cause each other at different points in people's lifetimes. So you can imagine education and intelligence, for instance, I totally accept that intelligence causes education. But also, I think that education likely causes intelligence to some extent, as well. So there's some evidence on the improvement data people get when they're in school for longer. So taking that into account, when you're just doing a standard snapshot, cross sectional study is almost impossible. And sometimes just mapping out your assumptions can be really helpful to say, actually, we have to be a lot more circumspect about what we take out of this, because we don't actually know the causal structure of the data. And we're making a lot of assumptions, even though they didn't seem obvious to us initially.

SPENCER: Yeah, this relates to the idea of credit factor that is sometimes talked about in psychology where almost all variables correlate with almost all other variables, at least to a small degree. So you actually have this massive web of causal structures going on, and everything sort of related to everything else. And it's really hard to untangle that web. That is what makes randomized control trials so beautiful. If you randomize people who say, "Hey, this group, we're going to have them get ivermectin and that group, they're not going to get ivermectin," and we're randomizing who's in which group. Then on average, there's no difference in that group, because you're randomizing who's in there. And that lets you really say whether ivermectin causes the difference in outcome.

STUART: We're taught from day one of statistics class that randomized control trials are the gold standard. It's easy to forget how powerful randomization is and how important it is because it's just kind of drummed into you. "Oh, yeah, randomization, that's what you want to do." But it's a really remarkable thing that you can use and all the kinds of clever ways that people have of trying to introduce random factors into experiments, into things like natural experiments. Using observational datasets to try and get access to random factors and there's ways of doing that using genetics. There's ways of doing that using random events that happen and kind of economic datasets like the oil price changes or a new law is enacted. It happens at random across the country, for instance. These are super important things. And that's why economists are so desperate to hold on to these random events and try to use them in their experiments. But the basic randomized control trial is such a beautiful thing. And I think we underestimate and we underrate just how important it is.

SPENCER: Right. To the listeners who haven't heard of that idea before, essentially, instead of a scientist randomizing things, the world has randomized things, right. So maybe politicians roll out some new policy, but it doesn't apply to everyone, it's random who it applies to. You could use that as though it were an experiment, even though it was not done by a scientist. But a scientist could then take that data and treat it like an experiment. And there's these kinds of things that are quasi-random, where they're not truly perfectly random, but they are random enough that let you make similar kinds of inferences.

STUART: One really nice example of that is — just to back up what I just said a minute ago about education causing intelligence — there's a really good study that was published in 2012 from Norway. In Norway, in the 1960s, the government added an extra year, or sometimes a couple of years, of education to the kind of compulsory schooling that happened in the country. They go to some lengths in the paper to show that this is done randomly with respect to the kind of county or municipality level in Norway. So it didn't happen in all the rich places first, and then the poor places later, or vice versa. It happened fairly randomly. Another kind of nice thing from an experimental perspective is that all the men in Norway do their military service. So, they do a whole bunch of tests when they go into the military including their IQ. So you're able to use this random change, where at the age of 18-19 to, I think might be the latest 20 or something, when they do their military service, some of these people had randomly been forced to stay in school for longer than they otherwise would have, and some hadn't. So, you're able to test the difference in the IQs, between the people who had been forced to stay in school for longer, and people who were able to leave as early as they possibly could. And they found that there was no effect. So, they found that the people who had been kind of forced by this policy to stay at school for longer had a slightly higher intelligence score on average. There are complications to that. So, there's this thing that economists call the local average treatment effect, which is where there are people who would have stayed in school for longer anyway, right? So there are people who would have been forced to stay in school for longer by the law. And then there are people who, say, you're leaving school at 16, and this new law made you stay in school till you're 18. Well, there are people who are going to stay in school till 18 anyway, right? So regardless of whether the law was enacted, they will stay in the school for longer. So, you can't really say that they have been affected by the experiment. Just to come back to your point, this is where the natural experiment isn't quite as good as a proper randomized controlled trial, that we as humans have randomized. Because you have all these issues of, as I say, like local average treatment effects, and so on. So really, it's only the people who would otherwise not have stayed in school for longer that you're really learning about. So as long as some portion of the population that this can apply to but it's still a beautiful thing to happen. And there are all these historical events, where economists and psychologists and so on jumped on these natural experiments to try and learn something when you couldn't really do a randomized control trial ethically on this kind of stuff. Maybe not ethically, maybe it would be ethical in the long run to do it, but you couldn't probably get people on board to sign up; your kid is going to be forced to leave school early versus your kid is going to be forced to stay in school for longer.

[promo]

SPENCER: One thing I wonder about with ivermectin in particular, is why this [inaudible] looks like they do. So if you actually had something that had no effect, it was completely useless, you might find that the initial studies would show an effect due to lots of biases in the studies or the things aren't that rigorous and so on. But then you would expect that other research teams that are doing more rigorous studies with no ax to grind would consistently find zero effects, right? But my sense is, with ivermectin, there are still quite a few studies that do show some effect. And it makes me wonder whether maybe what's going on here is that ivermectin just helps a little bit. And because it helps just a little bit, you will get a smattering of studies that will find some facts, even as they got more rigorous. Or maybe you think that evidence really shows that all the good studies don't show in fact, and it's not true that they're still finding a smattering of positive effects.

STUART: That's a really interesting question. I think someone can know. And there's still studies going on, by the way. My friend just got COVID a couple of days ago and she got an invitation from a study at Oxford University that is doing a randomized trial of ivermectin right now. So, there's still research going on, I guess, and you have to take into account the different variants of COVID and all the other kinds of irrelevant stuff. But I think someone's going to be able to take a step back and do some kind of meta scientific research on this exact question. I think it's totally fascinating. It's completely fascinating both from a sociological perspective and trying to understand how the dynamics of stuff on the internet work and how these kinds of beliefs spread through networks. And they'd be the first to say that a lot of the anti-ivermectin and stuff was completely over the top as well. The horse paste stuff and making up stories of people who had taken ivermectin overdose and died and look into it, it never happened. All this kind of stuff. It became a political issue, right? So you got the usual stuff, which comes alongside a political issue, which is, people make stuff up. Even on your side, people are making stuff up all the time. By the way, I think that's like when I was doing a critical thinking class for young people. Everyday, just on repeat, people make stuff up, even people you like. People are always constantly making stuff up, even people you like. That's the bottom line for critical thinking for me.

SPENCER: But I assume you mean that not like people are lying, more like people are self-deceiving?

STUART: Well, I think a lot of people are lying too. I mean, look at the fraudulent studies, they're like a lot of the horse paste stuff. Those reports of people who took overdoses and died and wherever, they were originally made up, right? They didn't just spring from someone's misunderstanding. There were some made up stories — which I can't remember all the details of — but there were some made up stories that spread around and everyone was jumping on it, because ivermectin was a bad thing to believe. You've got to trace these studies back to their original source and it turned out there wasn't one, in this case. I do think there's more nefarious deliberate making stuff up in science and elsewhere than we would want to believe.

SPENCER: Yeah, I think ivermectin so interestingly, to me, in particular, not because of the biology or the medical aspects or anything like that. But just because it seems like a case of everyone going insane. Some people going insane being like this is 100% cure.

STUART: Yeah.

SPENCER: Pretty much. Some people go insane being like, people are taking horse tranquilizers and killing themselves, right? And the reality is, it's a boring drug that maybe helps a little bit [laughs]. Yes, humans have been taking ivermectin for a really long time, calling it a horse tranquilizer? Yes, it's true. It's given to horses sometimes. But I don't know, it just seems silly in all directions. Yeah, I just think it's an example of science getting maximally politicized.

STUART: Oh yeah, absolutely. Even if there is an effect of ivermectin, eventually after all these studies, we come back and we say, "Actually, yes, there is an effect." You've got to line that up with what the claims are being made. People were saying that it was 100% effective, a perfect medicine, all sorts of stuff. We're talking about people like Brett Weinstein, who have these massive platforms, and are supposed to be rational. He became famous because he was responding to a kind of irrational attack on him at university by student protests and so on. Yet, he's making these outrageously strong claims that it definitely works 100%, and all this kind of stuff. So, even if it did turn out there was a small effect, it was still a gross overstatement by the proponents. And even setting up all these organizations to urge medical systems and hospitals and whatever, to use ivermectin. There was just no basis for that at the time, or at least the basis of it was extremely flawed. I don't want to make the mistake that has been criticized on Slate Star Codex by saying there's no evidence. I always have to go back to myself and say it was "no good evidence." Overall, the evidence did not point in that direction.

SPENCER: Yeah, that phrase, "no evidence," I think [laughs] it's a bit of a pet peeve of mine, when scientists will say there's no evidence for a thing. You're like, "No, actually, there's a lot of evidence." There's just not a randomized control trial, right? There's a difference there. What you mean is there's not a randomized controlled trial, but a lot of decisions in our life we have to make without [chuckles] randomized controlled trial.

STUART: Totally.

SPENCER: When Scott of Astral Codex Ten/Slate Star Codex, when he analyzed ivermectin, he kind of goes through all the different studies and the problems with them. You realize just how freaking complicated it is. There's so many studies, and the question of which ones to throw away and which ones are fraudulent? Which ones were not fraudulent, but badly designed? And some side effects and some don't. And he kind of concludes that, yeah, maybe what's happening is it helps with parasitic worm infections. Maybe that actually does help people with COVID in areas where they're infected with worms. So yeah, just the whole thing is super complicated.

STUART: I believe a study came out on meta-analysis that backs that up. I've not read it so I don't know if it's any good. But I believe there was a meta-analysis that came to that similar conclusion that if you stratify the data by countries, where there's parasitic worms and not, then ivermectin seem to have more effect in those countries. But I don't know, it would be fun if that was true, I think, but I don't know how high quality it is. And of course, just because a meta-analysis exists, does not mean that the studies in it are good quality.

SPENCER: I'm just waiting for the meta meta-analysis that will roll up all the meta-analyses.

STUART: Sometimes I've seen a few of them, meta-analyses of meta-analyses. Yeah.

SPENCER: This reminds me a little bit of the case of power posing. So people may be aware of this idea that if you hold certain postures they can have these positive effects. It can make you feel more powerful and increase your mood. Maybe it can also increase your risk taking and change your cortisol levels. The original study, after becoming an extremely popular TED Talk, came under serious fire from scientists saying actually the study was flawed. And that kind of swung things to the other direction where people were like, "Power posing is bullshit," and it became sort of symbolic of something that's coming out of social sciences bullshit. But the funny thing is, I wasn't actually sure it was bullshit, because if you look carefully at the replication studies — while they don't tend to replicate things, like the cortisol effects or risk taking — half of them did replicate the feelings of power. People felt more powerful and had a higher mood. So I was like, that's strange because that's not what you'd expect from something that doesn't work at all, right? If the thing doesn't work at all, you get very few percent of the studies finding an effect just by chance, but that's not what was happening.

STUART: I should add, one of the scientists that criticized the original study was the first author of that study. Dana Carney at Berkeley, I think, came out and said, "You know, I don't believe this stuff anymore. Our study was not very good," which I think is an amazing example of scientific integrity. She came out and said, "We had 42 people in that experiment. We ran tons and tons of different measures, and then only published the ones that were statistically significant and so on." She wasn't the one that made it famous, that was Amy Cuddy. She wasn't the one that had the best selling book, and I believe is the second most watched TED talk of all time. But yeah, you're right, the subjective power effect does seem to replicate. However, I believe that there is a kind of a deeper criticism, which is that, is it about having an expansive power posing posture versus nothing? Or actually is it that the control condition which is where people were slouched and hunched over, is it not being slouched? Or is it standing up in a power pose? And there's some discussion about whether it actually might be due to not being slouched. I believe Marcus Credé has critiqued it on the basis of an interpretational thing rather than all the studies don't point in the same direction. He's actually saying, "Well, even if they did, let's just take a step back, and what do we actually learn from these control conditions?"

SPENCER: Yeah, so we actually ran a really big replication of this. I think it was the largest one ever conducted. It was like 800 people or a thousand people. We had them take different postures, including contracted postures, neutral postures, and power posing postures. And we actually pre-tested different power posing postures to see if we can find one that works better, to try to give it the best shot. Funnily enough, we replicated that power posing work [laughs]. And the contractive postures, they seem to actually depressed people's mood somewhat. Then the neutral postures were kind of in the middle. But the thing about it is, the effect sizes are tiny, right? Really small effect sizes. And this is why I bring this up as an example related to ivermectin, which is that there's this thing that can happen in science. I think this is a recurring pattern, where something comes out, people get really, really excited about it. There are all these claims about its amazing benefits. And then there's a backlash against it, and it gets a terrible name. But the backlash might even go too far. Maybe the truth is it just works a little bit. It's not that exciting. And this is my suspicion. This could be true for ivermectin. We'll see the final accounting of the evidence. but also, I think it's what's true power posing, like, yeah, it makes you feel slightly more powerful. That's a neat little trick, but not life changing.

STUART: There's a similar story for quite a few other big, psychology books. It'd be published. The classic example that springs to mind is mindset, like growth mindset, which was originally written up as if it was literally life changing. Your kids' lives are gonna change, your life is gonna change, your relationship with your partner will be different if you have a growth mindset. They were publishing papers in science, like the second best scientific journal in the world, on how growth mindsets could solve the Israel-Palestine peace process.

SPENCER: That's pretty intense [laughs].

STUART: Enormous claims being made about growth mindset. They had a couple of experiments, which are like, p = 0.04 and all that kind of thing. And then there was the backlash. This obviously became super popular in schools. I assume it's similar in the US, but in the UK, almost every school has a growth mindset strategy. That book is on the shelf in every school. Mindset by Carol Dweck is everywhere. It's a massive, massive, massive thing. And then the actual evidence is, it's not that it's completely wrong. When you do the experiments, and randomize kids to getting the intervention that gives them more of a growth mindset versus gives them more of a fixed mindset. The kids that get the growth mindset do a little bit better, but it really is a little bit. It's a small amount better compared to what they would otherwise have done. And it just doesn't justify the enormous claims that were being made, life-changing claims are being made. Actually, I think Carol Dweck, in the last few years has kind of rolled back a little bit on the size of the claims she's made and brought them slightly more in sync with the evidence, but it's exactly the same story as the power posing. And if it turns out ivermectin works, it will be the same story, which is making those massively outrageous claims is just such a hostage to fortune. You're going to look bad even if the thing that you're advocating for does work. You're still going to look bad. You're going to make people more suspicious of it. So trying to be circumspect is probably quite good.

SPENCER: Maybe this is the hype cycle in science like in technology, where you have some new technology come out, people get way over excited, then the whole thing collapses [chuckles]. But then out of that collapse, emerges some really good stuff, sometimes, not always, but sometimes. We saw this with the internet in the late 1990s. People were just over the moon about the internet. Every internet stock goes up to a billion dollars (and even though they are making no revenue) then the whole thing collapses. But then, okay, the internet is actually a really good thing, right? It's not always the case. Sometimes the stuff is just fraud or bullshit. But yeah, maybe this is just some deeper truth about human nature where we have this hype then contraction kind of phenomenon.

STUART: Yeah, I think it's the decline effect again. I think maybe it's time to start talking again about the decline effect after 10 years of that guy that turned out to be a plagiarist. Maybe it's time to resurrect, and maybe I'll write a Substack on that.

SPENCER: Okay, so another controversial science topic that I want to go into with you is IQ. And you've pointed out how this is such a controversial topic, but maybe it shouldn't be. So yeah, love to hear some of your thoughts on that.

STUART: Yeah, we've been talking about areas of science where people are too open minded. They're accepting things when the evidence really isn't there. And that's a big part of the replication crisis in psychology and having quite low standards for what we accept as evidence in many areas. Obviously, that can be informed by our political biases. And it can be informed by our biases just towards wanting results in our experiments, and so on. But I think there's also areas where people are too skeptical. And that can also come from political biases, or just received wisdom. One of those areas, and this is why I find it so fascinating, is Intelligence Research, is IQ research. There's lots of different reasons why it's controversial, some of which are completely fair, and some of which are unfair. And I think you also see these massive, overblown claims when people talk about this, like "Oh, that's been completely debunked." IQ testing will tell you how good you are at doing IQ tests. This is all just pseudoscience and so on. People have asked me, when they see my book, "Science Fictions", which is obviously about bad science and stuff, and then they say, "But wait a minute, you do IQ research? Isn't that the most fictional science of all?" So there's this kind of weird disconnection, I think, possibly more in the US than in the UK, where it's slightly less controversial. But obviously, there are just these cultural differences. But yeah, I am. I think, given the state of the evidence, if you just look at what the evidence is saying, people who say that the classic claim that IQ tests only tell you how good you are at doing IQ tests is just completely wrong. It's just totally wrong, misleading in every possible respect. And anyone who honestly looks at the literature that's been published, would not hold that view. Now, you might hold all sorts of views about whether we should use IQ tests, like what the practical uses would be and what the effects are of using them in job selection, and schools and whatever else like talk completely. We can have all these conversations. And certainly when it comes to talking about group differences, or talking about whether it relates to genetics, and all that stuff, we need to be super careful about how we talk about these things, because these things can be misleading. But to deny that IQ correlates with important things in your life and is predictive at an early age of how people are going to do to some useful extent, I think, given the state of the evidence, it is quite perverse.

SPENCER: Yeah. So what do you think is really going on?

STUART: So I think a huge amount of it is conformity. So I think people get told by authority figures — Stephen Jay Gould is super popular in the US as a science educator and educated a whole generation of scientists — I think, in many ways, very well, if you read "The Mismeasure of Man," which is his book about intelligence testing, the history of scientific racism, and all that stuff. There's loads of good stuff in there that's well worth knowing. And actually really good critiques of some of the early scientific attempts to investigate intelligence and some of the biases of the people who were proponents of IQ at the time. And also proponents of things like eugenics and sterilization, and all that stuff. So, it's well worth criticizing. But I think the general impression he gave in that book was that all IQ testing was worthless. Any attempt to link it to the brain was not interesting. And I think there's an entire generation who have just, through conformity, just kind of absorbed that view. I think IQ tests are generally seen as being discordant with the ideas of equality that we all want to have, that everyone is equal, and everyone has equal potential and so on. So people are scared to say, "Well, actually there are some people who are more intelligent (however you define it, or if we define it on this IQ test) than others." I also think, by the way, we should be self-critical. There are lots of scientists who are really interested in IQ who have not done the field any favor because they have overstayed it. They are themselves politically biased, possibly, or definitely, in some cases, who have gone out and said outrageously over the top things about how important IQ is and are themselves probably quite racist and probably sexist, and all this kind of stuff. So I think that we have to be self-critical of the field, and I've tried my best when talking about IQ to chart a moderate course. On one side, people saying just ignore this stuff, it's holding the nonsense. On the other side, people say it's the most important thing, and it's the only thing we should talk about and so on. Try it. I've tried to chart a moderate course of just talking about what the evidence actually shows. And it's obviously an incredibly difficult thing to talk about, without getting into trouble in any field. But I think people aren't aware that there's loads of studies in mainstream journals that are high quality, large sample sizes, sometimes entire population samples, like millions of people that use these measures. They're routinely used in all educational settings, neuropsychological settings, and so on. So they're not as controversial as they appear in the public eye.

SPENCER: One thing that really confuses me about people's relationship to thinking about IQ is I think almost everyone agrees that they know some people that are smarter than other people, right? And if you think of IQ as just an attempt to quantify that thing, then it seems like most people would be willing to say, "Oh, yeah, some people are smarter than other people. I can tell that. I know some people that I think are extra smart compared to others." But maybe where some of the resistance comes in is whether that particular number is quantifying the thing that they observed in the world. So do you have a comment on that?

STUART: There's a few different things to say about that. I think, first of all, academics, who are critical of IQ, are often responding to a restriction of range problems, I would say which. By which, I mean, the types of people who academics encounter are all pretty smart, right? There's been this really extreme selection on people going to university, having pretty high IQs, they weren't on average. You're selecting them on their results from school, sometimes you're selecting them on actual tests that are given. So the actual range of IQs that you see is quite small and smaller individual differences can make a big difference there. So there are some people who really are genuinely amazingly good at math compared to their writing skills, English writing skills. And so I think you can confuse yourself about IQ by just not encountering people who are just lower down the the other end of the spectrum. So that's one thing. But I also think people have cultural stuff, from observing famous people and so on. They have this idea that there isn't just one thing that you can be intelligent in lots of different ways. And it's been helped along by the kind of cultural power of things like multiple intelligences, for instance. I think people have this conception that IQ is an outdated measure. We know better now that there's actually loads of different skills that you'll have there that can be classed as intelligence. What hasn't really got across is that on average, these are correlated together. People who are better at one thing are better than all. People who are better at even basic tasks, like pressing a button when the light goes on, are more likely to have a higher vocabulary than the average person and are more likely to have a better memory and are more likely to be better at doing puzzles, f reasoning, visual spatial reasoning puzzles. So I think that's a weirdly counterintuitive thing for a lot of people who like to think of, this person has this skill, this person has this skill. And there couldn't really be a world where that person could have gone on a slightly different path and use their skills to just be good at a different thing, to specialize in a different thing, when actually, that's probably the case that people get. They go down that kind of train track of one particular skill, and they specialize in that, and that becomes their thing. But had historical contingencies in their life being different, they would have done something else. People who are generally intelligent can probably apply their skills in lots of different areas and specialize in lots of different things, or could have potentially specialized in lots of different things. So I think people's life experiences kind of slightly doesn't quite fit alongside the IQ thing, just because they're not seeing every possibility that could have happened in the kind of multiverse of possibilities.

SPENCER: Well, if I imagine someone with 100 IQ, who spent 10,000 hours playing chess versus someone with 140 IQ, who just learned the rules 10 minutes ago, I'm definitely gonna bet on the 100 IQ person beating the 140 IQ person. There's a real skill of chess that you learn over time. IQ helps you learn it faster, maybe it could help you get to a higher level, ultimately. But it's not going to be a replacement for skill, right? You're going to have to practice. And so I'm curious to hear your thoughts on the relationship between IQ and skill building and how much they interrelate.

STUART: Yeah. I think what you said there about learning it faster is the key here. And I think, again, something people don't quite think about in the right way is that if you put someone with 100 IQ and some with 140 IQ doing the same task, the one that's with the 140 IQ would potentially learn quicker to do that task and get to the level, whatever level you want to get to, being able to defeat a person with an Elo score of whatever. They would get there quicker, they would be able to do it quicker. On average, there are obviously going to be other differences, but maybe the other person is more conscientious or something that kind of compensates for their lower cognitive abilities. But yeah, that kind of skill acquisition thing is a matter of speed of acquisition. That's what this is all about. It's meant to be your potential to pick up on things and learn things and learn new skills. So the combination of having that skill learning and a high IQ is going to take you places a lot quicker. But it doesn't mean that you can't learn those skills if you have a lower IQ, it's just, it'll take you longer.

SPENCER: I wonder if that framing makes the idea that more palatable. It doesn't prevent you, absolutely, from learning something. It may just be that if you have a lower IQ, it may be a barrier and higher IQ may get you there faster. But another question related to this is peak performance. Do you think that a limit on your IQ will set some peak performance for people? For example, could someone with a 100 IQ become the world champion of chess? Or are they going to eventually cap out and their IQ is gonna become a limitation?

STUART: I think with something like chess, yeah. I think there's going to be a cap to that. The power of being able to be however many moves ahead, and so on, have this kind of working memory thing, where you're thinking ahead of your opponent and thinking ahead of where you are, and all that sort of stuff. That's a lot of cognitive load and I think some people just are gonna hit a ceiling as to how much cognitive load they can have in terms of planning the moves. Also, in terms of just being able to draw from the reservoir of historical games of chess that you can compare your game to, and moves that are made, and so on. Not a chess expert, by the way, but I believe there's a lot of historical moves that come into it, and so on. So I think there will be a cap probably at certain levels. But I think there's lots of things that we do in our daily life that can be compensated for by other skills. So I've met people, for instance, who I don't think are super bright in the sense that they are going to be the next Stephen Hawking or something, but they're bright enough. And they're extremely conscientious. They go on to get a PhD, right? So because they have their life organized in such an amazingly productive way, that it can make up for the fact that it takes a bit longer to learn things, and that they're a bit slower to catch on. And maybe it takes them longer to learn how to code or how to set up an experiment, or to really read and digest and absorb a scientific paper, or whatever it is. I've seen this multiple times that people with very high conscientiousness are making up for the fact that they have perhaps a lower cognitive horsepower, or whatever you want to call it. This has made me wish very strongly that I would have a higher conscientiousness because I can see how it really works for them. And kind of acts as a sort of a scaffolding on what they do. They're setting up times when they'll do stuff and setting up a calendar for the day with all these really organized times when they're working, in times they're not and all that. They're taking copious notes in meetings, they're not wasting time doing kind of frivolous tasks, putting away their time and all that stuff. That level of conscientiousness is just something which a lot of us should aspire to. So what I'm trying to say is, people find the idea of IQ dispersing a lot of time, but I think there are other factors that are important as well. Another reason that IQ is controversial is because the impression has been given that it's the be all and end all. You're screwed if you don't have IQ, you're never going to make it in these high prestige areas like academia or science or whatever. But I think to some extent, you can make up for it with other personality traits. And I can imagine similar stories for people who have very high charisma, high extraversion, and all that sort of stuff that they can make up for. That they can kind of compensate for that by having different arrangements of personality that also helped you get on.

SPENCER: I think there are other factors too, even beyond personality. If you think of the skill of building mental models, or the skill of learning how to learn, things like these, these can be accelerants. Imagine one person who's trying to get good at chess, and they go, and they learn about what are the most effective ways to learn chess? And what are the most effective ways to practice? That's gonna get you there a lot faster than someone who just makes up their own thing as they go along and is very unsystematic about it. And so I just think there's many things we can do to accelerate our own abilities, even if we can't change our IQ.

STUART: Yeah, that's a kind of a lightbulb moment when you realize something. I remember as a kid playing video games and I never once thought about the kind of the meta, right? What is the way to break this game to be the most effective? Really thinking about what the rules of the game are, and then working out how to maximize your success, knowing what those rules are, right? But at some point, in adolescence, however, it clicks that games have these rules. That they're just these setups that if you know what the rules are, you can arrange your game in whatever way to take advantage of them and be much, much better than you otherwise would. Similarly, when you're doing an IQ test, I think if you sit people down and tell — and there's actually a study on this that came out a couple of years ago — you can boost people's IQ on something like the Raven's Matrices, which in case people don't know, is where you're shown a series of little shapes and patterns of different, sometimes they're colored, sometimes they're not. And you have to choose from among different options for the next one in the sequence. So it's the classic kind of IQ puzzle thing you'd have seen. If you just sit people down and say, "By the way, there is a rule here," or you show them an example of what the rule might look like. The little dots, if they are there twice, then they disappear in the third panel. But if they're only there once, then they remain in the third panel. There are these rules, if you just tell people at the meta level, there's a rule here that you have to follow, you're basically telling them how to do the test, if not the actual answers to the test. They get much higher scores when you do it. So you can level out the playing field in some respect by telling you how to do it. All I'm doing is echoing what you said about learning how to learn being a super important thing as well. There's loads of teachers in schools who can be given all these learning strategies that we know, from cognitive psychology, really work. And they should be giving their students those tools to learn more efficiently, regardless of what level of intelligence — or whatever you want to call it — they are.

SPENCER: We're actually preparing a really big replication study now on IQ and intelligence, where we've implemented about 60 different intelligence tasks. And we have over 50 different claims from the intelligence literature that we're going to attempt to replicate in one giant study. So I'm really curious to see how this is gonna pan out. I suspect that we're going to replicate a bunch of the claims. But I'd also be fascinated if we didn't. [laughs] So it would also be interesting.

STUART: It's often said that it's the soft, weak areas of psychology, like social psychology that don't replicate. But areas like IQ research definitely do replicate, like, let's test it. And I think you're doing a great thing there.

SPENCER: Yeah, so I'm really interested to see how this is gonna pan out. But, I suspect that you and I disagree about something regarding IQ. And I want to try to see if we can pinpoint this point of disagreement if we do disagree.

STUART: Okay.

SPENCER: So as a mathematician, the way I think about IQ is very simply that it's the claim that there is a single number you can write down about a person, that then predicts many life outcomes. Additionally, this number can be calculated by giving people a bunch of random intelligence tests. It doesn't really matter very much which intelligence tests they are, as long as they're just a bunch of different ones. You can calculate the same number from them. So if I give people Raven's Progressive Matrices, or vocab test or reaction time tests, or any kind of random combination of intelligence, I get pretty much the same number. And then that number predicts life outcomes. So, do you accept that framing the fundamental claim about IQ?

STUART: I would finesse it slightly. I think it's not that the different measures correlate together. But on a scale of -1 to 1, they may only correlate like 0.3 - 0.4 together. So what I would say is that the single number that you extract from a wide variety of tests. And this is what Charles Spearman, who was one of the initial IQ psychology people at the very start of the 20th century talked about. His phrase was the indifference of the indicator, which means that you can basically measure IQ with any kind of cognitive test to some extent, but to get a good measure of what we would call general intelligence, which is the kind of the thing that all the IQ tests have in common, you got to give a wide variety of tests. So you got to get the Raven's Matrices test and the speed test and the vocab test and the memory test. You got to give them all. Take what they have in common. Then I would agree with the claim that the thing that you take, that's in common from those will be almost the same, regardless of what the tests in the battery work. So test and battery one, you take the factor out, the general factor, correlate that with the different but still varied tests and battery two and you'll get a very high correlation. There's two papers that I can point to that show exactly that.

SPENCER: Yeah, I agree. And that is exactly what I was intending to say. I obviously didn't say very clearly. You do need a bunch of different tests to get the score, but it doesn't really matter what they are, right?

STUART: Okay, we do agree on that aspect, so far.

SPENCER: We agree on that aspect. So then we can kind of boil it down to these two claims. One is that you can get this number using any of a variety of different tests as long as you use a bunch of tests, right? And the second claim is that, that number then predicts a whole bunch of life outcomes. And I expect that that will replicate. I think, most likely we're going to replicate that. But here's what maybe you and I disagree on. We'll see. I highly suspect that there is important other information beyond that one number if you extract it the right way. The reason I think this is because of a few things. One, if you look at people with some of the highest IQs in the world, they do not seem to be the most intelligent, at least how I would define intelligence. And I've gone and looked at a bunch of these people's lives and tried to understand what they're doing. So, it seems like if you're just based on that one number, there's something, some really important element you're missing. Two, another piece of evidence I would point to is that I know people who are really, really smart, but they just seem to be crazily bad at certain sub-intelligence tasks, where they're maybe even below average at them, even though they're sure phenomenal thinkers in many different ways. So, it seems to me that there has to be a hidden signal beyond that one number. So yeah, curious to hear your thoughts on that.

STUART: Well, again, I think if you go back to Charles Spearman, he talks not just about the general factor, which was the kind of the new claim in his works, which is why people have focused on that. But he also talked about the S Factor, the specific factor. So, I could totally see how, in a specific profession, that it might be really good to be particularly good at mathematical thinking, for instance. So, if you have specific skills in arithmetical thinking or other kinds of abstract mathematical thinking. I can totally see how the extra information that you would gain from just looking at those specific tests, or even the domain level. So sometimes we talk about general intelligence, and then there's broader domains of verbal ability and mathematical ability and spatial ability and whatever else. Then we talk about the individual tests themselves. Maybe taking information from the domain level can predict, but I think it would be a more specific prediction. So it would be that being particularly good at math wouldn't help you in lots of professions, whereas it would help in professions like math or physics or some other kind of creative profession where those kinds of thinking skills were really relevant. So I don't disagree that there's extra information. It's just that when you're talking to employers, for instance, even in the very famous meta-analysis of IQ tests, predicting job outcomes — which there was a very famous one in 1988, and it's been updated in various ways ever since then, even then the broad finding was that — the best overall predictor of how people were going to do in their jobs is what they call general mental ability, like general intelligence. But a better thing for specific jobs is to just get people to do the task that they're going to do in their job. So actually give them a specific test of whether they're doing accounts or whether they're doing some kind of scientific analysis. Whatever it is, get them to do that specifically.

SPENCER: Isn't IQ an incremental predictor to some degree on top of that, though?

STUART: Sure, absolutely. But the claim is that if you only had one test to do, then the best thing to do would just give people a test of the actual thing they're doing.

SPENCER: Right.

STUART: But that's often quite impractical and difficult. And people's jobs can often be reduced to like a 30-minute task. So that's why having the IQ, the general mental ability thing, was kind of the best outcome in those. So yeah, I don't think we necessarily disagree. I mean, the question of them being really bad at certain tests is interesting, and I wonder why that is. Because generally, you would expect if someone's at a very, very, very, very high level that they would be at a high level. And anything outside of things like specific learning disabilities, like dyslexia, for instance, or at least one of the ways you can define dyslexia is that someone's IQ is in the generally normal range. But they score very, very low on reading tests, and they seem to have that specific problem. Same with dyscalculia, or some of the other specific language disabilities and things like that. So outside of those things, it is surprising. And I don't have an answer to why an individual person would be bad at a particular test. Aside from what I just said a few moments ago about them not being told how the test works, they just misunderstood it. And if they were told how the test works, then they could apply their general abilities to doing better on that test.

SPENCER: To me, there's a bit of a mysterious thing here, which is, okay, so we have this very empirical claim about how you can compute this number and use it to predict things. But what is the right philosophical interpretation of this number? And there's a bunch of different theories one could have about this right? One theory would be, you could think of IQ, like the hardware of a computer. So you can have a really amazing computer with great hardware, or you can have a crappy computer. But there's an additional thing, which is what's the software running on it? So you could have a person, you could think of IQ as the hardware of the brain, I don't know what it would actually correspond to. Neuroanatomically, it could be how fast neurons fire or something like this. And it's like well, you could have really good hardware. But in practice, you could still have someone that isn't running very good software on it. They don't have good thinking skills, or they haven't used their brain very much or whatever. That's one possibility. Right? Another possibility that my friend Sam Rosen brought up is, how do we know for sure that IQ isn't something like strength? As Sam puts it, we can definitely say that some people are stronger than others, right? Nobody would disagree with that. Furthermore, if someone's strong, they tend to be strong in multiple ways, right? It's theoretically possible, someone could have incredibly weak legs and really strong arms like this has happened. But usually, they're all these things in real life that link these things together. Where if someone's strong in their arms, they are probably strong in their legs, part of it is genetic, part of it is because maybe they're an athlete. So how do we know IQs are not like that? There's a whole bunch of correlations, but they're not inherently causally connected.

STUART: That, I think, is the other interpretation of the general factor of intelligence. If you go back to Charles Spearman, again, his interpretation was that there is this thing called IQ in your brain that is your mental horsepower. He talks about mental energy. I think it was the phrase he used, a nice use of the word energy to me. I don't really know what it is, which is what people end up kind of New Age thinking talking about. But the alternative to that was Godfrey Thompson's theory. Sir Godfrey Thompson was a Scottish psychologist in the first part of the 20th century. Largely unknown now, even though he was very famous at the time, he put forward what he called as the Bond Theory of Intelligence. I'm not sure it's a very descriptive name, but the idea is that you have mental abilities, 'A', 'B', and 'C', and that's actually in your brain. You have those different abilities, and they can be completely uncorrelated. Yet, when you do a task, when you do some kind of intelligence task, it involves a little bit of ability 'A'. Maybe ability is memory so it involves a little bit of memory. But it also involves a little bit of variable skills, and that's ability 'B'. But in your brain, those are uncorrelated. But when we do the task, because we don't have an intelligence test that completely isolates accountability, now, people write that way, as if this is a memory task. But a memory task involves a little bit of variable stuff as well. And it can often involve some mathematical stuff, too. So there's all these different mental abilities being used. So, Thompson's idea was that you could potentially be being fooled by the performance on the test, because it could be using uncorrelated abilities, but it could appear that they're correlated when you do the test. And the data doesn't speak to either the G theory, which is general intelligence, and that causes the test to be correlated together. Or this Bond theory where there's all these uncorrelated abilities and then it appears that the correlates either tests between those. You can't actually see the difference between those theories in the data that we have currently. Until we have a much better idea of what's actually happening in the brain, we won't be able to answer this question. And there's been a few more modern interpretations of this theory, which I encourage people to look at if they're interested in this question. There was the Mutualism theory, which is really about how these correlations develop as we mature. And then there's Process Overlap Theory, which is a more recent one, which I would say is a restatement really of the Bonds theory, an updating of the Bonds theory. And I think it has a few things that it doesn't answer, but I think we're in a state where we don't really understand. We know that these tests correlate positively together, like that is as replicable as you could ever get. You will not feel like replicating that. I would bet large amounts of money on you not feeling to replicate that. But what that means is that they correlate puzzles together. We're really no further forward in understanding that, and then we were at the start of the 20th century. And it's because to answer that question would require just such a more detailed understanding of how the brain is working, and how abilities and skills are instantiated at the level of the brain. And we're just miles away from that right now.

SPENCER: I would add that there could be a third theory, which is maybe more analogous to the strength metaphor I used before, which is that you could have these distinct capacities of the brain that are causally unrelated. You could be really good at one, really bad at another, and so on. But due to circumstances about the way the world works, they become correlated. So an example of this would be the way that with strength, someone who works out their arms probably also works out their legs. They're causally don't have to be related, but because of the way the world works, they do become related. And so the way this could work with intelligence is maybe if there's a lead that's consumed by the mother, that has a negative effect on fetal development, and it affects multiple brain regions, simultaneously, and so that would correlate them. Or another example is maybe someone who spends a lot of time as a child learning and reading, maybe that actually works multiple brain regions. And again, they become correlated. So, you could have things that become correlated, not because they're necessarily correlated, but because of the way that things tend to correlate them in reality.

STUART: It's been a while since I've read it but I think in terms of doing things while you're growing up, that affects multiple different abilities, that might be consistent with the kind of mutualism idea that you're training your memory, and that helps other abilities develop. But that wouldn't have been the case if you hadn't had this environmental input along the way, whether it's reading books or something else. So I think this is a really interesting question. And it's something that psychologists should be trying to work out and that they've expanded a lot of theoretical energy on. There's a famous article called "g, a Statistical Myth" by Cosma Shalizi. Everyone references that all the time. It's the go-to thing to say, "Well, I accuse a lot of nonsense." And he asks this theoretical question about what it means that there's this thing called general intelligence. His idea is that the test has correlated puzzles together. We don't really know why it could be that he restates this Bonds theory that Godfrey Thompson had in the early 20th century. And you get the impression from that, that he's writing off IQ tests. But the thing is, the predictive ability of IQ tests exists, regardless of what the interpretation of the general factor is. There's predictive validity when you just have these tests, and then you just see what predictions you can make. And then trying to understand why that's the case is a different question. And I think it will be practically important once we work out how to test it. But at the moment, we don't really have a way of testing these theories, as far as I can tell,

SPENCER: Why couldn't they be tested though, by designing intelligence tasks that operate on as narrow a brain function as possible? And then if you can design those tasks, and show that those tasks have very low correlations to each other, would there be evidence against the single G factor theory?

STUART: I suppose I might just dispute the premise there or at least highlight the premise there. If you could design intelligence tests that could isolate specific cognitive functions, then you could test the theory. Sure. But good luck designing those tests. It turns out to be extraordinarily difficult to design those very, very simple tests. I think that's actually a good thing for people to do to try and design those tests to isolate the specific cognitive abilities, because that's what people want to do. They want to isolate specific abilities. They want to isolate specific areas of the brain, for instance, when they're doing neuroimaging studies. And a lot of these studies are written up as if they already do that. They're written up as if it's only this particular region of the frontal lobe that's active when you're doing this task. But it's just simply not the case.

SPENCER: Right. Even if you could just do it in a few cases. So, even if we don't know how to isolate most brain functions, if we had a few of them that were isolated, and we show that they had no correlation to each other, that would at least begin to provide evidence for one of the theories, right?

STUART: It's really, really tough when you give someone a task, and you're there in an fMRI scanner. The activation that you observed from that is everywhere, all over the brains, all regions working in concert, really, when we do anything. But when you do an IQ type task in the scanner, I guess you would want to try and isolate down as much as you possibly can. And I'm not aware of anyone really, really trying that to any serious degree.

[promo]

SPENCER: Is your internal experience that your IQ captures the level of your intelligence well? Because it is just funny to me, because to me, my internal experience is that I actually differ a lot in my ability, like different relatively basic tasks. So I'm not talking about skills I practiced, but more like things I can do with my brain. It just seems like for some of them, I can do much better than other people. And some of them I cannot do as well as other people, rather than feeling like it's sort of more uniform.

STUART: But I don't necessarily think it has to be uniform. As I say, the correlations among the tests are not super strong. They're not, again, from -1 to 1. They're not 0.8 - 0.9 kind of thing. The general factor of intelligence explains somewhere between 40 and 50% of the overall variation among all the different tasks that you give people like that. So, you wouldn't expect to kind of internally experience that you just have a uniform level of intelligence across all the different tasks. You would expect to see a good amount of variation. But I think when you meet people who have a very, very wide range of intelligence, you meet people from all different backgrounds, then you start to see that actually, even though there's quite a lot of variance in how I do tasks, I am going to be doing them substantially better than someone who has a much lower overall intelligence and approaching them in just a completely different way that helps me do better across all these different tasks. I think a lot of that comes from just experiencing the full range, which is a lot wider than people think and a lot wider than people experience. I mean, you people assertively meet really strongly for education, that is their partners tend to have the same level of education, for whatever reason, whether that's because they seek them out, whether it's because they meet them at college, or whatever. They tend to have similar levels of education. People's friends tend to be the same kind of level of intelligence. And as I mentioned earlier, academics, their strong selection on the people that they tend to encounter in their everyday jobs, because they have been selected for education. So I think maybe it's worthwhile getting out and having more. I'm not saying that you should get out more. But what I mean is just having more experience of the full range, I think, answers a lot of these questions that your subjective experience can often not quite line up with what the full range of the evidence would show you.

SPENCER: Yes, you mentioned that. It seems like general intelligence or that single number is capturing maybe like 40 or 50% of the variance. Obviously, some of that variance is going to be just noise, right? On a particular day, someone might have needed a question, so we wouldn't expect even the perfect model to capture 100% of the variance. But still, there's a bunch of the variance leftover. And so what do you attribute what's left over to, that's not just noise, but it's also not captured by the IQ score?

STUART: That's the more specific domains that are correlated with general ability. And this is where you get into more complicated models. In some models, we will make them not be correlated, and then see how that works. And people may have them be correlated and see how that works. There are different ways of modeling the same data. But in one conception, the kind of classic conception, is that this is a kind of a hierarchy, or a higher order process where there's G at the top, and there's domains that explain a little bit more variance. There's specific tests, and those specific tests have their own specific abilities that you could potentially vary on. But also, there's just noise. As you say, someone's looking out the window, and they did a test or whatever. So there's always just gonna be this error that you can't get away from. But yeah, the full model of intelligence includes your more specific domains as well. And those are things which you can potentially vary on and also are potentially trainable, and so on, in their own specific ways. I could imagine, potentially, you could increase your verbal ability, if you read the dictionary every day, for a year, as a theoretical thing. In theory, you could do that. Whether anyone would want to do that is another question.

SPENCER: Well, AJ Jacobs did spend quite a while reading the encyclopedia for one of his books. So maybe that comes close.

STUART: Fair enough, fair enough. And then there's the brain training stuff. The working memory training thing that caused a lot of interest. Actually, there was a paper just the other day on that, which claims positive results, which I'm surprised by, because I felt that was not the consensus. But it is interesting to read it, where you can show that you can improve one ability that you can improve, even people talk about neurotransfer, right. So there might even be nuerotransfer. So if you train your working memory by doing all these really difficult working memory tasks over and over and over again, you can get better at those working memory tasks, and also other tasks that involve working memory. So that domain of working memory has been improved. But then they talk about the lack of file transfer, which is that you would struggle to improve your general ability from just training working memory. And that implies lots of interesting theoretical things. It implies that working memory is not the engine of general ability. And the way that some people, particularly in the early 2000s, thought that the real bottleneck of intelligence was your working memory ability. But given that training, your working memory doesn't transfer up to improve intelligence. That doesn't seem to be true, which is interesting, because it's like closing off a theoretical perspective, unless all that research is wrong. And this new study I mentioned is actually true. So that's another thing to look up. But that's an interesting example of where some experimental research can actually have some purchase on a higher level theory of what intelligence is. But yeah, I think the rest of the variation is in the domains and in the specific abilities.

SPENCER: Yeah, what you're mentioning about the domains, I've looked at a handful of different models of different subdomains of intelligence. And there's different ways of cutting it up, right? You've got fluid versus crystallized intelligence. You've got this math versus verbal intelligence, it's different models here. In a way, I feel like there's something very optimistic about this, which is, okay, so we don't really know a way to change your "IQ score." But I think most people agree that any particular domain you're interested in, you can get better at that domain, right? You can go practice it, and sort of like, well, then why do you need to change your "IQ score?" Figure out what you want to get better at and practice that thing, and you'll likely get better at it.

STUART: It could be the case that there's this really interesting idea of G, and it's a really important variable to have in your psychological studies and all this kind of stuff. But we've given it much more of a kind of cultural cachet that it perhaps deserves. And you're quite right, we could easily end up in a situation where people focus more on the specific skills, and I can totally see how maybe if the working memory training really does become useful. What I have never seen is all these claims about people. People are desperate to influence general ability. So they're desperate for the working memory training to go upstream and influence general ability. But does working memory training help you in your everyday life? If it does, then that's great. That's a thing that we should perhaps do. Although, I don't know if you've ever done the dual n back task, which is the task which we've often used to do. Not a pleasant or fun task. Just for anyone who doesn't know, it's the idea that you're looking at a screen and you're seeing numbers or letters come up. And you have to say whether the number or letter you're looking at right now is the same as the one that was N letters back in the sequence. So that's really easy. If it's just two letters ago, you can just remember that. Okay, two letters ago, there was a G, so I'm gonna do that one again. Two letters ago, there was anX, or whatever. But if it becomes like six or seven, it becomes incredibly difficult to hold all those layers in mind, as it's advancing through and you're seeing more and more and more layers. And you're having to make the decision every single time. What led her seven letters back in sequence. And then just to make things worse, you're doing it in an auditory format as well. You're wearing headphones, and someone's speaking numbers out to you. And you have to say whether the number you just heard is the same as the one and back and sequence. That's why it's dual and back. So it's incredibly hard. And it gets worse and worse and worse as you go on. But apparently doing that for long periods of time makes you a little bit better at working memory tasks. So the question is, does that improve your life in any way? I would like to know, I don't think I've seen a study on that.

SPENCER: Yeah. And the area that I'm really interested in is, okay, so you've got this one number IQ. And we know that it predicts a lot of things about a person. About income and education, things like that. It's very rare that a single variable model outperforms a multivariable model, right? So, if we were going to add more variables that are cognitive, about people's cognitive ability, what would be the second variable you'd want to add? Is it seven particular sub-domains of intelligence that actually enables you to make better predictions? And so it seems to me that our intuitive experience when we go around the world is that one number doesn't capture everything. There's about intelligence. But can we formalize that, like what would the second number be? And what would the third number be? That actually lets you predict things that are interesting and important?

STUART: I suppose I would slightly cop out, and I would say that it depends on what you're predicting. And I think for some professions, different variables will make better predictions. So there's evidence for instance, that spatial ability might be particularly relevant to creative professions. So artistic or creative professions. And having that spatial stuff, even regardless of how someone's general intelligence is, having their specific spatial ability might be really useful, but it will be less useful in other, perhaps, less creative jobs. So I think it depends on what you're overall wanting to predict. But there will be an answer to that, overall, regardless of intelligence, like what is the next column is a variable you add in to just predict whether someone earns more money, or whatever it is, or gets a higher degree or whatever. And I don't quite know what the answer to that would be, often, because the studies are really big datasets. And they really only have one or two variables of intelligence. And they merge them together, they get one fluid measure, one crystallized measure, and you take an average of them, and that's your intelligence variable. And then there's nothing really else in the dataset. So because it was done as part of a massive study where you've measured dozens and dozens and dozens of variables. So that can be quite tricky to test in a really big sample, but that's where you'd want it to be.

SPENCER: Stuart, thank you so much. This is a really fascinating conversation.

STUART: Great pleasure. Thanks so much for having me on the podcast.

[outro]

JOSH: It'd be great if everyone could really try hard to seek the truth and not just confirm their own beliefs or support their own team. But aside from automatically somehow magically making everyone have that attitude, if you could just snap your fingers and apply some intervention to things like the internet or to the media ecosystem or something like that, what intervention would you apply?

SPENCER: It's tough to think about off the cuff. But just one thing comes to mind is that if you at least knew when something is being claimed as controversial, that would be useful, right? It's like you're reading an article and it's like, oh, wait, that sentence is highly controversial that they're claiming. At least there's another perspective, a lot of people that disagree with that. I think that would be quite interesting. Just being aware when you're not just getting the straight facts as presented.

JOSH: Do you think Facebook and Twitter's efforts in that direction to flag misinformation for users was a useful step in that direction, or was it missing the mark?

SPENCER: I think it really suffers from a problem, which is that, who are the trustworthy sources of information? You have to make a decision. If you're going to flag misinformation, then you have to decide who is deciding what's misinformation. And I think that these organizations, they want to outsource this to some extent, but then the question is, who do you outsource it to? And part of the problem is that it's very hard to decide who to outsource to. There's not going to just get a bunch of attacks saying, "Oh, those people are political, those people have a bias and so on." Additionally, there's a question of beyond just like the question of, are they unbiased? Are they political? Or will they be accused of that? This was a question of, are they actually giving the correct information? And so you have these really interesting examples where some information that was blocked in social media was actually valid information. And then that's infuriating to people and also just kind of destroys trust in the system. So I don't know, overall, whether it was a good thing. It might have been a good thing overall, but it's hard to say. But I think it just suffers from this extreme difficulty of choosing who the arbiter of truth is, and nobody has a really good answer to that.

JOSH: So your idea is something closer to doing sentiment analysis and saying like, "This statement is very inflammatory or very intense in some particular direction," rather than estimating the truth or falsehood of particular statements?

SPENCER: Well, I'm giving an off the cuff answer, so this may not be a good answer. But what I was proposing is just that it flags, "Hey, there are a lot of people that disagree with this statement." This is not just an objective fact that everyone agrees on. It might be true, it might be false, but it's controversial. And I think one thing that's interesting about that, is it is much easier to flag things that are controversial than it is to say which group is right.

JOSH: Yeah, I'd read it. If a post gets really highly upvoted and downloaded they put a little red cross or x or something next to it to indicate that it's controversial.

SPENCER: Yeah, exactly. It just kind of raises your awareness. Oh, hey, maybe I shouldn't just immediately adopt this and assume it's true because it's coming from a media source I like. Maybe I should give it a second thought because clearly a lot of people disagree with this. And I just need to give it a little bit more skepticism before deciding.

Staff

Spencer Greenberg — Host / Director
Josh Castle — Producer
Ryan Kessler — Audio Engineer
Uri Bram — Factotum
Janaisa Baril — Transcriptionist
Miles Kestran — Marketing

Music

Affiliates

Click here to return to the list of all episodes.

CLEARER THINKING

Episode 141: How can we make science more trustworthy? (with Stuart Ritchie)

Contact Us