Enjoying the episode? Want to listen later? Subscribe on any of these apps or stores to be notified when we release new episodes:
January 7, 2025
Which of the world's hardest problems can be solved merely by gathering more data? Why are social problems harder to solve than biological problems? How should power between business owners and unions be balanced? Why do people so often misinterpret research results — sometimes even to the point of concluding exactly the opposite of what the results show? What heuristics should people use when reading research papers? How are culture wars affecting the reliability of research? Should there be any limits on what can be researched and published? When might left-leaning researchers actually end up causing harm to the groups whose causes they're trying to defend? How much does it matter that we know what sorts of traits are heritable? How important is IQ for predicting life outcomes? In what kinds of situations is skill more important than IQ, and vice versa? Is there value in knowing your own IQ?
Learn more about Cremieux at their website, cremieux.xyz.
Further reading
SPENCER: Cremieux, welcome.
CREMIEUX: Hi. Thanks for having me.
SPENCER: So my understanding is that you believe that a lot of complex problems in our world are actually much simpler than they seem, especially if we were to use data in the proper way. Could you tell us about that?
CREMIEUX: Sure. So I actually came to believe this pretty recently, after seeing a wonderful set of preprints released on drug discovery. It's always been a big problem. How do you find new drug targets? How do you find ones that are very likely to work? There seem to be a lot of wonderful methods for doing that nowadays, but they require you to have a lot of data. As it turns out, we're gathering a significant amount of data now. For instance, 23andMe has sequenced the genomes of over a million people. They have a huge amount of phenotyping data they gather through their online platform. They have lots of wonderful stuff they can use for drug discovery, them and others, of course. As more and more of this comes online, they're finding more and more targets, and it just seems like the problem of finding drug targets that are likely to work seems to have been pretty much resolved by data gathering. I'm not going to say it's entirely resolved; that's a bit of a misstatement, but it definitely seems to have become easier to find plausible hits. A lot of problems are like this, but we have a big issue in that there is a lot of fraud and a lot of very poor methodology out there for dealing with these problems. For example, recently, it was revealed that Parkinson's and Alzheimer's research has probably been set back considerably by fraud at the National Institute on Aging. The head of their neuroscience division, Eliezer Masliah, it turns out, had faked a lot of his results over the years, and a lot of the clinical trials that had gone forward for Parkinson's, Alzheimer's, whatever, relied on inferences from his research, which turned out to be fake, and that might explain why they've been failing. The really hard problem of finding drugs for Alzheimer's or Parkinson's might just be the really hard problem of not gathering enough data to use in a more appropriate way, and having too many frauds running about mucking up the ecosystem of thought and matter.
SPENCER: So is your thesis that a lot of complex problems, if we could just gather enough data and analyze it in a truth-seeking manner, we could actually make serious progress on that?
CREMIEUX: Yeah, absolutely. It was believed, based on some publications in JAMA, that epidurals increased the risk of autism for the exposed kids; the kids that are exposed to epidurals during childbirth and afterwards through breast milk and whatnot. As it turns out, all it took to disprove this was getting a sufficiently large data set from Scandinavia, because they have these large population registries, and using that data to check in the same way that everybody else had checked it. Suddenly, it turns out the results are null, and everybody was screwing up because they were selectively publishing things that appeared to affirm the hypothesis that epidurals caused autism. Among other things, there's also been claims that they cause depression and ADHD. It turns out that all of these are false. We just had to have had large enough data to actually check the answer to this. These ideas persisted for years, basically on the backside of not having enough data.
SPENCER: My understanding is that one of the ideas behind it was that with this giant database of a million or more people's genetics, plus lots of surveys from those people and other health data, they would be able to enable drug discovery, but that hasn't panned out so well. As far as I know, no drugs have come out of their research, but maybe other companies have found ways to leverage data more effectively. Do you know about that?
CREMIEUX: Yeah. In fact, Regeneron stands out as a wonderful example. Through both their acquisitions and their discovery strategy, which is sequencing a million Americans' whole exomes and then phenotyping things through their electronic health records, their EHRs, they found and developed several drug targets. They've been really successful. It's been kind of amazing, and they're only going to be more successful in the coming future. They are currently sequencing people and phenotyping them more, and it's just becoming better and better. There have actually been considerable numbers of drugs that have been found through genetically assisted discovery. It's kind of been like a flood in terms of the companies. I don't quite know them all. A lot of them are small companies, and many of them end up getting acquired by larger companies like Regeneron. 23andMe has published some on the impact of genetic evidence on clinical success, and they're aware that they have a lot of really useful data, but it seems like due to executive mismanagement, they don't put it to use as they should be. It's really a wonder that they don't. It's not that there's nothing there. There's a ton there. I'll actually send you a paper on this. You can look at it. For some reason, they do not use it. It's kind of a mystery.
SPENCER: My understanding is that in 2023 they had a monoclonal antibody that they've been working on in-house. I'm not sure what has happened to that, but my sense is it is part of their plan to develop drugs, but maybe it hasn't panned out so far.
CREMIEUX: They have a lot of people in-house who are supposed to be working on this project, but they just don't seem to be doing much. It seems like they're really missing the low-hanging fruit because several other groups, like Regeneron, for example, are doing this. They have plenty of room to do it. We know they can do it. Their data is very desirable, for that reason, they're just not doing it. There are a lot of problems like this, where the fruit is really low-hanging. The epidermal autism thing that I mentioned is really low-hanging too, and the reason it wasn't immediately resolved is that if you want to access these large population registers, you have to have certain credentials. You have to go through a lot of forms and bureaucracy. Friends have related to me that it takes an inordinate amount of time to actually get a proposal through, even if it's relatively simple. In the case of the epidural autism thing, they just wanted to do a sibling control analysis. Basically, they wanted to do a regression with sibling fixed effects. That took, I think, more than a year to get through, which is really absurdly long to do anything. But they could have dissolved an entire literature in moments if the data were just a little more open.
SPENCER: Yeah, I could see why people might think, "Yeah. Sure. In medicine, it's just a data problem. If you have enough data, you can answer medical questions." But do you think this applies more broadly to complex problems in society?
CREMIEUX: Absolutely. I think it applies very well to, for example, archaeology. Let's talk about a recent paper. I'll talk about this one in the Proceedings of the National Academy of Sciences. I think a week or two ago, there was a really incredible paper that came out on the Nazca Lines. The Nazca Lines are those big lines that most are familiar with that look like they were made by aliens or whatever. The History Channel used to do all those funny shows on, "Oh, they were made by ancient aliens." None of them.
SPENCER: And you have to have looked at them from way above. If you fly over on a plane, you can see these pictorial diagrams on the surface of the earth.
CREMIEUX: Yeah, that's the thing. Some of them, yes. Others, no. And all of them, if you find the right vantage point in the Pampa where they're located, the answer is no. You can see them from certain vantage points really well, and those vantage points have significance. So the first thing to know about these lines is that there are a lot of small lines. The small lines you can see while you're walking along a trail. If you see a certain line, it's like a funny picture that looks kind of like an alien, or a picture of guys with really big butts, or a picture of a big animal, or all sorts of weird things. If you see that, you can recognize it if you're one of the hikers, and you can walk down the path that it marks. They're path markers. They mark areas where people would go to or want to turn to get to a certain destination if they cross it at that moment. And then the big ones, everybody's more familiar with. Those ones were made after a state appeared in the region, and you could see all of them from the top of this huge temple complex in the southern part of the Pampa. It's really kind of incredible. But these small ones we didn't know too much about. Most of them were discovered recently by AI, by computer vision software that pinpointed where they were in all these images, and then people went down with the proving government, and they looked at them, and they were like, "Yep, that's real. So there really is a line here. We didn't notice it before, but AI showed it to us." And it was these small lines that gave us the indication that they're trailheads, and they're older than the big lines, and that the state co-opted the practice of making these lines for their ritual purposes.
SPENCER: So this was a huge mystery for a long time. All kinds of speculation about aliens making them and so on. It was just this discovery through AI of these smaller lines that unlocked the whole thing. Is that right?
CREMIEUX: Yeah. We knew about a lot of the smaller lines, but there weren't enough of them in the data set to figure out that, "Whoa, wait, look, every single one of them is right near a trailhead, where the trail turns. There are trails carved in right near each of these little lines. It's kind of incredible." All it took was gathering more data, and then we could easily figure out and piece together the history here.
SPENCER: Are there any big societal problems you'd point to where you feel data has helped us unlock it or might help us to unlock it in the future?
CREMIEUX: There are a lot of societal problems that we might be able to use larger data sets to address. Drug discovery is a very obvious one, treatments for rare diseases, which about 10% of the population has some rare disease. They're not really rare overall; it's just that each individual disease itself is rare. All of those can be addressed with just gathering more data, it seems. But if I had to think of a big societal problem, I think a lot of problems can be aided by gathering a lot more data and using better methods. Figuring out how to best distribute welfare, for example, is something that could be assisted through more data. Figuring out how poor people are, and thus what they need, is also something that can be assisted. But I don't think those things are cured per se. Fixing social phenomena is actually a lot harder than fixing biological phenomena.
SPENCER: What do you attribute that to?
CREMIEUX: Social phenomena are incredibly complex, and people are really invested, and nobody wants to be wrong about the things they feel so strongly about. There could be a social problem due to a group's religion. For example, a group such as the Amish or the Mennonites who resist technology, so they're not getting vaccinated, they're dying at high rates, or something like that. There could be all sorts of things where people are heavily invested in the conclusion, and they're not allowing policy to move. They're not allowing people to act. They're not allowing things to develop. They're actively impeding progress in one area or another. For example, environmentalism. The Sierra Club is ostensibly an environmental organization, but they file huge numbers of injunctions, hundreds of millions of dollars worth of injunctions every year against new builds. For example, they block meaningful percentages of all solar developments, which is kind of wild.
SPENCER: Why would they do that?
CREMIEUX: They don't want stuff built in their backyard. They don't want stuff built where some people in the organization might see it; they don't want stuff built, whatever. They're a concerned interest group. And through doing what they do, they impede everything. It's really unfortunate. One of the things that they help to make worse is, for example, forest fires. Forest fires are probably something you know how to handle and prevent; you just clear the underbrush and whatnot. You do your regular forest cleans. But as it turns out, if the Forestry Service wants to do this, they have to go through environmental review per NEPA. Or if you're in California, I believe they probably have to go through CEQA too. These are the California Environmental Quality Act and the National Environmental Policy Act, and because of this, they never get to actually clear the forests. So a lightning strike comes down, some brush somehow catches fire spontaneously, or an arsonist has a really lucky day. Because of environmental review, people are forbidden from actually doing anything, and as a result, the environment gets a lot worse. There are a lot of routes through which environmental review can be pushed onto an organization or a group trying to build things. Some areas don't even allow you to hook up solar panels, for example, over your balconies. It's not even allowed in a lot of areas in the US. It's really bizarre, but you would think that wouldn't be regulated, but it's just not allowed because it's considered ugly. Now, sorry, I guess I want to bring it back to the idea that social problems are harder than biological problems.
SPENCER: And so what do you see as sort of the key aspects of why social problems are harder than biological problems, in particular?
CREMIEUX: Social problems are really difficult because people are really invested in what they want to happen. They don't have a great causal understanding of social phenomena. They think that they can get social fixes to everything, but they can't. They are really deluding themselves a lot of the time. A really good example might be Christian Scientists. They pray for terminal illness to go away. That does not work. If you get your whole congregation to pray, you will not be curing somebody's leukemia. Whereas, if you have that person go through using a bunch of different cancer drugs, and if they maybe undergo chemotherapy or whatever is required, they might actually live. That's a very easily addressed biological problem. Medicare costs a lot. Medicare is very, very expensive. It's a social problem because we have to handle this to make the budget work, and we have to budget for people's care and all that. There are a lot of constraints we have to deal with. It's a complex issue, and nobody wants anybody to touch their Medicare budgeting or to stop their care, or whatever it may be. One way to make Medicare work is to simply address the underlying disease issues. If you can, for example, cure obesity, which we now seem able to do, then you'll be able to save considerable amounts in the long run, if the price of the drugs like semaglutide comes down. If that can happen, then you have effectively helped to remedy a social problem regarding the cost of Medicare and the cost of healthcare provisioning through biological means, or a quasi-biological means. I don't know what you would call GLP-1 RAs, but I would call them a biological fix.
SPENCER: It seems to be that one of the big challenges of social problems, and I think this is what you're alluding to in part, is that if you want to implement a social solution, almost certainly some people will be harmed by that social solution, even if it's really good on average. Even if it's much better for society, someone's going to lose out because you're replacing whatever is currently benefiting them, or there's some kind of externality you're creating, and so you're going to get significant pressure pushing back against the solution, even if it's a really good one.
CREMIEUX: Yeah, exactly. Now, I actually think the ILA, the International Longshoremen's Association, is a really good example of this. The rhetoric background for your listeners — I'm not going to assume everybody has heard of them — but a few weeks ago, they were threatening to close a lot of ports on America's east coast and Gulf Coast, and then they said no. They said they were going to delay the strike, and they're going to do it in January, after the election and everything after the holiday season. They can hold the country hostage. And the reasoning is they want to be paid more money, they want higher wages, they want cushier jobs, they want more safety, they want more benefits of all sorts. And they talk about the fact that robots don't pay for your family's dinner, your family's rent, or your family's mortgage, or whatever it may be. A lot of their shtick is that they're trying to protect relatively high-paid blue-collar workers from losing their jobs. And the irony of this is that by doing so, they are creating a regime where they're protecting relatively small numbers of blue-collar workers. But if their ports were automated and made more efficient, and trade volumes could increase substantially, and they were basically cleared out of the way, what would happen is America would have a much easier time reindustrializing, and it would be able to create a lot of high-paying, relatively low-skill blue-collar jobs in manufacturing, for example. But America can't do that so long as its ports are some of the world's worst. It's an ironic position they put themselves in.
SPENCER: One way to analyze these situations is that you can think about kind of four interest groups. You've got the big companies, you've got the unionized workers, you've got the non-unionized workers, and then you've got end consumers. And I think unfortunately, there are often trade-offs between them. If the big corporation gets more power, they may be able to rip off customers because maybe they have a monopoly and people don't have a choice where to buy. They might be able to rip off their employees because maybe there's nowhere else to work. So that's the danger of the company getting too powerful. On the other hand, if the union gets too powerful, they can actually make things a lot worse for consumers by basically raising prices. They can also make things worse for the non-unionized workers, where it's great if you're in the union, but you're kind of screwed if you're not in the union. So each of these groups, if it has too much power, it feels like it can have negative effects on the other groups. But that creates this internal struggle between these different groups trying to gain an advantage relative to the other groups.
CREMIEUX: Definitely. One of the things I've been reading about recently is how they manage this in Japan. They also manage this pretty well in Sweden, but their results are actually very similar to what happens in America, but I'll talk about that next. In Japan, employees expect that their employment will be maintained at the firm level. They expect that if you're hired by some big company as a salaryman, you are basically going to be part of a family, and as part of a family, they're going to maintain you, they're going to train you, they're going to help you acquire new skills and whatnot. It's not a very efficient model for the service sector, but it is a model that allows them to keep up with America and sometimes even get ahead in the manufacturing sector. A lot of the worries of people in the service — well, not the manufacturing sector — are that they'll be replaced. And so they resist automation. They resist new robot arms coming in to make widgets instead of having humans make the widgets on those assembly lines. They resist these engines of material progress. But in Japan, they embrace them wholeheartedly. In Japan, they just go, "Oh, of course, we want new mechanical arms to come in, because it's safer for the mechanical arm to do it, and I'm not going to lose my job. The firm will just retrain me." And so they manage to adopt all this new technology at a much greater rate than the U.S. does, because everybody doesn't fear that they're going to be knocked out into doing something else. It's a wonderful little social model. It's, again, not efficient, but it seems to be a good way to get technology adopted. It might actually be better in the long run.
SPENCER: It's really interesting. It reminds me that I recently heard an account of a school teacher in Japan who quit their job, and they documented what happened after they quit. It was so intense; this committee met with them, and they had to explain themselves to the committee about why they were leaving. They went through a really difficult process to leave their job, whereas in the US, people quit, give their two weeks' notice, and they don't even have to do that. They just do that as a courtesy, and then nobody thinks twice about it. But in Japan, it's like, "Well, you're leaving. That's like a betrayal." There might actually be social consequences for leaving.
CREMIEUX: Yeah, absolutely. It's a known thing that in Japan, if you leave certain positions, don't expect to get hired again. If you separate from a company where you are a salaryman, then you're considered to be damaged goods, and in all probability, you'll never be offered another salaryman position. You may get contract employment, but the terms for contract employment are worse materially. They have lower social status, and they don't have the sort of job security that a salaryman position does have. It's a really incredible economy in Japan.
SPENCER: You mentioned that something interesting occurs in the Swedish market as well. What were you going to say there?
CREMIEUX: In Sweden, they have very high rates of unionization, but they don't have a lot of resistance towards the introduction of new technologies that can make work safer, more productive, and more automated. The reason being, these unions hope to ensure high levels of employment in the affected industries and to ensure wages remain high, even if new technologies are introduced, which is sort of similar to what happens in Japan, except it's not firm level; it's sectoral. They do a lot of sectoral bargaining that makes technology more a complement to workers than something that's viewed as a replacement, and that helps them adopt technology as well. It's really nice. But the thing is, even though they have this, and this is sort of a belief that a lot of their labor leaders have, the outcomes of automation in Sweden and the US are not very different at all. Practically, the effects on net employment. The difference there is that I'm talking about replacement effects. A replacement effect knocks a worker out of a job; there's a reinstatement effect that puts workers back into jobs due to increased demand and productivity after robots and stuff are installed. So that means that there's a net employment effect. The effects on net employment income are all incredibly similar between all of Europe and the US, and only really different for Japan. So I think maybe the Swedish model of using sectoral bargaining to ease the introduction of new technologies doesn't really have much of an impact, even if it makes it rhetorically easier to get new technologies introduced or eases the way towards their introduction. But the Japanese model of firm-level introductions actually does seem to be better in that sense.
SPENCER: I was in Finland recently, and I was quite fascinated by the society, because it has extremely high life satisfaction scores, but it's really different from the US in many ways, which interested me. One of those ways is that my understanding is that something like 70% of people there who are working are in unions. Not only that, but the unions are even stronger than that sounds, because they tend to team up with each other. If one union goes on strike, others join in solidarity. I thought that was incredible. It must give the workers so much power over the companies. I was talking to someone I know who lives there, and I asked him about the startup scene. I said, "Suppose one day you come home and tell your family, 'Oh, I want to start a startup.' How would people react? Would they be supportive? Would they be nervous for you?" He said, "Oh, that would never happen here. Nobody would ever go home and say, 'I want to start a startup.'" I found that shocking as an American, just the different attitudes. Another Finnish person told me that there's a much greater sense of working so that you can do other things. You get your work done, and then you spend time with your family or go on a nice vacation, rather than thinking of work as a fundamental aspect of your identity.
CREMIEUX: Absolutely. I actually think that Europeans would, if their laws were different, come more around the American way. This is going to sound like a stretch, but Americans and Israelis kind of see it similarly. We have a lot of startups and a lot of impetus to work. Israel does not have a chill reputation, but I still think that the whole "work like an American and live like an American" thing is alive and well in Israel, even if it's emulating rather than actually deriving the same satisfaction from work organically. I think that a lot of European countries would become more like America if their laws were just a little bit different. In Italy, it's very hard to start a startup, but a lot of Italians really want to start one. They just can't because they have a lot of obstacles, like a tax on unrealized income, which is pretty devastating.
SPENCER: I feel like this really cuts both ways; there are good and bad aspects of government efficiency. There is silly paperwork to fill out that doesn't help anybody. But there are also fundamental value trade-offs. For example, a friend of mine who works in academia moved to Europe from America, and she felt like she got a lot of benefits in terms of her sense of self. In America, she felt that her sense of self-worth was based on what she was publishing and how productive she was. When she moved to Europe, she felt that people appreciated her for who she is, and they didn't think about her through the lens of her work. She found that to be a big relief. Working hard and trying to achieve great things can be wonderful, but there can also be a big trade-off. It's not for everyone, and there is a real cost to that.
CREMIEUX: Absolutely, there definitely is. But I do wonder how long Europe is going to put up with being poorer than the US. The way I see it is that there are two growth frontiers we need to think about when it comes to the Western sphere of influence: Japan and Korea, plus Europe, plus Australia, New Zealand, the US, and Canada. There's the European growth frontier, which is a steady state that is increasing, but very slowly, relative to the American growth frontier, which is kind of getting away from it. If everything continues at this pace, and the European growth frontier doesn't experience some acceleration, then in 10 to 20 years, Europeans will seem really poor to Americans. They will be so much worse off materially that I wonder, "Will Europeans be okay with it? Will they really want to live in a Europe that gives them an enhanced sense of meaning right now, but isn't really super far behind when they really are super far behind?"
[promo]
SPENCER: Another topic I wanted to talk to you about is this idea of people quoting sources or drawing conclusions from sources without carefully investigating what those sources are really saying or qualifying the statements properly to address the limitations of those sources. Could you go into that a bit?
CREMIEUX: Yeah, so this is actually really common. This is a huge problem, and I think it's led to a lot of people being misinformed. Part of it has to do with people not being able to read studies, and it's hard to teach them how to do that. But a lot of it has to do with the fact that they simply do not read what they cite. For example, I recently read, in fact, I just read a relatively new study in the Proceedings of the National Academy of Sciences on differences in how often cops will pull over black versus white motorists in the US, specifically in Chicago. This produced a really evocative graph that appeared to show that there was extremely good evidence for discrimination by police officers who are on the beat, looking to pull over cars. What it showed was that when officers pulled people over, they pulled over a really excessive number of black motorists and a minimal number of white motorists. When it was speed ticket cameras giving out the tickets, that led to much more even rates and linearity in the relationship between the probability of a member of a particular group being stopped and their share on the road. But you crack open this study, and it turns out that linearity is entirely built into how they've constructed the data. The share on the road, the independent variable in this regression, is constructed from neighborhood demographics, and the probability of being stopped is also imputed from neighborhood demographics. It is a one-to-one relationship that is built up based on a tautology of their methods; they've created this one-to-one relationship. It's really impressive that this was considered evidence for discrimination, but it shouldn't have been. There's really no reason to think it was. The other thing is that the officers, when they pull people over, are pulling them over for more than speeding. At a glance, this data shouldn't have even given anybody the idea that it advanced discrimination. They're comparing apples to oranges, and the result is something just intensely unfavorable to meaningful inference.
SPENCER: I think that there's this idea that you either don't trust studies, that studies are bullshit, or you trust studies and are pro-science. But actually, I think both of those ideas are really unhelpful. Rather, you have to think about some papers being reliable and some being unreliable, and it's really a spectrum. Just because a paper says something doesn't mean that it actually provides any evidence for that thing. We see this all the time. We have a project called transparent replications, where we replicate new papers coming out in psychology journals, just looking at top journals. Some of the papers are excellent, and we learn a lot, and some of the papers have fundamental flaws and don't even mean what they claim to mean, and some of them don't even replicate. It runs the full gamut from actually providing no evidence at all for the thing claimed, to providing some evidence, to providing very strong evidence. Unless you're willing to make that distinction, either position — science is bad or science is good — is extremely inadequate.
CREMIEUX: I actually ran into this very recently where I got a paper retracted because it turned out, okay, the paper is really funny. About a dozen different news organizations at the time I looked had reported on it and said that high ceilings — and I'm not talking about the ceiling of the test; I'm talking about the ceiling of a room in which students take tests — reduces their test scores.
SPENCER: Oh, my God, my alarm bells are immediately going off.
CREMIEUX: This is a little weird. This is just a funky idea. So the whole idea that high ceilings make students have lower test scores. Why? Because something — this is really crazy — they're saying it was something about embodiment in the room. They had this diagram showing that people consider themselves differently in the room if it has different dimensions. It's a little hokey. So I look at this, and I read through the tables, and I look at their little replication thing they have on OSF, and I look at their pre-registration, and I'm just seeing problem after problem, and the main one is the result they achieved was exactly the opposite of what they found. Yeah, they found that higher ceilings were related to higher test scores. Now, mind you, this isn't a causal study. Anyway, they were just making strong causal claims on the basis of no real causal evidence.
SPENCER: How do they misinterpret it so badly?
CREMIEUX: That's a good question. I think they really wanted ceilings to be related to lower test performance, and it just wasn't. So they decided to go ahead with their conclusion, even though their tables showed very clearly that something was amiss. The study is now retracted. I wrote a long thread on it. It went fairly viral, got more than a million views, and there was some back and forth, and a former editor of the journal reached out and said that he didn't know how it got published in the first place and how the reviewers missed that it didn't actually support the conclusions it said it did.
SPENCER: Now, did you have to analyze the data to figure this out? Or just reading the paper, were you able to figure this out?
CREMIEUX: You could have figured this one out from the paper, but a lot of other details you would have had to go and look into the data and into the pre-registration. Looking into pre-registration is actually something that's pretty rare. What I found in the pre-registration was that they substantially deviated. They did not stick to what they pre-registered they were going to do. They did something very different, and the model they chose to use for estimating the effects of ceilings was really absurd if you take it literally. If students are positively impacted by higher ceilings, you should put them at the bottom of a space escalator or put them outdoors. If they're negatively impacted, you should cram them into a cave. They didn't include a non-linear term. They didn't even think about it.
SPENCER: So when you're reading academic papers, what are you looking for? If you're just doing a quick skim, and you're like, "Can I trust this? Is this reliable?" What are you picking up on?
CREMIEUX: There are a lot of heuristics that I use that I don't know if I've ever explicitly written down. One thing that people will often see me talk about is that there are a bunch of marginal p-values in papers everywhere. For example, there are a lot of papers from before the replication crisis was really well known, purporting to show amazing coincidences of effects across data sets and all sorts of evidence for X, Y, Z, different things that the authors really wanted to be true. For example, in the Proceedings of the National Academy of Sciences, there was a multi-study paper in which they showed that rich people are really selfish, and almost all of the p-values for the focal tests are between 0.05 and 0.01. They're basically all in the margins.
SPENCER: So just to clarify that for listeners, if you recall, I'm sure you've heard the phrase P value, but a P value basically tells you how likely you are to get a result at least this extreme if, in fact, there's no effect. So if you're flipping a coin and you're trying to say, "Is it a fair coin?" And you flip it 100 times and you get 75 heads out of 100, then the question that the P value answers for you is, "How likely are you going to get a result this extreme, with 75 or more heads, if actually it's a fair coin?" So if it truly is a coin that would 50% of the time land on heads. So that's what we're getting at. Many journals, especially in social sciences, use this heuristic that if a P value is less than 0.05, that means you found an effect, and it's potentially publishable, whereas if it's above 0.05, they'll say, "You didn't find an effect." This is a very arbitrary cutoff. There's really nothing special about 0.05, but because that's a cutoff used for journals, it creates a weird incentive to try to get your P value just below 0.05 if it was just above. So let's say you got 0.06, now you're incentivized to, "Oh I maybe throw away that outlier, or maybe try analyzing it a slightly different way to see if you can get it down to 0.04," and now you can try to publish.
CREMIEUX: Yeah. And so what's funny about finding all the P values between 0.05 and 0.01 is that it means they were probably P hacking. They were really trying to get all of their test statistics to return to be right in that coincidentally significant range, with the implication being they really weren't. They had to do something to make things significant because the journal editors wouldn't have accepted the results otherwise. When you see a paper with one of my heuristics, when you see a paper with a bunch of these P values right along this edge, that's a good sign that something is amiss, and you can do this with literature, too. I have a blog post from a while back where I looked at how many P values were in this range across different literatures because a lot of these P values have been gathered up in several papers. It turns out the portions are pretty huge depending on the field. There are some fields where strictly the majority of P values are right under 0.05 and right above 0.01. It's that dubious. Something is really a mess.
SPENCER: What about sample size, also known as the number of study participants in the study? Is that something that you hone in on when you're thinking about the reliability of the paper?
CREMIEUX: I think you have to contextualize it, though. For example, if you wanted to test an absolute thing, like, "Can X and Y differ?" If you can find one example of X and Y differing, then bada bing bada boom, your sample size was sufficient. If you are looking to test a systematic effect of a medication on patient outcomes, and you find that it didn't work, but your sample size was 10 and the effect size was large, and it just wasn't significant, then, of course, your study should not be treated as evidence against the medicine working. If you keep it in context, then, yeah, I definitely do use sample sizes as a heuristic. But there are just some cases where it is completely fine to have a small sample size, and others where it is very, very bad. For example, in international aggressions, where you're looking at determinants of growth, if you're looking at changes in, for example, the Freedom House index, if you're looking at changes in that over many years, your sample size of countries is only going to be a couple hundred. There aren't that many countries. So your sample size is necessarily limited, but it's totally fine to use that because that's all you really can have. That's a complete sample. If you wanted to use a small country, like if you had everybody in a register in that country, and you wanted to see how different things were related, and the country was small enough, then it might be the case that you're not going to get a lot of significant results. But it shouldn't actually matter because the P values are for sample statistics, not for population statistics, and in that case, you have the whole population, so you can skew P values entirely.
SPENCER: I think what the distinction you're drawing here is that often, when we do statistics, what we're really interested in is generalizing about a larger population. We want to know something about all Americans, but we can't ask every American. So we take a sample of 300 random Americans, and then we ask them, and we use that to infer something about the entire American population. That's really common. In fact, I would say that's almost always what they're doing. But sometimes you're just interested in a particular group, let's say, you're just interested in your own classroom that you teach. You could just survey every person in the classroom; you have the entire sample. There's no need to generalize to some other group. You just care about the kids in your classroom.
CREMIEUX: I do look at sample size in a lot of places because even when you want to control for something, sometimes if you don't have a large enough sample size, you run into finite sample problems where the adjustment isn't sufficient. The fact that you didn't have a large enough sample means you're not going to be able to get an adequate control; you're going to have a lot of sampling error. You're going to leave more measurement error on the table than if you had enough people to bring it down to what it really should have been after the adjustment. There are a lot of papers that explain this probably better than I just did right there. But the gist of it is, if you really want to test something and you want to be very sure about it, you're going to need a big sample.
SPENCER: Yeah, I find that something like a sample size of 20 is so small that there's almost nothing to be learned from it. The exception would be it could be good for a pilot study or a feasibility study, that kind of thing. But if you're trying to really draw inferences, basically, if I see a study has 20 people, I generally think, yeah, this is garbage. Would you agree with that?
CREMIEUX: It depends. There actually is a really small study that I find to be really, really probably true, where people rated the difference in taste between pudding that was flavored like feces and pudding flavored like chocolate, and they rated the pudding flavored like chocolate as much, much better than the pudding flavored like feces. Small sample, but I totally believe it.
SPENCER: Right. But is that just because you have high prior? Did you even need the study at all?
CREMIEUX: Yeah. I mean, it does, but I think this is actually good. You've hit something very important there. Did we need the study at all? Absolutely, I think we did. The reason being, this can help us to tune up our understanding of other effect sizes, and it can help us to better understand if our methods actually worked. Because imagine if people rated the feces one better than the chocolate one. Your prior would be shot. I mean, what would you even say at that point? There would be some confusion. We do have to test things that are obvious in order to make sure that our methods work, that our theories work, that we're able to make theories properly, to test that anything really makes sense, and sometimes the obvious things are wrong. There is a very famous example of this. So sometimes we do have to test the obvious, because what's obvious isn't exactly likely to be true. So, Sir Archie Cochrane, the guy whose name is behind the Cochrane Review, he was a huge fan of RCTs, and one of the things that he did was he ran an RCT to see if patients would benefit from recovering after heart attacks in a specialized cardiac unit versus at home. And when he went to present his findings to a bunch of doctors, he said, "Gentlemen, we have preliminary results. They may not be significant, but we have something. And it turns out, you're right, I'm wrong, pulling the trial over." He goes, "It's dangerous for patients to recover from heart attacks at home; they should be in the hospital." And that seems obvious. The doctors erupted, saying, "Of course, we're right." They pounded the table. They went, "Archie, you are so unethical. I can't believe you did that, Archie. I just don't know why you would ever want to do these RCTs, because we know what works." So then he goes, "All right, gentlemen, very interesting response. When I gave you the results, I swapped the column names around. It turns out hospitals are killing people; they should be at home. Do you want to close down the trial, or should we continue until we have more robust results?"
SPENCER: Wow.
CREMIEUX: Yeah, it seems obvious, but it just isn't.
SPENCER: It seems especially useful to study quote "obvious" things when they're current practices. If everyone's doing things a certain way, it's worth double-checking that it's actually good. But there are lots of other things that just aren't going to move the needle one way or another, even if you find out it's not true.
CREMIEUX: Yeah. So the example I was thinking of from Arthur Schlesinger was World War II soldiers. Their experiences were described in a way that allowed people to understand common sense as being wrong. Arthur Schlesinger gave a series of findings, and he told them to say, "How commonsensical these were? How true were they? Etc." And everybody said, "Of course, they're true. The findings were that better educated men showed more neurotic symptoms than those with less education when they were deployed in World War II." The explanation he provided was, "Oh, the mental instability of intellectuals compared to the impassive psychology of the man in the street is well known." Everybody knows that men from rural backgrounds are in better spirits during army life than soldiers from city backgrounds. After all, they're more accustomed to hardships. Another one is that Southern soldiers are better able to stand the climate in the hot South Sea Islands than Northern soldiers, which, of course, southerners are more accustomed to hot weather. Another one was that white privates were more eager to become non-commissioned officers than black privates, and everybody at the time agreed black officers lacked ambition. Another one was that Southern blacks preferred Southern to Northern white officers, and of course, they live amongst each other, so they must be friends. The last one was that people agreed with these things, and people still agree with these things. They think, "Oh, this just makes total sense. The explanations are right. Everybody's aware of this," but they're all wrong. Every single one of those is backwards. He described them all backwards. It's the exact opposite of what they actually found. They found that better educated men were actually less neurotic. They were better able to hold up in the war. They found that men from city backgrounds were in higher spirits. They found that men from the North and the South were equally able to stand the climate; maybe the Northerners even better. They found that black privates were actually really eager to become non-commissioned officers. They found that Southern blacks preferred Northern white officers, not Southern ones. They found that people were fine continuing the fight even after the Germans surrendered. All of this common sense was wrong, and a lot of findings are like this, so we have to test.
SPENCER: I think you're also pointing out something, which is that it's so easy to create a kind of just-so story. Once you have a hypothesis in mind, you can come up with an explanation for it. And then if your brain doesn't do the extra work to try to reverse it and say, "Well, could I come up with an equally satisfying explanation for the opposite?" Then you end up really convinced. And so, there's such a useful mental habit of, it seems like, "A implies B or A causes B. Let me try to reverse it for a second and see if I can come up with an explanation the other way around."
CREMIEUX: Yeah, absolutely. Or even just come up with an explanation for the null because they're unrelated.
SPENCER: Changing topics a bit: to what extent do you think that kind of culture war is impacting the reliability of scientific work?
CREMIEUX: Really, really badly. There's a lot of unfortunate culture warring that holds back our ability to gather data, our ability to get studies published. There are some issues that I don't really care about too much, like, for example.... Do you mind if I actually broach some culture war topics?
SPENCER: I don't mind. I'll just say, as a side, I'd rather not talk about race and IQ, but I'm happy to talk about other ones.
CREMIEUX: I was actually going to talk about that. I think it's a New York Times or Washington Post piece yesterday that said that the guy or the woman who did the trans-puberty blocker study. She said she couldn't get it published. There was a piece that came out in the news the other day about how this person who had done this very well-funded study on the reversibility of puberty blockers had a hard time getting their work published. They couldn't do it. They said that there was resistance to publishing the results because they were negative. They showed that something like puberty blockers didn't have reversible effects, which is a big talking point for a lot of people promoting their use. This happens all the time. There are absolutely bundles of topics where it is just impossible to publish certain conclusions, even if they're correct. Now, we don't know the results of that particular study yet because, of course, it hasn't been published. We only know about the sort of hearsay we've heard from one of the authors, and maybe we'll know what it says at a later date. Maybe it will get published. Maybe the fact that it was published, about not being publishable, will help it to get published, I don't know. But there are a lot of things where you just can't publish anything, and a lot of things where you cannot even access the data needed to test a thing that could be pretty useful. For example, there are people who wish to study the relationship between educational attainment and body mass index. They want to figure out how these things are related, why they are related, how they are differently related across eras and groups and whatever. This turns out to be a contentious thing to study, so when they've applied to receive access to the NIH's dbGaP, the database of genotypes and phenotypes, some of the data sets on there, they've been rejected round one. There are tons of data sets in the dbGaP that can be used to investigate the relationship between education and body mass index, and they're just not able to get access. They're being blackballed. They're being denied. They're being told, "Go somewhere else," and so all this research will never happen. In some cases, they do get access to people who want to study things that are contentious. They do get access to data, and when they go to see if they're allowed to publish whatever they applied for the data to do, they are told no. The authorities who control access to the data tell them, "We don't like the study now, and you've already done it, but you can't upload it, no preprint, no paper, no submission, no anything, or else we'll deny you access going forward, we'll prevent you from having access in the future to any other data sets in here," etc. It's common and bad, and a lot of our privacy regulations on access to data also hold us back. Those come more from the conservative side. A lot of Americans really, really dislike the idea of somebody having access to their data, of somebody using their data to produce any sort of scientific result because they think it might be used against them in some way. There's a lot of this bizarreness going on where people have polarized in ways that prevent us from actually doing good scientific work, and ironically, this makes the reputation of science worse. Americans' trust in scientific institutions, journals, universities, publishers, scientists themselves has really plummeted recently, and unfortunately, this has also made it harder to do anything because people are less willing to support investment into efforts to make things better, to make things different, to do more stuff. I don't know, I can go on, but if you want to get specific, please ask.
SPENCER: What do you think is really motivating this? Is it that people think there's certain information that would be harmful to have out there in the world, and so we've got to block it from getting out there? Or do you think it's more that people think, "Oh, someone's researching this topic at all? They're probably a bad person, and so we can't trust what they say. They might be manipulating the evidence." I'm just trying to take the point of view of someone who's trying to block this work. What do you think their motivation is?
CREMIEUX: I actually think we have some pretty good work on this now, showing that people are hyper-vigilant about the harms of scientific evidence, and they do this. People are ironic about this. I'm going to mention a study. It's from 2023. Cory Clark was the lead author. Philip Tetlock was the senior author. They had some other authors on there whose names I don't recall at the moment because I didn't ever read the middle names. What they did is they presented some samples of almost a thousand participants, if I recall correctly, some excerpts from five real, peer-reviewed, published scientific studies and a discussion section excerpt that was made up for this study. They presented people with the idea that female protégés benefit more when they have male rather than female mentors in academia. They presented evidence for an absence of evidence for racial discrimination against ethnic minorities in police shootings. They presented evidence that priming people with Christian concepts increases racial prejudice. They presented evidence that the children of same-sex parents are no worse off than the children of opposite-sex parents, and they presented evidence that experiencing child sexual abuse does not cause severe, long-lasting psychological harm for all victims. Then they presented a made-up one that was something about how conservatives are bad, or they swapped it for some people, liberals are bad in certain ways. Then they gauged people's reactions, and they wanted them to predict how other people would react. Then they wanted to get how they reacted. A lot of people thought that the reaction from everyone else would be harmful to all these findings. They had something like nearly 50% expectation that people would react in really bad ways that would hurt people. But when you look at how they reacted, I think it's like less than a third said they had a harmful reaction to those findings, like that they would then say, "Oh, well, women should only have male mentors," or something like that. Very large numbers of them gave helpful reactions, saying, "Oh, well, we should help women with mentorship in more ways because there might be some gender-related harms we should think about." Or they said, "Oh, well, I guess that means same-sex parents are okay in the overwhelming majority of cases." But they expected far fewer people to have such helpful reactions. Another finding was that most people called for more research to be done after hearing these findings, but they thought relatively fewer people would want to call for more research.
SPENCER: So is the idea, basically, that people reading these, their reactions are pretty reasonable, but they're afraid that other people will have unreasonable reactions. So they're scared of the research being published or scared of it being conducted at all.
CREMIEUX: That's exactly right. And it's funny, one of the things they tested in this was that they wanted to see how often people wanted to thwart research, how often they wanted to stop it in its tracks. The actual number was something like a fifth of the people wanted to stop the research, but they thought nearly double that number of people would want to ban the research, which is a pretty terrible reaction, but everybody had a pessimistic view.
SPENCER: How do you think about studies done on topics where there does seem to be a reasonable chance of people misinterpreting it and drawing negative conclusions, like negative stereotypes? Do you think that it's better if just the evidence is out there and we have to trust in the long run that truth wins out? Or do you think that there should be some limit where certain types of evidence are just better not to have out in the world because the danger of misinterpretation is so high?
CREMIEUX: I think there should be basically no limits on what we research. The relevant thing here is that if it's between front lash and backlash, everybody knows what backlash is. A finding comes out, there's backlash to it. Somebody gets punished, somebody gets harmed, etc. Something happens that a group gets targeted for whatever. After 9/11, there was a frontlash, which means that opinions about Muslim groups after 9/11 got better. Pew and Gallup, I think both found that there was an increase in positive opinion towards Muslim groups in the US. But that doesn't mean that Muslim groups became safer because the opinion in general about them got better, because a few people will still go out of their way to harm these groups. They're going to attack them because they're associated with the group that they now dislike, even if they're in the minority of people who cause backlash.
SPENCER: Why did the opinion get better? It's kind of counterintuitive.
CREMIEUX: Yeah, I think what has to do with people feeling there will be backlash. This is the same sort of thing with the research.
SPENCER: So they're kind of preempting it and saying, "Look, people are going to treat Muslims unfairly. They're going to blame them for this thing that obviously the vast majority of Muslims had nothing to do with." And so they're kind of like, it gives them more positivity towards Muslims because of that.
CREMIEUX: Yeah, absolutely. It's something like that. It's definitely. So a front lash happens all the time, and with the research we just talked about, a lot of the responses seem to be sort of the front lash variety of, "Let's do something better because of these findings, not let's do something worse," but because it seems like maybe the reason for that is that everybody expects there to be backlash. Everybody expects there to be a bad reaction, even though they all have good reactions. When it comes to contentious findings, it's basically never the case that they actually lead to any sort of harm. It's incredibly, incredibly rare, and the harms are generally very minor, if they do happen, they're met by a better response. Sometimes when something bad happens, generally something good also happens. So one of the examples that I often tell people about is that it was thought for a long time, and especially by Christian conservatives — not to target them. I love Christian conservatives. They're fine — but a lot of Christian conservatives are very much against LGBT families. They're against families where you have two mothers, two fathers, two non-binary people, one non-binary, one binary, whatever. They're against all that stuff. And they had this reasoning that was based on their biblical prescriptions, but people have a funny way of finding evidence for the things they already believe, which, in this case, is their biblical prescriptions. So some studies came out saying that, "Oh, gay parents are bad because of all these outcomes," and then the response was, "No, let's do a bunch more research. And basically every other study since then has shown that gay parents don't actually seem to do much bad." They don't really do any bad, unless you have a very loaded sense of what's good or bad. The only differentiated thing between homosexual and non-homosexual parents is that the kids of homosexual parents are more likely to themselves express an atypical sexual identity. They're more likely to be, for example, lesbians if they have lesbian parents. But that's not necessarily a bad outcome unless you believe that being gay is bad, which is very loaded. But anyway, what I'm trying to say is that the response to this literature was to produce better literature to critique that literature, to figure out where the methods went bad, to find that actually things are a lot better than how bad people are hoping they'll be. That's just generally how things go.
[promo]
SPENCER: If we take an issue like health care for trans youth, and what kind of procedures or what kind of treatments might be helpful to them, one perspective on it is a sort of pro-trans, anti-trans perspective, which is the culture war perspective. Anything that makes youth gender medicine look good is on the pro-trans side. Anything that makes it look bad is on the anti-trans side. Another completely different perspective, which I think is the scientific perspective, is that it's valuable to know more about the topic to better understand how well these treatments work, who they help, when they help, and when they are not helpful, and so on. The cultural perspective that anything that says the medicine works less well is anti-trans can lead to a worse evidence base that harms trans people, because it's harder to tell which treatments are helpful and when they're helpful.
CREMIEUX: I have a funny perspective on the trans issue, especially, which is that perhaps the culture war is a huge blessing for trans people. Not right now, they might feel discriminated against. They might feel as though their healthcare is threatened. They might feel as though healthcare for their kids is threatened, or parents of trans kids, whatever it might be, but the culture war is a blessing because it gives impetus to researchers on the other side to do the biotech research to make it possible to do real gender transition, real transition that has never been possible. I'm talking about actually turning someone from male to female, and not just using cosmetic surgeries to emulate going male to female. It creates a motivation in the heads of people who might want the culture war to end, or might want the culture war to decide in their favor, to actually go and do something. The bias to action here is unidimensional. Nobody can do research that stops you from making real gender transition possible, but you can definitely do research that makes it possible. If the culture war is seen as a motivation to make it possible to do wound transplants or to figure out how to make puberty work in a different way, then it might be helpful in the long run.
SPENCER: Do you think in the short run, it's making it harder to help trans people because it's harder to understand the evidence?
CREMIEUX: I think it makes it harder insofar as it acts as an impetus for legislation that makes it harder for them to access the healthcare they might need. An example of this is in some states, it's become quite hard for people to get the hormones they wish to have at certain ages. There's a very high suicide rate in the trans community, especially, I believe, on the MTF side of things, because it is hard for them to appear to be female without a huge number of surgeries, lots of vocal training, stuff like that. If they go through a typical male puberty, the options we have for puberty blocking right now are really not all that great. They don't actually give you female puberty. What they're doing is preventing development, which probably has a lot of downsides. Because these downsides are recognized by a certain number of people, and because it's recognized that this is an inferior option to really fixing the issue with biotech, we are getting people doing research into making it better, into making it possible to do all this stuff. A lot of the research you hear about with respect to the trans debate is people kind of circling around the drain of do-nothing research, whereas what's really going to end up resolving this culture war is biotech. Once it's possible to actually turn someone from male to female or vice versa, obviously, the debate ends. You can't then deny that a person who is always female but was born male, or is always male but was born female, has been male or female.
SPENCER: You don't think that people who are anti-trans are then going to just change the goalposts and say, "No, you actually have to be born with a certain body. It doesn't matter. Even if you can fully medically transition in every way, it doesn't count."
CREMIEUX: No, I think they'll just lose. I think that the pro-trans side is going to win because they're on the side that can embrace technology, and technology has a way of making it possible to do things that become undeniable. IVF is a good example of this. Prior to the advent of IVF babies, it was looked down upon by the majority. It was a strong majority of bioethicists who were against IVF. And now that you have IVF babies walking around, many of whom are esteemed individuals, you can't deny it anymore.
SPENCER: Do you think it's just that people get accustomed to things and their views just sort of naturally change? They stop resisting them.
CREMIEUX: Yeah, I think they're forced to be accustomed to it. They're forced by the circumstances to be accepting because you can't just tell someone to their face, "I don't accept your existence. It was immoral for you to be born, at least for most people." There are, of course, some people who will stick to that line forever, but they're going to become a dwindling minority, especially as generations advance and we come to live with our technological changes. Technology has a way of reforging our cultural intuitions about things.
SPENCER: Another topic I wanted to ask you about is heritability, and this is one that I'm actually pretty confused about myself. If you look at twin studies, typically ones done with identical twins raised in the same household, and compare their traits to fraternal twins of the same gender raised in the same household, you'll find really high heritabilities for all kinds of traits. Everything from personality traits to depression to IQ and so on has really high heritabilities. But then if you look at DNA-based methods, where you can measure people's DNA and try to predict their traits from that, you get heritability estimates that are much lower. Initially, people thought the DNA-based methods were newer, and they didn't have great data, and the methods weren't that great, and they weren't capturing everything. Just a technological limitation. But as the technology has improved, yes, the heritability estimates have gone up, but they still aren't anywhere close to the twin study heritabilities. It leaves this kind of what they call the missing heritability problem: Why is there this huge gap? Is it that the twin studies are more reliable, and indeed, lots of things are super heritable, or is it that the DNA methods are more reliable, and we've overestimated the heritability of all kinds of things, and things are not nearly as heritable as we thought? I'm just curious to hear your thoughts on: how heritable are important traits?
CREMIEUX: Yeah. The estimates for these methods differ considerably. Briefly for the audience, the estimate is what you're trying to estimate. Your estimate can be whatever x on y gives you an estimate of the effect of x on y, but what that actually represents is your estimate. The estimate for SNP-based heritability computation is different from the estimate for twin studies, which gives you quite a bit more than just what you would find from SNPs. There's non-SNP heritability galore. Additionally, one of the big problems with these non-twin SNP heritability methods is that you need variability that you're going to have a hard time capturing. There are reliability issues with the actual genotyping, and you need big samples. You need big, diverse samples, and it can be hard to get all that together. There are several questions related to the heritability of different traits that cannot be answered with our existing samples because they're not large enough. They don't have good enough measures. What we find with twin studies is that they're just a more powerful method for determining the amount of genetic influence on things because they capture all the genetic influence. A variance decomposition gives you a lot more power than attempting to do a one across 300,000 SNPs. To put this in a different perspective, consider height. We're getting pretty close to the twin heritability of height. The thing about height is that it's very easy to measure. Everybody knows their height. It's objective, and it's incredibly easy to just go and measure people. You can get an exact read with very little measurement error. It is very hard to give 500,000 or a million people a high-quality IQ test or a high-quality personality measurement. It's incredibly hard to actually measure things well at scale, and you need scale to make these methods really give you a complete picture. Because of that, we're just nowhere near where we need to be for a lot of traits.
SPENCER: Is your opinion that indeed, lots of traits are very highly heritable? Many traits ranging in the 40% to 60% heritable, and that actually, the twin study methods are kind of the best estimates we have, and the DNA methods just haven't caught up yet?
CREMIEUX: Yeah, for the most part, I'd say that's correct. The other issue is that we don't really know how to handle assortative mating yet. Nobody has actually figured out how to handle this. It has different effects across multiple generations; you can pump up or down the genetic variants of a trait, and some things have changed considerably, like educational attainment. Its heritability has changed over time, and this presents a total mess. It's actually somewhat amazing that it's even been possible to find all these educational attainment-related hits in GWAS just because of how terrible of a phenotype it is. We know that it's drifted. It correlates with different things differently across generations, sometimes linearly, sometimes non-linearly, sometimes with policy. It's very weird. It's really lucky that we've been able to do anything with it, and if we want to do better, we have to get better measures.
SPENCER: So on this sort of mating point to illustrate that for the listener, imagine the people who are highly educated want to marry people that are highly educated. What that means is that people's children are more likely to inherit, if there are genes that affect education, they're more likely to inherit more of those genes because their parents are going to be more correlated in those genes than you'd expect from random chance. This kind of creates a problem for these twin studies because it actually screws up the heritability estimates. Did I get that right?
CREMIEUX: Yeah, it definitely screws them up a lot. We still actually don't know how to work out what happens when assortative mating is increasing or decreasing across generations, when it's out of equilibrium, when it's not just at a constant level for a very long time. There are people actively working on this, like Alex Young is working on this, and there are some Icelandic people working on this, Danes working on this, but we don't know how to deal with this yet. One of the ridiculous things that happens with assortative mating is that it makes it so some twin study heritability estimates are above one when you've corrected for it properly, which can be absurd. Alex Young has a really great paper illustrating this, and it just causes a bunch of headaches. It's really not clear what to do yet, and we won't know what to do until we've gathered a huge amount more data, which is very unfortunate. You can't just innovate your way out of this methodologically. You have to actually go out and sequence more people, measure more people's traits, and really we need a lot more family-based sampling of the biobanks. We need millions and millions of families to get sequenced together and measured together, or we're never going to figure this out.
SPENCER: What's the value of knowing how heritable traits are? Does it really matter, at the end of the day, whether something's heritable or not?
CREMIEUX: Scientifically, yeah, it does matter. If you were doing breeding programs, if you want to know the true breeding value, then you would use a method like Alex Young's RDR, the regression disequilibrium.
SPENCER: You mean if you're having farms and you're breeding cattle for certain traits?
CREMIEUX: Yeah, absolutely. If you want to do embryo selection, or if you want to affect the genetic variants in some other way, then you also should know the heritability. It can help you to calibrate your expectations.
SPENCER: So this would be couples that have in vitro fertilization, and they want to choose the embryo that's most likely to be healthy, that kind of thing.
CREMIEUX: That's right. If we want to give them accurate predictions, we need to know the heritability. We need to know it correctly. And that's a pretty good reason.
SPENCER: Do you think it matters at a societal level? Whether, let's say, take personality traits like dark triad traits, such as manipulativeness, which are known to correlate with criminality. Let's say they turned out to be more heritable or less heritable. Does that really matter? Does it have societal implications, how heritable the traits actually are?
CREMIEUX: Sure. It can definitely have societal implications. If something that is going wrong in society or something that's going well in society is driven by the genes of the population, then that's something we should probably know. It can be hard to act on it. In fact, it can be immoral to act on it, as basically everybody agrees, because of the specter of eugenics in the room. But it should definitely be known. We should have the ability to adjudicate between theories properly. If we want to act based on what a given theory of criminality says, then we need to know what the causes are, and knowing what the causes are can give us insights into how to affect change in the traits. Here's a great example: schizophrenia. We have wonderful drugs for schizophrenia. We have drugs that lead to much reduced levels of criminal offending, that lead to longer lifespans, greater ability to hold a job. If we can get schizophrenics these medications early, we can achieve better effects; they'll be able to stay in the workforce for larger periods of their life and thus move up more. They'll be more likely to have a family and hold a family together, and they'll be less likely to abuse people or go homeless or whatever it may be. If we can predict this early on, great, we can allocate drugs better. We can also discover drugs better. If we know the heritability of some problematic trait is very high, we can target that specifically for the discovery of drugs for interventions, like giving people access to embryo selection if they're extremely high risk and they want to have a child. There are all sorts of ways we can use that information to make society better, and I just think schizophrenia is a pretty good example of that because of how badly it increases homicide rates, for example.
SPENCER: My understanding is that schizophrenia drugs are quite effective, although they make people feel kind of crap. But there might be some newer drugs in development that might have fewer side effects.
CREMIEUX: There are actually some wonderful drugs in the pipeline. Some of them are depot medications, meaning that you could give somebody a shot and it lasts for an extended period of time, like three months, six months, a year, maybe more, with minimal side effects.
SPENCER: I've often heard it disputed that schizophrenia is linked to criminality. Is there high-quality evidence on that topic?
CREMIEUX: Oh yes, absolutely. There is tons of high-quality evidence on this point, and some of the best evidence actually comes from Scandinavia, where you have population registers and free healthcare. People are getting diagnosed at very high rates. They're getting referred to by the police because the police are somewhat more progressive there. They are getting diagnosed in jail. They're getting put on medication, and the medication data is available to researchers. We can see that, for example, when one member of a sibling pair has schizophrenia and the other does not, the one with schizophrenia is at a vastly elevated risk of committing a violent crime.
SPENCER: Yeah, I could definitely understand why people would want to resist that, even if it's true, because you don't want people to be stigmatized, but on the other hand, it speaks to the importance of getting people the treatment they need, and also obviously not assuming that just because someone has schizophrenia they're going to commit a crime. Obviously, the vast majority of them are never going to commit a crime.
CREMIEUX: One of the interesting things is, I think that stigma kind of goes away when you have the ability to fix the issue, which is why resistance to fixing it is so bizarre to me, part of why it's so bizarre. For example, I gave IVF as an example of this. IVF was highly stigmatized for a little while, and now there's no real stigma attached to IVF unless you're in a very weird, very conservative circle, and that's just because it's normal now, it's fine. Nobody really judges anybody over IVF anymore. And if they do, it's quite rare. We can make a lot of diseases like that.
SPENCER: The topic I want to ask you about before we wrap up is IQ. People are really split on IQ. Some people see it as a very important metric to understand things in society. Other people see it as either meaningless, a kind of pseudoscientific measure, or just that whatever it's measuring is something very uninteresting. Do you think it's measuring something that's important?
CREMIEUX: Absolutely. It definitely measures something quite important. It measures a very general aptitude that is definitely not tied to specific test content. It is independent of the tests. You can infer it from a lot of things. There's a paper that used data like personality measures and psychopathology measures to train a machine learning model, and on the test set they got greater than 0.8 for IQs. That's pretty profound, that you can find that sort of thing without a lot of overfitting. It definitely measures something profound and important. It measures something that has implications for everyday life and impacts the structure of society. Society is heavily cognitively stratified, and you can think of this as unfortunate or fortunate. To some extent, it means that meritocracy is real. For example, when you see that within a family, the smarter sibling is more likely to move up than the less intelligent sibling, or the less intelligent sibling is more likely to move down, that is pretty good. That tells you that society is at least somewhat fair. It's rewarding people who have high levels of a trait that can allow them to go out and do things. If that weren't the case, I think we'd be a little worse off. Maybe that would mean that either society is unfair or it would mean that IQ doesn't really measure very much and it's not important. Because it does predict things within families, we know that it is important and we know that it does suggest that society is at least a bit fair. Over time, it seems to have become increasingly fair too. The heritability of many traits has increased as people's familial background has become less important for those traits. For example, educational attainment has increased in heritability in a lot of cohorts. It has increased in general over time, with the acceleration of the increase towards when education really started to become much more popular. That suggests that where you come from now matters a lot less than what you have to offer.
SPENCER: It seems to me that IQ is predictive of a lot of things, but there are also a lot of other traits that are predictive, such as conscientiousness. For example, if someone is organized and they work hard towards their goals, they're much more likely to achieve them. Do you view IQ as just one metric among a bunch of metrics, all of which help predict someone's life path?
CREMIEUX: Certainly, but I consider it to be one of the most limiting metrics. Unfortunately, somebody who is not cognitively capable, let's say they're on the lower end of capability, with an 85 IQ in a population with a mean of 100, they're not going to excel in a field. They're not going to push the limits on scientific research in a complicated domain, even if they put in a lot of work. It's an unfortunate fact that you really do need to be cognitively capable to excel in many ways. You can get really far with a high IQ. I have a guest post incoming that should be published shortly, depending on how quickly the person writes, which it's taken them a while so far, that looks into occupations where people have relatively low IQs and high compensation, or where they have relatively high IQs and low compensation. Some people want more money, so they pursue higher incomes at all levels of IQ. Some people don't care as much about money; they pursue less. People have different goals, and unfortunately, IQ seems to be a limiter on where those goals can be. You can do anything if you have a high IQ, but you can't do everything if you have a very low IQ. But I don't believe in hard cutoffs either.
SPENCER: So what you're saying is that for certain kinds of tasks, having a lower IQ makes it less likely you'll be able to be high achieving in that task.
CREMIEUX: Right. Yeah, unfortunate, but it gives us all the more impetus to boost society's IQ so everybody can do more of the things they really want to do.
SPENCER: And IQ has been rising over the decades,
CREMIEUX: This is contentious. The IQ scores have been rising, not totally consistently, not totally reliably. Sometimes, non-linearly. And the gains don't seem to be due to actually rising intelligence. They seem to be due to people getting better at tests. As education has become much more universal, more and more people are familiar with test content. For example, a lot of old IQ test questions are now games that kids play. As that sort of thing has come about, the IQ test meanings have drifted, and the Flynn effect is primarily about the drift in the meaning of IQ tests, rather than the increase in the intelligence of the population. It doesn't actually represent an increase in capability. What does represent an increase in capability is, well, not capability per se. So what I'm going to say here is that in the past, cognitive stratification was greater because it was education-gated, and as it turns out, with increasing levels of education, we've learned that society doesn't have to be so occupationally cognitively stratified. Doctors today are less intelligent than doctors in the past, but they're still able to do their jobs. They might not do them as well on average, but we still have more doctors. We are still able to produce doctors who can do what is generally required of them, even if there's been this dilution of talent within the area of being a doctor.
SPENCER: It seems to me a very important factor is that virtually anyone can improve at virtually anything with proper training. If you have someone with an incredibly high IQ who plays chess against someone with, let's say, an IQ of 100, but the person with IQ 100 has spent a thousand hours playing chess, and the person with the really high IQ is just starting, obviously, the person with the higher IQ is going to get obliterated. There's just no question, because you can get better no matter what you do. To me, this is a really wonderful fact about life that the human brain can improve at just about anything through practice, especially through good training methods. On the other hand, all else equal, if you just had to predict who's going to do better after a thousand hours of practice, the person with 100 IQ or the person with 140 IQ, all else equal, you would probably have to predict the 140 IQ person. The 140 IQ person is going to be better after equal amounts of practice. In that sense, IQ is something like that, it's not that it's preventing people from getting better. It's not that it's a guarantee of how good you're going to be, but maybe it's something about how quickly you can learn or how high you can rise. How would you think about that?
CREMIEUX: It's definitely related to those things. One of the wonderful examples that I give people to understand training and IQ and its relationship to skill is reaction time measures. When you start people off, you give them their first round of a reaction time test, their reactions are relatively slow compared to where they're going to end up, but they also asymptote after a certain level. After a bunch of training, they don't tend to get much faster. They've hit a limit. The limit for individuals with lower IQs is generally a considerably slower reaction time, a greater lag. But if you were to compare somebody who has relatively low IQ, who's had 20 rounds of training, versus somebody who has had no rounds of training, who is relatively high IQ, you'll see that the high IQ person will start off doing worse, but they will eventually get better.
SPENCER: Another aspect of this is that we seem to each have some unique capacities that are sort of independent of our IQ. We tend to be better at certain things than you'd expect from our IQ, and worse at other things. Just to give an example, I'm absolutely terrible at anything involving unscrambling words. If you gave me an IQ test that was all just word scramble problems, I would test way below what I would test if you gave me an IQ test based on other factors, let's say doing math or something like that. It seems to me there's a piece of aptitude that IQ is not capturing. It may be very idiosyncratic; it may depend very much on specific tasks, but there may be someone, for example, who learns much faster than you'd expect from their IQ because of some natural predisposition in their mind to being good at that sort of task. What do you think about that?
CREMIEUX: Yeah, absolutely. I think a lot of this actually has to do with interests. People are invested in different things in their daily life; some people will read more books, and so they're better at vocabulary tasks than you would expect from their more general IQ. They do better at certain things because they've been interested in different things in the past, and it's given them different experiences that build their specific abilities and skills rather than their general ability. I had the opportunity to once participate in a study of this at a conference. They had a table, and they had several other tables where people were taking tests and doing little problems, and the idea was to get better at arithmetic. Before people started, they did a little vocabulary, a little verbal tasks, and they also asked how good they were at arithmetic and how much they felt they could improve, among other questions. It turned out that people didn't really think they could improve very much. They thought they were pretty bad at arithmetic, but what determined their ability to solve arithmetic problems very quickly after a few days of the conference was more about their initial IQ level than anything else, than their experiences, or even their initial arithmetic scores. It was very enlightening to see that.
SPENCER: Final question for you, do you think there's any value in people knowing their own IQ scores?
CREMIEUX: No, not really. I consider the obsession with IQ scores, like knowing your own IQ, knowing other people's IQ, to be really boring. I often think of it as a sign that somebody is not really serious about understanding the subject. It's unfortunate because some interest should indicate that they're more serious, but it generally isn't. It's a self-aggrandizing thing. I saw this post recently where somebody was talking about people with 120 IQs, they're all social and outgoing, and people with 135 IQs are crazy creeps, and they're all sorts of weird. For one, how would you know this? You're probably not a 135 IQ; that's quite rare. And there's just no data to support these inferences. But people have this sort of a quasi-IQ-science view on the subject. It's a distraction. It makes the rest of any research to do with IQ seem bad because it seems related to this nonsense. It's not generally just telling people to know you don't get to know.
SPENCER: Cremieux, thanks so much for coming on.
CREMIEUX: Thank you for having me.
[outro]
JOSH: A listener asks, "Do you think there are situations where forming an opinion about something becomes a moral obligation? In other words, when not having an opinion is harmful?"
SPENCER: Absolutely. I think if you have lots of evidence that something is really, really bad, and you refuse to form an opinion; and if you had an opinion, it would actually get you to take action, like maybe you take action against that thing or reject that thing or keep that thing away; then, yeah, I think it can become a moral issue. For example, let's say you have someone in your life who's constantly causing all kinds of harm to others. You know, maybe an extreme case, maybe they are sexually assaulting other people, and you refuse to form an opinion on this. Well, that can be just a cop-out for you to continue feeling good about this person, not jeopardizing your relationship with them, not causing yourself anxiety while maintaining that relationship and potentially kind of enabling this person, especially if you're kind of inviting them to events where they might commit sexual assault. So I do think that sometimes our epistemics actually intersect with morality.
Staff
Music
Affiliates
Click here to return to the list of all episodes.
Sign up to receive one helpful idea and one brand-new podcast episode each week!
Subscribe via RSS or through one of these platforms:
Apple Podcasts Spotify TuneIn Amazon Podurama Podcast Addict YouTube RSS
We'd love to hear from you! To give us your feedback on the podcast, or to tell us about how the ideas from the podcast have impacted you, send us an email at:
Or connect with us on social media: