Enjoying the episode? Want to listen later? Subscribe on any of these apps or stores to be notified when we release new episodes:
December 12, 2024
What have we learned about UBI from recent, large-scale studies? What factors contribute to differential attrition in (especially long-term) studies? How much does it cost to run large UBI studies? Where else in the world have major UBI studies been run? What's the difference between "guaranteed income" and UBI? How do people in cash transfer studies tend to spend their money? Should restrictions be placed on what people can spend their study money on? How long does it take to see various effects of UBI or guaranteed income on a large scale? How does guaranteed income affect the nature of work in recipients' lives? How does guaranteed income affect a person's net worth in the long run? What are the effects on well-being? How does topical knowledge affect prediction accuracy in a given area? How good are subject-matter experts at making predictions about the outcome or utility of a study? How can such predictions in aggregate be used to shape future research? To what extent should reseachers express uncertainty when making proposals to policy-makers? How much of an effect does the publishing of academic papers have on the world? What kind of person should try to build a career in academia? How can non-experts assess the rigor and significance of academic papers?
Eva Vivalt is an Assistant Professor in the Department of Economics at the University of Toronto. Dr. Vivalt's main research interests are in investigating stumbling blocks to evidence-based policy decisions, including methodological issues, how evidence is interpreted, and the use of forecasting. Dr. Vivalt is also a principal investigator on three guaranteed income RCTs and a co-founder of the Social Science Prediction Platform, a platform to coordinate the collection of forecasts of research results. Find out more about her on her website, evavivalt.com.
Further reading:
JOSH: Hello and welcome to Clearer Thinking with Spencer Greenberg, the podcast about ideas that matter. I'm Josh Castle, the producer of the podcast, and I'm so glad you joined us today. In this episode, Spencer speaks with Eva Vivalt about research on universal basic income (UBI), improving data collection on UBI studies, and Eva's work at the Global Priorities Institute.
SPENCER: Eva, welcome.
EVA: Thank you so much for having me.
SPENCER: Many of us have heard a lot about UBI, universal basic income, the idea that if you give people some money every month, that may be a way to improve people's lives, especially as technological progress causes more inequality, or AI makes it harder for people to have jobs. We might be able to solve that by giving people UBI. Now you've done one of the most amazing studies ever conducted on the topic. So do you want to tell us a little bit about that study?
EVA: Sure, thanks. So we conducted a randomized controlled trial of a cash transfer program that provided $1000 a month unconditionally for three years to 2,000 people in the treatment group, with another 2,000 people receiving $50 a month serving as the control. We are considering the impacts of income on a wide range of outcomes through a combination of administrative records, enumerated in person or over the phone surveys, online surveys, and even data collected through a custom mobile phone app. This was part of a collaboration with Elizabeth Rhodes, Alex Bartik, David Broockman, and Sarah Miller, and together, we have a bunch of papers, each studying different outcomes. My main focus so far out of the papers we've released has been on the employment outcomes.
SPENCER: Great. Can you tell us a little bit about why you chose that design? So you have some people that are randomized to get this larger amount of money, some randomized to get a very small amount of money.
EVA: Yeah. So we wanted to look at the effect of the transfers, the large transfers, thousand dollars a month. But it was actually really important for us to give the control group some money as well, and this is because we want to minimize differential attrition. Now, differential attrition, that's where, suppose your treatment group is getting a large benefit and they want to continue participating. And your control group, say they're getting nothing, they would just say, "Oh, well, see you. I'm never responding to your surveys. I never want to see you again." What we did instead was we set things up such that, as people were being recruited to the study, they were actually recruited to a program in which they would receive $50 a month or more, and nobody knew that there was some potential they could receive thousand dollars a month. Everybody, as they were getting enrolled, started receiving $50 a month. Then, after we did randomization, a few months later, the treatment group was informed, "Hey, by the way, you're going to get $1,000 a month from here on out for the next three years." The control group was similarly told, "Okay, for three more years, you'll get $50 a month." But the control group never realized that they were the control group. They were receiving something, and we did get amazing response rates, and people continued to be really involved in this. We had a 97% response rate at midline and a 96% at endline. Considering that endline was three years after the study began, that was actually quite a high rate; we're quite happy about that.
SPENCER: It's a phenomenal rate. I think people don't really get how important this issue of attrition is. It's very common in a long-term study to lose 20%, 30%, maybe even 50% of participants. If it's completely random who drops out, that may not be such a big deal. But if the people that drop out are not random, for example, if those getting the treatment are less likely to drop out, but those not getting the treatment may have a different group of them drop out, then you could really distort your results.
EVA: Absolutely. We did a few things to try to address this. We thought that people, if they were to treat, would attrit early on, and we had this really long baseline period so that at the time we did randomization, we could make sure we were balanced on the attrition up to that point. We could also pre-specify that if we later have some attrition, we can restrict attention to the group of people who responded regularly to the early surveys before randomization. There is a group of people who are responding to every survey regardless, and they've been doing this for a few months before the treatment assignment. We can be confident that those people are more likely to continue to respond later on as well. Since this was based on pre-randomization, pre-treatment characteristics, it wouldn't bias the estimates in any way. We did a lot to try to address this and add in robustness checks.
SPENCER: This might be saying the obvious to many people, but I think it's so important that it's worth talking about, which is the power of randomization. It's so easy to run a survey where you ask people about how much money they make, and you can find correlates of making more money correlated with happiness or life satisfaction, or all these different things. Many such studies are done, but it's a much harder study to run where you actually randomize people and give them money. It also tells you so much more because now you can actually look at the causal effect. Not only is this correlated with the outcome, but does the money actually cause a different outcome, which is almost impossible to get at in most study designs.
EVA: Absolutely. That was really critical to us, and also having a large enough sample size. You'll often see very small pilots, and we really wanted to be well powered for the outcomes we were considering.
SPENCER: You can imagine why people don't run these studies because they're so expensive. What's the total cost of this study?
EVA: Yeah, it was really large. The project overall cost over $60 million, including the transfers, incentive payments, control group payments, as well as the research itself.
SPENCER: I also just want to get into where this was conducted and how big a deal this amount of money was. Because you might say, "Okay, thousand dollars a month, that's a decent amount. But is that actually enough to make a difference in people's lives?"
EVA: Absolutely, great question. So this was in two areas in the US: in Illinois and Texas. Basically think of a two-hour radius around Chicago and Dallas. These areas were selected in part because if it's around a major city, then it's easy for enumerators to reach you. We also had a mix of urban, suburban, and rural areas, and the people who were targeted were relatively low income. This was based on different bins of the federal poverty level. The federal poverty level is a guideline that depends on how many people you have in your household. The thresholds depend a little bit on your household size. But roughly, you can think of this as the average household earning about a little less than $30,000 per year in the year before the study began in 2019. So yeah, these are relatively low-income people in those areas.
SPENCER: So if they're earning $30,000, this is actually a pretty substantial increase in their income; it's like a 40% increase.
EVA: Absolutely, yep, 40%.
SPENCER: So hopefully that's a big enough increase in percentage income to really make a difference in people's lives.
EVA: Yeah, hopefully. I think there are some cases where it is going to be hard even with a thousand dollars a month. Suppose you are a single parent of a severely disabled child. There are some really tough life circumstances where even a thousand dollars a month may not go that far. But yes, the hope was that this would be enough to make a real difference in people's lives.
SPENCER: Now, most of the studies in this area that I've ever heard of, maybe all of them, have been conducted in low-income areas, much lower income areas. In rural parts of Africa, for example. How do you think about your study in relation to those?
EVA: Yeah, so the US is a very different context, and a lot of people are more familiar with cash transfers in these other low or middle-income countries. There was actually a really nice meta-analysis of these other settings where they found really positive effects. In the US, you might expect the impacts to not be as large. First, it's really easy to improve things from a low base. For example, if really few people are graduating from high school, it would be easier to increase high school graduation rates than if almost everybody is already graduating from high school at baseline. There can also be some differences. I would make an analogy to microfinance programs. In microfinance programs, you lend people a little bit of money, and there was a while that everyone was really excited about these programs, but then they were found to not have these great effects that people initially thought they would. The reason was that you needed to have really specific conditions in order for them to be helpful. You needed things to be screwed up enough that people couldn't get access to finance, but not so screwed up that they couldn't do productive things with the money if they were to get it. I think it's at least possible that in lower-income contexts, a bit of money can go a lot farther, and people are really capital constrained. In our context, some people did have really deep problems, and it isn't clear that, depending on their circumstances, even a thousand dollars a month will necessarily completely change that. But again, that's just my intuition from some of the qualitative findings. There are really a lot of different kinds of problems people face. Maybe if you're relatively low income in the US, you face different kinds of issues than if you're low income in a lower-middle-income country.
SPENCER: Do you think that your study more directly tells us about these kinds of UBI programs that some people have proposed?
EVA: Yeah. So it's not UBI; it is a guaranteed income. I think that's actually good for the sake of policy relevance because it's probably more likely that something that looks more like a guaranteed income, like we did, would be an actual policy proposal.
SPENCER: Could you explain the distinction you're drawing there?
EVA: Yeah, so for a universal basic income — well, first of all, it has to be universal so everybody has to get it — and ours was not universal that means we're not going to be able to capture any kind of general equilibrium effects, like inflation, that you might expect with a larger, more comprehensive study.
SPENCER: If everyone got more income, that could cause more spending, for example, it should have societal effects?
EVA: Absolutely. So we're not going to capture that dimension. Also, when people talk about universal basic income, they are typically thinking of transfers that last forever. In our context, our transfers last for three years, which is actually a really long time to receive the transfers. But you could argue that, since it's not forever, maybe people would take different actions than they would if it lasted longer. For example, since they know the transfers are going to end in three years, maybe they start to plan, "Okay, what's something I could do in the next few years to set myself up well for finding a good job when the transfers end?" That's not going to be the same kind of consideration if it's a forever transfer.
SPENCER: It models the early phase of UBI, where they don't know for sure they're going to keep getting it in the next few years if the president changes, or that kind of thing.
EVA: That's true. That's possible. There's definitely a lot of policy changes with different administrations.
SPENCER: All right, so let's get into some of the results. First, let's talk about what did people actually do with the money?
EVA: So there are two ways of thinking about how people are spending their money. You can think people first make a decision about how much to spend today versus how much to spend in the future, and then out of how much they're spending today, what to spend it on. This is implicitly getting at the idea that you can save your money for the future, invest it in various ways for the future, or spend it today. If you're spending it today, you're spending it on things like consumption, such as food and housing, those big categories. You can also spend it, in a manner of speaking, on leisure, meaning you could work a little bit less, so you're effectively spending that money on leisure. We do observe people spending a lot of money on leisure and consumption, with very limited savings. This is maybe not super surprising, because savings in the US are not very high. We saw people spending a lot on consumption and a lot on leisure, essentially from the reduction of work hours that we observed.
SPENCER: Okay, consumption and leisure. Could you tell us a little bit more about what that means? Is consumption things like food, or what are we talking about there?
EVA: So actually, pretty much all the main categories that people typically spend money on, such as housing, food, and some other smaller categories. One thing that people are often concerned about is vice goods, such as spending on the lottery, smoking, or drinking. We didn't actually see any real increases there. Maybe something like $10 a month more, so maybe people were getting a beer a month more. But that's not really very much, and we didn't see any indication of problematic drinking or negative effects that some people might have expected. Broadly speaking, it increased whatever you were currently buying in a fairly proportionate amount overall.
SPENCER: Now some people might say, "Okay, that seems like a good thing. Maybe people are going to live happier lives if they spend more on leisure." But other people might say, "Well, that's kind of disappointing. Wouldn't we want people to use the money in a way that's going to benefit them in the future, not just consume it, but make their life better in a more sustained way?"
EVA: Yeah, and I think there is a policy question here too. Because a policymaker might genuinely care about how people are using their time. There's a sense in which you give somebody money, and how they choose to spend that money is an indication of what they truly value and what they get the most utility from. You can think, "Who would we be to be particularistic and say how they should spend it? How they spend it is actually what they really want the most." On the other hand, the policymaker could care how they spend their money and how they spend their time because there are some uses of time that might be more productive from a societal standpoint. Just going back to school or taking care of kids or elder care. So, leisure is actually really important.
SPENCER: Another thing you often hear about when it comes to giving people money is that they might use it to be entrepreneurial. So what did you find in that regard?
EVA: While we did see leisure increase overall, we saw some signs that were more productive. One of them was with regards to entrepreneurship. Now, for entrepreneurship, we had three different kinds of outcomes we were considering. We had entrepreneurial orientation, "How willing are you to take financial risks?" We had entrepreneurial intention, "Do you plan to start a business?" And then entrepreneurial activity, "Have you actually started a business?" We saw really large increases in willingness to take financial risks, entrepreneurial orientation, and entrepreneurial intention, but we didn't see any significant impacts on entrepreneurial activity. It's possible that this is just not long enough to see those effects because they were seemingly trending upwards a bit over time. By year three, it was kind of almost there. Overall, I would say that this was an area that was kind of promising. At the same time, you can't expect all that many people to become entrepreneurs. It's actually very hard to go and become an entrepreneur, and not that many people maybe want to do that.
SPENCER: So would you say we kind of don't know what the effects on entrepreneurship would ultimately be? We need to have an even longer study.
EVA: I think we can continue to follow up these people for longer, and when we plan to do that, if you're going to start a business, then it's maybe going to take you a little bit of time to actually do that. So we're going to keep on following these people and see what happens.
SPENCER: In a few years, we'll see basically, does it turn into actual entrepreneurship, or was it really intention?
EVA: Yes.
SPENCER: But it is interesting, people seem to have more financial safety, if they're willing to take more financial risks.
EVA: Absolutely. Maybe I should say that the kinds of small businesses people were thinking of are not necessarily the next big app or something like that. It was more like people were buying a vending machine for their apartment block or a machine for screen printing T-shirts or something like that, yeah.
SPENCER: And kind of small businesses?
EVA: Yeah, but it could still help with job flexibility, etc.
SPENCER: Another interesting question is: do people work less when they have more money? Because you might expect, when you have more money, that there's less need to work or money from your job is a little bit less valuable to you.
EVA: Absolutely. So we did see moderate effects on labor supply, both on the intensive and extensive margin. So that means that people were both working a little bit less overall, there are fewer people working, as well as if you were working, you're reducing your hours a little bit. Now, not everybody works a job that allows them to reduce their hours, but still on net, overall, the average is that the hours worked per week also declined, and also for other people in the household too, I should say, which was somewhat unexpected.
SPENCER: So you were giving the money to the individual, not the household.
EVA: Yes, the transfers were going to the individual, but we saw large spillover effects on the rest of the household too.
SPENCER: Yeah. I guess if you have shared finances, you might think, "Okay, well maybe it doesn't matter that much whether it goes to the individual or the household." So it sounds like that's evidence in favor of that idea. Do you view it as a bad sign that people work less? Because that's always one of the critiques of a program that gives people money, "If you give them money and then they work less, you're kind of undercutting your own goals in a certain sense."
EVA: I would say that how you observe people spending their time is indicative of what they value. It's not necessarily about, well, imagine that AI does really take off, and a lot of people can't find work. Maybe that's fine if there are other things that they actually want to do instead. Also, we did see that towards the end of the program, there were maybe some suggestive signs that, as we were approaching the very end, people were recognizing the transfers were going to end, and they may be starting to find work a little bit more again. So they were dynamic in a kind of rational way about it.
SPENCER: What about better jobs? Because you might think that with more money, people could be a little more flexible. Maybe they could take longer to find a job they like or quit a job they hate. So, they can then go find a new job that's better for them.
EVA: Absolutely. So this was one of the hypotheses we were really interested in. As you say, "You might think that if people have more of a cushion, they can search longer for work, and they can find better fitting jobs." On the other hand, there's actually a debate in the literature right now, because you could think that the longer people search, the more their skills degrade, and the harder time they might have finding a good job. To really know what happens to the quality of employment, you need really detailed data, and our study is uniquely positioned to look at that. What we found was really no effect overall on the quality of employment. We had so many different dimensions that we considered; we went really in depth, looking at not just the hourly wage but the adequacy of the employment. A lot of low-income jobs actually don't give you enough hours; people want to work more hours. We looked at the adequacy of employment. We looked at things like training and benefits that were available. We looked at opportunities for advancement and quality of work life on a day-to-day basis. "Does your manager treat you fairly?" All sorts of different kinds of things, dozens of different questions, and there was just nothing flat across the board.
SPENCER: What about total net worth? You're giving people extra money; you might think that at the end of three years, their net worth would be higher than it would have been otherwise. What did you find there?
EVA: We actually found a decrease in net worth, although it's worth mentioning that this was very imprecisely estimated. It's still possible that maybe people saved a few thousand dollars over the course of the study, but the point estimate was like negative one thousand or two thousand dollars, so it wasn't increasing earnings. Actually, earnings fell net of the transfers. It also didn't increase the hourly wage. We can really precisely reject any real changes in the hourly wage. So it wasn't increasing earnings, either by the quality of employment increasing, and overall people were working less. So income fell.
SPENCER: I see the hourly wage did not go up, but people worked a bit less. So overall, they earned a little bit less money, basically.
EVA: Yeah.
SPENCER: Got it. And what about debt? Were people taking on more debt?
EVA: Yeah, so we were looking at debt in the Experian Credit report data, and it did seem that people were taking on a little bit more debt, especially educational debt and car loans. So we'll see what happens now that the transfers have ended, whether they're still able to sustain their payments or what happens at this point. I don't have a good answer on that yet, a final answer on the overall effects. Will people be able to service it? But we're definitely following up to look into that.
SPENCER: I see, but the net worth change was close to zero. That means that the additional debt they took on was somehow compensated for by an asset for the most part?
EVA: Yeah.
SPENCER: So they're not in more debt but don't have more assets to compensate. So that's at least good.
[promo]
SPENCER: Another reason you might think it could be good to give people money, especially young people, is that it might allow them to get better educations. Did you find something in that regard?
EVA: So we did see some tentative signs that for younger people, education did increase now that more people went back to school and enrolled in a post-secondary degree. But this is a little bit tentative; it's suggestive because this is the result of a subgroup analysis, this is focusing on the younger individuals. Oftentimes, for subgroup analyses, you say, "Of course, if you split things down finely, you'll find some little set of the overall sample that responds positively." But for these people, for education in particular, we actually pre-specified. We wrote it in our pre-analysis plan prior to doing the experiment that we would look at education specifically amongst younger people as a group of particular interest. So we're not specification searching or anything here. This is a group we identified before we had any data because we think that education changes more for young people. It doesn't make sense if you're fairly old to actually go back to school because the number of years left in your working life that you could recoup the expenditures, recoup the costs through increased earnings is, you have fewer years left. So you do think that younger people, those are the people who could benefit from education. We do see them going back to school a little bit more.
SPENCER: Yeah, regarding subgroup analysis with a giant study like this, $60 million, you of course want to squeeze every last drop of value you can, and you collected a huge range of outcomes, which makes perfect sense, but it does raise the question, "How do you avoid false positives?" While you have a reasonably large sample size with a thousand people in the intervention group and 2,000 in the control group, that's pretty large, but still, if you do enough analyses, you're likely to get some false positives. So how do you think about that?
EVA: So that's definitely a concern that there are multiple hypotheses we're testing. We need to do something about that. There are two things that we did. So first, we grouped the outcomes into indices. Think of like a tree where you've got an index, a family at the top, like, say, quality of employment, that's one family of outcomes, and then within that family of outcomes, you have different components. There are many dimensions of quality of employment, like I mentioned earlier, adequacy of employment or your hourly wage. So you've got all these different components. Within each component, you might have a number of items that are trying to estimate that thing. For example, for adequacy of employment, "Are you able to get enough hours in your main job?" Questions like that. We have this kind of tree-like structure. In addition to reporting the raw units of each of these items, we also report these indices in standard deviations, where we combine all the items into components and the components into the families, and then we do these false discovery rate adjustments, and that prioritizes the things that we really care about the most. The index level values are adjusted with the fewest other values. As you go down, you adjust the component level estimates according to how many components there are, as well as the high-level family estimate, and then when you go down to the item level, you adjust by the number of items, the number of components, and the family. It's kind of like a tree, and it just prioritizes the ones that we care about the most.
SPENCER: That's really interesting. So essentially, you're using the statistics in such a way that you're less likely to get a false positive for the things you care about most. As you go down to less and less important hypotheses, you get more false positives, but you also care less about those false positives. Is that right?
EVA: Yeah, that explains things pretty well. For example, we want to do some heterogeneity tests. Those are even sort of lower than the item level. Those are just subgroup analyses. So those are really deprioritized. You might have some false positives, but you don't want to not do subgroup analyses. You still want to know what happened there, at least. You want to run those tests, but you don't want running those tests to mess up all your other estimates.
SPENCER: I think when people learn about the replication crisis and become really concerned about it, they sometimes come to the view that, "Oh you should collect fewer outcomes." Because then you're going to avoid all these issues of false positives and adjusting or hypothesizing after the fact and so on. I think that's an understandable lesson, but it's completely the wrong lesson. The more outcomes you collect, the more you learn about the phenomenon; you just have to be very careful about how you use the outcomes to not come to false conclusions. It sounds like you put a lot of effort into making sure to do that. Another reason you might want to give people money is that you think it will make them happier, make their lives better. What kind of data did you collect on how happy people are, and what were the findings there?
EVA: We did care a lot about well-being, and some of these measures we're still writing up, and we'll have a paper out soon. One thing I wanted to flag was the overall pattern that we see. You can actually already see the overall pattern just by looking at the mental health measures. Essentially, in year one, everybody is doing great, everybody is pretty happy, and their mental health seems to be improving. Then in years two and three, it goes back down to how it was before, basically, and this would be really consistent with the literature on hedonic adaptation, that people sort of adapt to whatever their circumstances are. In our case, it's also possible that people, especially towards the end of the study, were maybe getting a little bit stressed out. The transfers are going to end soon. Overall, we saw this big effect in year one, not so much anything after that. It also points to the importance of doing a longer study because if we had just looked at year one, we would have concluded something very different.
SPENCER: That's really fascinating because there are those classic studies where they looked at lotto winners versus people that didn't win the lotto, and they found that they seemed a lot happier at first, but then their happiness fell down. As you mentioned, this whole idea of a hedonic treadmill emerged, that maybe these things just temporarily change your habits. It does sound like your evidence supports that idea.
EVA: Yeah, very much in line with that.
SPENCER: Did you find the same thing with mental health?
EVA: Yeah. This was very similar. So the same kind of patterns emerge across multiple measures here.
SPENCER: One thing people often complain about with the US is that healthcare can be super expensive, and so a very natural question is whether people's health improves when you give them more money.
EVA: As you say, this is an area where the US spends a lot of money, I think of Medicaid. Improvements in health could help to offset the costs of such a program. We didn't actually see major changes in health overall. I had mentioned we did see sort of very temporary changes in mental health and improvements in stress. But overall, we didn't find that those sort of go back to nothing, and most of the health results are null results. Now, we did see maybe there were some kinds of care that increased. In particular, people went to the dentist more. But they also, surprisingly, had a bit more stress around the healthcare expenditures. Our interpretation of this is that maybe you are getting a little bit more healthcare, but actually you're getting diagnosed with more things as well. It looks not necessarily very positive, but because, say, you hadn't gone to the dentist for a long time, and you went to the dentist and they said, "Hey, you have a bunch of cavities," and now you're like, "Oh no, I have to treat them." So, it was a bit of a mixed bag.
SPENCER: Thinking about the study overall, what were you most surprised by in the results?
EVA: It's a good question. I was pretty surprised that the household effects were as strong as they were. We had pretty much a one-to-one decrease in your work hours and your partner's work hours. We might have thought, based on the literature, that your partner's work hours might adjust somewhat, but probably not by quite as much as your own. That was kind of interesting. One thing we did was collect forecasts ex ante from experts as to what they thought we would find. We were like, "We were doing this really big study. It's kind of once in a lifetime. We wanted to get as much out of it as we could. We wanted to know what was surprising." And you can only do that by asking people beforehand what they think is going to happen. Who did we ask? We asked people on the social science prediction platform, which is this forecasting platform that I set up with Stefano DellaVigna that enables researchers to collect forecasts of what their studies will find. We also asked NBER affiliates. This is the National Bureau of Economic Research, and these are senior economists, mostly at a variety of top schools. We asked them what they thought, and overall, they weren't too far off on the effects on labor supply. They thought we would find somewhat smaller effects than we found, but it's within the confidence intervals. We can't reject it. But then for these other outcomes, they were pretty wildly off. For example, they thought there would be some positive increases in the hourly wage. We saw imprecisely estimated but slightly negative impacts on the hourly wage at end line. We asked all of our questions about what people thought would happen at the end line. People thought participants would search for work less. Actually, they searched for work more, maybe because more of them were not employed. They thought people would be more likely to enroll in a post-secondary program at end line. At end line, we saw, across the whole sample, precisely zero effect. It was quite different. They also were quite different for health and all the other topics that we asked about.
SPENCER: I want to come back to forecasting more in a moment. But before I do, with regard to the question of UBI in society, "Is this a good thing? Is this going to help solve societal problems in a world where it's harder to get employment because of technology and things like that?" What do you think the major takeaways are from your study? What should we learn from your study that will inform the topic of UBI?
EVA: I think it's not going to solve all the problems that people thought. It's not saying don't do it, but just do it for the right reasons. If you're going to do it, maybe it does improve people's financial situation temporarily and gives them a little bit more flexibility in terms of how they spend their time. If you want to do it for those reasons, great. But also, people had been arguing, "Oh that it's going to improve the quality of employment." "No, it's probably not going to do that." It's still important to know what all these other effects on productivity are. One thing to note is that different people use the transfers in very different ways. There is quite a lot of heterogeneity here. The cash transfers could potentially have a larger impact for the individual things that specific people cared about that we can't actually see when doing the aggregate. Suppose some people use the money to get a car, some people use the money to go back to school. Overall, we don't see that many people doing either of these two things. But if we ask people ex ante — and this is what we're doing in two other studies we're conducting right now in Chicago and Cook County — we asked people ex ante, "What kinds of things do they value? What their top priorities are." We can say, "Look, you said that these are the things that are most important to you. We can combine the estimates that we get later to see if the things that are most important to you are the things that move." And if things move more when we look at those things that you said mattered. It gives you more flexibility and more ability to make decisions. But the effects are going to be driven by the different things people care about, and effects on any one particular thing, like if you're trying to improve healthcare or education, it's probably going to be weaker than a more targeted transfer or targeted kind of intervention.
SPENCER: One reason that your research is important is because we want to see if the effects of these kinds of income transfers are similar or different to what you get in much lower income areas, like in rural Africa. When you compare what you found to these other studies, did it tend to be similar, or were there significant differences?
EVA: No. I think I would say that it's much easier to get positive effects. You see much more positive effects looking at these lower-income contexts than you do in the US. The money also goes a lot farther, of course, in lower-income countries. So yeah, it is certainly somewhat disappointing compared to that.
SPENCER: So going back to the question of forecasting, you mentioned your platform where social scientists can make predictions about the results of different studies. What do you see as the value of that kind of research?
EVA: First, let me say a little bit more about what's special about the platform. We are really focused on collecting causal forecasts. So what's the causal effect of various things? That's a little bit different from some of the other platforms out there that are more focused on forecasting what's called state forecasts, or even conditional forecasts sometimes, but not quite causal forecasts. Let me explain the difference a little bit here. A state forecast is something that's more like, "Okay, what's the likelihood that Trump is going to be elected?" That's a question about a fact of the world that will resolve, and you'll see what state it's in. Then you can have conditional forecasts, like, "What's the likelihood that Trump is going to be elected if he goes on this podcast?" And then you have causal forecasts, like, "What's the causal effect of going on this podcast on Trump's chance of being elected?" You could think those are really different things. Your causal estimate is probably not going to have a huge effect causally. But if you were to come on this podcast, I would indicate some kind of state of the world in which things are a little bit different. So you have all these different kinds of forecasts. The social science prediction platform is really focused on causal forecasts, and we're hoping that can help to both improve the accuracy of forecasts over time, as people learn to better calibrate their own forecasts. If we have this database of forecasts of RCTs, maybe we can start to say something about situations in which we don't have RCTs, because we don't have all the evidence we would want. We never have exactly the RCT in the exact setting with all the details that we want. To the extent that that's a long-term dream, forecasts can be biased and have all sorts of problems, but if there's some kind of signal there, then hopefully, over time, we can learn to extract it a bit better, and it can at least support policy decisions, even if it's only one piece of the puzzle.
SPENCER: I imagine it probably varies a lot how accurate people are depending on the kind of causal forecast. But what can you say in general about the accuracy of experts at making these kinds of predictions?
EVA: Yeah, that's a good question. The jury is still out overall. There have been a couple of attempts that are still in the early stages. I don't think anything has been formally put out yet with regards to causal forecasts. It seems, overall — my own bet, at least, for how this will shake out — is that people tend to expect there to be more effects of things than actually occur in practice. Within the guaranteed income team, we would always joke amongst ourselves about being on Team Nothing Works. I think that does tend to be true. People tend to be optimistic.
SPENCER: Fascinating, yeah. Because I think a lot of times when you think about an intervention, you can see immediately a story as to why the intervention would work, but it's sort of harder to see why that wouldn't work. What's going to be the barrier to that?
EVA: Absolutely. There could be a lot of things that need to go right in order for it to work.
SPENCER: Right. There's this causal chain that you're kind of implying, that A has to happen, and that leads to B, and that leads to C, and that leads to D. But then it's like, "Well, you have no idea which of this is going to break down. Do I find out that for some reason the link between C and D breaks down, and then the thing just doesn't affect?" We put out a little program working with 80,000 Hours where people predict the results of charitable interventions, and it's really hard to predict. Until someone does a large government-controlled trial, you really are just guessing whether it's going to work. I also wonder about experts. You might think, "Well, they're experts in these topics, so they should be able to predict these things." But expertise in the subject matter is different from expertise in forecasting. You have, for example, the Good Judgment Project that has these expert forecasters who have been trained in forecasting and have practiced it a lot. It's maybe a unique skill that's distinct from subject matter expertise.
EVA: Absolutely. And so there's actually this paper by Stefano DellaVigna and Devin Pope looking at sort of who knows what. Okay, and so they look at whether people who at least are more senior than other people. So think of tenured faculty versus grad students, whether the tenured faculty are any better at providing forecasts than the grad students. And their answer is, "No. Actually, it doesn't seem to matter." So there's lots of different kinds of expertise, as you say, but at least that kind of expertise didn't seem to matter at all.
SPENCER: We ran a study where we had academic psychologists make predictions about the correlations between things. So let's say you have a statement like "I am organized," and you have another statement, like "I worry a lot." We had them predict the correlation between agreeing to those two statements. We had empirical data, so we knew what the answer was. We also tested laypeople, and we built our own AI called Personality Map to make these predictions. We trained it on over a million real correlations, and what we found is that academic psychologists were pretty good at making the correlation predictions. They very much beat lay people at making the predictions, but they got trounced by our AI Personality Map. I think it beat 99% of the experts at making these predictions, and it got me thinking about the fact that the experts were often able to say what direction the correlation would go — would it be positive, about zero, or negative — but they weren't necessarily very good at identifying the strength of the correlation. That's really why the AI did so much better than them.
EVA: That's interesting. Actually, that's pretty much in line with one study that I worked on as part of a large collaboration. We were looking at whether people could build and aggregate models of COVID mortality. We had this model challenge where we asked people to submit models of what they thought would predict COVID mortality, both across countries and within a few states. After we gathered these models, we then asked people to predict which of the models would perform well. Very similar to what you're saying, the AI — a very simple thing — actually did better than the vast majority of both models and of putting weight on things. First, just the lasso would do better than most people in building models. Second, in terms of aggregating and putting weight on how models might perform, unfortunately, people didn't do so well.
SPENCER: Yeah, this is kind of a famous finding that goes back. I think it was Paul Meehl's work, if I'm not mistaken, looking at simple forecasting algorithms versus humans. In our case, with Personality Map, we actually had to put a huge amount of work into getting the model to be really good and training it on a huge data set. But for many of these forecasting tournaments where humans would get pitted against machines, really simple algorithms — just the most simple, even just take 20 factors and you get a point for each factor, or do a simple linear regression — would beat humans. I think often it's because the algorithm can be more calibrated. You could train the algorithm so that at least when it says there's an 80% chance, then it really is an 80% chance. Humans really struggle with that a lot. Do you think that's part of what's going on here?
EVA: I think that is a good description of it.
SPENCER: So let's take these forecasting studies that you're running. Do you think that the accuracy is high enough for the human experts that we could actually use that to make decisions and say, "Oh, look, we're going to have people forecast before we decide what study to run or to help us inform the study prior to running it?"
EVA: I think that while the accuracy is off, it could still have uses in figuring out your study design. For example, suppose you are a Nudge Unit within a government, and you have 10 different interventions you could run. You might really want to know, "Okay, what do people think the effects of these things are?" so that you can try to focus on whichever intervention has the highest value of information. That could actually be really important because the people you're trying to convince at the end of the day, especially if you get their priors, could help you figure out what's going to be most informative to them. You really want to maximize the value of your study. Sometimes people say, "Well, hey, I don't have 10 interventions. I just have one intervention. I know the intervention I'm going to run. What's the use of this for me?" I would say, "Well, you probably have many outcomes that you could study, and maybe some of those outcomes have a higher value of information than others." And you don't necessarily know that before asking people what they think.
SPENCER: One thing that I suspect is true is that social scientists, whether they're ecologists or economists, are often good at telling what's going to happen when there's one dominant, clear force. For example, "Well, if there are two people who like each other, you know what's going to entail the fact that they like each other." I think psychologists do a pretty good job of predicting some things about that fact, what it entails? In economics, if the supply of a good has gone up, what do you think is going to happen? But I think in real-world situations, you often have forces going in opposite directions. Our theories often are very bad at telling us what the net effect will be or what will dominate when you have multiple forces going in opposite directions. What do you think about that?
EVA: That's entirely possible. It's worth noting that how much confidence you need in order to make your decision is going to depend a bit on the whole structure of the decision problem too. How informative this is, is also going to depend on some things you really want to be completely confident about, and then, "Okay, just do the RCT." But there may be other things where it's still better than nothing to try to get some kind of sense of this. For example, maybe you already have an RCT, so you can try to really calibrate people very well on the results of the RCT. Then you say, "Well, look, I want to know about this particular group that is just outside of where the RCT was done. What do you think is going to be the effects for them?" You might be able to stretch your RCT a little bit further, in a way. Again, that's a little bit speculative here, and I don't want to oversell forecasts by any means, but I do think it's important to study because, if nothing else, we are all going around with forecasts and priors in our heads all the time; it's just that forecasting makes them explicit.
[promo]
SPENCER: Changing topics a little bit. If you think about this giant study you're running on giving people unconditional cash, or maybe even this idea of having experts make predictions about what should be studied, at the end of the day, a lot of the theory of change, as I understand it, is that policymakers might take the information into account. I'm curious to know, what do we know about what policymakers actually do? Do they actually take evidence into account?
EVA: There has been a burgeoning area of research on this. I think a lot of people don't, but some people do. It has a non-zero value, and it has been shown in multiple studies, including some of my own work, that it does affect real-life decisions. If you show somebody the results of an impact evaluation and you ask them to make some kind of allocation of resources, they are going to take the results of that impact evaluation into account. Does everybody do that? No.
SPENCER: And what do you attribute that to? Is it just that some politicians are more evidence-minded? Is it more part of their worldview? Or is there some incentive at play as to why some of them take into account some evidence and others do not?
EVA: Yeah. So actually, what I meant by that was more that the studies I've seen that have focused on this have tended to look at samples where you might think people care more about evidence; they're kind of like the best-case scenario. For example, in some of my own work with Aidan Coville at the World Bank, we looked at bureaucrats from various lower-income and middle-income countries, various government officials in certain line agencies. These are people who came to these impact evaluation workshops where, over the course of a week, they would be matched with a researcher to help them design some kind of study relating to the program they had of interest. These are people who are particularly selected for caring about evidence; there are going to be other people who didn't even show up. Okay, kind of the best-case scenario. Similarly, in some other work that's out there, it tends to focus on audiences of people who are generally interested in impact evaluation or evidence of some kind.
SPENCER: I might imagine that, if you're thinking about politicians' incentives, they often have a clear incentive to get reelected, to be liked, or to be written about in the news. Sometimes evidence can help them do a better job at achieving their goals; they want a certain outcome, and then the evidence tells them how to get that outcome. Great. Other times, though, the evidence might actually go against their incentives, where they want to implement a certain policy because they think it will improve their likability or electability, but the evidence might suggest that it actually doesn't work. Should they really care about that from their own point of view if they're just optimizing for their own incentives? I'm wondering, do you think that sometimes they have a strategic reason to not want to look at the evidence, and that kind of undermines the use of evidence?
EVA: There could certainly be selective search or confirmation bias, as well as biases in interpreting evidence. I haven't worked on anything like that or seen something recently relating to that for policymakers specifically. But I have looked at a number of other behavioral biases that policymakers might have. On the one hand, you might think, "Okay, policymakers are human, and we know humans have behavioral biases. Policy makers probably also have behavioral biases." But they could also be special in some way. There's a literature on CEO biases, where they can be very overconfident, and so maybe policymakers are selected in a certain way that leads to them having certain biases. In some work with Coville, we looked at asymmetric optimism, which is kind of like a good news-bad news effect. For example, if I ask you, "What do you think the effect of a cash transfer program is on enrollment rates in a low-income country?" And you say, "I think it's going to improve enrollment rates by three percentage points," and I say, "Great, okay, well, here I randomized the evidence I show you," I say, "Either it's five or it's one." If I say it's five, you're like, "Great, it's five." If I say it's one, you're like, "Hmm, maybe it's two." You're actually gravitating towards the more positive results relative to your priors. That's a little bit different from confirmation bias because, in this example, we are looking at amounts that are equally above or below your prior beliefs. We're not looking at things that accord versus don't accord with your prior beliefs. Still, we do see some evidence of asymmetric optimism. We also look at variance neglect, which is basically whether people are paying attention to confidence intervals in the way that someone in Beijing would.
SPENCER: The asymmetric optimism question, why would people be asymmetric like that? Is there a theory?
EVA: Yeah, this actually stemmed from thinking about what we might observe. I used to work at the World Bank in Washington, DC, and had lots of interactions with a great variety of people, both on the operational side of the World Bank, as well as with low-income country government agencies. I remember a conversation with somebody who is a former MP who told me, "Look, policymakers are constantly having to sell their work, so if you have any good news, they're going to be really excited." I also remember when I was making graphs for reports, people would say, "Don't put confidence intervals on your graphs. People won't understand them. They'll just clutter it up." These are biases that stood out to me as potentially important. But that's not to say that you can't study other things as well, but there does seem to be this sense in which people are really excited when things go well.
SPENCER: One thing I wonder about with regard to epistemics. When you believe things, there are really multiple reasons you have to believe a thing. One is because it accords with reality. Having beliefs about reality helps you achieve your goals in all kinds of ways, or maybe you just really value believing the truth. But another reason to believe a thing is because it makes you feel good. People seem to form beliefs because it actually feels good to have that belief. A third reason is that there can be actual instrumental reasons to believe a thing because it helps you achieve your goals. For example, having a belief that accords with your friends and social group helps you bond with them, even though it might not be true, but it might help with the bonding. Do you think that's kind of what's going on here? They're getting some kind of benefit, either feeling good or some kind of social benefit from the belief, even if it's not in accordance with the truth?
EVA: Yeah. I could imagine people thinking, "Hey, this is great. This is a program maybe that I could use," and so definitely, there could be some kind of positive feeling towards it on the confidence.
SPENCER: Interval neglect. I've heard the same thing from an economist I know who used to work in the White House. He told me that many years ago, they would take the confidence intervals off of his charts before they showed them to the politicians, which annoyed him. The pushback he would get is they don't want to know, they don't want to see that. They just want to be told what the answer is. They don't want to deal with the nuance of the uncertainty of it all. It seems like there are multiple incentives going on that are very wacky in terms of people not wanting to deal with that nuance.
EVA: Absolutely. Regularly, you will not have confidence intervals on any kind of policy report. I had an RA go through the World Development reports. These are flagship publications that the World Bank puts out every year. They went through them from 2010 to 2016 and counted how many times some kind of evidence was accompanied by either a p-value, confidence intervals, standard errors, or anything that says something about the uncertainty inherent in the result. Across the time periods they were looking at, there were thousands of studies cited, but only eight times was there any information about precision.
SPENCER: That's terrible. One thing I think a lot of people don't realize is that almost any time you're quoted an average, there's an uncertainty in that estimate. It's a really healthy thing to build the habit of when you see an average, think to yourself, "Plus or minus what?" For example, 2% of people believe this. "Okay, 2% plus or minus what? If it's 2% plus or minus 2%, that's a big difference versus 2% plus or minus 0.1%." People just aren't used to thinking that way; they think, "Oh, a scientist computed this number. It must be correct."
EVA: Yeah, absolutely. People are also paying attention to different kinds of evidence. One thing that inspired me from having encountered this in the past was that people tend to put a lot of weight on what other people say, particularly local expertise. In one study, we conducted discrete choice experiments. We asked people to pick repeatedly between two programs, Program A and Program B, which had a bunch of attributes associated with them. For example, Program A may have been evaluated by an RCT, whereas Program B was evaluated by an observational study. Program A had an impact evaluation done in the same country, whereas Program B had an impact evaluation done in another region, but maybe Program B was recommended by a local expert and Program A wasn't. Which of these two programs do you pick? They also had different results associated with them, different confidence intervals on those results. We could see what characteristics policymakers and policy practitioners were really valuing. It turned out they put a lot of weight on either local evidence, so evidence from a study done in your country, or evidence from a local expert. This local dimension really seemed to matter a lot.
SPENCER: If you think that that's rational at all, could it be that the local events are more relevant because the data from other places wouldn't generalize? Or do you think that that's kind of a bias we have where we trust local information more?
EVA: Yeah, that's a great question. So it could be entirely rational. We can't say that it's not. It could be that people are thinking, oh, yeah, evidence doesn't really generalize. Interestingly, they were willing to trade off basically the entire effect that was estimated in order to have local evidence.
SPENCER: Maybe a little too much.
EVA: Yeah.
SPENCER: For those that are thinking of impact might wonder, "Does writing academic papers really cause impact?" I remember I had this funny experience talking to an academic who worked on a topic that they felt was impactful, and they were saying to me that the impact is really a core driver of why they do it. I was like, "Oh, that's really interesting. What's your theory of change? Or how do you create impact?" They were like, "Oh, I published papers in journals." I was waiting for them to say something else, but they didn't. That was literally the complete theory of change. I was like, "Okay, but who uses that? Who's reading that? What kind of impact is it having?" So how do you think about, okay, you write papers, but is there more to it in order to actually drive impact from that work?
EVA: Okay, that's pretty funny. So, really good academic work can really have a big impact, I think. First of all, there can be positive spillovers to other kinds of work going on. You can develop methods and models that people use outside of academia. Actually, sometimes you might be lucky that a particular concept opens up entirely new fields of inquiry and inspires a lot of other great research. I think there is a path through publishing and then that research getting in front of a policymaker somehow, but it is sort of long and circuitous, generally speaking. I definitely think that if you want to have a quick impact, don't go into academia. Academia moves on slow time scales but still proceeds step-by-step, incrementally building up an evidence base and building up concepts that we use that enter into our day-to-day lives. There are lots of concepts from things that came from academia initially, including some things around behavioral biases, say, that we just sort of use more regularly. It's a little bit hit and miss. It's the kind of profession where there are a few things that do very, very well and a lot of things that don't. That's just the name of the game.
SPENCER: There's no question that some academic papers, just the mere fact of publishing them, have a tremendous impact on the world, like scientific discoveries and so on. What I'm wondering is, do you think that academics who want to have an impact should work more to do things like market their work or build relationships with people that might use their work to increase the probability that someone actually uses that impactful work and it doesn't just languish in an academic journal and get 10 citations?
EVA: Yeah, absolutely. One of the things that people can do is, especially if they have the opportunity to work with a policymaker early on, they can even design their studies to give the kind of information that people need to know. For example, for these cash transfer studies that we're currently doing in Chicago and Cook County, we had some meetings with the city and the county to try to learn about their interests and what kinds of considerations would move the needle for them, that really determined whether this was something that was going to lead to a renewal of the program or not. If you can design the study, you need that kind of early buy-in and early discussions to try to design the study appropriately, to actually give you the answers that people might care about. Otherwise, I think there's a big risk that you'll just go off and design something and nobody's going to care at the end of the day because it wasn't answering the question that they cared about.
SPENCER: That's really interesting. So you kind of want to have a sense of the people that might use this, or the stakeholders, what are the exact questions they're interested in before you run the research, so that it actually addresses those real questions.
EVA: Yeah, exactly.
SPENCER: Who do you think should go into academia? Let's suppose someone's interested in having an impact and considering a career in academia. What makes someone a good fit versus not a good fit?
EVA: Yeah, so I think you do have to be ideally smart, but also persistent and able to work well on your own, because academia is an area where you're a little bit like an entrepreneur. Nobody's going to breathe down your neck or closely monitor you. You're kind of independent, and you have to be able to work well on your own. When I say being smart, that includes being able to generate ideas, not just take tests. There are actually kind of different skills sometimes, and people can be very surprised when it comes to actually doing original research that's maybe a little bit different from just taking a test. Ideally, you should be a little bit entrepreneurial, especially if you're working on applied topics that might be less relevant to more theoretical topics. But applied work can certainly help to be entrepreneurial.
SPENCER: Final question for you, before we wrap up. When people are reading academic papers, let's say they're not a scientist, not a specialist in that exact area, but they're reading papers in that area. What do you think they should keep in mind? Or what do you think they should look for that might not be on their radar but might be important for helping them interpret the paper or thinking about how reliable or robust it is?
EVA: That's a good question. So I think that they should definitely bear in mind considerations around the sort of internal validity of the study. So think of response rates and attrition and who is attriting. I think about things like, were they powered to see an effect in the first place?
SPENCER: So were there enough participants in the study to actually measure the effect that they tried to measure, is that what you were saying?
EVA: Yeah, exactly, it's like, "Is their sample size too small, or is it actually reasonable for the question that they're answering?" Sometimes it's always, of course, a lot better if you have multiple studies on the same thing that gives you a lot more comfort with trusting the results. Don't trust the results of any one study too much. On the other hand, I should say this is something that surprised me, and it's a little bit counterintuitive, but one time, I actually did a value of information kind of calculation just for myself, just to try to know, because I've always said, "We should do replications all the time of things, and never trust the results of any one study." But if you're thinking about it from a value of information perspective under reasonable assumptions about what the true interstudy heterogeneity and effects actually are, under reasonable assumptions about that, the drop-off is pretty steep. It all depends on what the decision problem is. It all depends on if it is really important for you to get it really confidently right. In which case, sure, do lots of replications of it. But from a value of information perspective, maybe it actually is better to study multiple things that you're interested in and do more replications of the ones where you really need that kind of extra confidence.
SPENCER: Eva, thanks so much for coming on.
EVA: Thanks so much for having me here.
[outro]
JOSH: A listener asks, "Can you explain the personality trait called need for cognitive closure?"
SPENCER: The way that I think about it is that it's a trait that says, "Are you okay with ambiguity or uncertainty in an answer, or there not being a right answer, or do you have a kind of need for cognitive closure where you need there to be an answer, you need to know what the truth is on the matter, or you need to have everything settled?" I think it's an interesting trait because the reality is, if we really have accurate views of the world, a great deal of things are uncertain. There's so much uncertainty about almost everything in life, about our future, about what's healthy or unhealthy, about the best way to accomplish our goals, and so on. If you really desperately need there to be an answer, a clear answer, a lack of ambiguity, it can be a struggle, and it can force you into a false or artificial answer just to kind of have that cognitive closure. I think especially given the complexity of the world, it's important to be able to embrace ambiguity and not need an answer all the time.
Staff
Music
Affiliates
Click here to return to the list of all episodes.
Sign up to receive one helpful idea and one brand-new podcast episode each week!
Subscribe via RSS or through one of these platforms:
Apple Podcasts Spotify TuneIn Amazon Podurama Podcast Addict YouTube RSS
We'd love to hear from you! To give us your feedback on the podcast, or to tell us about how the ideas from the podcast have impacted you, send us an email at:
Or connect with us on social media: