July 5, 2021
Why has there been such an explosion of progress in genomics recently? What's the right way to think about how genes affect the likelihood of experiencing certain health outcomes? How can people mitigate genetic risks for their potential children? What sorts of moral obligations (if any) do parents have to mitigate potential genetic risks for their children? How does Orchid's focus differ from other companies in the same space? What is "junk" DNA? What percentage of our genes are identical to our siblings, to other humans, and even to other animals?
Noor Siddiqui is the Founder and CEO of Orchid, a reproductive technology company. Prior to Orchid, Noor was an AI researcher at Stanford where she worked on applications of deep learning to genomics with Anshul Kundaje and computer vision applied to medical imaging with Sebastian Thrun. Noor has spoken internationally about her work at the intersection of technology and medicine at events like Milken's Global Conference, WebSummit, and Kaiser Permanente's Executive Leadership Summit. Her work has been covered by The Washington Post, Forbes, TechCrunch, among other outlets. Noor is also a recipient of the Thiel Fellowship, a grant program spawned by Paypal founder and Facebook board member, Peter Thiel, supporting breakthrough technology companies. Noor earned her M.S. and B.S. in Computer Science from Stanford University. Follow her on Twitter, connect with her on LinkedIn, visit her website, or email her at firstname.lastname@example.org.
JOSH: Hello, and welcome to Clearer Thinking with Spencer Greenberg, the podcast about ideas that matter. I'm Josh Castle, the producer of the podcast, and I'm so glad you've joined us today. In this episode, Spencer speaks with Noor Siddiqui about growth in genetics research, the measurement of high-complexity genetic factors, and trends in drug development, regulation, and testing.
SPENCER: Noor, welcome. It's great to have you on.
NOOR: Yeah, it's great to be here, Spencer. Thanks for having me.
SPENCER: So, can you tell us about the revolution that's happening right now in genetics and why is now a special period for this work?
NOOR: Yeah, so I think it might help to tell a little bit of an analogous story of sort of the rise of AI, and why we see this massive inflection point in 2012 that we didn't see in the 1980s or 1970s when these algorithms for backprop and gradient descent already existed. Why was it that a lot of this progress in AI took off in 2012 rather than in the 1990s or 1980s? I think that there's actually a pretty parallel structure between the revolution in AI and the revolution in genomics. So for AI, I think that the main ingredient that led to that inflection point in 2012 was the drastically dropping cost of compute and storage. So I think in 1985, it was about $100 per million transistors in terms of the cost of compute compared to 2012. We're looking at like five cents per million transistors. And if you look at the cost of storage in 1995, it was around $600 per gigabyte compared to 10 cents in 2012. So we're just talking about a really meteoric drop in the cost of computer storage. And what that led to was the aggregation of data and the scaling of the ability to label and structure that data. And similarly, obviously, GPUs came on the scene, which allowed us to massively accelerate the types of computations that you need to do for training a lot of these models. So I think those are the sort of ingredients that led to that inflection point in 2012 for AI. And I think that for genomics, it sort of is a similar, slow set of milestones that led to where we are today. So I guess, backing up to the beginning, 1989 is when the US made this $3 billion investment and actually sequencing the first human genome. So our knowledge of genomics back then was economically bad. I mean, the best scientists in the world thought we had 100,000 genes in our genome. So the actual answer was 20,000. So it's just kind of humbling, I think, to look back that not very long ago we literally didn't even know how many genes were in our genome. So basically, there was a huge amount of excitement over this 15-year period to actually assemble that first sequence. And in 2000, Bill Clinton came famously on the record saying, "The human genome will revolutionize the diagnosis, prevention, and treatment of most, if not all, human diseases." And I think that we had this huge investment in dollars and scientific energy around this work. And the result was very, very high expectations of what the actual output of that would be. And fast forward 10 years later to 2010, everyone was super disappointed — I think the New York Times and Scientific American had these disappointed headlines talking about the revolution postponed or all of this effort didn't really lead to the payoff that they were expecting. And I think that's really analogous to what was happening in the 1990s. It was just the fact that we didn't have the compute and storage that we needed for AI to hit that inflection point. And I think that the parallel for genomics is that we just hadn't sequenced enough people. So in 2010, there were sort of fewer than 20 million people sequenced compared to 2020. Fast forward to today, there have been over 150 million people sequenced. And what the impact of that is that we can finally figure out which pieces of the genome are important. So we're 99.9% the same. There's 45 million sites called snips, or single nucleotide polymorphisms, where we differ. And that leads to this enormous and beautiful amount of diversity between individuals with everything from height, skin color, and things that are important to us or which is disease propensity. So what's really exciting about this moment in genomics is that this goal of genomic medicine for over 30 years has finally been realized — which is that we can finally measure genetic susceptibility to the most common conditions. So things like heart disease, schizophrenia, Alzheimer's, diabetes, and conditions that affect the vast majority of the population but aren't simple genetic diseases in the sense that they're driven by a single gene, they're driven by the cumulative effect of thousands or millions of genetic factors that required the aggregation of these really large data sets in order to build accurate models for.
SPENCER: So my understanding is that some disorders — the kind of classic genetic disorders — there'll be just one gene that if it has mutation, some will have the disorder. But as you're pointing out with things like heart disease or maybe things like depression or schizophrenia, it might be more of a kind of a large number of genes that are all contributing factors. Is that right?
NOOR: Yeah, exactly. So I think sort of the way I talk about it is old school genetics versus modern genetics. So old school genetics is — I think what most people are familiar with so things like — seeing if you have the Huntington's gene or Tay-Sachs or cystic fibrosis is sort of a single gene and old school genetics can find it because it didn't require huge databases of individuals to suss out that mutation. You can sequence a small number of families (sequencing can still be really expensive), and you can still discover those really highly penetrant single gene diseases. So I think it's really incredible that for those 4000 or so really rare diseases, we can find a definitive cause. But I think that the reason why most people haven't seen genomics affect their lives is because this new capability to manage genetic susceptibility for common conditions has just very recently come online. And I think that, obviously, just has a much larger set of folks who are going to be able to benefit from that information, as opposed to just people who have these really rare genetic mutations.
SPENCER: So what's the right way to think about the way that genetics affects something like, let's say heart disease? Is it that you have hundreds of genes, each kind of increasing or decreasing your probability of getting heart disease by a little bit?
NOOR: Yeah, I think that's definitely a way to look at it. So the way that these studies work to actually recover risk is they're called genome-wide association studies — and the concept is actually really simple — so you have a set of cases of people with heart disease and a set of controls. So people who are healthy who don't have heart disease, and what you're looking for is an association between the variants that the people who have the disease possess that are not present in the healthy individuals, and then you're just doing statistical hypothesis testing and correction to make sure that the snips or variants that you're finding are actually truly correlated with the disease and not just noise. And basically, each of these variants can either have a protective or deleterious effect on whether or not you're going to be more susceptible to the disease.
SPENCER: Yeah, that makes sense. And are you essentially doing something like linear regression to try to predict whether someone has this disease using all of these snips as the independent variables and that regression?
NOOR: Yeah, exactly. So basically, these polygenic risk scores are actually pretty simple. They're just the weighted linear models where you have these effect sizes that you learn from these new studies. And then in terms of actually being able to estimate risk, what you do is you just take the percentile. So let's say, you're in the 99th percentile or 98th percentile, and you look at the corresponding prevalence rate in that cohort. So essentially, people who are in the 99th or 98th percentile for a lot of these polygenic risk scores are depending on the disease anywhere from 2x 3x 4x, five times the population average for getting the disease. So I think that — one of the things that's really interesting about these polygenic risk scores is that they're sort of nonlinear and how the prevalence changes with your percentile. So basically, at these highest percentiles of risk, there's a really, really, really strong signal. But there's not a huge amount of difference. If you're moving from the 50th to the 80th percentile, there's sort of an inflection point at where your genetic risk has a really outsized impact on your susceptibility to the disease.
SPENCER: Got it. So you kind of do this linear regression based on all the snips to predict someone's likelihood of heart disease, maybe using logistic regression or something like this. And then you're saying, you take the score that model outputs — and those are the percentiles you're talking about, right? Like by the 98th percentile in that score, 99th percentile on that score, or what have you. Is that correct?
NOOR: Yep. Yeah, exactly.
SPENCER: Do you have any examples of roughly how many would end up being used in one of these models? Like, would it be 1000 snips that end up getting us or 100, or 50?
NOOR: It really varies by disease. So something like type one diabetes, where a large proportion of the risk score is driven by the HLA type, is a much smaller pair of scores. I think that those include less than 100 snips. So something like cardiovascular disease, many of those pair scores include sort of three or 4 million snips.
SPENCER: Really? So three or four million are all influencing that risk score, basically.
NOOR: Yeah, exactly. So it just really varies based on the disease. Typically, the rarer the disease, the fewer variants that you can assign to it. But to make the situation even more complex, for every disease, whether it's a polygenic risk score, there's often also a monogenic risk. So for something like Alzheimer's in 1991 — it was actually the first case where they were able to avoid a case of early onset Alzheimer's via IVF, and that's because there's a single gene cause that they've identified that causes it. — Whenever there's a single gene cause, it usually is a result of an earlier onset, more severe manifestation of a disease compared to the polygenic risk score. Or basically, a more complex or common architecture, which sort of makes sense from an evolutionary perspective, because if it were to affect an individual before reproductive age, it's probably going to be less severe than if it were to affect someone after reproductive age.
SPENCER: That makes sense. So my understanding is that there was a period where a lot of these genetic association studies were just basically mining noise that there were tons of false positives, but that the field kind of got its act together and was able to overcome this. Can you tell us a little bit about that?
NOOR: Yeah, I guess, tell me more when did you feel like there wasn't a lot of signal and when did you feel like that changed?
SPENCER: I'm not an expert in this. That was just my understanding that there was a period in genetics where people were finding all these gene associations, but basically, the sample sizes were just small — as you kind of pointed out, we just didn't have that much data. And we're looking at millions of snips trying to find correlations and with small sample sizes, that's just a recipe for a lot of false positives. But basically, as the amount of data increased dramatically and the sort of statistical methodology became more rigorous, a bunch of those old associations were discarded and said, "Actually, they're maybe not real." But now we actually have the ability to find these real associations.
NOOR: Yeah, exactly. So I think it really just boils down to dataset sizes. I mean, in 2010, when a lot of folks had diminished expectations around how useful polygenic risk scores were going to be, I think it was just a function of the fact that we just didn't have these cohorts of hundreds of thousands of individuals who had been sequenced and had physician verified diagnoses. And I think that as those datasets grew and came on the scene in the last couple of years, the studies that have come out have just been much more promising. We sort of newly have this ability, where a single gene cause explains a very small proportion of folks with a condition as compared to the number of people that can benefit from these polygenic risk scores, where there are so many more people who can discover and stratify their risk compared to just the very small population of people who have these rare variants that was the only thing that genetics can give us before.
SPENCER: Stepping back, there's something really amazing about all this, which is that in theory, you could for a given person estimate the chance that all kinds of bad things might happen to them, whether it's getting heart disease, or ending up being diagnosed with Schizophrenia or getting Alzheimer's and so on. So do you want to talk a bit about what sort of ramifications we have and how can we use this technology?
NOOR: I think that the most exciting and useful use of this technology is actually in the reproductive setting. So for you and me, our genetics are fixed. We can make lifestyle changes, we can do preventative screening to sort of stall out the onset of a lot of these diseases. But when you have a child, you have this really unique chance to reduce the incidence rate of a disease that's affected your family. So I think for me, personally, I've sort of seen the way that a health issue can hijack a family member's life. And I think that sort of fundamentally just stuck with me from a really young age of, "Okay, this is just super unfair. Why is it that for this family member or for that family member, just because they lost this sort of genetic lottery at birth of good health that their whole life and all their dreams are sort of cast to the wayside to deal with this struggle of a condition that they've been diagnosed with?" So I think that for couples, they have this really unique opportunity, which is something that we're building with Orchid to actually measure the genetic susceptibility of their future child before they conceive. So the test that we offered is kind of like 23andMe for couples. So couples just submit a saliva sample. And then they can discover their future child's risks for conditions that affect a lot of people. So things like diabetes, heart disease, cancers, schizophrenia, there's a bunch of conditions that we measure, and then using that information, rather than most genetic tests that don't really have any action or mitigation associated with them. With Orchids tests, you can actually take action. So on the other side of this preconception screen that's saliva-based, a couple can actually elect to go through embryo screening. So we can sequence each of their embryos through IVF, and then determine the propensity of each of those embryos for a condition that they're concerned about. So just as an example, if a couple discovered on the preconception screen that they're really at high risk for schizophrenia, and they want to mitigate that risk, they can go to do the embryo testing, and then elect to transfer the embryo with the lowest risk for schizophrenia. And that's sort of a unique mitigation option that has just recently come online because before now, there has been no other strategy for couples who know that they have that type of complex disease running in their family. It just removes a little bit of uncertainty from the equation of having children.
SPENCER: My understanding of what you're doing at Orchid is if I want to have a child, me and my partner could get saliva tests done, and you'll then use our genetic information to make predictions about how likely a child is to have different diseases — and if I understand correctly — you kind of do simulations of when a child is born, they get a mix and match from the two parents, you're gonna like do simulated mixing and matching of our two genes and this ensemble will be able to calculate the probability of that child having different diseases? Is that correct?
NOOR: Yeah, exactly. So basically siblings share about 50% of DNA between each other. And there's quite a bit of variance for each of these diseases. So yeah, we run a simulation that models how segments of DNA between the maternal and paternal genome are going to recombine so that parents have sort of an estimated prediction or projection of "Okay, this is the range of genetic risks that's possible between the two of us." And that's very personalized and special between each couple, as you can imagine, because — yeah, sort of the question that's always asked is "Okay, how much risk is there between us, specifically?” And you can't really give averages for that. You have to know for each individual couple.
SPENCER: You can't say that Bob has this risk and Sheila has that risk as individuals, because actually, it depends on the ways their genes are combined, right?
NOOR: Yeah, exactly. Especially if you think about a lot of these polygenic conditions, the reason why you carry your screen. So basically old school genetics today only screens one parent because they're looking at only a single gene recessive condition. And then if they're negative for everything, you don't even need to screen the other parent.
SPENCER: Right, because you would need both parents to have that gene to have the danger essentially.
NOOR: Exactly. So basically, that's why I think that old-school genetics is so limited. It's because we're only looking at these really rare conditions, and we're only looking at conditions where both partners have to have the gene in order for there to be even the 25 or 50% chance for the child to be affected, versus with Orchid’s testing, it's looking at millions of variants across the entire genome. So these old-school genetic tests are sequencing about 2% of the genome as compared to Orchid, we're sequencing 100% of the genome. So there's just a lot more that you can detect if you have all of that data. And you're looking at these complex conditions, as opposed to these simple conditions.
SPENCER: So let's say two people come in, you give them saliva tests, and you uncover that they have a higher risk of their child having schizophrenia. So then, as I understand it, what they could choose to do if they wanted to, is they could do IVF, and get a bunch of different eggs. And then you could actually tell them from these different eggs which one is the one that's least likely to have the child develop schizophrenia. And then they could fertilize that one. Is that correct?
NOOR: So yeah, there’s a minor point there. So the eggs are extracted and then fertilized from the male partner's sperm sample, so the embryos were the things.
SPENCER: Okay, okay. The embryo.
NOOR: But that's a super minor point. But yeah, the basic idea is correct. What I think is super interesting that I think a lot of people don't know about IVF, which is that IVF has already been treating the diseases that gene therapy will one day treat, or $20,000 a case as compared to millions of dollars a case. So there's this technology called PGT-M. So essentially, it's embryo testing for monogenic or single gene conditions. So for those folks who identify with a carrier screen, that they're both carriers, they can elect for testing for cystic fibrosis on their embryos. So I think that that's a really exciting technology because a lot of these gene therapies cost millions of dollars. Then there's, obviously, all the suffering that the child goes through before actually receiving the treatment if it actually even works for them. So I think that most people don't realize that IVF has already had this really impressive place to mitigate disease for rare conditions. And it's basically not unprecedented for IVF to be used this way to essentially help parents mitigate the genetic component of risk for a disease. It's just the exciting thing that's come online now is that you can basically both measure risk with the complex conditions. You can actually now sequence with high-fidelity embryos. So before, the type of data that was available to read out of embryos is really coarse. So if you think about looking at chromosomes on embryos, you can sort of think of it as like looking at the chapters in a book — that single page as opposed to reading the entire book. And the complexity there on the single cell sequencing side, it's just that you have a really small amount of DNA and embryos compared to a blood sample or a saliva sample, which becomes a commodity sequencing process. So basically, a bunch of development had to go into figuring out the chemistry there to actually be able to get a reliable readout of embryos, DNA off of that really small sample of five cells off of an embryo. But now that both of those pieces are together, I just think this is a really fundamentally new and exciting capability for couples to be able to have a little bit more control in a really uncertain and scary process of what conception can feel like for people who have a family history of these conditions.
SPENCER: I think it sounds like an amazing technology because it can basically reduce risks before they happen. And also, you can actually roll it out in the real world. It's actionable and I think one of the problems with a lot of predictive technologies is that even if you know there's a risk, even if you can predict it, you can't really do much about it. So I really liked that this is actionable, that they can actually use it to reduce the risk of their child having one of these diseases. I have a weirdly technical question about this, but I think it's important. How many of these embryos do you need to actually reduce the risk substantially? Like if you go from one to two to three, how does that risk fall if you're trying to pick the best one among those?
NOOR: If you're interested in the theoretical side, there’s actually been quite a few papers published sort of modeling it in the general sense, but sort of the entire reason why we built out the preconception report was to answer that question. Because you don't want to go through this expensive and time-consuming process of IVF without having some barometer or estimate of how much I can actually mitigate risk? So in the preconception report, we're modeling and estimating based on the couple producing less than eight embryos. So comparing basically to 20th versus 80th percentile embryos. So we're trying to give bounds based on sort of a realistic outcome of a few cycles of IVF. So, between one and three cycles is usually normal for folks going through IVF and creating embryos. I don't want to create a non-answer, but the answer is that it's very specific, depending on the disease a couple is interested in mitigating and the number of embryos they can produce.
SPENCER: To basically give them a meta-analysis telling them look, if you guys were able to get this many embryos, here's how much on average we'd expect to build or just arrest that kind of thing. Right?
NOOR: Yeah, exactly. So it's basically just giving them an estimate of how much mitigation would be possible. And I guess even before that if there is risk present because I think that's also a big relief for a lot of people. Like with old school genetics, all you had was, “Okay, can I just screen for one or two variants, like BRCA (everyone talks about that for breast cancer risk).” And you can have this vague specter of "Oh, I have a family history. And like, my sister and my mom had this condition." But you can do this old-school genetic testing, where they'll just exclude these very narrow ranges of aesthetic factors. But it's missing this much broader set of genetic factors that could confer risk. So I think that's something that's useful to people as well as just "Okay, I know, I have this wrong in my family. But I want to quantify exactly how high my personal risk is." So, we report on each individual partner. But also, “How much is that going to contribute to my child's risk?” There are certain cases where a partner who's low risk is able to completely mitigate or neutralize that risk in the child. And there's obviously the case where both partners discover that they're at high risk for the same condition. And that's kind of where mitigation can come into play.
SPENCER: So thinking about this, the number of embryos is to some degree a limiting factor, right? You can still get a lot of benefits even if you only have maybe five embryos. But obviously, if we could magically have a million embryos to choose from, that would give a lot more flexibility. And my understanding is — I think you've mentioned this to me before — that there's, in theory, at least some way to do this using sperm, even if the technology is not there yet. Do you want to talk about that a bit?
NOOR: Yeah, I think you're 100% correct. I mean, in terms of the mitigation potential here, it's entirely limited by the number of embryos that are able to be generated, which is limited by the retrieval process and the hormone injections that the female in the party needs to take in order to actually get those eggs that you need to create the embryos. But I think that there are super exciting technologies along the pipeline, it's called IVG. In Vitro Gametogenesis has the ability to create egg cells from stem cells. So quite a while ago, there's this big breakthrough called the Yamanaka factors. So the ability to figure out "Okay, here are the transcription factors that we need in order to take any cell in the body and return it to its pluripotent state to basically turn a regular cell into a stem cell." But basically, the part that we don't know yet is when we have a stem cell, how do we direct its fate? How do we direct it to become a heart cell or lung cell or skin cell or — what's most exciting in the reproductive fields — turn it into an egg cell? So basically, the ability to do that reproducibly reliably is something that, unfortunately, I don't think enough people are working on. There's about five labs in the world that are doing interesting work in this space. And yeah, I think that some people think it's really far-fetched. And there's a lot of nuance about transcriptional regulation and erasing and adding methylation marks and things like that. It's just a lot of technical detail that needs to be worked out. But some folks in the field think it's maybe on the 10- or 15-year horizon.
SPENCER: So what would that involve exactly? You take any cell on a person's body, and you can just generate a whole bunch of egg cells from that?
NOOR: Yeah, exactly. Theoretically, you could have a same-sex couple have the ability to have a child. That's genetically turn a skin cell into an egg cell and skin cells into a sperm cell and then have a child that way. But obviously, this is theoretical at this point. But, it's theoretically possible.
SPENCER: Got it. But yeah, on that interesting application, and also, in theory, could let you create huge numbers of embryos, and then this whole selection process would just be much more effective, right?
NOOR: Yeah, exactly. Right now, IVF is, unfortunately, invasive. So I think that it would really open up the floodgates if there was a possibility to create embryos and eggs without having to endure the injections and retrieval process. And it's not the worst. I mean, it's a two-three week of injections and hormone therapy and a 15-minute surgery to extract the eggs. But I'm not excited about it. I'm gonna do it, obviously, but the processes are not as seamless. Just taking a skin cell and directing its fate would be way better.
SPENCER: To get a little bit sci-fi for a moment, suppose that we get this technology that allows the creation of a huge number of embryos, my understanding is that not only would that make the kind of technology you're working on to reduce the risk of disease much more effective, but it would also open the door to people really doing designer babies in the sense that people have talked about it for a long time. Is that right? Like, basically, insofar as people understand what genes lead to what, they could essentially craft their children to be more towards whatever target we understand genetically.
NOOR: Oh, yeah. I think Orchid specifically is focused on disease mitigation. We're not doing designer babies or any sort of enhancement stuff.
SPENCER: But of course, yeah.
NOOR: I mean, theoretically, parents or governments could decide how they wanted to regulate the space and what they wanted to do there. But from a purely theoretical perspective, yeah, that's obviously possible.
SPENCER: Now, one thing that people have made the argument with regard to is that even if some countries ban research like that, other countries that decide to not ban could essentially start trying to create groups of people with whatever qualities they want — whether it's trying to raise people's IQ, or trying to make people who are more compliant, or really, the sky's the limit. Do you perceive that in 20 years, that technology is going to be existent? And that will actually be possible for countries to do if they choose to?
NOOR: It's kind of funny to try and comment on hypothetical, geopolitical scenarios. But I could talk about a little of some things that have happened in the past. I mean, I think maybe most people don't know that Yao Ming was sort of created by the Chinese government, right?
SPENCER: The basketball player, right?
NOOR: Yeah, exactly. So the Chinese government wanted the tallest man and woman to have a child so that they could have this super tall basketball player that would make the country proud. So I think that there's obviously regimes that have already been interested in doing — as a sort of meeting is kind of, I guess, the level that you could talk about it. But I don't find that particularly exciting. I think that the most exciting application is for disease mitigation. I think that basically, there's just this unfairness at birth, where there's this lottery ticket that you either get or you don't get, which is, do you get to live a life uninterrupted by disease or do you have to live a life where this errant genetic program just hijacks your life? It makes it about a chronic health issue, as opposed to being about what you actually want to do. And yeah, it's just kind of crazy, what's the scope of it today. So I think right now in the US, over 100 million Americans — so I think it's on the order of 45% of Americans — are living with a chronic disease. And they're sort of leashed to drugs for life that are treating symptoms rather than root causes. And we all know about Moore's law, the fact that the number of transistors on a circuit doubles every two years. But there's also this much worse law (which is the reverse of Moore's law), Eroom’s Law, which is that the drug development speed declines and the cost doubles every nine years. So I think the state of treatment today is just really sad. We're spending huge amounts of money, billions of dollars on the order of decades to bring these drugs to market and they're just treating symptoms or not treating root causes. So it's really interesting and worthwhile to be able to push a different lever, which is for a couple that's choosing to have a child being able to say, "Okay, this disease has run in my family for however many generations, but it doesn't have to affect my child." So I guess for me, personally, that's the feature that I'm most excited about.
SPENCER: You're not excited about future dictatorships using this to control populations? [laughs] No, I'm just joking. Obviously, what you're working on is really socially beneficial. But I also think it is interesting to speculate what is going to happen with technology in 20-30 years.
NOOR: What do you think will happen?
SPENCER: I think it's very likely that some countries will allow people to control the genetics of their children, not just to alleviate disease, but to try to craft the children to be the way they want them to be. And that could go very weird and potentially very dark places, but also potentially beneficial places like you could imagine, it would be better if there was more altruism in the world that insofar as genes affect altruism, that's not necessarily a bad thing. So, I don't think it's all evil. But I think that it's a mixed bag, and it's something we have to be very cautious about.
NOOR: Yeah, so I guess, just commenting on what is the correlation of these genetic risk scores between each other, basically, if you have a really high risk for a certain condition, does that mean you're trading off risk for another condition? So it depends on how you set your threshold. So at Orchid, we have really pretty conservative thresholds. So we're talking about the people who are above the 97th or so percentile being alerted as identifying high risk for specific conditions. But there has been quite a bit of research on how certain categories of genetic risk scores correlate. So as you can imagine, diseases around brain health have many shared snips or shared variants that are included in these risk scores. So if we were to expand the set of diseases that we offered to the dozens or hundreds, then we would see this trade-off between certain types of correlated conditions. But just at the stage that we're at right now, where we've only sort of hand-selected the diseases that have the highest signal, where we think there's the largest variance between embryos that a couple would produce, we're not necessarily seeing that trade-off.
SPENCER: So I'm guessing the fewer snips that matter, the more variation you find. Is that right?
NOOR: Sort of depends on how those blocks of DNA are inherited. Are they inherited and blocked together? Are they inherited separately? Generally, if you have more variants that are inherited independently, you're gonna have fewer variants than if you have fewer variants that are inherited independently. But yeah, it just depends on the correlation structure between those variants. The way that you can think about the genetics of traits, generally, is sort of there's broad sense heritability, and there's narrow sense heritability. So these studies are looking at narrow sense heritability. So when I say broad sense, it means what are all of the genetic factors? And these studies are looking at narrow sense heritability suggests the snip heritability of a given trait. So basically, what are the total phenotypic variants? And then what are the genetic variants that are contributing to that phenotype? So that's another thing to keep in mind is that these risk scores are measuring something very specific, and it's not necessarily capturing all of the genetic factors.
SPENCER: Got it. And maybe that's a good time to talk about how the kind of testing you do differs from what 23andMe does.
NOOR: Yeah, that's an awesome question. So, Orchid versus 23andMe, what are the main differences? There are three main categories. The first is, are you interested in understanding your future child's risk? So 23andMe provides individual reports, but it's not going to take you and your partner's DNA, recombine them and let you know what your future child's risk is going to be for diseases like heart disease, diabetes, cancer, schizophrenia, things like this. So Orchids focuses on that reproductive decision for you and your partner: What are your individual risks? But in addition to that, how is that going to recombine in your future child's? And what mitigation strategies might you be able to consider given that information? The second major factor is how much data you want to have analyzed. So for a company like 23andMe, they're looking at a really small fraction of your genome, so less than 2% of it. In contrast, Orchid is sequencing 100% of not just one partner’s, but both partners' genomes in order to return your results. So the way that impacts specific risk scores is that for any of these diseases (like heart disease, or breast cancer) Orchid’s risk scores include millions of variants. So in contrast, 23andMe’s and other companies’ results are pretty incomplete. They're just looking at a handful or fewer snips to give you an indication of whether or not you're at increased or decreased susceptibility. And that's an incomplete picture because we know that there are actually millions of variants that are driving risk, not just a handful. And the third main differentiator is, how much support you actually want during the testing process. So for a company like 23andMe, they're considered truly direct to consumers. So there's no physician oversight, there's no genetic counselor, and there's no one involved in helping you understand your report. So in contrast to that, Orchid is a physician-approved test — there's a board-certified genetic counselor that does a video walkthrough of every single report. And it's also available synchronously to discuss the results with you. So if you want to get support, if you want to get expert guidance, that's something that Orchid does that a lot of other companies don't do.
SPENCER: Do they choose that 2%? Is it because there's the 2% that they kind of expect to be most interesting or something like that?
NOOR: Yeah. So it's basically based on studies that came out of the time that they were making that decision about which technology to use. So essentially, the number of markers they're looking at for like earwax type, or the types of traits that they're including, they're just trying to specifically grab the markers that are going to give them an indication for their report.
SPENCER: Right, so that they can tell you interesting facts about yourself based on the snips that they measure.
NOOR: Yeah, exactly. And then, sort of in contrast, with Orchid, we're specifically focused on this reproductive decision of "Okay, how much risk are we going to transfer to our child?" And then basically, in order to do phasing, and basically figuring out what are the maternal and paternal contributions for each partner, we need to just do whole genome sequencing, and we need to do a higher level of coverage in order to return those results. How are the pieces of DNA going to combine between each partner in order to report on these complex disease risks? So that's probably the third difference: 23andMe just isn't doing these polygenic risk scores. So they're not looking at millions of variants for each disease, they're looking at a very small number, because of, again, the technology that they're using. The sequence of your DNA is array-based, as opposed to NGS or whole genome sequencing, which is what Orchid is using. And then, I think the final component is just the actionability. So there's something that you can actually do with Orchid’s results, which is you can choose to mitigate risk via embryo screening, which we offer, or you can just be alerted of these risks ahead of time. So if you identify high risk for type one diabetes, just as an example, people often go on these diagnostic odysseys where they think that their child has one condition, but it's actually another and basically being alerted and aware of that tends to have better outcomes than people who are reactively responding to symptoms after they're emerging, and they've already become a problem.
SPENCER: One thing that I don't really understand is how you define a gene. How do you know when one gene stops one gene? Is it clear-cut or is it not clear-cut?
NOOR: It's actually kind of a complicated question. But basically, there are these things called start codons and stop codons. Basically, it's just defined by where RNA starts processing and creating a protein. There's actually a whole category of genes called pseudogenes where we're not actually sure how they're transcriptionally regulated, how they're processed, and it's actually difficult to determine where the gene starts and ends. But yeah, that's sort of like the conventional definition.
SPENCER: So most of the time, it's pretty clear cut based on the stop codons, but sometimes it's kind of a little fuzzy where the gene ends.
NOOR: Yeah, that's why when I say that they discovered that the human genome has around 20,000 genes, well, that's why it's not a definitive number. It's, again, very humbling that we think that there's around 20,000 genes, but we're not exactly sure of the exact number because of this whole issue of pseudogenes and how exactly you should define whether a gene exists. So yeah, it's not a dumb question. It's still an active area of research.
SPENCER: So why did they think there were 100,000?
NOOR: I don't know if hubris, right? I mean, we thought that humans are way more complicated than a yeast genome or any other genomes, so we just thought that since humans are so great that we had to be really different and have way more genes than anything else that we've sequenced before.
SPENCER: I see. So what's the deal with junk DNA? When you talk about this 20,000, does that not include junk DNA?
NOOR: I think that most of the time, when people refer to junk DNA, they're talking about the regulatory regions of DNA. So basically, we have 3 billion bases in our genome. About a shocking portion is only 2% of that is coding, meaning it creates proteins, meaning it's those 20,000 genes that we're talking about that we know what the output of it is. But the vast majority of DNA is regulatory, meaning we used to call it junk DNA, but now we're sort of realizing that it has a purpose, it has an impact on which genes are transcribed, when, and basically how genes are differentially expressed. (It's super interesting.) Around 2010, when we were really disappointed with the results of the human genome project, we were basically like, "Oh, all of this is junk. It has no purpose." And now there's like a flurry of studies around trying to understand this “junk DNA” and how exactly it impacts what genes are transcribed, when, and what the influence they have on disease and longevity and a whole number of other factors.
SPENCER: So when you do full DNA sequencing, do you measure the junk DNA or do you not?
NOOR: Yeah, so in this sense of measuring, we read out all of the DNA bases, so yeah.
SPENCER: I see. But that's not going to come into your analysis, because we don't know how to use it to make our predictions better. Is that right?
NOOR: Another really interesting component of these genetic risk scores is that basically, we thought that these pieces of DNA that weren't in genes were useless. But we discovered through genome-wide association studies and a bunch of other experiments that actually these regions are correlated with disease. And these do have an effect on a whole set of downstream, whether it's a disease or a target for a drug. Basically, this regulatory DNA is not useless. It has a function that we don't yet understand.
SPENCER: Got it. So there's like 2% of the genome we understand. And then there's the 98% that we're beginning to understand. But yeah, that's super interesting.
SPENCER: Another thing that confuses me is it sometimes — as a sort of shorthand — geneticists and other people will say, "Oh, yeah, you share 50% of your DNA with your brother and sister." But at the same time, they'll say things like, "Oh, yeah, humans and chimps share more than 90% of their genes." And so, at face value, this is totally contradictory. And I know that that's just because people are using shorthand. Do you want to unpack that a little bit? Like, what does it mean that we share so many genes in common with chimps yet only 50% in common with brother and sister? I know that we're not really talking about the same genes there. So can you explain that?
NOOR: Yeah, of course. So basically, you have these 3 billion bases and 99.9% of that is the same between every human. So when we're talking about the 50% that's different between siblings, we're talking about those 4 million free and 45 million sites is this thing nucleotide polymorphisms that are different between individuals. So we're talking about the DNA that's the same between everyone, let's slice out the portion that's actually different between people, how different is one sibling from another?
SPENCER: Right. So when you say that most of our genes are the same for any two humans, right?
SPENCER: Does that mean we literally would both have an A in that exact spot, and all humans essentially have an A in that spot?
NOOR: So basically, the genes can really vary in size. So genes could be anywhere from 100 bases, 1000, bases, 10,000 bases, they really vary in size. So what we're talking about is sort of specific points. So a gene might be a thousand letters long. So when we're talking about very specific points, like, okay, that position 305 (I just made that up), what letter does Spencer have versus what letter does Noor have? Does that make sense?
SPENCER: I see. And so you're saying almost all humans have exactly the same letter in exactly the same places, except for these few million spots?
NOOR: Yeah, there's basically a few million sites across the entire genome where these letters differ. And that's what people are interested in measuring. It's kind of remarkable, right? How is it that you have 3 billion letters and then only four to 5 million sites lead to Yao Ming being seven feet tall, and then someone else being four feet tall? How is it that just 4 million or 5 million? And obviously, it's not all of them as well. Of this 3 billion letters are driving such a huge disparity in something like height, and obviously, as well in something like disease propensity. So, I think it's super remarkable, and it's a really interesting thing to reflect on. Also, just thinking about what you're saying about chimps and insects, like the amount of DNA that's conserved across species is incredible. How is it that so many of these biological processes are conserved? How are the same genes that are required for life? I think that's also super remarkable.
SPENCER: So one thing I really enjoy is programming languages. And to make a weird analogy, I think of genetics as a programming language, where essentially, it's programming a creature. I'm curious to get your reaction. It's because I feel like that's a useful reframe because if you ask the question, "Well, how could it possibly be that just a few million sites are all it takes to make the difference between all the different humans that are alive?" And it's well if you think in terms of a programming language, that means that DNA as a programming language has to be extremely high level. The gene can't code for what is exactly in a particular skin cell. It has to code for really high-level features, like how tall are you, maybe not that general, but maybe less than all of that. But I've also heard genes that will do crazy things, like kind of massively change a creature's teeth because of one gene flipping that may be reverted back to some ancestral form of their teeth, and you realize that this program is just such a high level of abstraction. I’m curious to hear your thoughts on that.
NOOR: Yeah. So there's actually a professor at Stanford — his name is Gil Vetrano — And he really dives deep on this programming analogy. So he has a bioinformatics lab, and he's basically really good at recruiting CS students into his lab because he talks about all of these programming analogies around: “Okay, this is sort of the largest distributed system in the world. This is the operating system everyone is running. Don't you want to understand the code for life?” So, there's definitely a lot of really solid programming metaphors there. I think one that's sort of most obvious is like, you can think of a gene like a function. It's sort of like, "Okay, you repeatedly call this function or you repeatedly transcribe this gene and protein gets spit out or this function gets run." So I think there's really quite a bit there. And I think that to the extent that that gets really talented computer scientists interested in genomics, I'm all for it. I guess the counterargument there is that — I'm sure you're familiar with Marvin Minsky. And then he talks a little bit about how you really truly master a subject. — And I think that for adults, we're really excited about reasoning by analogy. — I think the economical example he uses in his book is, if you have an adult try and learn the piano, they'll think about it like typing on a keyboard, and they'll get to a certain plateau or proficiency. That way is a little faster than a toddler would, where a toddler doesn't have that analogous experience of typing on a keyboard. They're just learning how the piano works. And they're fully immersed in that. — So I think that there are basically just trade-offs. I think, in reasoning by analogy, I think that it gets you up to speed, if you have this really strong CS toolkit into kind of transferring over. “Where are the analogies and how do genomics work?” But I think that there are just some challenges in understanding the specific nuance of this new system when sometimes I think people try and force analogies where they might not exist. But I think that there actually are a lot of sincere analogies. So I don't want to be too negative on DNA as a programming language. I think that that's actually a super solid analogy,
SPENCER: Great. One way I think about analogies is they're excellent teaching tools. Because if someone already understands the thing that you're making an analogy to, it can help them get up to speed really fast. But they're really bad arguing tools. Like you can't be like, "Oh, because DNA is like a programming language, therefore…” and then say something that's true about programming to assume that's going to apply to DNA.
NOOR: Yeah, exactly. I'm curious, what else do you think is really interesting or surprising about DNA?
SPENCER: Another one that I wanted to ask you about is epigenetics. I feel like some people make it sound like it's a super big deal because it means that whereas we think of a creature as sort of just being fixed at birth by its genetics, it maybe not, and maybe the creatures actually change in a genetic way. Do you want to unpack that a bit? What's the big deal about epigenetics? How big a deal is it? What does it really mean?
NOOR: So for epigenetics, what it's talking about concretely is methylation patterns. So basically, it’s the way that you discover epigenetic marks as you do something called bisulfite sequencing. So it's a different type of sequencing, where you're looking at the methylation patterns on top of DNA. And sort of the core mechanism behind it is what genes are turned on and off and why. So it gets into this — what we're talking about earlier — which is like the regulatory aspect of DNA. So what's interesting about epigenetic regulation is that you have this core DNA sequence underneath. But on top, you have these marks that are telling, basically, which pieces of DNA should be expressed when. So I think that it's super remarkable. So it basically makes a creature more resilient. It's sort of the idea that, "Okay, you have this underlying genetic code that you can run but on top of that genetic code, based on how your environment is changing, I can turn certain genes on or off that might be more beneficial to you." So when that model system is starving, certain genes are turned on that are different from those that would be utilized when that organism has a surplus of food. And that's just obviously the most simple example. But that's sort of been recapitulated in many, many other types of challenges from the environment that, basically, you have a bunch of DNA and your organism is trying to figure out which piece of DNA is going to be most useful or effective for me given the situation. And yeah, I think it is super interesting and exciting. And another piece of it that I think fewer people know about is that there's actually a transfer of epigenetic information from parents as well. So it's called imprinting really. There are basically epigenetic marks that are transferred from your parents onto their children. I can’t, unfortunately, describe the mechanism. But that is the case that it's a really interesting area of research.
SPENCER: It's insane because it kind of suggests that Lamarck was not completely 100% wrong, right? In the great debate between Lamarckism and Darwinism, it turns out Darwin was mostly right, but then actually, a creature seemed to be able to pass on some element of their environment to their children, which is kind of amazing.
NOOR: Unfortunately, I think that the way that we discovered this was that we looked at the effect of women who were pregnant during the great depression or other periods of scarcity, and unfortunately, the long-term effect that that had on their children. So it's sad, obviously, in that context, but there's obviously protective epigenetic marks that I guess I just don't have the anecdotes for. But I'm sure that they exist. But I think that the original way that we discovered it was through some of these more unfortunate situations.
SPENCER: Oh, another genetics question for you. Will we one day be able to change our genes after the fact? Like, will there someday be a virus we can inject ourselves and our eyes go from blue to brown and that kind of thing?
NOOR: Yes, I think genetic engineering is super interesting. So I think the technology that most people heard about is CRISPR. So this basically allows you to cut and paste DNA. I think CRISPR is really — I don't want to say it's overhyped, because I think it's a really incredible technology — but I think that people are expecting it to impact clinical care, I think, a little bit sooner than is actually realistic. So I think it's a really interesting research tool in the sense that before it was really difficult to create these cell lines, create these model organisms that had these very specific genetic changes that allow you to interrogate mechanisms more closely. I think, unfortunately, in order to translate into humans, you have this problem of delivery. So basically, you have a viral vector that has a bunch of different types like lentivirus and others, they have different amounts of payloads that they can actually carry. So when I say payload, basically genetic material that they can inject into your genome. So these viral vectors are difficult to engineer. And they're difficult to introduce to different cell types, they have safety concerns, they have efficacy concerns, it's off-target effects. So I think that obviously gene therapy, there's been more than a handful that have actually already been approved in the US for these very specific single gene mutations. But something like eye color, for example, there's many genes involved. I don't know the exact number, but I think somewhere between 50 and a thousand (too large of a range). But basically, for the vast majority of traits, whether those are aesthetic or disease, like I said, there's many, many, many genes that are involved. So it's already hard right now to make a single gene edit, it requires probably 10 years of work just to design that viral vector, and then to do the clinical trial and make sure that people aren't getting these immune reactions. So basically, there's the cytokine response where sometimes those viral vectors are rejected by the immune system of the individual that gets the treatment and then basically try to continue to modify that delivery mechanism so that there isn't that immuno-response. So, unfortunately, I think that it's pretty far out to be able to engineer humans in that way, but certainly is definitely still in the realm of sci-fi and I think that there's gonna be a lot of progress in that direction and that we're gonna learn over the next decade or two.
SPENCER: How does that work now? So is the idea that they'll take someone's DNA, they'll snip out one gene, and replace it with a different gene. And then they insert that into a virus, and then they put the virus in the person, and then that virus spreads and basically like copies that change throughout their whole body?
NOOR: Yeah, it's actually pretty much exactly like that. So basically, the core idea behind gene therapy is that you've identified a defective gene, you have the corrected copy of that gene, and then you figure out a way to introduce that into a set of cells that carry that mutation.
SPENCER: So that's customized for that person, basically.
NOOR: Let me give a quick background of SMA. SMA is a rare genetic disease caused by a mutation in the survival motor neuron1, SMN 1 gene. So what this gene does is it includes a survival motor neuron (SMN protein) — a protein found throughout the body, which is critical for the maintenance and function of specialized nerve cells that are called motor neuron. So motor neurons in the brain and spinal cord control muscle movement throughout the body. So if there's not enough functional SMN protein, the motor neurons die, which leads to debilitating and often fatal muscle weakness. And this muscle weakness is often so severe that these children don't survive even the first couple of years of life. So they typically die due to respiratory failure. So the SMA is caused by a mutation in the SMN 1 gene and it's generally classified into a couple of subtypes based on the age of onset and severity. So the infantile-onset SMA is the most severe and most common subtype. So zolgensma — which is the gene therapy that was approved in 2019 — is indicated for the treatment of children who are less than two years of age with SMA. And what the product is concretely an adeno-associated virus vector-based gene therapy that targets the cause of SMA. So the vector delivers a fully functional copy of the human SMM gene into the target motor neuron cells. And just this one time, an injection results in the expression of the SMN protein in the child's motor neurons, which improves muscle movement and function and survival for these children who previously had a pretty awful lot prognosis. I think, as of 2020, there were about 700 plus children who are successfully treated with this gene therapy, which is a pretty sweet outcome.
SPENCER: Oh, I see. So they can design one virus that just changes that one spot on your DNA, regardless of what the rest of your DNA looks like.
NOOR: Yep, exactly. It's kind of like grepping for a specific sequence. It's sort of trying to seek and finally, here's a specific sequence, and looking for that region to cut and paste. And basically, the whole idea behind off-target effects is kind of, let's say you have a Google Doc, and you're searching for cat, so basically, off-targets effects occur when there's like a substring that matches the string you're looking for. (I can't think of a good example.)
SPENCER: You got the word category, and you choose to change all the instances of cat, but then your category gets modified to dogatory or whatever,
NOOR: Yeah, exactly. So that's what's going on. Basically, they have to really specifically design these vectors so that they have long enough sequences where there's a really low chance they're gonna go cut and paste other substrings that they don't want to affect. And there's actually a huge amount of progress in this space. I don't want to put a wet blanket on that. There are people who are doing really, really impressive work around making CRISPR. It's super targeted and able to deliver these reliably. So I don't want to do any fear-mongering. I think that it's definitely getting there. But that's just one of the problems.
SPENCER: See why it doesn't work for the kind of stuff that you're doing where there might be thousands or millions of genes implicated because they have to target super-specific genes in order to be able to change it.
NOOR: Yeah. I think that our approach — despite I think what some people might think — is actually pretty humble. Right now, during IVF, the embryo prioritization process is purely a beauty contest. So they're looking at the morphology. So they're just looking at how the embryo looks, and they're saying, "Okay, this one has more or less likely potential to implant." And I'm not saying that morphology is bogus, but I mean, it just objectively is a beauty contest. And understanding the genetic propensity for a specific disease, I think, for a lot of parents is super important. And I think that it's low risk, in a sense that it's sort of a natural process. These are the embryos that were naturally produced that would otherwise have been created via spontaneous conception. And we're just providing more information during that otherwise random process of selecting which embryo to implant into the mother.
SPENCER: So, Noor before we wrap up, I wanted to ask you, what do you feel about the moral obligation parents have in terms of reducing risk for their children?
NOOR: I think that Jonathan Anomaly, who's a bioethicist at UPenn, actually has a really interesting framework to think about this. So he talks about the idea of deserved versus undeserved bad luck. So basically, deserved bad luck would be bad luck that happens to people who have done something that they could have done to prevent it. So essentially, when bad luck happens to a risk seeker. So someone who's doing base jumping, and then they end up breaking their leg or something as a result of that, as opposed to undeserved bad luck, which would be you'd get diagnosed with cancer and ended up having to go through chemotherapy and everything that's associated with that. So I think that it's a really interesting situation. I think that couples, right now, are in a place where they sort of have this newfound capability to forecast what their future risks will be. And they have to sort of decide what they want to do with that information. And I think that for some couples, they're going to feel that they want to basically mitigate the chance of undeserved bad luck and to reduce the amount of susceptibility variants that their child inherits. But I think for other couples, they're going to do it differently as the idea of rolling the dice — that sort of classic thinking of "Well, it worked for me, it was good enough for me, so it'll be good enough for my child." So I think that parents are gonna see it really, really differently.
SPENCER: So people want to purchase the service you're offering on Orchid, how would they do that?
NOOR: Yeah, so you just go to Orchidhealth.com and you can sign up for the waitlist.
SPENCER: Awesome, Noor. Thanks so much for coming out. It's a lot of fun.
NOOR: Yeah. Thanks for having me.
Click here to return to the list of all episodes.
Sign up to receive one helpful idea and one brand-new podcast episode each week!
Subscribe via RSS or through one of the major podcast platforms:
Host / Director