In addition to the usual disclaimers (i.e., this is an unpolished blog, written perhaps a bit more stream of consciousness and repetitive than some (all?) readers would like because this is a free time activity to which I am not devoting time to edit because I have football to watch), I will add one more. This turned out longer than usual. Brevity has never been my strong suit. Warning: unpolished and meandering.
At this point, I would guess that Harden’s book The Genetic Lottery, a rapid review of which was the focus of my last blog post, has been a success in terms of sales and stimulating thought and attention. I’m not a fan at all, and I’m still thinking about it. Yet, in terms of its actual contribution–in its logic and argumentation–I remain utterly perplexed by the plaudits the book has received. I have been thinking (and now writing) more about the issue I raised in my previous review about Harden’s claim that incorporating genetics has much to offer social science and policy, so much so that it is not just helpful but necessary to include genetics in social science models. I noted toward the end of my rapid review: “Given that Harden agrees that genetic engineering is unwise, what, exactly, does she think this [the incorporation of genetics into policy] offers us? She doesn’t tell us in this book.”
The (lack of) answer to this question has continued to nag me, and so too for the value for (social) science understanding. Thus, I re-read—well, more like skimmed–the book and reviewed some other relevant works to try to smack myself in the face with their arguments for ‘why this is valuable’, so valuable, to remind the reader, that Harden avers that ignoring genetics in social science research and models is the moral equivalent of bank robbery [again, I think that she meant to use a burglary as an example, but still].
Here, I wanted to quickly outline what I understand to be their arguments for the value-added, even necessity, of incorporating genetics into research on human social behaviors/social science. In so doing, I aim to clarify what I see as the problems with her/their logic. First, I discuss social science and then policy.
Why incorporating genetics into social science models is valuable, even necessary: Harden and colleagues’ justification; (reader please note this my summarizing what I understand to be their arguments):
(1) All complex social behaviors are heritable. As Harden (2021) notes, twin studies have demonstrated for decades that for all complex social behaviors of interest to social scientists, heritability is substantial (an average of 50% heritability).
(2) Heritability estimates indicate causal effects of genetic differences on individual differences in traits. [Again, this is a contested viewpoint, but this is her view.]
(3) Ergo, individual differences in all complex social traits are partially caused by genetic differences, and, as Harden (2021) notes, we have known this for years. [At least twin studies showing non-trivial heritability of complex social traits have existed for years.]
(4) When examining the influence on social forces on complex social outcomes and behaviors (e.g., educational attainment, crime), leaving out genetic differences is a problem of omitted variable bias. That is, if we do not incorporate genetic differences, which may partly cause—only partly because it should be remembered for most outcomes the twin study heritability is <70% and for many reasons this is a high end estimate—putative environmental causes, these uncontrolled genetic influences can and will masquerade as causal environmental effects.
(5) Consequently, the estimate of the environmental effect will be less precise (and potentially inflated in some cases) by not parsing out genetic influences.
(6) Therefore, to get more precise point estimates of environmental influences, we must incorporate genetic influences.
(7) Harden et al. recognize that methods that can provide (roughly) genetically unconfounded estimates exist as instrumental variable methods and fixed effects models where individual differences are held constant to assess change over time in response to changing circumstances, but these require expensive panel data.
For the non-quantitatively oriented science reader, I will note a few things. First, to point #7, federally funded rich panel data already exist, and using such data, numerous fixed effects models have shown that changes in environments produce changes in the things that we care about. Second, and more importantly, precise point estimates are generally not of major interest to social scientists. Nearly all of our measures, including our outcome measures, are noisy, (contain error), even biased. In general, what we want to know is whether more of something (education, parental support) is associated with more (or less) of something else (income, education) that we care about, ideally with some theoretical orientation. Frequently the scale used to measure social influences is somewhat arbitrary anyway, such that the precise point estimate (e.g., weeks of schooling) associated with 1 point increase in the ‘social support scale’ is inherently vague.
Third, for reasons that are out of scope, genetic measures (as PGS scores) are always noisy and biased as well, due to the measurement of only some of genetic variants we possess, imputation (informed guessing) for some of the unmeasured variants, and the presence of environmental confounding, known as population stratification, which can be understood more simply as the effects of shared culture or social structures in the form of distant relatedness. That is, people who live in geographically proximate areas and thus share more similar physical and social environments also tend to mate with each other creating greater genetic similarity in groups which also share more similar social and physical environments. Controls for population structure are included, but except in the case of sibling different studies, these are inadequate.
PGSs created from the estimates from sibling studies are the rare exception, such that most PGSs are tainted with population confounding (i.e., pick up sociocultural and physical environmental differences). Therefore, PGSs capture both ‘genetic’ differences that influence the trait through biological mechanisms (usually distantly) and sociocultural differences that are associated with allele (genetic variant) frequency differences across subgroups. For a quick example, due to the greater UV rates at lower latitudes, if we were to do a GWAS of the Eastern US population for skin cancer, we would likely pick up, in addition to any genetic predispositions to skin cancer (e.g., genes involved in cellular repair) genetic variant (allele) frequency differences among those in the US South and those in the Northeast due to the fact that people in the South have tended to mate with each other and are more closely related and so too for the North. Some unknown number of these genetic differences, however, will have nothing to do with the biology of skin cancer and everything to do with being randomly differentially distributed between the South and North, the former which is exposed to higher UV rays and more sun. (These random differences that develop over time are known as genetic drift through assortative mating based on sociocultural and physical environmental proximity.) In short, some of the genetic variants identified will be a function of random genetic differences and associated with the outcome for purely environmental reasons that have nothing (biologically) to do with the outcome.
For all of these reasons, the ‘greater precision in estimates of the environment’, at the current moment, is debatable. Perhaps we’ll see a little greater precision in some environments, but a very meager and not very useful benefit. Thus, leaving out genetics is, in my view, akin to jaywalking (at worst) not bank robbery.
Why incorporating genetics into social science models is important for policy—Harden et al.’s view:
I must first admit I found this even less clearly articulated in Harden’s book, which is why I wrote that ‘she doesn’t tell us why in this book’. I don’t think she does, clearly. However, if one looks at some of her other co-authored work, especially with Koellinger (who she references as a high-quality cook in her book), one can see more clearly their ‘logic’.
For example, in a paper called, “Genetic Fortune: Winning or Losing Education, Income, and Health”—similar to the ‘lottery’ with a somewhat awkward subheading (what does ‘losing education mean’?) a study/paper designed and overseen by Koellinger and co-authored by Harden and published online in November 2020 (though I have the updated December 2020 version), spells out why they believe this work is important to policy. I disagree with this reasoning, as you can imagine. Notably all of the below are quotes from Kweon, Burik, Linnér, Vlaming, Okbay, Martshenko, Harden, DiPrete, and Koellinger (2020), unless otherwise noted.
(1) “The origins, extent and consequences of income inequalities” differ across contexts,”
(2) “a universal fact is that parents influence the starting-points of their children by providing them with family-specific environments and by passing down a part of their genes [1 of each chromosome from each parent, plus an X from mom and an X/Y from dad, all going well]. This phenomenon creates individual-specific social and genetic endowments that are due to luck in the sense that they are exogenously given rather than the result of one’s own actions [here we can see the luck language].”
(3) “Thus inequalities of opportunity can partly arise from the outcomes of two family specific “lotteries” that take place during conception—a “social lottery” that determines who are parents are [?social status], and a “genetic lottery” that determines which part of their genomes our parents pass on to us.”
(4) “Inequalities in opportunity restrict the extent of intergenerational social mobility and limit how much credit people can claim for achievements such as their education or income.” [Okay, inequalities in opportunity affect a lot of things, curious that these are the two accentuated here.”]
(5) **Key** “The relative importance of social and genetic luck has policy relevance because the extent to which people are willing to tolerate or endorse inequality partially depends on whether they perceive that disparity originates from differences in effort and choice (e.g., working hard) or from differences in circumstances that are outside of one’s control (e.g., luck in the social or genetic lotteries). The empirical results suggest that inequality that can ultimately be traced back to luck may be perceived as unfair and people may favor redistributive polices more strongly *if* inequality is the result of luck rather than agency” (bold and asterisk emphasis added).
Phrased alternatively, their argument here is that the more people perceive that disparate outcomes/inequalities are due more to luck than to personal agency/hard work, they may be more inclined to favor redistributive policies. (The word perception jumps out at me here; I will return to that later.)
(6) “*If* the outcomes of the genetic or social lottery influence economic outcomes, it [confusing pronoun usage] can challenge common intuitions about the relative importance of luck and agency” (asterisk emphasis added).
(7) “It is important for science and policy to understand the extent to which genetic and social fortune[s] contribute to inequality, the mechanisms that are at work, and whether and how the consequences of exogenously given endowments can be altered.”
This argument pulls in some new facets and combines them with the ongoing logic. Understanding “the extent to which genes and social fortune contribute to inequality” is important, they argue, because presumably that shapes people’s perceptions about the importance of such factors and therefore their willingness to support redistributive policies. Here they also add in a focus on mechanisms and changeability, presumably due to their obvious policy relevance-but genetics isn’t needed to examine changeability.
Therefore, the Kweon et al. (2020) paper “makes progress in this regard by using large-scale molecular genetic and family data to test the influence of genetic and family-specific endowments on income inequality and its consequences for health.” Health, you say; where did that come from? I agree; (for those not in the know, the funding for health outcomes in the USA from NIH is high, less so for non-health social outcomes.)
A few points, of course.
First, I must make note of the importance of the perceptions bit. Of course, facts and empirical patterns don’t determine action. Whether such facts are known and how they are interpreted/given meaning shapes action. Again, Harden and colleagues note that to the extent that people perceive that inequalities are the result of luck (genetic or social), the more inclined they tend to be to support redistributive policies. This makes sense.
However, I have a few issues. First, the extent to which people’s perceptions of the sources of inequality are reflected by empirical evidence/facts is not entirely clear. Indeed, I am sure that a twin heritability study would likely reveal that a non-trivial portion of the variance in perceptions of sources of inequality would be found to be heritable. To be clear, from that finding, I do not personally conclude that this means that this same portion of variance across individuals is caused by genetic differences, given the inflationary biases of twin study heritability estimates. My point is, by their own logic, much of this variance in perception in the source of inequality is due to, even caused by, genetic differences. There is, in a sense, an infinite regress or circle in behavior genetic models when everything is ~50% heritable—perceptions, beliefs, attitudes, traits, and behaviors; every non-randomly assigned allegedly ‘environment exposure’ is approximately half caused by genetics, on this view, which makes everything a bit awkward.
Second, and more importantly, we already know both from common sense and a wealth of social research that both one’s family-specific environment (income, wealth, status, race/ethnicity, religion, beliefs) and genetic endowment shape one’s life trajectory, including social outcomes like educational attainment and income. We know this from a wealth of social science and behavioral genetic research and basic observation of reality. Harden tells us this repeatedly in her book and Kweon et al (2020) mention it in their paper. WE KNOW THIS. Thus, there repeating the *if* this matters, then so and so, is a misrepresentation of the literature. We already know these matter. We do not need yet another study to show that these things matter, and it is somewhat disingenuous to highlight that these things matter, you all, so much so that if you exclude them from your models you are the moral equivalent of a bank robber, while turning around and publishing papers questioning whether these things matter. At least, that’s my opinion.
Third, social and genetic endowments are inextricable. We also know this. We are not going to be able to parse them out because human development and behaviors emerge from a biological system that always operates in with responses to social environments. Humans are organisms, which are processes in constant flux in response to internal and external input. Environments up and down regulate genes; initiate cascades of action and reaction; shape our motivations, desires, and behaviors, among other things. Likewise, our genetic differences can shape both the amount and the nature of protein products produced and other gene products in a manner that can and does affect how we look, perceive the world, and react to it.
All of this is why the results of their study: “We show that the well-known gradient between sociogenomics status and health is partly rooted in exogenously given genetic and social endowments. Furthermore, we demonstrate that a substantial part of genetic luck for income and its link with health appears to operate via educational attainment and its accompaniments, i.e., environmental factors that are in principle malleable through policy interventions” can be met with a “duh”. Of course, this is the case. How this advances policy relevant knowledge one iota I cannot say (because I do not think it does).
Heritability says nothing about malleability, as Harden and Kweon et al. remind us on multiple occasions, and a wealth of social scientific evidence on traits that have been found to be substantially heritable has already shown that highly heritability says nothing about responsivity to social influences. Indeed, that is part of their argument—that genetic causes operate through malleable environmental pathways. But here, they throw “does it matter” as unknown and then answer it as if they are offering something new. That’s double dipping, which is kind of like jaywalking but dirtier.
Broadening our lens, and thinking about the well-known fact (again repeated by Harden in her book several times) that high heritability, interpreted by her to mean and thus include ‘genetic causes’, of differences in social outcomes say nothing about malleability to policy influences, brings to stark relief the question of why then do we care about the extent to which these differences may be partially—as again even the upward bounds of heritability are almost always less than 70%–rooted in genetic differences between people? Given that this is not relevant to changeability, I am left to believe that this is meant to go back to “perceptions”, and with it that is intended as the policy-relevant import of this work, in Harden’s view, and indeed, perhaps the purpose of her book.
Phrased more clearly, this would make the aim of Harden’s book, in my view to try to change people’s perceptions about the causes of inequalities from personal responsibility and hard work to ‘luck’, with an emphasis on genetics. Doing so, in her mind, will perhaps promote the redistributive policies she clearly publicly supports.
This angle is also how she distances herself from the ‘bad’ (according to Harden: eugenic, racist) arguments of infamous hereditarian scholars before her—the Bell Curve types who, it might be noted, focus less on genetic differences, in general, and more on IQ, which they view as largely genetic and, crucially, mostly fixed, if not at birth, then certainly after childhood. Harden, for her part, recognizes that genetic differences do not imply fixity (although she has used the language of ‘genetic predisposition’, so this is in some ways a difference in degree rather than kind, but a notable one nonetheless). However, that recognition does not, as Harden argues, somehow make a consideration of genetics useful for social science or policy. From the fact that genetic differences do matter for development and individual social behaviors, it does not follow that incorporating (noisy, environmentally confounded) measures of genetic differences will advance social science models or policy since, as we have noted ad nauseam, heritable or genetic does not mean unchangeable. Whether I’m naturally a bad singer or fail to get the proper socialization about singing from my parents is irrelevant to the question of whether or not singing lessons (or parenting lessons for singing support) can make me a better singer. If they can work for me, we do not need to know the extent to which genetic and social endowments shape the extent to which I am a bad singer. That is, if we can teach or support parenting skills to make me a better singer, who cares about the why especially given that we cannot really answer this question anyway given gene-environment inextricability and context-specificity (though replace singer with student, since we don’t care about my singing…well, you might if you had to listen to it. I’m serious.)
In addition to being irrelevant to policy, there remains a potential dark side to even benign, explicitly anti-eugenic work and not only if these results are used by those with whom we disagree (e.g., the billionaire insurance person with whom Harden and colleagues had dinner with at the fancy French restaurant). Indeed, even work with ‘good intentions’ from Harden and colleagues to understand the sources of inequalities can reveal the potential harmful effects of genetically-informed policies.
Specifically, in co-authored work, Harden focuses using PGSs as a molecular tracer in school achievement. Although more detail is out of scope, for reasons abovementioned, including but not limited to their environmental confounding, PGSs do not measure genetic potential for educational attainment—they capture the differences in those who receive educational attainment under current systems. And again, potential for educational attainment does not reside in our genomes, but emerges from a complex biosocial system in which genes are important resources but not the drivers of development. Yet, that is often missed or ignored despite the professed environmental awareness of those conducting this work.
Additionally, Harden et al recognize that PGSs are not useful for individual prediction but that they are useful for aggregate prediction and can be used, for example, to assess school performances. They suggest that we might, for example, control for the average PGS of schools when comparing their performances, and/or devote extra resources to schools with more students with lower ‘education-related genetic variants’. However, this quite clearly, in my view, begins to solidify this view of these PGSs/measures of ‘education related genetic differences’ as “education potential”, which is problematic. And while the outcome might be positive in some cases, even if misguided, e.g., by directing resources at the lower SES schools, given the environmental confounding, there are other potential negatives, including labeling and lower expectations.
Equally important, why in the world would we use noisy, problematic measure of ‘genetic potentials’ to direct resources and evaluate schools, rather than more easily measurable and more accurately predictive measures of past achievement, SES, and even standardized test scores? As others have noted, a much better predictor of future school performance is past and current performance (see Morris et al. 2020). We can easily see which students struggle and ascertain which students are getting support and extra help in the home and which students aren’t and respond accordingly with more sensitivity and specificity. No need for genetics whatsoever.
Genetic measures are not only less accurate predictors but also potentially a source of differential treatment. Sure, Harden wants this information to be used to good ends, but why in the world do we think that it will be? Best intentions gone awry is perhaps the most obvious lesson from all of history, and there is no reason to think that many actors agree with us on the proper courses of action. Other potential implications could be lowering expectations and standards for schools/individuals with lower ‘education-related genetics’ (estimates). One could easily see how we could treat such measures as potential, think this is the best we can get from individuals, and set the bar low for them, despite the fact that, to repeat ad nausem, even if educational attainment is strongly genetically influenced, it is strongly socially influenced and–both genetic and social influences are potentially malleable.
In sum, the policy value of this work for education systems is, in my view, minimal to nonexistent and potentially harmful to students and society. I might also note that almost all of the samples used to create these large-scale education-related (and, income-associated) “genetic propensities” use older samples, who grew up in the pre-internet era. One of the younger samples is the Add Health sample, which includes my age category (I’m 42), when google, youtube, and other concomitants of the growth of non-dial-up internet access were not prevalent (at least not until I was in graduate school). Today’s students grow up in a connected world of easily accessible information online. These technological innovations have transformed the learning environment in non-trivial ways, further altered by the pandemic. Does anyone want to argue that the individual characteristics shaping educational achievement and income are the same today as they were for children born in the 1970s and early 80s, much less the 1940s-1960s? I don’t; (even as I would agree many are the same). Distractibility working on a computer is, in my view, exponentially greater than that in pre-internet times when we worked with pen and papers and with books (hard copies of).
In sum, for real this time, even if genetic differences have a substantial influence on educational attainment, income, and via those pathways, health outcomes, given that these social influences are malleable regardless of their origin, there is no obvious policy-relevant value added to incorporating genetic differences given our measurement and knowledge limitations. If genetic meant unchangeable, then a case could be made; but it doesn’t. There are potential harms. For all these reasons, I find these arguments to be flimsy at best, with some of the recent framing about uncertainties (*if* this is shaped by social and genetic endowments), which have been well established, as unknowns, almost disingenuous.
One can believe that genetic differences exist and shape differences in complex human social outcomes without being a eugenicist or racist or sexist, etc. But it does not follow that study of the effect of genetic differences on these social outcomes from an anti-eugenicist or anti-racist perspective is thereby useful and important (or that it can’t then be used to support racist or eugenics ideologies and policies, to be fair, a fact recognized by Harden and others on multiple occasions very clearly. Clearly recognizing a danger and acting to prevent it, are, however, two different things, which is how I once broke both my hands.).
If you made it this far, bless you, ha. Before closing, I want to reiterate what this long-winded verbosity was and wasn’t about. I am *not* arguing that perceptions about the sources of inequality do not matter for how we address (or whether we tolerate) inequality. I agree that perceptions about the sources of inequalities shape individuals’ views about whether and how to address inequalities. I am saying that we have a wealth of research over decades that shows that genetic and social endowments at birth profoundly shape our economic and social outcomes. We do not need more data showing this if the aim is to change perceptions; we do not need genetic data, for example, to show more clearly that environments matter for our life trajectories. Doing more research to compile yet more facts that this matters is not, in my view, necessary or efficient to change perceptions, in other words, if that is the goal.
To summarize too my thoughts on policy-relevance, my argument is not that genetics don’t matter or that genetic data can’t tell us anything. To use again Harden’s example about controlling for average student PGSs for educational attainment in schools to assess school performance while controlling for individual differences, this is both unnecessary and limiting. Net of environmental and shared family environments, the educational attainment PGS explains not quite 3% of the variance in education. And without controlling for such confounding these PGSs explain 12-14% of the variance in educational attainment. Why would we control for a noisy measure that explains less than 15% of the variance, when we could use already available standardized test scores from previous years or past school performance–scores already available–that would explain more than 50% of the variance in student performance. The only reason that I can think we would do that is if we assume that the genetics measure is actually a measure of ‘genetic potential’ or ‘ability’, an assumption which Harden et al. disavow. This doesn’t mean genetic differences don’t matter. It’s like bringing pliers for a wall nailing job; sure you can do it, but why would you when you could bring and use a hammer.
Perhaps I’ll read the book one more time in my “for fun” time to make sure I haven’t missed anything. Feel free to add your thoughts; if I’ve missed something important, I’d particularly like to hear about it.
 Citing from Harden’s book required much more words as she sprinkled arguments in larger discussions with examples and personal stories, which is good for a trade book, but makes for more difficult quoting. However, she makes similar claims. For example, Harden (2021, pp.181-182) makes: “First, environmental experiences—whether it be having a sexual relationship at a certain point in one’s adolescence or receiving a certain type of parenting or living in a certain type of neighborhood—can be correlated with life outcomes but not be causes of them. Second, policies that are built on a flawed understanding of which environments are truly causal are wasteful and potentially harmful [why–since malleability is not related to heritability?]…. Third, genetic data—whether it be comparisons of identical twins or comparisons of people with similar polygenic indices— help researchers solve the first problem and, in so doing, avoid the second problem. Genetic data gets one source of human differences out of the way, so that the environment is easier to see.”
 Curiously, in what is likely a common instantiation of our arbitrary citing practices and unclear protocols and expectations, Kweon et al. (2020) cite Rietveld et al. (2013) twice in one paragraph on the bottom of p.5 cite after a statement about what twin studies show. Yet, Rietveld et al. (2013) is a GWAS study–known as EA1 (educational attainment GWAS 1). I’m not sure if they cite Rietveld et al. because Rietveld et al. also noted what twin studies show, but that is confusing, in my view. When someone (perhaps most commonly the author in their prior work) make a similar statement about existing research, this does not not warrant citing the prior research for making that not-so-unique claim. If wanting to point to a similar paper–in this case no idea why the Rietveld et al. was chosen, I think more helpful to say “see summary in xx”. Just me, perhaps.