I’m reading a lot more continental philosophy lately than I ever have before (I know — I know. I know!) and been having an interesting thought about epidemiology and the practice of science that I want to share — mostly because I’m genuinely not sure what I think yet, and on top of that, I don’t really know enough yet to begin to determine what I think. The basic idea is this: epidemiology as it is taught and learned is basically a deconstructive project. This is distinct from how epidemiology functions in the social and political spheres. I will try to illustrate with an example that I am actually capable of handling — Derrida is like Beetlejuice, I’m afraid to invoke him directly too many times for fear he’ll jump-scare materialize behind me in a striped suit.
The best part of science education is the venerable institution of journal club. In my opinion, we don’t do journal clubs nearly enough, which is to say, we don’t do nearly enough close-reading of scientific texts. My original home discipline is biology; journal clubs in biology are a lot of fun. Journal clubs in epidemiology are much more rare (not surprisingly — very boring managerial discipline lurching hard into Vox-style smug irrelevance post-2020) but when they’re good, they can be really good, for a really specific reason.
The way I usually like to think about things in my field is: “data are compatible with multiple states of reality.” Developing the intuition necessary to imagine what these states of reality might be and how they might have converged to produce the data collected and analyzed in a given article is the real work of journal club — epidemiologic critique as tacit acknowledgment that there is no interpretation of a given piece of data analysis that closes the interpretive act for good, or brings the process of interpretation to a definitive end. Iterative and open-ended critique of a given journal article is supposed to excavate, like dusty buried artifacts, not only the investigator decisions (implicit or explicit) and values structuring the analysis, but also the stuff lurking out there in the “real world” that may explain a given set of findings better than the authors’ interpretation or cut against the interpretation supplied. (From the Wikipedia page on deconstruction: “Derrida refers to the — in his view, mistaken — belief that there is a self-sufficient, non-deferred meaning as metaphysics of presence. Rather… a concept must be understood in the context of its opposite.”)
There’s a lot to deconstruction and I am simply not pretending I know. I don’t know shit. Just want to be very clear about that, because a lot of what follows is over my head and maybe sounding even more addled than the rest of this post. But considering deconstruction in the technical sense, we have to consider also what the binary opposition at stake in journal club is. To my mind, it’s something like “real/not real” or “true/not true,” pertaining to the findings or overall declarative statement about what a particular paper “shows.” This opposition can never be definitively resolved or synthesized, and in fact the two sides of the binary are sort of mutually haunted by their opposites, as illustrated by the sheer number of epidemiologic findings that seem to be both true and not true simultaneously — for example, are coffee or wine supposed to be good or bad for you? What’s the consensus this week?
This is particularly fun in epidemiology because it’s often like shooting fish in a barrel — the interpretations authors give their own work is usually so basic, 101-level and rote that one can have a lot of fun searching for others. This is not a personal criticism at all, it’s a categorical-disciplinary one. We are trained to do all this wonderful work of deconstructing epidemiologic texts, and then trained even more to disregard it and follow very strict formulas for how we conceptualize, discuss, and communicate our own findings. The ghost of frequentist statistics haunts this rigid metaphysic of presence, so often do we have to talk about things backwards in a hypothesis-testing way or exercise extreme caution lest anyone think we’re inferring causal relationships from statistical associations.
To illustrate very briefly, I plucked one article from the current issue of AJPH to look at. (No particular reason, this one was just easy to find and seems like a perfectly standard and illustrative epidemiologic cohort analysis.) The article is called “Airborne Lead Exposure and Childhood Cognition: The Environmental Influences on Child Health Outcomes (ECHO) Cohort (2003-2022)" (Gatze-Kopp et al., 2024).
The objective of this paper is to do a lot of complicated data analysis to determine “whether a previously reported association between airborne lead exposure and children's cognitive function replicates across a geographically diverse sample of the United States.” This is an interesting one because the findings are both true and not true and there’s no way to make a final, dispositive reading of this article to resolve that opposition.
One the one hand, borrowing from a lot of background knowledge about lead exposure, the finding is obviously true. We know lead is a neurotoxin; we further know that the neurotoxic properties of lead transcend space and time, so there’s not really any reason to believe that this relationship would not hold (or “replicate”) in a more (?) geographically diverse sample of the US population than has been analyzed in other papers. (We might observe geographic differences in airborne lead exposure itself — in fact, I’m almost certain that such differences exist — but the research question here pertains, I think, to some mechanistic aspect of the exposure-outcome relationship.)
At the level of the analysis itself, the picture is a lot less clear. For one thing, consider the layers of abstraction involved here (not uncommon for epidemiologic analyses). Consider the text of the Methods part of the abstract:
Residential addresses of children (< 5 years) were spatially joined to the Risk-Screening Environmental Indicators model of relative airborne lead toxicity.
Right off the bat, we are abstracting from living under-5 children to static residential addresses, which are then joined to another mathematical abstraction, the EPA’s Risk-Screening Environmental Indicators model, a mathematical model that “incorporates information from the Toxics Release Inventory (TRI) on the amount of toxic chemicals released or transferred from facilities, together with factors such as the chemical’s fate and transport through the environment, each chemical’s relative toxicity, and potential human exposure.” How many kids are being exposed to airborne lead, and how much — the exposure half of the exposure-outcome relationship under investigation — is constructed from a teetering tower of mathematical assumptions informed by some data, likely with lots of imperfections of its own. The outcome side of the relationship is similarly abstracted:
Cognitive outcomes for children younger than 8 years were available for 1629 children with IQ data and 1476 with measures of executive function (EF; inhibitory control, cognitive flexibility). We used generalized linear models using generalized estimating equations to examine the associations of lead, scaled by interquartile range (IQR), accounting for individual- and area-level confounders.
The outcomes are IQ and measures of executive function, both abstract (though reified) statistically-derived indices. The abstracted exposure and abstracted outcome measures, along with some likely similarly abstracted characteristics about the children under study and their residential neighborhoods, were fed into some statistical architecture (linear models using generalized estimating equations, which is a technique for analyzing data with repeated or clustered measurements). Does this tell us whether the finding is “true”? Does this tell us whether airborne lead exposure is associated with adverse cognitive outcomes in children?
Well… sort of. Here’s what the authors say:
An IQR increase in airborne lead was associated with a 0.74-point lower mean IQ score (b = -0.74; 95% confidence interval = -1.00, -0.48).
This finding is taken to be “real” because it is licensed by the procedures of statistical hypothesis testing: because the confidence interval does not encompass the “null” value of no effect (zero in this case), this finding is “statistically significant” and therefore real/true — unlike some of the findings also reported in the paper about executive function outcomes. But how true is true, in this sense? There are ways to get confidence intervals that don’t cross the null value in lots of ways that have nothing to do with the relationship under investigation (like having a large sample size, for example). And what does a 0.74-point decrease in IQ score really mean?* Do partial, fractional points mean anything in the context of a 140+ point discrete scale? Moreover, do they translate to meaningful biological, cognitive, or functional differences?
Then we can even get into lots more epi critique stuff. As just one example, children’s lead exposure is indexed to their residential address. This is a notoriously crude (but unfortunately common and often necessary) way of assessing exposure. For just one thing, operationalizing the exposure this way assumes that children under 5 spend most of their time at their residential address. But what if they don’t? We don’t know! We shouldn’t even expect the number and characteristics of children who do spend most of their time at home to remain constant over the study period; residential address is probably a more faithful indicator of toxic exposures in 2020-2023 than at the beginning of the study period in 2002.
This is going on way too long, so I’m going to wrap up here in a moment, but this is all just intended to hopefully demonstrate the circularity and endlessness of this process, and the peculiarity of these kinds of inductive inferences that they can be and often are both obviously true and endlessly vulnerable to falsification and attack, simultaneously and irreducibly. The process of learning how to do epidemiologic critique is (or should be, I think) is the process of getting deeper into these weeds and deeper into the weirdnesses of trying to make truth claims about the world based on studies like this. I think it is a really fruitful exercise that stands in contrast to the way that statistical indicators and products of data analysis function as code in public life and media… a topic for another day soon.
* I do not believe in IQ as a concept and think it’s bogus — just to be clear.