Verso books, man. Why are they like this?
I picked up a copy of Justin Joque's 2022 book Revolutionary Mathematics: Artificial Intelligence, Statistics, and the Logic of Capitalism -- sounds promising, right? Wrong.
I wanted to like it. Obviously! This book is way up my alley, judging from the title alone -- in fact, at first glance I was worried that it had sort of scooped the idea for my own (glacially progressing) book project. But the book is simply, where the thread of argument is briefly intelligible, like a dark street illuminated by a flash of lightning, wrong. At least, this is my take on it about three turgid chapters (of eight) in. I've read three chapters, and I still have absolutely no fucking clue what this book is trying to argue about statistics and its relationship to capitalism. A few of my stray thoughts are below.
Joque describes a "revolution" in statististical inference which seems to refer to... a move to Bayesian inference? This is hardly a revolution. For one thing, the move to Bayesian inference techniques over the more familiar midcentury frequentist techniques is far from total. I guess many so-called AI models (which are really "machine learning," which is to say, statistical learning models) use some kind of Bayesian methodology, but if you look around, frequentist statistics are very much the norm (hah hah) in many other fields. I work as a statistician facilitating (really subpar) clinical research; I reside in, have my mail forwarded to, frequentist-land. I spend my days arguing with pompous surgeons about why a p-value doesn't mean what they think it means (I do not want to be doing this, so please, if you've got any job leads, or are looking for writers or book reviewers for hire, please reach out).
And for another, in the longue durée of the history of statistics, the turn to Bayesian statistics is actually a reversion. Bayesian statistics correspond much more closely to the "classical" probability of the mighty Laplace and, well, just about everybody working on probability and statistics prior to R.A. Fisher in the 1920s and 30s. Fisher's frequentist "inference revolution" would actually be a blip (if we take Joque's word for it that we have transitioned wholesale to a world of Bayesian inference, which we haven't...) in a much longer history of concern with so-called "inverse probability." This was the central problematic of statistics and probability calculus before Fisher dubbed inverse probability "Bayesian," after the hapless and long-dead Rev. Thomas Bayes. This is a bit of a misnomer, because it's really Laplace who cracked the inverse probability problem for the first time, after but independently of Bayes.
What the hell am I even talking about here? "Bayesian" and "frequentist" statistics are two heterogeneous schools of technique as well as philosophy, and two different ways of treating the same problems. Using a coin-flipping example, I'll follow on this great blog post illustrating the differences. In the simplest terms, the frequentist views the probability of getting heads on 50% of the coin flips as an empirical fact of the data. Flip the coin twice, ten times, two hundred times, and write the number of heads. As the number of coin flips goes to infinity, the proportion of heads goes to 50%. The maximum likelihood estimate of the probability of getting heads is (# heads / # flips) -- a single value, and a property of the data. Bayesians, rather than treating the probability of getting heads as a value, treat it as a probability distribution -- the parameter a Bayesian is trying to estimate is the probability of getting heads given the data of however many coin flips. Put another way, frequentists tend to look at the problem as "what is the probability of the data given an assumed hypothesis?" This is the idea behind the uneasy hybrid techniques of "null hypothesis significance testing" (which are nevertheless widely used today): the correctly maligned p-value is supposed to summarize the probability of observing the data one has, given a "null hypothesis" of no effect. Bayesians look at the problem in terms of: what is the probability of the hypothesis given the data? This is the problem of "inverse probability," which we'll return to later (later as in, perhaps in another post), and it's a much more intuitive one -- inferring causes from effects corresponds much more neatly to the types of things that we want to know, whereas the frequentist approach as it is mostly used simply computes the likelihood of the effects observed given an assumption of no operational cause.
I am far from an expert in Bayesian methods, I don't know too much about them. I'd like to learn more and I'm working on it, but I'm gonna stop getting into the weeds here -- I think this is enough for my purposes. Joque's book makes the (in my view) error of equating frequentist statistics with "objectivity" and Bayesian statistics with "subjectivity" (although in the book both are, confusingly, related to a process of "objectification" as described by none other than Karl Marx). There is a bit of truth to this, because frequentist statistics deal mostly in long-run objective frequencies of events, and Bayesian statistics deal mostly in attempting to quantify subjective uncertainty about a given hypothesis -- in slightly more mathematical terms, attempting to characterize the probability distribution of an unknown parameter. But the very concept of probability has internalized these two facets from the jump, as extensively chronicled and excavated from boring old documents by Ian Hacking in his histories of probability and statistics. The old sense, probability meant approvability or ratification by some kind of authority, usually clerical; Hacking charts how a concept of internal evidence emerging from the "low" epistemic sciences of alchemy and medicine fused together the objective/subjective modes of probability into one concept. The color or cloudiness of a patient's urine was read as a sign, written by the Author of Nature, as to some obscure cause of a patient's ailment. Reading these signs backwards to likely causes was the work of early probabilistic thinking, and the stability or conjunction of certain signs with certain causes over time -- stable long-run frequencies of events -- was the revelation of internal evidence.
I'm a little off path here. As I said before, the frequentist school came to be with R.A. Fisher and the Neyman-Pearson-Wald (NPW) schools around 1920-1950. In a 1985 paper, Bradley Efron asks "why isn't everyone a Bayesian?" His answer (Efron, former president of the American Statistical Association, is -- I think -- an avowed frequentist) is that while Bayesian techniques are "philosophically superior" in their treatment of uncertainty and inductive reasoning, frequentist techniques are way, way easier to apply, and are even automatic. With frequentist statistics, you can get a pretty good answer to your question (provided it is formulated the right way) without having to really think too much or know too much about the situation at hand. Kurt Danziger's work charting the uptake of mathematical statistics in psychology and psychiatry through the 20th century confirms this; Danziger's argument is that mathematical statistics (by which he means the frequentist methods of Fisher and NPW) automate an "objective" process of inductive reasoning, eliminating subjectivity at two levels -- the subjectivity of the research subject, who mathematical statistics aggregate into anonymous groups characterized by statistical parameters, and the subjectivity of the researcher, whose "subjective" beliefs ostensibly disappear into the mathematics of applied statistics. (Of course, the subjective beliefs don't disappear; mathematical methods of any kind encode serious epistemological and ontological assumptions, though these are totally ignored in the actual practice of applied statistical analysis because, again, a huge impetus for adopting them is the desire for an objective mode of reasoning free from human influence.)
In Bayesian inference, you do have to know a fair amount about the particular problem at hand. The most obvious example of this is the "prior" -- that is, the prior probability distribution of the parameter you're trying to estimate. (Called "prior" because it's prior to data collection. If you don't know anything, you can use an "uninformative prior.") People who don't do statistics for a living talk about "priors" as a mathematization of subjective belief. But funnily enough, in practice, priors are usually taken from frequentist empirical probabilities. (Need a prior probability of the risk of heart attack among mid-life white men in America? Look at a bunch of published research papers and see what they say, then pick something from those that seems reasonable.) Joque, rather puzzlingly, seems to view this as a weakness? It's hard to tell, but the book seems to be trending in the direction of "Bayesian statistics = capitalism = bad," which would be really stupid if it is indeed where it's going. (In any case, it's bad that I'm nearly a hundred pages into this thing and can't tell what the hell argument he's trying to make.)
There's a big unanswered question here that perhaps seems to be what Joque is fumbling for. This aspect of Bayesian inference -- the need to know a lot about the situation and have a lot of data to try to characterize the posterior distribution of a parameter means that Bayesian methods tend to be a good deal more computationally and labor intensive than frequentist methods. This is sort of what Efron is getting at. At the end of the day, you can do "machine learning" with just about any statistical technique -- Joque misses the fundamental continuity of all these statistical methods with the big data/AI/machine learning turn. At the end of the day, they are all just optimization problems, run for purposes of prediction rather than inference as in experimental statistics. There has been no breakthrough in artificial intelligence technologies. The reason a large language model like ChatGPT can't do math is because it is a powerful brute-force computer (memorizer) of tons and tons and tons of data. Literally, ChatGPT cannot manipulate the numbers 2 + 2 to give you the answer "4," instead it has memorized from its vast data banks that statements of "2+2 = " are vastly more often followed by the character "4" than not. The unanswered question is: whither this increase in computing power, and especially, whither this increase in training data? This brings us into much more fertile territory for political-economic analysis. These models are so powerful because of the vast quantities of data (and fossil fuels) made available to them, data that is extracted via surveillance of our phones, via gamified attention-sucking social media platforms, the myriad of "smart" devices we now have, and so on. In short, the power of these models isn't due to any metaphysical weakness in Bayesian reasoning as compared to frequentist inference, but to very clear and intelligible political-economic trends.
This is a big weakness with Joque's book: the mapping of mathematical problematics onto "metaphysical" assumptions and subsequently onto "the logic of capitalism" is 1:1 and too just-so. There's an extremely fraught analogy between the "ideal coin" as money, the value form, and as the classic example used to illustrate the combinatorial mathematics underlying probability theory. (To which I'll have to be pedantic and say -- there's an interesting question in here about so-called equipossibility -- where Bayes tried to provide an argument for why it was possible to assume that all events are equally likely, Laplace took this as a given and derived his probabilistic math from it; there are interesting problems here, Ian Hacking has written about some of them, and I have fallen asleep on the couch a few times trying to read them. And besides, many of the original examples in probability mathematics were actually derived from dice, or from the analogy of drawing tickets from urns, not from coin flipping.)
Flipping (hahaaaaa) ahead in the book, I also want to note something that really shocked me. Joque asserts that Fisher's experimental statistical techniques were those suited to an "agrarian" society, while Pearson/Neyman's were those of an "industrial" society. The quote is here:
"We will turn to more specifics shortly, but we should note that, if Ronald Fisher modeled a statistics for an agrarian society, and Jerzy Neyman and Egon Pearson modeled one for an industrial society, Bayes—through his modern interpreters—has provided a statistical theory for the information age." (p.115)
This is, to put it bluntly, insane. Fisher worked and developed many of his techniques at the Rothamsted Agricultural Station, where he began working in -- get this -- 1919. The Station itself was established as a private research institute in 1843, the very beating heart of the industrial revolution, by -- get this -- an industrialist named John Bennet Lawes, whose claim to fame is patenting the first artificial fertilizer. The types of experiments Fisher worked on at Rothamsted were those suited to industrial agriculture -- applying different types of chemical fertilizers to plots of land and seeing which ones increased crop yields. In fact, Fisher corresponded extensively with a man named William Gosset, aka Student (of t distribution fame), the head brewer for Guinness. Gosset contributed tremendously to the development of techniques for small-sample statistical analysis, aimed at -- you guessed it -- making industrial-scale agriculture more efficient and profitable. Working on problems of agriculture does not an "agrarian" make, and Joque dangerously overidentifies the method with the setting -- a type of vulgar correlationism, if you will. In any case, please be fucking serious. How did this make it to press?
Some interesting themes are here, and I wish they were developed more rigorously and thoughtfully -- questions of objectification, reification, abstraction in the "science and technology studies" treatment of the development of mathematical techniques. There is a bit analogizing markets to machine learning algorithms that is interesting and pretty much correct -- and not new. Friedrich von Hayek figured it out first. von Hayek invented the artificial neural net, not as a model of the human brain or its neuronal architecture, but as a way to model how price information moves through a market, the oldest kind of distributed cybernetic network. Joque explains how artificial neural nets work in commendably accessible language, which seems to be a major strength of his, but he doesn't talk about this (yet, at least).
As I've already said, the thrust of the book's argument, something about subjectivity, objectivity, and objectification, is incoherent. I'm not a bad reader of texts. It simply is incoherent. Buried under a bunch of disjointed examples and Marxist technobabble are some fairly basic insights -- that knowledge is socially produced, and that under capitalism, capitalist logics, which are sometimes instantiated through mathematics and the epistemological assumptions of mathematics, affect that social production. (Joque repeatedly invokes the need for a "revolutionary mathematics" which has a rather anemic aim -- just to "show how this stuff works"). It amounts to the reformist call issued in other books on this topic -- let's just do "objectification," whatever that actually means in the text, better. In a more Marxist way. Again, the thought I'm left with is: please be serious.
How political economy shapes knowledge production is actually right here, instantiated in this book and in the general nature of books that presses like Verso churn out. A collage of references, with no deep tissue of argument to connect them in a coherent way. Which is serviceably fine, because people expect that "Marxist" books will be unintelligible (charitably) or sloppy (uncharitably); nobody close reads anymore, nobody cares, and this "knowledge" has no use except as a niche product for "knowledge workers," increasingly proletarianized as digital peons of the algorithmic economy whose attention spans are wrecked by social media platforms and relentless social and economic pressures. It is enough, for these books, that they reach some appropriately serious length (most clock in right around 200 pages), that they cobble together some interesting references, and that the language used to describe the thesis is sufficiently indecipherable and equivocal to prevent anyone from really taking it seriously -- positively or negatively. To invoke Crass, "what question is left, and is anyone asking?" Readers may detect a note of bitterness here. There is a note of bitterness here. I would like to write a real review of this book... on my own time, my own dime, for my own free newsletter and small number of friends and friendly readers, because the kind of intellectual work I want to do is not the kind that has a place in our political economy. I don't know how to do the sophisticated fakery it takes to write a 200-page Verso book, not that I would if I did -- it's simply financially untenable to think or write anything original in 2024. Why would Bayesian inference do this to me? Of course, it wouldn't. Bayesian inference didn't do this to me. The political economic fabric, of which statistics in general is a part, did.