Statistics & complexity
- Its been a whirling few weeks: on the micro level we did a house move, on the macro level Oh America! Now that this period has concluded with the bittersweet combination of a Biden victory, Guy Fawkes fireworks from a new view, and a second lockdown in the UK, let me sum up what I've been up to.
- My last email was about questions. It might be that every week is about questions!
- How you formulate a question is everything, and this is how I learnt this message again.
- I was on a statistics course to learn Stata - a stats software that can do very complicated maths in seconds (rather it than me).
- While statistics is the discipline that allows for interpretation of data, and may sound like a magical framework of truth-mining, most of the work in the course was around preparing data.
- This means understanding the numbers and answers in the spreadsheet, understanding the relevance of different variables in a problem and recognising the limitations of what we have.
- Remember when I was talking about asking a question to my data and there not being an answer?! It's similar.
- Imagine for example wanting to understand the relationship between smoking and obesity, and in both cases this having been assessed by participant-completed questionnaires. You may look at the data and it might seem a bit strange: there will be questions not answered (being asked about your weight may be considered offensive, as will be your smoking habits: imagine participants who have a knowledge of what the researchers are after and feel jaded by the endless efforts at stigmatising smoking as a factor in all kinds of health conditions that they're well aware of etc. That said, smoking IS bad for you but it's often more complicated). Or there will be questions answered with unrealistic figures (weight: 10kg - it could have been due to interpretation of the question, or...?!) (which is when dealing with real people is easier). How to account for these people's responses?
- For an election-themed example: in my small village in Northern Italy, after an election for the village mayor, one of the vote counters complained that someone had wrapped a slice of salami in their blank ballot paper and the count had become a very smelly affair. While the person who did this did not tick a box, the gesture was still significant of their attitude to the messy political situation.
- Of course you need to understand your measures too - for example Body Mass Index (BMI), a compound of height and weight that indicates whether a person is overweight or not, is not an accurate measure of fat for some people who may be very muscular (hence high weigh) but lean.
- Another learning point is that there is no such thing as a simple question.
- Taking the smoking & obesity example again, a simple question might be: do people who smoke end up being overweight? The statistic that I'd be interested in this case is a prediction statistic (a regression).
- From walking in the street, it is clear that there are thin, large, average sized smokers, as much as there are overweight people who are healthy, or smokers, or ... To predict something requires a full awareness of the problem at stake and the variables involved.
- The trajectory from A) smoking to B) overweight might be particular for some people but not others. Which means there is much more to it that smoking. Perhaps, contributing factors (covariables) could be a genetic propensity, or the amount of exercise they do, or some stress that might be leading them to smoke, or the fact that they belong to a community where smoking and feasting is commonplace - it could be hundreds of things!
- So... what's the point?! We cannot account for everything, but we can make informed decisions.
- There may be evidence from a previous study suggesting that people with prolonged levels of stress tend to accumulate weight at midlife (I'm riffing), and their research methods sound particularly convincing. Fine, that might help deciding what variables I think might be contributing or confounding.
- These may be biological factors or factors that have to do with a person's culture, customs, and behaviours. Or both. Less Or and more And.
- The real summary is that formulating a question is an exercise in humility and involves embracing the work of the hundreds of people and groups who have tried to ask that or a similar question before, and not necessarily in 'science' alone. Shedding the ego of the scientist and taking on the more meandering and questionable role of the researcher.
- Once this is done, the statistical test is easy.