429: quantum of sollazzo
#429: quantum of sollazzo – 6 July 2021
The data newsletter by @puntofisso.
What, is it July already? I’m still stuck in Winter 2019, and I hear that many others are too… ah well. Look below for this week’s interview with data hero Alex Homer, a journalist at the BBC who got into working with data in his career without having a data background. He’s one of those amazing folks whose work and career I find incredibly inspiring.
Also, let me wish my friends at 360Giving a very happy sixth birthday! They have achieved so much in terms of openness of grants data over such a short time span.
I made an appearance in one of my favourite podcasts, the Why I care about data… series by Swirrl. Sarah Roberts and I discussed Why I care about data quality, where my definition of data quality is a slightly unorthodox take on a full understanding of the context in which the data is generated and used – a Cruyff-style Total Data approach. You can listen to the podcast here and on a number of other platforms.
I found this brilliant Observable notebook that analyses and visualizes (with a cell-based plot) the ratings of “The Simpsons” over time, showing how they took a downward turn after only a few series. I just managed to plug in data about “Friends”, showing how consistent its ratings were. But what matters to us data folks is how easy it was for me to simply fork and edit the notebook, plug in the Friends data coming from Kaggle, and just run it. What a great product Observable is!
There are some delicious data jobs available: - with the New Statesman David Ottewell stepping down and Patrick Scott taking over, the data team will be hiring a few roles including a Data projects editor and a Pharma data journalist; - Martin Stabe at the Financial Times is looking for a data journalist - the Royal Automobile Club Foundation’s data guru Ivo Wengraf is looking for a data analyst (and, believe me, you’ll want to work from their office!)
‘till next week,
Giuseppe @puntofisso
Six questions to... Alex Homer
Alex Homer is a Senior Journalist at the BBC Shared Data Unit.
What is your daily data work like and what tools do you use?
I'm a journalist first so my work starts with an editorial meeting discussing a tip or hypothesis I want to explore either independently or in a small team. Then it's sourcing the data I need whether that's finding open data, asking data holders for the data I need, making a request under the Freedom of Information Act or creating a dataset of my own. I speak to experts in whatever field to understand what the data show, any limitation or caveats I need to understand and then I do analysis - usually in Excel or Googlesheets, rarely but sometimes using R - and then carry out interviews to report on those findings.
Tell me about a data project that you're proud of...
I felt there was a big public interest in my most recent investigation, where I reported that cases where people claiming benefits died or came to serious harm had led to more than 150 internal reviews by the Department for Work & Pensions(DWP) since 2012. I worked on this report independently for months including creating a dataset of press reports of individuals who had died after some interaction with the benefits system. Debbie Abrahams MP wrote in The Times RedBox it was a "watershed moment. Family members of other claimants who had died contacted me from all over the country and asked me to add their names to the list; they thought the deaths of their fathers, mothers, sisters, bothers, sons or daughters were isolated cases. Sadly, they are not.”
...and a data project that someone else did and you're jealous of.
I really admire a lot of projects by The Pudding but this was absolutely brilliant about women's pockets - the choice of subject, the scrollytelling presentation, the exploration of the reasons, the tone of voice - everything about it.
If I say "dataset", you think of...
A source or interviewee if you like because journalists ask questions of datasets just as they would any other person.
Give someone new to data a tip or lesson you wish you'd learned earlier.
I wish I had appreciated sooner what I could have done if I'd learned to code when I was younger.
Data is or data are...
This is tough! I'd say data are because I am a pedant when it comes to grammar/verb conjugation etc.
Topical
‘Tornado Alley’ is expanding: Southern states see more twisters now than ever before
USA Today takes a look at an obvious feature of climate change: the increase frequency of hurricanes in the South of the US, and the problems this causes in a context of mobile homes and the other issues of a part of the country unprepared for extreme weather events.
“I kissed a girl” to “Call me by your name”
“A story about hearing yourself represented with same-gender lyrics for the first time”, by Jan Diehm for The Pudding.
Food Apartheid in Washington, D.C.
An analysis of community boundaries taking into account how certain type of food outlets are distributed controlling for race, income, and geography.
Name Infrastructure
An interesting look at the issue of non-Western names not being correctly recognised (e.g. often Asian people in Europe are incorrectly addressed by their surname, as it normally appears first in writing)
DeJoy’s USPS slowdown plan will delay the mail. What’s it mean for your Zip code?
I don’t know much about the changing delivery regime in the US other that it’s causing some controversy in the US, but as a postal nerd I can’t but link to it.
“The logistical challenges, for example, of getting a letter from Maine to the Grand Canyon — where the agency famously delivers mail from a sack on a mule — won’t change.“
(via Fair Warning)
Lord of the Roths
“How Tech Mogul Peter Thiel Turned a Retirement Account for the Middle Class Into a $5 Billion Tax-Free Piggy Bank” – a brilliant example of data-driven investigative journalism, by ProPublica.
Also: the US Tax System is… somewhat weird.
How ranked-choice voting could change the way democracy works
“Detractors say it confuses voters. Supporters say it better represents the will of the people.” And both might be right…
It includes a handy simulator.
Tools & Tutorials
How The Economist collected data about 1.65m studies
In The Economist’s latest Off the Charts, there’s an extraordinary piece by data journo Matt Lerner, where he describes step-by-steps how he collected data for 1.65m scientific studies using data from CrossRef and other sources.
Place-based carbon calculator
“PBCC is a free tool which estimates the per-person carbon footprint for every Lower Super Output Area (LSOA) in England. “
CronyConnect
CronyConnect is another excellent tool by Harvard researcher Sophie Hill.
“The purpose of Crony Connect is to bring together a few important databases in one place and search them simultaneously: Companies House’s company registrations, the Electoral Commission’s political donations database, and the UK Parliament’s register of MPs financial interests”.
Oh, the beauty of the pun in the name…
Understanding p-values Through Simulations
“An Interactive Visualization”.
A gorgeous, accessible color system
“An open-source color system for designing beautiful, accessible websites and apps.“
GPX.Studio
“gpx.studio is a free online GPX viewer and editor”. It’s quite handy if you work or wish to work with GPX traces (e.g. those coming from your smartwatch or Strava), as it doesn’t just visualize them but also allows you to easily edit them, add or reduce waypoints, extract segments, etc.
Data thinking
Three simple ideas for better election poll graphics
“Reporting on election polls, however, is often misleading” must be the biggest understatement in history ;-) Simon at Datawrapper shows a few alternatives on better polls reporting.
The Beginner’s Guide to the Modern Data Stack
“A curated list of blogs, books, newsletters, podcasts, and communities for all things modern data stack”, by Prukalpa Sankar.
Dataviz & Interactive
Average colors of the world
Created by the awesome Erin Davis by averaging Sentinel-2 satellite photos (with step-by-step instructions and source code very much available on the page). Europe is very green!
The stuff nightmares are made of
Amazing Hokusai-inspired data visualization of growing levels of CO₂, with all details and data on this blog post.
(via Lucilla Piccari)
AI
A deep dive into natural language processing and speech to text systems
Not an article but a podcast, by Stack Overflow. (It’s sponsored by Rev, so don’t expect an entirely impartial take). A transcript is linked in the page.
Become a GitHub Sponsor. It costs about the price of a coffee per month, and you’ll get an Open Data Rottweiler sticker (and other stuff). Or you can Buy Me A Coffee.
quantum of sollazzo is also supported by ProofRed’s excellent proofreading service. If you need high-quality copy editing or proofreading, head to http://proofred.co.uk. Oh, they also make really good explainer videos.