504: quantum of sollazzo
#504: quantum of sollazzo – 7 February 2022
The data newsletter by @puntofisso.
Hello, regular readers and welcome new ones :) This is Quantum of Sollazzo, the newsletter about all things data. I am Giuseppe Sollazzo, or @puntofisso. I've been sending this newsletter since 2012 to be a summary of all the articles with or about data that captured my attention over the previous week. The newsletter is and will always (well, for as long as I can keep going!) be free, but you're welcome to become a friend via the links below.
It was very nice to see this article on the Financial Times using my Parli-N-Grams platform :)
The most clicked link last week was The Economist's look at dictatorships and economic growth.
We have more sponsored content by Ed Freyfogle, organiser of location-based service meetup Geomob, co-host of the Geomob podcast, and co-founder of the OpenCage, who has offered to introduce a set of points around the topic of geodata. His first entry starts a few paragraphs below on geocoding as a first step.
'till next week,
Giuseppe @puntofisso
Become a Friend of Quantum of Sollazzo from $1/month → If you enjoy this newsletter, you can support it by becoming a GitHub Sponsor. Or you can Buy Me a Coffee. I'll send you an Open Data Rottweiler sticker. You're receiving this email because you subscribed to Quantum of Sollazzo, a weekly newsletter covering all things data, written by Giuseppe Sollazzo (@puntofisso). If you have a product or service to promote and want to support this newsletter, you can sponsor an issue. |
✨ Topical
Religion in the UK
"The ONS have just dropped their Census2021 religion by age data and it's a really great read."
A couple of brilliant charts by the Telegraph's Ben Butcher.
The Greek wiretapping scandal on Twitter: The course of the conversation, polarization and the role of the media
Very interesting look at this story that I hadn't really heard of, via the European Data Journalism Network: "The journalistic revelations about the Greek wiretapping scandal have intensified the user engagement in the digital conversation. About 70% of the relevant tweets are posted by accounts that tend to mostly follow parties and members of the political Left than the Right. In any case, this conversation continues to polarize the two groups. iMEdD Lab and Datalab present a research collaboration analyzing the course of the conversation about the wiretapping scandal on Twitter over the last nine months."
There's a few Datawrapper-powered charts in it, but this network visualization below is what really got me.
For the poor, retirement is short(er)
"25% of the poorest French men are already dead before reaching retirement age". Datawrapper's Rose Mintzer-Sweeney makes a chart showing it.
UK’s Poorer Regions Fall Further Behind in Blow to Sunak’s ‘Levelling Up’ Pledge
"Bloomberg UK’s Scorecard shows that just 6% of constituencies have made an overall improvement since May and the vast majority continue to struggle"
In fairness, change takes time. And, of course, the moment a policy like this one gets announced, with some rather obvious metrics attachable to it, it will have data folks looking at what happens to the defining metrics. If I were the PM I would... CENSORED.
See the evolution of lies in George Santos’s campaign biography
The Washington Post showing off their ability using diff
.
🛠️📖 Tools & Tutorials
SQL should be your default choice for data engineering pipelines
Robin Linacre, whose profile we featured in issue 486: "SQL should be the first option considered for new data engineering work. It’s robust, fast, future-proof and testable. With a bit of care, it’s clear and readable. A new SQL engine - DuckDB - makes SQL competitive with other high performance dataframe libraries, making SQL a good candidate for data of all sizes."
From Zero to Research Scientist full resources guide.
"Detailed and tailored guide for undergraduate students or anybody want to dig deep into the field of AI with solid foundation."
It's under construction.
apitable
"APITable, an API-oriented low-code platform for building collaborative apps and better than all other Airtable open-source alternatives."
Website here, and source code here.
=GPT3()
Pandas Illustrated: The Definitive Visual Guide to Pandas
A (long) Medium article which does what it says on the tin.
AI Tools for Local Newsroom Database
Journo Samantha Sunne has created this helpful Airtable of AI tools useful to journalists.
Geocoding is just the first step
Most data projects involve tedious cleaning and enriching before the data can actually be "used".
At OpenCage, we are firm believers that laziness is one of the virtues of a great developer. We’ve thought a lot about making geocoding with open data dead simple, but also how to simplify the total journey to using the data. Our geocoding API returns "annotations" - extra information about the location that developers might find useful, thus saving work.
An example is EU NUTS codes, standard codes commonly used for linking datasets and statistical analysis. Looking up the relevant codes for a region is not particularly complex, but it is the kind small tasks that need to be done correctly (and maintained) in a larger data processing project. So, as a simplification for our users, we already return the correct codes as an annotation.
As an example, a request to the OpenCage geocoder for 52.387, 9.733
(in northern Germany) returns the annotation
"NUTS": {
"NUTS0": {
"code": "DE"
},
"NUTS1": {
"code": "DE9"
},
"NUTS2": {
"code": "DE92"
},
"NUTS3": {
"code": "DE929"
}
},
We also return many other types of information, for example: the local timezone, calling code, currency information, other reference systems like geohash, what3words, MGRS, US FIPS cdes, Maidenhead, the time of sunrise and sunset, the qibla angle, and much more.
Have a project that will need geocoding? See our geocoding buyer's guide for an overview of all the factors to consider when choosing between geocoding services.
🤯 Data thinking
When the heck do you actually need a data catalog? The one metric you should know
Some good insight from Austin Kronz at Atlan.
📈Dataviz, Data Analysis, & Interactive
Algorithms Tour
"How data science is woven into the fabric of Stitch Fix".
No, I'm not linking this for the obvious pun, and this is one of those articles for which my newsletter categories entirely fail! This is an interactive exploration at the way Stitch Fix, an online personal styling service, uses data science in its business.
Where are rents rising post COVID-19?
"Rents rose in 58% of all counties nationwide between 2020 and 2023. Data shows that rents rose and dipped between neighboring zip codes."
Making travel plans? Southwest’s holiday meltdown may be a sign of air travel drama to come
Among all other considerations, I really like the interactive visualization of cancellations in this article.
🤖 AI
The AI Crowd is Mad
"In general, I think the current LLM discourse needs more nuance. On podcasts and in blog posts I always seem to identify one line of reasoning and it is too optimistic for my taste. So here are some points that I find under-developed in the discussion."
Not All Rainbows and Sunshine: The Darker Side of ChatGPT
Some reflections on ChatGPT in this series. "Part 1: The Risks and Ethical Issues Associated with Large Language Models"
quantum of sollazzo is supported by ProofRed's excellent proofreading. If you need high-quality copy editing or proofreading, head to http://proofred.co.uk. Oh, they also make really good explainer videos.
Supporters* casperdcl and iterative.ai Jeff Wilson Fay Simcock Naomi Penfold
[*] this is for all $5+/months Github sponsors. If you are one of those and don't appear here, please e-mail me