420: quantum of sollazzo
#420: quantum of sollazzo – 4 May 2021
The data newsletter by @puntofisso.
Hello, new readers. Quite a few of you have subscribed in the past couple of weeks, so welcome – if you've joined for the data stuff, you'll probably stay for the geeky links too.
So, let me start with a service announcement about the launch of a "Six questions with...", a new section of Quantum of Sollazzo with a set of brief interviews with great people who work with data. I have some true stars lined up, starting today with the Wall Street Journal's Soph Warnes.
If you speak Italian and want to hear yours truly blabbering in Italian about Open Data, I will join DataNinja on their live session this Wednesday, May 6th, alongside Giorgia Lodi and Maurizio Napolitano, two leaders of the Open Data movement in Italy. We'll be comparing notes about the state of Open Data in Italy and the UK.
Join on Youtube at 6pm Rome time/5pm London time.
Trivia: I first met Giorgia when she taught me at university (in a galaxy far far away), while my first meeting with Maurizio involved a beer garden in Berlin (we where there for WhereCamp EU).
The UK Open Government Network is launching thematic groups in order to allow civil society and civil servants to meet and discuss specific policy themes for the UK's next Open Government National Action Plan, to be published later this year.
"Over the next couple of months, Thematic Groups will bring together civil servants and civil society to discuss and co-create proposals around these different topics. Your participation will help shape government policy pledges. With stories about transparency at the highest levels of government dominating the news, this has never been more important."
If you wish to take part, please fill in this form.
Links are below the interview.
My latest monthly notes are here.
'till next week,
Giuseppe @puntofisso
Six questions to... Soph Warnes
Soph Warnes, Strategy Editor, WSJ tweets at @SophieWarnes. She also authors Fair Warning, a pretty good data newsletter (with a pretty good pun as its title).
I used to do a lot more data journalism type work, but now I work more with analytics data. I use BigQuery (and thus SQL), Data Studio, Google Sheets, Excel. I miss using R a lot, and even Python. I just don't have good excuses to use them anymore, everything has a GUI now.
Tell me about a data project that you're proud of...I am inordinately proud of Which occupations are at the highest risk of being automated? which I worked on while I was at ONS. It was a difficult project — I had to persuade everyone to go with my vision and keep everyone's nerves steady when people wanted to kill it off. It's the civil service: they don't do cool things. But I was so pleased with the end result. The statisticians came to me and said "hey we think this might appeal to a broader audience" and my colleague and I had been looking for an excuse to create an editorial chatbot. The idea of having a bot explain automation to people really tickled me. It was just perfect. It was really successful for us. I ended up going to Paris and talking about chatbots at the OECD. Automation risk was discussed in Parliament in the UK and I'm so sure it's because of this crazy unexpected thing that got a lot of attention. People were like "The ONS did this!?". We won awards for it. It's just so cool and satisfying that it had such a wide impact.
...and a data project that someone else did and you're jealous of.My favourite fun thing of all time is probably the Wall Street Journal's explanation of Hamilton and how rapping and rhyming works. It just gets to the heart of why the musical is so incredibly smart, and for me it nailed why I found the songs so compelling. For a few months it felt like when I listened to the OST, I would hear something new and think, "God, that's so clever". Each rhyme, every pacing change, is very specific and deliberate. It's just such a great explanation and it helped me appreciate Lin Manuel Miranda's writing so much more. And it also comes from such an unexpected place.
If I say "dataset", you think of...Jeremy Singer-Vine. He writes a whole newsletter about them and everything.
Give someone new to data a tip or lesson you wish you'd learned earlier.OK so I am chronically disorganised except when it comes to data projects that I do on my laptop because I have a very specific file and folder structure that I rigidly stick to. See also: commenting. I thought comments were for losers. Until I went back to a project years later and was like... "I DID THIS!?!" I comment the hell out of everything now, even small things. And to this end I would also say saving basic 'snippets' of code that perform a specific function saved somewhere will help you so, so much.
Uh, that's three things:
- Comment
- Create and stick to a file/folder structure
- Save reusable snippets in one file
Strictly speaking, Data Are. But language evolves over time, and we all need to accept that we are on the Data Is train whether we like it or not.
Topical
How Big Tech got so big: Hundreds of acquisitions
"Amazon, Apple, Facebook and Google — known as the Big 4 — now dominate many facets of our lives. But they didn’t get there alone. They acquired hundreds of companies over decades to propel them to become some of the most powerful tech behemoths in the world."
A year in lockdown trends
I almost decided not to link this one due to the lack of credits on the page, which makes it odd, but it turns out it's by Google Trends and Polygraph (see here).
They created this dashboard that showcases in a calendar "the hobby that saw the greatest growth in search queries, compared to one year prior, for each day of the pandemic's first year".
Global house prices
The Economist's "interactive guide to housing data across the world".
If you're curious how they collected and used the data for this article, their "Graphic Detail" newsletter has some of it.
Deepfake satellite imagery poses a not-so-distant threat, warn geographers
"Geographic deepfakes could be used for misinformation and much more".
Just as if we had nothing else to worry about...
Whose inflation is it anyway?
"A mathematical explanation for why inflation rates feel too low."
This is an entertaining and well articulated read by Tom Neill (the guy who rose to fame for the Suez boat website, but has quite a bit more to offer, if you follow his newsletter).
“Red zones” in Sicily: a story of civic hacking
"If you live in Sicily and want to know if your city is currently red, i.e. under specific COVID-19 restrictions, all you need to do is read 197 pages of PDF files published by the regional government."
Or maybe not. Good job by my friends at Open Data Sicilia.
(via Paola Masuzzo)
Tools & Tutorials
R CHARTS
"In this site you will find code examples of R graphs made with base R graphics, ggplot2 and other packages. Feel free to contribute suggesting new visualizations or fixing any bug via GitHub."
What Is Your Model Hiding? A Tutorial on Evaluating ML Models
Another good article from Evidently AI, discussing "how to explore the performance of classification models before production use."
TwoTone
TwoTone is a very easy-to-use online (and open source) tool to turn data into sound.
SQLite the only database you will ever need in most cases
We covered this topic previously: in many quarters, SQLite is being hailed as a more "enterprise" solution than we commonly thought, to the point where it's being used as the backend of high performance web services (assuming you can do without a client-server approach, i.e. no replication etc).
Is it crazy? This article argues that no, it's not: "I have run SQLite as a web application database with thousands concurrent writes every second, coming from different HTTP requests, without any delays or issues. This is because even on a very busy site, the hardware is extremely fast and fully capable of handling that.".
Basic Music Theory in ~200 Lines of Python
Just for fun, but with so much data sonification, this brief tutorial might be useful.
Introducing Scale-a-Tron
Whether it's putting containers on a map, or checking the true relative size of continents, scaling things right on a map is tricky, due to the different projections used. Scale-a-Tron is a tool by cartographic experts Stamen that allows you to draw a polygon on a map, and then carry it around by scaling it right. A blog post explains it all.
A new R package for exploring the wealth of information stored by Wikidata: tidywikidatar
Good tutorial on the tidywikidatar R library by Giorgio Comai of OBC Transeuropa/EDJNet: "What does Wikidata know about members of the European Parliament? Let’s find out using our new R package tidywikidatar."
Visual
Ferrari 1000 GP
"On September 13, 2020 Scuderia Ferrari, the most famed team in the history of Formula 1 and the only one to compete in each season since the sport was born in 1950, reached the milestone of its 1000th World Championship race."
Andrea Giambelli of VisualEyed created this beautiful interactive timeline visualizing each of the races.
Data thinking
Data communities need to come before data catalogues
Interesting piece about energy data, which could just as well apply to a different context. Guy Lipman argues that data catalogues are doomed because they can't meet all data needs, and suggests, as an alternative, the development of living platforms to enable shared documentation and enriched data. These are not without risks or issues, but it is interesting to see community building placed centre stage in a data context.
Run Your Data Team Like A Product Team
An interesting take on the data as a service vs data as a product models (my 2p: I'm never too sure things can work if you go 100% one way or another, without an iterative mindset).
I've found the root cause of the problem in what3words
This Twitter thread has been circulating, alleging that there is an intrinsic issue in the way what3words geocodes locations, as described in their patent. I only managed to follow it up to a point, but the geonerds among you might find it stimulating.
Become a GitHub Sponsor. It costs about the price of a coffee per month, and you'll get an Open Data Rottweiler sticker (and other stuff). Or you can Buy Me A Coffee.
quantum of sollazzo is also supported by ProofRed's excellent proofreading service. If you need high-quality copy editing or proofreading, head to http://proofred.co.uk. Oh, they also make really good explainer videos.