501: quantum of sollazzo
#501: quantum of sollazzo – 17 January 2023
The data newsletter by @puntofisso.
Hello, regular readers and welcome new ones :) This is Quantum of Sollazzo, the newsletter about all things data. I am Giuseppe Sollazzo, or @puntofisso. I’ve been sending this newsletter since 2012 to be a summary of all the articles with or about data that captured my attention over the previous week. The newsletter is and will always (well, for as long as I can keep going!) be free, but you’re welcome to become a friend via the links below.
The most clicked link last week was LearnGPT. I think it’s the first time for Quantum that an AI-related link is the most viewed.
We have some great sponsored content this week: Ed Freyfogle, organiser of location-based service meetup Geomob, co-host of the Geomob podcast, and co-founder of the OpenCage, has offered to introduce a set of points around the topic of geodata. His first entry starts a few paragraphs below on the importance of geodata and the difference between open and closed data.
‘till next week,
Giuseppe @puntofisso
Become a Friend of Quantum of Sollazzo from $1/month → If you enjoy this newsletter, you can support it by becoming a GitHub Sponsor. Or you can Buy Me a Coffee. I'll send you an Open Data Rottweiler sticker. You're receiving this email because you subscribed to Quantum of Sollazzo, a weekly newsletter covering all things data, written by Giuseppe Sollazzo (@puntofisso). If you have a product or service to promote and want to support this newsletter, you can sponsor an issue. |
✨ Topical
The Westminster Accounts
“Although most of this money must technically be disclosed to the public, the way that information is reported, stored and displayed almost guarantees the records will not be widely scrutinised.“
As others have noted, great that the data is searchaable, but… if the problem was that the data wasn’t accessible, it’s somewhat disappointing that this project hasn’t made it so.
(Disclaimer: just choosing the highest ranking parliamentarian, no politics, etc etc.)
Tech Layoffs Are Happening Faster Than at Any Time During the Pandemic
“Areas that were largely spared in 2020 are now among those with the largest numbers of job cuts.“
The Wall Street Journal takes a look at Silicon Valley redundancies, which worryingly are higher than during COVID, a bad sign for the state of the economy.
118th Congress has a record number of women
Analysis by the Pew Research Centre shows growing representation.
Gabinete de Lula tem recorde de mulheres, mas sem paridade
Speaking about gender ratios, here’s a look to the percentage of women in Lula’s cabinet in Brazil.
How each House member voted for speaker in 15 ballots
The Washington Post looks at the House Speaker election.
The list of 2022 visualization lists
Infographic guru Maarten Lambrecht’s own list.
Locked up: Covid-19 and prisons in Europe
In the first episode of the Uncharted Territory podcast by the European Data Journalism Network, we learn about the state of prison life during the pandemic.
“Data collected by 12 newsrooms in the European Data Journalism Network, coordinated by Deutsche Welle, shows that the effort to keep the infection under control in detention institutions came at a high cost. Prisoners found themselves more isolated than ever: visits and education activities were suspended, vaccination campaigns were delayed, while overcrowding put the most vulnerable at risk.“
🛠️📖 Tools & Tutorials - part 1
Lisa Hornung’s no code data and design tools directory
Quantum Six Questions graduate Lisa Hornung is an open source and open data star. She has now created a directory of no code data and design tools.
graphic-walker
“Graphic Walker is a different type of open-source alternative to Tableau. It allows data scientists to analyze data and visualize patterns with simple drag-and-drop operations.“
You can try it here with a very basic UI, or implement your own.
From Data to Viz
Extracting, converting, and querying data in local files using clickhouse-local
A tutorial on how to use ClickHouse, a software library that allows SQL queries to run on simple csv or parquet files. Although it’s a commercial product, it also comes with an open source version.
Why is open geodata important? What’s the difference between open and closed data?
Proprietary geodata from private services like Google are widely used, but come with licenses that severely restrict how you can use the data. Restrictions include:
- don’t allow storing (caching) beyond a certain time period, and require deletion when you stop being a customer
- limit which maps you can use to display the geodata
- require a significantly higher cost to use behind a firewall or in desktop software
- no clarity on when or if data will be refreshed or corrected
Open data, like that returned by the OpenCage geocoding API, means:
- store data as long as you like
- display on any map
- use publicly or behind a firewall
- fix errors when you find them
As a final bonus, because the data is free (no cost), our service is also much more affordable.
Have a project that will need geocoding? See our geocoding buyer’s guide for an overview of all the factors to consider when choosing between geocoding services.
🛠️📖 Tools & Tutorials - part 2
Microsoft Road Detections
“This week I heard about the Microsoft Road Detections dataset, published as open data under the ODbL licence, and thought I’d take a look.
Turns out it’s a slightly unusual format: a tsv file containing a column for country codes and a column containing GeoJSON objects. Also, each file is pretty huge and contains data for multiple countries.
So… it’s not an easy drag & drop into QGIS.
But it’s the holidays, I’ve got some time on my hands, so I spent a bit of tinkering time figuring out how to pull out just the data for one country and convert it to a friendlier format.
I wrote a short 3 line bash script that downloads the Oceania data, extracts 4.6M records for Australia, and converts it to a GeoPackage so it works nicely in QGIS.
Please feel free to use the script, which should be easy to repurpose for any country covered by this data. Source Microsoft data.“
From this LinkedIn status update.
The Illustrated Machine Learning website
“Our goal is to provide a visual aid for students, professionals, and anyone preparing for a technical interview to better understand the underlying concepts of Machine Learning.“
Annotated Forest Plots using ggplot2
A quick recipe.
Why parquet files are my preferred API for bulk open data
The MOJ’s Robin Linacre explains why “bulk open data is best served as statically-hosted parquet files, with csv equivalents. It’s faster, easier to use and cheaper to host than alternatives such as custom APIs.“
arXiv Xplorer
A “Semantic Search Engine for ArXiv.“
PirateWeather
“A Free, Open, and Documented Forecast API. An unprocessed weather forecast API, built to be fully Dark Sky compatible.“
It’s based on NOAA models and is run by a researcher, who will hopefully receive enough donations to keep it running.
3D in CSS
This is a great tutorial on how to represent 3D in pure CSS, and the rest of the website is equally amazing at introducing different CSS concepts.
These are the most hearted Pens of 2022
As in the most favourited code snippets from CodePen.
📈Dataviz, Data Analysis, & Interactive
My 2022 in reading
Another personal dataviz by Quantum graduate Erin Davis, generated using Kindle data.
NYC Slice
“Starting in 2014, I logged every slice of pizza I ate in New York City on the Instagram account NYC Slice. The results shown below are collected from 464 slices. Over an eight-year period the average price of a plain slice increased from $2.52 to $3.00. This calculation excludes dollar slices.“
Happiness and Meaning in What We Do
One of those nice, simple, powerful interactive charts that Nathan Yau is famous for.
[OC] I tracked every hour of my life for 5 years.
🤖 AI
A college student created an app that can tell whether AI wrote an essay
“Edward Tian, a 22-year-old senior at Princeton University, has built an app to detect whether text is written by ChatGPT, the viral chatbot that’s sparked fears over its potential for unethical uses in academia.“
What is AGI-hard
“At this point, knowing what AI can’t do is more useful than knowing what it can”.
quantum of sollazzo is supported by ProofRed’s excellent proofreading. If you need high-quality copy editing or proofreading, head to http://proofred.co.uk. Oh, they also make really good explainer videos.
Supporters* casperdcl and iterative.ai Jeff Wilson Fay Simcock Naomi Penfold
[*] this is for all $5+/months Github sponsors. If you are one of those and don’t appear here, please e-mail me
Match with a licensed therapist for convenient online sessions via BetterHelp
Start your therapy journey with BetterHelp, the online platform that has helped over three million people match with licensed therapists.