quantum of sollazzo

Subscribe
Archives
September 21, 2021

440: quantum of sollazzo

#440: quantum of sollazzo – 21 September 2021

The data newsletter by @puntofisso.


Hello, regular readers and welcome new ones :) This is Quantum of Sollazzo, the newsletter about all things data. I am Giuseppe Sollazzo, or @puntofisso. I've been sending this newsletter since 2012 to be a summary of all the articles with or about data that captured my attention over the previous week. The newsletter is and will always (well, for as long as I can keep going!) be free, but you're welcome to become a friend via the links below.

·

Recently I've been looking at the Fitzpatrick Scale, which is a classification of human skin colour. Originally developed in dermatology to describe skin shades, I've been reading about it while researching how to address bias in data for AI applications and coming up with obvious questions, such as "if we need to collect images on which to build a machine learning model, what is the best unbiased distribution of images on the Fitzpatrick Scale?".

In other words, is it best to have an equal number of images per step in the scale (a uniform distribution), or should images be allocated according to some other distribution? And this is assuming that the best approach is to create a single model as opposed to, for example, a separate model per step on the scale. All of this is very abstract, of course, and might be necessary to know the full context of the application in order to really understand what's right to do.

Interestingly, the Fitzpatrick Scale is also the basis for emoji skin modifiers.

0 The Fitzpatrick scale and the risk of skin cancer, CC BY 3.0, John D’Orazio et al. "UV Radiation and the Skin" Int. J. Mol. Sci. 2013, 14(6), 12222-12248; doi:10.3390/ijms140612222 ·

My "Six Questions" series is taking a few weeks' break after 14 issues while I prepare the next series. Which gives me the opportunity of asking: who would you like to be featured in the next series of interviews? I'm just a few interviews away from completing this, so please do shout.

Six questions.png

Also, in case you're in the mood for some GCP and don't have an account... "Whenever someone signs up for a Google Cloud free trial using your referral link, they'll get US $350 worth of credits — that's US $50 more than the standard free trial offer. When any of your referrals become a paying customer, you will earn US $100 free credits which will be deposited automatically into your account. There's no cap on how many people you refer or how many rewards you receive. GCP Referral Code."

·

'till next week,
Giuseppe @puntofisso


Become a Friend of Quantum of Sollazzo →

If you enjoy this newsletter, you can support it by becoming a GitHub Sponsor. Or you can Buy Me a Coffee. I'll send you an Open Data Rottweiler sticker.
Quantum of Sollazzo will always be free.

You're receiving this email because you subscribed to Quantum of Sollazzo, a weekly newsletter covering all things data, written by Giuseppe Sollazzo (@puntofisso). If you have a product or service to promote and want to support this newsletter, you can sponsor an issue.


Topical

When will Europe phase out coal?

Lisa Charlotte Muth of Datawrapper, a well known figure to our readers, looks at how countries in Europe and beyond are trying to move from coal to renewable sources of energy.

When will Europe phase out coal?.png

How common is long COVID? That depends on how you measure it

The Office for National Statistics' Daniel Ayoubkhani explains their work on assessing that "643,000 people in private households in the UK could be experiencing activity-limiting long COVID symptoms."

How The New York Times changed after September 11

"An analysis of 1,574,957 articles investigates changes in the NYT's evolution between September 2001 and December 2020", with all source code available. By Francesco Piccinelli.

How The New York Times changed after September 11.png

COVID cases hold steady while hospitalizations, deaths rise

According to Axios, things aren't going too well in the US: "While the overall COVID case rate increased less than 1% over the last two weeks, hospitalizations increased 4% in the same time frame, and the seven-day rolling average of deaths rose 29%."

COVID cases hold steady while hospitalizations, deaths rise.png

Israel’s Rising Covid Cases Aren’t So Scary This Time

"While the news may look grim, and the government has been scrambling to respond, the reality is more prosaic: Israel hasn’t defeated the virus, but it has probably redefined what success against the virus looks like. It’s a messy reality, but not an intolerable one."

Israels Rising Covid Cases Arent So Scary This Time.png

Data suggests you should get the COVID-19 vaccine

At least in New Zealand, according to this well illustrated data analysis.

Data suggests you should get the COVID-19 vaccine.png

Tools & Tutorials

skimpy

"skimpy is a light weight tool that provides summary statistics about variables in data frames within the console. Think of it as a super version of df.describe()."
A useful tool for Exploratory Data Analysis (EDA) created by the ONS' Arthur Turrell.

skimpy.png

GlueSQL

"GlueSQL is a SQL database library written in Rust. It provides a parser (sqlparser-rs), execution layer, and optional storage (sled) packaged into a single library. Developers can choose to use GlueSQL to build their own SQL database, or as an embedded SQL database using the default storage engine."

Open-Meteo

"Open-Meteo offers free weather forecast APIs for open-source developers and non-commercial use. No API key is required."
Yes, non-commercial use only, which should be ok for most users. But I'm a little puzzle as to why they have chosen this licence and where the data comes from (they say National Weather Forecasters).

open-meteo.png

FullFact Content API

Fact checking organisation FullFact is now granting access to their API (but there's an authorisation process to go through). "You can use this API to retrieve fact checks and fact check metadata."

zinggAI

ZinggAI offers "scalable fuzzy matching for data mastering, deduplication and entity resolution". It includes connectors to most of the big platforms.

Mockachino

Mockachino is a website allowing users to create and run "a mock JSON API in 10 seconds."

mockachino.png

How percentile approximation works (and why it's more useful than averages)

Although this article is in the context of TimescaleDB, some of the chunky bits relate to understanding percentiles and how to use them in PostgreSQL, so they might be useful.

Data thinking

Possession Is The Puzzle Of Soccer Analytics. These Models Are Trying To Solve It.

"In 2011, a software developer named Sarah Rudd got fed up with conventional soccer stats. It was easy to count how many passes a player attempted and completed, but not all passes are equal. “We know passing percentage is a terrible metric for evaluating how good of a passer you are,” she told me. What were those actions really worth?"
Great piece by FiveThirtyEight: not all data/analytics matter.

Possession Is The Puzzle Of Soccer Analytics.png

Why data scientists shouldn’t need to know Kubernetes

"This post is to argue that while it’s good for data scientists to own the entire stack, they can do so without having to know K8s if they leverage a good infrastructure abstraction tool that allows them to focus on actual data science instead of getting YAML files to work."
Yes. And no. :-)
Maybe I link this up too much to my own experience, but in my case the key to having fun at work and being productive implied both some degree of specific knowledge as well as a good generalist approach. And I don't mean just now, when I'm doing very little technical work – this was even more true when I had a data centre with 300 servers running the most disparate services and platforms, and a total team of 3 to run it.

Dataviz & Interactive

Who would win if a general election was held tomorrow?

"Make your prediction and see the latest forecast with Britain Predicts, the New Statesman's election calculator", i.e. the latest, interactive, polling forecast by Patrick E. Scott and team at the New Statesman.

Who would win if a general election was held tomorrow.png

What did Earth look like ... million years ago?

Basically, find your home in the Pangea.

What did earth.png

Traffic filters etc

"Traffic filters and low-traffic neighbourhoods have been the target of some toxic discourse recently. So we decided to set the record straight. We did some door-knocking around the country and we’ve got some heartening stats for everyone."
An interesting Twitter thread from environmental charity Possible, with illustrations by Duncan Geere.

traffic filters.png

Heritage Connector visualisation

By researcher Kalyan Dutia at the Science Museum, "this is a visualisation of a knowledge graph created from the Science Museum Group's online collection, blog and academic journal using machine learning. It's part of the Heritage Connector project."
The project is described here.

Heritage Connector visualisation.png

AI

AI Skunkworks projects

Sorry to blow my own trumpet again ;-) But we've now published a web page with summaries of all our ongoing projects, which we'll keep up to date as we complete them and onboard new ones. Suggestions and feedback are much appreciated.


quantum of sollazzo is supported by ProofRed's excellent proofreading service. If you need high-quality copy editing or proofreading, head to http://proofred.co.uk. Oh, they also make really good explainer videos.

0

Don't miss what's next. Subscribe to quantum of sollazzo:
This email brought to you by Buttondown, the easiest way to start and grow your newsletter.