444: quantum of sollazzo
#444: quantum of sollazzo – 26 October 2021
The data newsletter by @puntofisso.
Hello, regular readers and welcome new ones :) This is Quantum of Sollazzo, the newsletter about all things data. I am Giuseppe Sollazzo, or @puntofisso. I’ve been sending this newsletter since 2012 to be a summary of all the articles with or about data that captured my attention over the previous week. The newsletter is and will always (well, for as long as I can keep going!) be free, but you’re welcome to become a friend via the links below.
This week, the newsletter is slightly longer than usual, as I skipped last week’s.
I travelled to Iceland last week, seeing recent volcanic eruptions and feeling the heat still coming out of the lava, walking inside a glacier, and bathing in natural hot springs. But one of my favourite trips was to see the famous DC-3 plane wreck, something that’s been on my list for years and that I wanted to see before it fatally crumbles – it’s literally dusting away as the years pass.
But… the wreckage also offers some reflections, as in my tweet below. There were 7 people on board of that plane when it crash-landed, and they all survived. However, at least 2 people have died in connection with it: tourists trying to visit the wreckage while ill-equipped or not fully informed. Risk-assessment is highly contextual. My picture below captures what is – by Icelandic October standards – a glorious day. But weather in Iceland is highly changing, even more than in Britain. A snow storm is always a possibility. High winds could make walking hard for people who are not at the top of their fitness. Would you believe that walking on a flat 2.8km walk could kill you and that crash-landing in the middle of a desert could not?
Every week I include a six-question interview with an inspiring data person. This week, I speak with Prukalpa Sankar of Atlan. Some of you might know her as the data infrastructures and platforms guru whose enlightning blog posts I’ve often featured in this newsletter.
David Kane and others, including the NCVO – the organisation that groups together charities and other voluntary organisations – have launched a new UK Charity Classification, with a view of reducing the way too many “catch-all” categories. They say: “we took a sample of over 4,000 registered charities and manually classified each one, creating new tags as we went along and encountered different types of charities. This sample could then be used to generate and test keyword-based rules for automatic classification of charities, as well as training machine-learning models.”
All the code and methods are openly available.
As you might or might not know, yours truly has been advising a group of academics – some of whom are former colleagues at St George’s, University of London – on an interesting research project that is trying to use knowledge graphs to detect “hubris” in leaders.
The project is now looking for its first PhD Students with a fully funded scholarship. If you are interested or know someone who could be, please direct them to the PhD scholarship advert and the Knowledge4Hubris project page.
‘till next week,
Giuseppe @puntofisso
Six questions to...
Prukalpa Sankar
Prukalpa is co-founder at Atlan.
What is your daily data work like and what tools do you use?
Unfortunately, I don’t get to actually work with data as much as I’d like in my current role, because I’m spending most of my time running Atlan, which helps other data teams around the world. But the data stack has become so amazing in the last few years. If I got a chance to get my hands dirty again, I’d love for my stack to be a data warehouse like Snowflake; dbt for data transformation; Looker, Mode, and Sigma on analytics; and Atlan, of course, as the collaborative workspace.
Tell me about a data project that you're proud of...
Our work with the Government of India in bringing clean cooking fuel to 80m women below the poverty line. By leveraging dynamic monitoring and analytics, it became the first government program in India to surpass its annual target by 22%. Part of this project was opening 10,000 new fuel distribution centers, which was complex because we were balancing two conflicting questions — profitability and accessibility (i.e. a center within 10 kms of every home). We ended up mapping India’s 600k+ villages, layering 600+ datasets, and running a geo-clustering algorithm to identify the best locations.
...and a data project that someone else did and you're jealous of.
There’s so much happening these days, but I want to call out the data projects that researchers, scientists, and analysts have been doing to fight COVID. They’ve been creating amazing research, models and, of course, vaccines faster than anyone imagined. A few recent examples are the Terasaki Institute’s AI system to detect COVID from a single lung scan, the Mayo Clinic and nference’s NLP work to predict COVID complications, and Andrew Barber & Jeremy West’s analysis that the Ohio’s vaccine lottery saved over $60m in averted healthcare costs.
If I say "dataset", you think of...
The Census. I find it fascinating that the oldest data projects in history trace back to governments and even kings wanting to create a better society (and collect taxes).
Give someone new to data a tip or lesson you wish you'd learned earlier.
This is one of the rare technical fields where you studied doesn’t really matter, because no one really studied data science in school! The only thing that matters is learning and curiosity.
Data is or data are...
Data is :)
Become a Friend of Quantum of Sollazzo → If you enjoy this newsletter, you can support it by becoming a GitHub Sponsor. Or you can Buy Me a Coffee. I'll send you an Open Data Rottweiler sticker. You're receiving this email because you subscribed to Quantum of Sollazzo, a weekly newsletter covering all things data, written by Giuseppe Sollazzo (@puntofisso). If you have a product or service to promote and want to support this newsletter, you can sponsor an issue. |
Topical
COVID threads
The Financial Times’ voice of rationality, data journalist John Burn-Murdoch, has published two brilliant Twitter threads about COVID. The first explains population statistics when things look odd (e.g. what “>100% of the population is vaccinated” actually means) and is ultimately about the perils of using data without fully appreciating the methodology that created/processed that data in the first place. The second covers the often misinformed debate about cases, hospitalisations and deaths.
IATI COVID-19 Funding Dashboard
“This dashboard was created to visualize, explore and analyze all of the published IATI (International Aid Transparency Initiative) data that is related to the coronavirus pandemic. It was developed by the OCHA Centre for Humanitarian Data.“
How Much Carbon Dioxide Are We Emitting?
“Visualizing the Quantities of Climate Change”.
Flooding could shut down one-quarter of America’s critical infrastructure
According to this article on Grist, “hospitals, airports, and other public services are at risk.” The numbers are scary.
(via Mobility Matters Daily)
Russian election tampering
The Economist’s data journalist Sondre Ulvund Solstad explains the thinking behind their visualization of Russian election voting patterns which shows, according to the author, that the election has some fraudulent results even though the chart cannot show exactlly where.
Corona Memories
“What are the measures & policy responses connected to data about the pandemic?
Since the start of the pandemic about 641 days ago, we are confronted with charts about new cases or even deaths. What are the events behind the numbers?“
A catchy and interactive data visualization by researchers at the University of Applied Sciences Potsdam.
Tools & Tutorials
Lisa Hornung’s Thursday Tools
Data Analyst Lisa Hornung shares her favourite mostly data-related tools in a Twitter thread every Thursday. Keep an eye on Lisa, she’s one of data Twitter must-follows.
Area Explorer
“Six Questions” graduate and census dataviz guru Ahmad Barclay has created a brilliant prototype of a Census Area Explorer for the Office of National Statistics.
Why “Python” is the best coding language for data journalism
Ah, those trolls at The Economist ;-) After convinging everyone that R was the best coding language for data journalism in their previous newsletter, they have now done the same with Python. Write-up by Dolly Setton, one of their Data Journalism team.
D3 Charts
“A guide to D3’s reusable example charts.“.
NOAA Historical Hourly Weather Data by U.S. City
“An interface for quickly downloading historical weather data from NOAA”, fully functional in this Observable notebook.
(via Jeremy Singer-Vine Data is Plural
Google Chrome Hidden Features Every Developer Should Know
Interesting set of features I wasn’t really aware of. The CSS Overview tool is pretty.
Map Projection Playground
We’ve seen plenty of map projection tools before, but this Observable notebook by geography lecturer Florian Ledermann is particularly easy to use. It’s for his course “Cartographic and Geodetic Foundations forPlanners” at the TU Wien.
robservable
“This package allows the use of Observable notebooks (or parts of them) as htmlwidgets in R.”
The State Of Web Scraping in 2021
“In this post, we will cover.
What is web scraping?
What are the main programming frameworks for web scraping?
What are some of the main enterprise-level paid web scraping frameworks?
A Python web scraping example where we extract some information from a site with Beautiful Soup
A JavaScript (Node.js) example where we interact with Google Search using Puppeteer
The Do’s and Don’ts of Web Scraping“
Weather Spark
“WeatherSpark.com offers detailed reports of the typical weather for 145,449 locations worldwide.“
Dr Bahareh Heravi’s tools collection
“Every year I share a collection of useful tools for data journalism and data storytelling with my students. Sharing it with a wider world here.“
How to match data with VLOOKUP in Excel & Google Sheets
VLOOKUP is probably the single most useful function in spreadsheet applications. Here Lisa Charlotte Muth of Datawrapper shows how to use it.
Data thinking
Making data visualizations more accessible
“Researchers find blind and sighted readers have sharply different takes on what content is most useful to include in a chart caption.“
(via DataNinja)
Statistical problems found when studying Long Covid in kids
“Statistical tests need to be paired with proper data and study design to yield valid results. A recent review paper on Long Covid in children provides a useful example of how researchers can get this wrong. We use causal diagrams to decompose the problem and illustrate where errors were made.” Interesting read by fast.ai.
How Wavelets Allow Researchers to Transform, and Understand, Data
“Built upon the ubiquitous Fourier transform, the mathematical tools known as wavelets allow unprecedented analysis and understanding of continuous signals.“
Machine learning is not nonparametric statistics
A take on whether machine learning is statistics or not – the author strongly supports the view that the two are essentially different. I’m not entirely sure that the case is entirely well argued, but it’s a good starting point to this side of the debate.
Dataviz & Interactive
Where to pass the Great British driving test
Not all driving test centres are created equal. This Tableau visualization uses DVSA data in order to show where… you have better chances of passing.
(via Mobility Matters Daily)
The give and take of volcanoes
Anna Thieme at Datawrapper shows how to make a map of the volcano eruption in La Palma.
Visualizing Religious Diversity
A very good piece of student work that was longlisted in the Information is Beautiful Awards. “The visualizations presented here are designed in an abstract form of a networked mesh of the places of worship of different dominant faiths in the country. “
A view on despair
No image here, as this piece comes with a warning: it’s about suicide. If you feel impacted or have suicidal thoughts, please contact the Samaritans.
This article analyses suicide in the Netherlands in 2017.
Werner’s Nomenclature of Colours
“I recognize the system Werner devised isn’t as useful as it used to be when it was devised so many years ago but I enjoy breathing new life into classic works of art so I chose to recreate it online.
The result is something that’s hopefully interesting for those just discovering Werner’s guide and those that may already be familiar with it and want to discover it in a new light.“
This is just too pretty not to link to, it’s fully interactive, and it might trigger great questions on where colours come from.
AI
Image analysis of all 27,682 election campaign photos of the parties
“With the help of automatic image recognition, we analyzed all Instagram photos of the Bundestag candidates.“
Original German and automatic English translation.
Netflix thumbnails
“A Netflix user will browse the app for 90 seconds and leave if they find nothing. Thumbnail artwork is actually NFLX’s most effective lever to influence a viewer’s choice. A user will look at one for only 1.8 seconds, so NFLX spends huge to optimize them.“
A very interesting Twitter thread.
What does Google think the minimum wage is?
Terence Eden has written this pretty enjoyable (but grim) article about the dangers of Google’s automatic summarisation in its search info-box. In some case, with potential risks to life.
quantum of sollazzo is supported by ProofRed’s excellent proofreading service. If you need high-quality copy editing or proofreading, head to http://proofred.co.uk. Oh, they also make really good explainer videos.