417: quantum of sollazzo
#417: quantum of sollazzo – 13 April 2021
The data newsletter by @puntofisso.
Ah, this tweet by Will Geary. Wars were started over much less. Does nobody really use dashboards? Or is it, maybe, that most dashboards are unusable and badly designed? There are a few interesting reflections both in the thread by Will and in the many replies to it and my quote-tweet of it.
Two concepts seem to emerge: first, that the only useful dashboards are those that tell you at a glance if something has gone wrong, as opposed to those that promise deeper insight about problem X; second, that dashboards can be meta: the due diligence required to build them will tell you if you have the right data infrastructure in place or if you’ve got work to do. Where do you stand?
Interesting post by Facebook’s Nick Clegg here on the role of his employer in keeping user behaviours in check. Needless to say, Clegg argues that Facebook is not to blame for users misbehaviour, specifically arguing that “the algorithm” just highlights existing thoughts and opinions. As much as that is true, as an AI practitioner I disagree strongly.
When using Artificial Intelligence, we must all reflect on our own responsibility. To put it simply, highlighting a behaviour comes with a very high risk of encouraging it: “the algorithm” influences behaviours by making the rare seem ordinary. I believe it is our role as AI practitioners to make sure we don’t lose track of this.
An interesting free workshop is happening on the 27/4, run by GIS guru Prof Maria Antonia Brovelli of Milan Polytechnic: Satellite Data Analysis and Machine Learning Classification with QGIS, part 1. Part 2 will be on 11/5. The workshop “introduces how classification of satellite imagery can be done with QGIS by showing how to retrieve, process and classify satellite imagery, as well as how to assess performance of machine learning algorithms through error matrix and accuracy indexes.”
To close, here’s some Hansard joy for you:
Your links are below.
‘till next week,
Giuseppe @puntofisso
At the European Data Journalism Network we’ve been producing a dozen articles per month since 2017, covering plenty of issues affecting European citizens.
We’ve been paying much attention to social issues, with a special focus on health matters (from mental health to the causes of death) and on individuals and communities living at the margins, i.e. people exposed to poverty, poor infrastructures and little opportunities.
The climate crisis was the focus of some of our most ambitious investigations to date: we’ve explored the impact of global warming in Europe, as well as the main sources of carbon emissions.
The state of democracy in Europe is another of our main concerns, leading to the coverage of infringements of EU law, threats to digital rights, and the influence of big tech companies.
We also have a weekly newsletter.
Topical
Following the Science
“A look at the global research effort to combat the coronavirus pandemic.”
This is one of those brilliant interactive uses of scrollytelling by The Pudding.
See How Rich Countries Got to the Front of the Vaccine Line
The New York times illustrates global inequalities in the distribution of COVID-19 vaccines.
China’s Covid Rebound Edges It Closer to Overtaking U.S. Economy
“Since the 1970s, China has been racing to become the world’s largest economy. Its recovery from the pandemic means it could eclipse the U.S. this decade.”
I link to this article fundamentally for a single reason: the racetrack visualization.
Predicting FT Trending Topics
Adam Gajtkowski of the Financial Times shows how the team used time-series analysis and unsupervised machine learning to identify signals and, allegedly, “help journalists write more relevant stories”.
A People Map of the US
“…where city names are replaced by their most Wikipedia’ed resident: people born in, lived in, or connected to a place.” Methodology notes: data collected using the Wikipedia API and using the “People from X city” pages on Wikipedia. The top person from each city was determined by using median pageviews (with a minimum of 1 year of traffic) .
Data v Science
“Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI
ACADEMIC PAPER KLAXON. “In this paper, we report on data practices in high-stakes AI, from interviews with 53 AI practitioners in India, East and West African countries, and USA. We define, identify, and present empirical evidence on Data Cascades—compounding events causing negative, downstream effects from data issues—triggered by conventional AI/ML practices that undervalue data quality.” By a Google Research team. Sorry, it’s a PDF. Take care of your data quality, folks!
The ghosts in the data
Related to and citing the paper above is this brilliant blog post by Vicki Boykis.
The basic block of labor of machine learning is cleaning data and setting up engineering pipelines, the detailed and tedious work of making all the pieces fit together. However, there is no way you can learn this.
Building a Data Platform in 2021
“How to build a modern, scalable data platform to power your analytics and data science projects.”
This is a good blog post articulating several aspects of building a data science platform through the steps of data integration, building a data warehouse, transformation, presentation, and transportation.
(via Guy Lipman)
Alternatives to a Log Scale
Yet another take on the use of log scale (which the pandemic has turned into the pie charts of these years), advocating the use of callout boxes.
Data scientists are predicting sports injuries with an algorithm
“Machine learning can tell athletes when to train and when to stop.“
You’re probably measuring your treatment effect incorrectly
“You’ve developed a new health program and you want to measure its impact. You run a randomized experiment where half of participants are allowed to get treatment, and half are not. Of those allowed to get treatment, 80% actually choose to get treated.
You would like to report the answer to a very basic question: How much did the treatment help the people who got treated? You might think that’s a simple question to answer. But you’d be wrong.”
Useful resource for journalists reporting about vaccines and therapies.
Tutorials, Tools, Resources
How to vectorise the Environment Agency’s Reservoir Flood Map Outline data
If you’re interested in using the Environment Agency’s Risk of Flooding from Reservoir outline maps, or fancy a challenge in learning a GIS workflow, this blog post by Owen Boswarva explains “how to derive vector data from Web Mapping Service (WMS) layers using QGIS.“
Elevation Scan
This is a pretty and quirky idea. How to create a “scan-like” animation that shows geographical data variance over a map. It uses D3.
Dimension reduction 1
A step-by-step tutorial in Principal Component Analysis (PCA), a technique that allows you to reduce the dimensions of a multidimensional vector and, as a consequence, to visualize datasets with many variables.
The same author is releasing subsequent courses in this series, so keep an eye out.
SQLite is not a toy database
“Whether you are a developer, data analyst, QA engineer, DevOps person, or product manager - SQLite is a perfect tool for you.”
I would add: data journalist.
SQLite has been part of my arsenal since I first encountered it in 2011 while doing iOS development and it surprises me every single time for its versatility and performances.
(By the way, here is the official, yet pretty honest “when to use SQLite” stance from its developers.)
100m Posts Analyzed: What You Need To Write The Best Headlines
If you excuse its clickbaity title, this article by BuzzSumo is pretty interesting.
Become a GitHub Sponsor. It costs about the price of a coffee per month, and you’ll get an Open Data Rottweiler sticker (and other stuff).
If you’re a supporter of this newsletter, thanks a lot for your support. Share this e-mail with a friend, or via social media.
quantum of sollazzo is supported by my GitHub Sponsors and Buy Me A Coffee supporters, and by ProofRed’s excellent proofreading service. If you need high-quality copy editing or proofreading, head to http://proofred.co.uk. Oh, they also make really good explainer videos.