421: quantum of sollazzo
#421: quantum of sollazzo – 11 May 2021
The data newsletter by @puntofisso.
One for you Italian speakers (I know who you are): last week, I took part in a great live conversation on Open Data, comparing the Italian situation with that of the UK and the rest of the world. Organised by DataNinja and moderated by Donata Columbro, this event was also a great opportunity for me to reconnect with two contacts from my past life who happen to be Open Data legends: Giorgia Lodi, who was one of my university lecturers, and Maurizio Napolitano, who’s responsible for getting me to draw maps with OpenStreetMap. You can find a recording here. Note the Brazilian-footballer-like name.
The Data4Good Festival is 3 days of 60+ events all on data use in charities (and run not for profit by DataKind UK, a charity, in partnership with about 20 other support charities). Lots more info here: https://data4goodfest.org.uk/
Tickets are £45-95 depending on size of org. However, the organisers don’t want money to stop anyone from attending - they have bursaries available for those for whom the ticket price is a barrier to taking part. You can get in touch at festival@datakind.org.uk.
Dan Hon, in his excellent newsletter, writes a rather in-depth analysis of the Post Office scandal.
For those of you who don’t live in the UK, this is a rather sad story of Post Office employees being convicted and spending years in jail for a crime they had not committed: stealing money. The issue? There was no crime. The IT system that run the transactions had a bug; management had failed to take the bug into account.
This story is interesting because it challenges our reliance on technology, first and foremost, especially in a context in which technology was politically charged. In an age where we’re increasingly relying on algorithms and AI, it gives us a few directions of enquiry: how much of our decision processes should we be giving away to machines? How much should we rely on them? How do we challenge the decisions they take or back? And what is the route to redress if something doesn’t work?
For longer form thinking about similar topics to those of this newsletter, Dan’s newsletter is strongly recommended.
Links are below the interview.
‘till next week,
Giuseppe @puntofisso
Six questions to...
Jeremy Singer-Vine. Jeremy is Data Editor at BuzzFeed News and a fellow newsletter-er of Data Is Plural.
What is your daily data work like and what tools do you use?
My work typically involves some combination of data gathering, restructuring, cleaning, exploration, and analysis. I have two main workflows. For simple tasks and preliminary exploration, I'm typically using VisiData, xsv, curl, grep, vim, and other command-line tools. For more complex tasks and work intended for publication, I'm using Python and Jupyter, with pandas, matplotilb, seaborn, scipy, lxml, pdfplumber, and other Python libraries. Ultimately, I'm aiming for a workflow that's as reproducible as possible (a habit I find Makefiles handy for reinforcing), ideally version-controlled in git, with diff-friendly outputs.
Tell me about a data project that you're proud of...
I was only one of several data-conductors in the FinCEN Files orchestra (to slightly mangle a metaphor), and that's a big reason why I'm so proud of the project. Collaborating with ICIJ's data team (and the 100+ partner organizations) was invigorating and a real joy, and helped take the investigation to places that would have been impossible otherwise.
...and a data project that someone else did and you're jealous of.
This multidisciplinary, harrowing investigation, which used the absence of data (in the form of blank map tiles), architectural expertise, and extensive interviews to examine the growth of Xinjiang's prisons and internment camps.
If I say "dataset", you think of...
... a specific way of seeing, interpreting, organizing, and (inevitably, for better or worse) compressing the world.
Give someone new to data a tip or lesson you wish you'd learned earlier.
Tip: Embrace reproducibility. It might seem like a distraction or time-sink at first, but it pays enormous dividends. It took me a few years to appreciate the value of reproducible workflows, but now I can't imagine doing this work any other way.
Data is or data are...
Philosophically: _are_. Grammatically: _you do you_. Statistico-historically: kinda fascinating.
Topical
Joe Biden’s presidency – What America thinks
“Tracking public opinion with The Economist’s polls from YouGov”.
This tracker by The Economist’s data journalism team is one to keep an eye on over the next few months.
Surprise COVID trend: Doomscrolling moved to desktop
“New data from Chartbeat provided to Axios finds that working from home has pushed people to scroll deeper through article pages on desktop, and slightly less through articles on mobile.“
As the article suggests, this matters because it looks like readers now prefer long form content on a large screen. An effect of the move to working from home?
17 Metrics to Watch in the Biden Era
Not quite as polished as the Economist’s tracker above, but definitely a set of interesting data points to look at during the Biden presidency.
The U.S. Will Need a Lot of Land for a Zero-Carbon Economy
“The goal will require sweeping changes in the power generation, transportation and manufacturing sectors. It will also require a tremendous amount of land.”
The US doesn’t lack land, of course, but it’s an interesting question in terms of politics and economic that doesn’t necessarily meets the current consensus in the US. It will be interesting to see how it develops – if it does at all.
Antidepressant use in Europe continues to break records
There are quite a few interesting points in this article shared via the European Data Journalism Network. Two that caught my attention: first, that the use of anti-anxiety medication is stable, while that of antidepressants has gone up considerably; secondly, that Denmark seems to be the only country – among those for which data is available – for which the use of antidepressants has decreased in the time frame considered.
Tools & Tutorials
Practical SQL for Data Analysis – What you can do without Pandas
“Pandas is a very popular tool for data analysis. It comes built-in with many useful features, it’s battle tested and widely accepted. However, pandas is not always the best tool for the job.”
This is a brilliant and accessible tutorial that will benefit data analysts and journalists exploring the tools they use.
Hosting SQLite databases on Github Pages
Another interesting article about using SQLite in ways I had not imagined. This time, it’s about how to add a database to Github Pages.
Visualising risk: a modern implementation of the Risk Characterisation Theatre
Giorgio Comai of OBC Transeurorpa/EDJNet explains how to use R to create a “risk theatre” visualization, which “takes a seating chart, such as the ones used for booking a place in a theatre, and obscu[res] a share of seats corresponding to the risk”. It can also be used as an online tool here. The Economist’s risk estimator also makes an appearance.
Dataviz
Visualised: glaciers then and now
The Guardian’s Nico Kommenda has put together an interactive database that shows, through their changing outlines, how glaciers have been affected by climate change.
Data is from the Glims database.
How many artists overshadow their band after going solo?
Fun dataviz by The Pudding, and – even better – all the data is available.
(via Soph Warnes’s Fair Warning)
Does a higher budget make a movie more successful?
“Only very expensive movies fail occasionally – extremely expensive ones don’t.”
Lisa Charlotte Rost of Datawrapper takes a look at the correlation between budget and success.
Interactive
Barabási
Albert-Laszlo Barabási is a Hungarian physicist who’s mostly responsible for getting me on the wrong path of a PhD twenty years ago… Well, his papers at least :) He’s also an utter genius who connects the study of natural or artificial networked phenomena with their visual representation. It is therefore rather apt that he’s now venturing into physical art, with an exhibition that is also available interactively online. There’s also a good write-up on his university’s magazine.
(via Massimo Conte)
Fingerspelling
An interactive, AI-driven, trainer to learn the American Sign Language alphabet.
Become a GitHub Sponsor. It costs about the price of a coffee per month, and you’ll get an Open Data Rottweiler sticker (and other stuff). Or you can Buy Me A Coffee.
quantum of sollazzo is also supported by ProofRed’s excellent proofreading service. If you need high-quality copy editing or proofreading, head to http://proofred.co.uk. Oh, they also make really good explainer videos.