January 25, 2022
454: quantum of sollazzo
#454: quantum of sollazzo – 25 January 2022
The data newsletter by @puntofisso.
Hello, regular readers and welcome new ones :) This is Quantum of Sollazzo, the newsletter about all things data. I am Giuseppe Sollazzo, or @puntofisso. I've been sending this newsletter since 2012 to be a summary of all the articles with or about data that captured my attention over the previous week. The newsletter is and will always (well, for as long as I can keep going!) be free, but you're welcome to become a friend via the links below.
·
Every week I include a six-question interview with an inspiring data person. This week, I speak with Erin Davis, whose amazingly artistic data visualizations I've often linked to.
Speaking of Six Questions interviewees, after a long time of sending each other links, I had a virtual chat with Jeremy Singer-Vine, interviewee #2, that served to remind me why I really enjoyed going to events like Hacks/Hackers and getting acquainted with the data journalism world. It's because data journalists are the community where I found a quasi-total overlap with the 3 areas of data I love: a geeky curiosity in learning something through data; using storytelling as the main way to communicate your findings; and a willingness to break the boundaries of technology through multiple forms of media or approaches. I miss attending Hacks/Hackers in person, by the way.
'till next week,
Giuseppe @puntofisso
This week's edition is sponsored by OpenCage
OpenCage operates a highly available, simple to use, worldwide, geocoding API based on open data like OpenStreetMap. With libraries for python, R, MATLAB, Stata, and over 30 other programming languages it's easy to dive in. Whether you just need to geocode one dataset, or you have an on-going need, we offer cost-effective, flat-fee packages, and all the benfits of Open Data.
Try the API now on the OpenCage demo page.
Six questions to...
Erin Davis
Erin is Data Visualization Specialist, at 1point21 Interactive.
What is your daily data work like and what tools do you use?
I create data-driven content for the firm’s clients, mostly attorneys. We want to make pieces that are interesting to the general public but still in line with the clients’ areas of practice, which can be a bit challenging!
I rely heavily on public datasets, especially of car crashes, crimes, and injuries. Most of my data collection, cleaning, analysis, viz, etc is done in R. Quite a lot of my work is geospatial in nature, so I also use QGIS a ton.
Once I produce a visualization with R/QGIS, I add on the text (headers, citations, annotations) in Photoshop.
Tell me about a data project that you're proud of...
...and a data project that someone else did and you're jealous of.
There are SO many of these. There’s nothing like the “man, I wish I’d thought of that” feeling. My Twitter bookmarks is basically just a hall of things I wish I’d created myself!
Some that come to mind are:
If I say "dataset", you think of...
The thrill of new data! I hoard datasets like a dragon. Give me mooooore.
Give someone new to data a tip or lesson you wish you'd learned earlier.
It is very easy to get too close to a dataset. I’ve definitely made the mistake of producing graphics that I thought were simple and easy to understand but were actually confusing to newcomers. I only thought they were clear because I was intimately familiar with the data. If you can, it’s great practice to get other eyes on your work or to set it aside for a while and come back to it fresh
Data is or data are...
Data is. Fight me.
Topical
More than 1,700 congressmen once enslaved Black people. This is who they were, and how they shaped the nation.
"The Washington Post has compiled the first database of slaveholding members of Congress by examining thousands of pages of census records and historical documents".
The New Normal
"How the Covid-19 pandemic is shaping our shopping searches" – a data visualization that uses Google Trends to assess what trends have changed. Art direction by the legendary Alberto Cairo.
More money, more COVID-19 vaccinations? Let’s look at the outliers with a bivariate map.
"Classes are always artificial", says Datawrapper's Lisa Charlotte Muth in this interesting article that shows how to produce a bivariate scatter plot and map to explore the relationship between GDP per capita and vaccination rates.
Toxic Churn
"How the legacy of former industrial sites pollutes American cities today". We've seen a similar article, based on the same EPA data, last week. This article on Grist takes a specific look at California.
Elezioni Quirinale 2022
Let's face it, unless you're Italian you won't be interested in the election of Italy's Head of State. However, the electoral system – which is based on supermajorities – is pretty famous for causing deadlocks and this simulator is quite good at exploring the chances of leading candidates.
Tools & Tutorials
GPS
Have you ever wondered how GPS works? This illustrated and interactive tutorial cracks it.
Log jam
"All of the authors’ analysis was performed on logarithms of wage data", and The Economist newsletter explains how that works.
Intro to Agile
The very smart folks at Coefficient, a data consultancy, have publicly released their handbook on how they use Agile for data science/engineering/analytics projects.
AI
Machine Learning Coding: Essential algorithms, mathematical foundations and best practices
My friend Riccardo Di Sipio, an esteemed physicist who worked at leading research centres, including CERN, who then "defected" to the world of industrial machine learning, has published a book that looks like a good intro to using ML in real life.
Here's an excerpt from the abstract: "However, there is no such thing as Machine Learning without a good deal of coding. Despite the large overlap in background knowledge of mathematics, statistics, software engineering and a generic ability in problem-solving, there is quite a bit of difference between more research- and product-oriented flavours of ML. While the former is mostly focused on creating new state-of-the-art models and new ways of training, the latter’s main concern is to make it work in real life. At times, trade-offs have to be found, with a specific constraint on keeping systems efficient both in terms of uptime and speed of execution. This can hardly be obtained by only using off-the-shelf solutions. The book also represents a unique resource to prepare a job interview for Machine Learning engineering positions."
Dataviz, Data Analysis, & Interactive
Extracting Information from Historical Genealogical Documents
"How HTR (Handwritten Text Recognition) and Related Technologies Are Empowering Family Discoveries". I was recently working on something similar related to handwritten notes, and I bet this might be useful to some data journalists.
Watch Covid-19 cases sweep across the United States
Just because it's a nice (and scary) animation.
The World’s Troubling New Tempo of Temperature Records
Bloomberg does an excellent job at showing we might be doomed unless we act.
Sponsored content
The essence of the web, every morning in your inbox
Tens of thousands of busy people start their day with their personalized digest by Refind. Sign up for free and pick your favorite topics and thought leaders. Subscribe here.