Tens of thousands of busy people start their day with their personalized digest by Refind. Sign up for free and pick your favorite topics and thought leaders. Subscribe here.
450: quantum of sollazzo
#450: quantum of sollazzo – 14 December 2021
The data newsletter by @puntofisso.
Hello, regular readers and welcome new ones :) This is Quantum of Sollazzo, the newsletter about all things data. I am Giuseppe Sollazzo, or @puntofisso. I've been sending this newsletter since 2012 to be a summary of all the articles with or about data that captured my attention over the previous week. The newsletter is and will always (well, for as long as I can keep going!) be free, but you're welcome to become a friend via the links below.
Every week I include a six-question interview with an inspiring data person. This week, I speak with Giorgia Lodi a technologist for the Italian Research Council who was, when she was a PhD student, one of the best lecturers I've ever had throughout my academic life. I was incredibly happy to see we'd both ended up as Open Data activists!
'till next week,
Giuseppe @puntofisso
Six questions to...
Giorgia Lodi
Giorgia is a Technologist at the Institute of Cognitive Sciences and Technologies of the Italian National Research Council.
What is your daily data work like and what tools do you use?
I work on personal and open data, and, in general, knowledge representation using semantic web standards and technologies (e.g., OWL, RDF, SPARQL). The idea is to represent data with the highest level of data quality standards, so that data can be (re)used in complex applications/systems. I am currently involved in projects covering different domains, from fishery to procurement, to environment and health. The challenges are many but also recurrent: the same data of different organisations is organised in heterogeneous ways and with poor quality, which makes the task of integrating it and getting value difficult (because the value of data comes out when you start linking it with other data!). The tools I am used to working with are ontology and vocabulary editors (e.g., Protégé, VocBench), SPARQL endpoints to query data (e.g., GraphDB and Virtuoso), graphical tools for ontologies design (e.g., Graffoo), knowledge graph construction tools as RML but also software for Data Protection Impact Assessment as PIA when I support my colleagues at the Institute in dealing with the personal data processed in their research projects.
Tell me about a data project that you're proud of...
This project about commercial fish names that my group started some years ago with DG MARE. It is a traditional software system and it also probably appears simple, but the data management we do in the back end is quite complex, with strong data reconciliation issues we have targeted over years in order to collect, in one unique point of access, a variety of data coming from different data sources, including unstructured ones such as laws. You cannot imagine how many ways there are in the different EU Member States to simply publish a table with two columns “commercial designations” and “scientific names” of fishery products they market!
...and a data project that someone else did and you're jealous of.
The ArCo Project (ITA) which is a project of my group at the Institute carried out in collaboration with the Italian Ministry of Cultural Heritage; unfortunately, I do not follow its development in person. The project is about opening up data of the vast Italian cultural heritage according to the linked open data paradigm. On top of this knowledge graph, which is constantly growing and expanding, nice research work is currently on-going involving virtual reality, artificial intelligence, among others. In essence, the power of having nice and interoperable data, interconnected with each other using consolidated standards, is that you then can exploit that data to create innovative systems!
If I say "dataset", you think of...
Cleansing and a shared semantic model.
Give someone new to data a tip or lesson you wish you'd learned earlier.
Try to do your best to curate data but remember that real systems, and the world, are not perfect: you will come to tolerate and manage some (hopefully low) level of “garbage” :-)
Data is or data are...
Since I am Italian, data are, but it should be data is!
Become a Friend of Quantum of Sollazzo → If you enjoy this newsletter, you can support it by becoming a GitHub Sponsor. Or you can Buy Me a Coffee. I'll send you an Open Data Rottweiler sticker. You're receiving this email because you subscribed to Quantum of Sollazzo, a weekly newsletter covering all things data, written by Giuseppe Sollazzo (@puntofisso). If you have a product or service to promote and want to support this newsletter, you can sponsor an issue. |
Topical
Vaccinating Europe's Undocumented: A Policy Scorecard
"The scorecard assesses national policies on vaccinating the undocumented against COVID-19."
This is an important topic because, whether you think selfishly or in a more humanitarian way, the vaccine campaigns will only succeed if they vaccinate enough people, regardless of their status. By the Dutch collaborative journalism not-for-profit Lighthouse Reports.
Locked up: Covid-19 and prisons in Europe
"Data collected by 12 newsrooms in the European Data Journalism Network, coordinated by Deutsche Welle, shows that the effort to keep the infection under control in detention institutions came at a high cost."
This is a brilliant, big piece of data-driven reporting coordinated by Deutsche Welle and the European Data Journalism Network, which has put together data from 32 European countries. Some other data can be found here. Methodology, data, and code, are all available on Deutsche Welle's GitHub.
On 15 December, 5pm CET, Kira Schacht, the main coordinator of the project, will discuss it live with other journalists on Zoom. Registration is free here.
Netflix Global Top 10
This is cool. Netflix releases a weekly Top 10 list which is pretty interactive. Take a look at the methodology. Interestingly, they will move to a new methodology later in the year (as illustrated in a recent letter to the shareholders: "we will shift to reporting on hours viewed for our titles rather than the number of accounts that choose to watch them.")
(via Massimo Conte)
World Inequality Database
An interactive income and wealth comparator which will make you think.
Tools & Tutorials
How Environmental Journalists Can Use NASA’s New Landsat 9 Satellite
NASA has recently launched its ninth Landsat satellite in 50 years. This article gives some tips on how to download and use its data.
To note: USGS, which runs the data programme for NASA, is considering switching to a paid-for model, so you better download what you can.
Which countries can you call?
Ok, this might be a bit niche for this newsletter, but it's a very interesting use of data. In ham radio, there is something called "Reverse Beacon Network" that uses data from international radio contacts to build a map of what places can "hear" one another. This is pretty useful to radio hams, because propagation is not a fixed quantity.
Chris Wraith, a technologist and my personal guide in the ham radio world, has created this handy web app, which uses beacon data to show which countries are reachable from any location, given the wavelength.
How to compile an index
Another really useful write-up from the Economist's Off The Charts team.
Indices are fundamentally lists of things, people, or places. "The popularity of such lists is unsurprising: most people take pride in where they live and want to see how it compares with other places, and there’s also a desire to “locate yourself within the data”. But how are these rankings created?"
Pandas Tutor
"Pandas Tutor visualizes how your Python code transforms dataframes"
Data thinking
What an opener
You'd never think that "Walter Plecker was an a***ole " would be the opening line of a book about information architecture. Yet it is, and it's a pretty good one (and I now want to read the rest of the book).
(via Guy Lipman)
How to make agile actually work for analytics
A very good blog post by Taylor Brownlow at Count contains this brilliant revisitation of the 4 foundational principles of Agile for data analytics.
Statistical Imaginaries: An Ode to Responsible Data Science
"When statistics are understood as facts, the public expects exact knowledge. Precision. A census is supposed to be an enumeration of the public, a count of all the people. The Census Bureau produces numbers as facts, knowing full well that such a number is only the best number producible given the procedures the bureau uses. But just because the process worked does not mean that the result is as exact as the number implies."
Data is Wicked
"What does that mean for our tools?"
Dataviz, Data Analysis, & Interactive
The Obsessively Detailed Map of American Literature’s Most Epic Road Trips
American literature does have an obsession with road trips and someone decided to map it. "To be included, a book needed to have a narrative arc matching the chronological and geographical arc of the trip it chronicles." Interesting to see some of the famous excluded novels. (via Riccardo Di Sipio)
Some picks from the 30 Day Map Challenge
OpenStreetMap volunteer and data geek Harry Wood has published a list of his favourites maps from this year's 30 Day Map Challenge. Some are pretty cool!
And this one is a full list of Observable notebooks with the same topic.
UK Covid relief and recovery grants: data analysis
"This report uses data published by funders about the grants they made between March 2020 and October 2021 during the Coronavirus pandemic to understand how they and wider civil society responded during that time."
By the very good folks at 360 Giving.
The Schmidt sting pain index
A visual explanation of one of those datasets that I would never want to put together myself!
Shipmap
Created by a UCL-Kiln collaboration, this map allows you to see "movements of the global merchant fleet over the course of 2012, overlaid on a bathymetric map. You can also see a few statistics such as a counter for emitted CO2 (in thousand tonnes) and maximum freight carried by represented vessels (varying units)."
quantum of sollazzo is also supported by ProofRed's excellent proofreading service. If you need high-quality copy editing or proofreading, head to http://proofred.co.uk. Oh, they also make really good explainer videos.