426: quantum of sollazzo
#426: quantum of sollazzo – 15 June 2021
The data newsletter by @puntofisso.
There’s an interesting event series about Coding in the Open happening this week. I know it’s a bit late, but there are still 3/4 days to go, so please take a look:
“Want to learn about automation, improving processes and how to incorporate open tools like Python and R? Want to learn about what open might mean for you, whether you’re looking for tools, processes, principles or just general interest?“
Each seminar runs for an hour between 1-2pm (BST) until June 18th, and it features an awesome set of people talking about different aspects of working openly and what the benefits are. All details can be found at http://codingedi.github.io.
Interesting episode of The Tip Off podcast on biased reporting and how data helped unearth the issue: “Sarah Turnnidge started her journalism career at local papers so it was there she first encountered press releases from police forces. But over time Sarah started to wonder - did they tell the whole story. In this episode Sarah talks through her meticulous data project which revealed a worrying disproportionality when it came to information put out about black criminals.” This story was first reported by the Huffington Post.
Any data engineers out there? The OCHA Centre for Humanitarian Data, part of the UN, is looking for one.
‘till next week,
Giuseppe @puntofisso
Six questions to... Hillary Juma
Hillary Juma is Common Voice Community Manager at Mozilla and formerly Data Science community and Engagement Manager at the Data Science Campus of the Office for National Statistics.
What is your daily data work like and what tools do you use?
I recently started my role at Mozilla Foundation to support the Common Voice (CV) Community. Voice technology doesn’t currently serve every language community in the world.
The CV community are helping us to build a publicly open source, multi-language dataset of voices that anyone can use to train speech-enabled applications.
Previously I was the Data Science Community Manager, for the UK Public Sector Data Science Community. To support and evaluate community activities, data from Meet ups Eventbrite pages and community survey informed the design and delivery of Data Science Community Programme. This helped ensure what I did as a community manager was relevant to members.
Data Ethics and Society reading group we (myself, Michael and Harriet) use GitHub repo to organise and communicate with members. Big thanks to DataKind UK’s Data Ethics in Book Club repo which we (myself, Harriet and Michael) modelled our reading group on.
Tell me about a data project that you're proud of...
During my time at the Church of England, I worked with quantitative and qualitive data about Clergy and ordinands (people on their way to be priests) to help provide insight to the ministry division’s Diversity and Inclusion strategy. It was my first, real-word data project and I learned a lot about stakeholder engagement and storytelling.
...and a data project that someone else did and you're jealous of.
Anything from The Pudding, in particular How Bad is your Spotify ?. As The Pudding describes it, This is a satirical project and does not use real artificial intelligence, but a faux pretentious music-loving AI. The code creates a custom blend of jokes from our database paired with the insights found in the artist, album, genre, and track data from your Spotify.
I like that it’s intentionally satirical and it made me expand my basic music tastes.
If I say "dataset", you think of...
A paint palette, you can create something perceived as beautiful or intriguing or even ghastly. Brushes dance with colours to create stories about the present, past and future.
Give someone new to data a tip or lesson you wish you'd learned earlier.
(Slightly biased) Join a community! Communities are spaces for you to connect and engage with likeminded people. You can learn so much from your peers and give back as well. Check out communities such as the Data Science Community (Open to U.K Public Sector), or Open Heroines (Open to Women and non-binary people).
Data is or data are...
As long as the topic of data is/are accessible to people you’re communicating with, I’m unbothered.
Topical
Cycling Levels of Stress
Originally about Ottawa, then also made for Trento by Maurizio Napolitano, these maps use the Level of Traffic Stress to visualize on OpenStreetMap how hard cycling is on road segments.
Building a Home in the U.S. Has Never Been More Expensive
“From lumber to paint to concrete, the cost of almost every single item that goes into building a house in the U.S. is soaring.“
Biden’s tax overhaul
An interesting (and visual) look by Reuters Graphics at the Biden Presidency’s spending plan and funding sources.
India: COVID-19 vaccination gender gap
According to Shadab Nazmi, BBC India’s statistician, more men are getting vaccinated in India than women. I can’t find more details about this and therefore is pending fact-checking, but I thought it would be an interesting direction of enquiry for this newsletter’s readers.
CoVaxxy
“Visualizing the relationship between COVID-19 vaccine adoption and online (mis)information”
Tools & Tutorials
Street Networks
You will surely remember Geoff Boeing’s original paper that generated so many imitators, including my own London Borough’s Street Network visualization.
This tweet took a different approach, showing the street network in order of entropy levels (i.e. sorted by overall “order”).
(via Guy Lipman)
New York Tax Bills Show Covid’s Lasting Damage to Real Estate
“Property assessments for the fiscal year starting July 1 detail the widespread hit to property in Manhattan“
Dataviz & Interactive
The Economist
This week, The Economist’s Off The Charts newsletter touches upon an interesting set of dataviz topics, including their Sybil tool to automatically generate charts in their style and how they’ve started using QGIS to make maps.
R for Public Health
A short list of “R resources that you may find helpful if you are seeking to increase your R skills with an eye toward public health applications“
Greggs vs Pret
New heatmaps by Owen Boswarva.
Interestingly, it’s also an opportunity to reflect on data quality: “About 22% of the FSA records were missing location coordinates. However for purposes of the heatmaps I reduced that to about 2% by adding postcode coordinates from the ONS Postcode Directory.“
Animated diagram of the Earth’s Carbon Cycle and how it has changed over time.
Twitter thread by Robert Rohde, a scientist at Berkeley Earth, an NGO.
Summer is here! It’s the best time to eat fresh local fruits — and create tables
Datawrapper does its usual great job of showing how to visualize data and that amazing datasets can be found for almost any topic (especially in Germany).
How we ask Google for help
“In the midst of the pandemic, conversations about mental health have emerged in the media. But Google Trends data shows that we were always dealing with these issues—it’s just that no one was talking to each other about it.“
Roads to Rome
“‘Roads to Rome‘ is a data visualization project that explores the idiom, ‘all roads lead to Rome’.“
(via Open Cage Data)
Mapping Indigenous history on the TTC
Another take on historic subway-like maps.
(via Riccardo Di Sipio)
AI
A recent paper on Nature looked at AI application to COVID clinical imaging data, with terrible results that are summarised in this Twitter thread. I was involved in a few conversations.
Data thinking
This week we have two very interesting articles by Atlan’s Prukalpa Sankar.
The Rise of the Metadata Lake
“Introducing a new way of storing metadata for today’s limitless use cases like data discovery, lineage, observability and fabrics” Long(ish) read with a fundamental truth: “Metadata is itself becoming big data.”
How TechStyle Used Agile Sprints to Roll Out a Modern Data Platform
“A tried-and-tested approach to democratizing tribal knowledge in TechStyle’s 50-member analytics team with Snowflake, Atlan, and Tableau”
I love reading how the industry approaches different problems in the data space.
Become a GitHub Sponsor. It costs about the price of a coffee per month, and you’ll get an Open Data Rottweiler sticker (and other stuff). Or you can Buy Me A Coffee.
quantum of sollazzo is also supported by ProofRed’s excellent proofreading service. If you need high-quality copy editing or proofreading, head to http://proofred.co.uk. Oh, they also make really good explainer videos.