553: quantum of sollazzo
#553: quantum of sollazzo – 20 February 2024
The data newsletter by @puntofisso.
Hello, regular readers and welcome new ones :) This is Quantum of Sollazzo, the newsletter about all things data. I am Giuseppe Sollazzo, or @puntofisso. I've been sending this newsletter since 2012 to be a summary of all the articles with or about data that captured my attention over the previous week. The newsletter is and will always (well, for as long as I can keep going!) be free, but you're welcome to become a friend via the links below.
We have some great new sponsored content: Ed Freyfogle, organiser of location-based service meetup Geomob, co-host of the Geomob podcast, and co-founder of the OpenCage, has offered to introduce a set of points around the topic of geodata. Read a few paragraphs below about geocoding at scale.
The most clicked link last week was this catchy infographic of World Leaders approval rating, by Visual Capitalist.
A few weeks back I was a co-host of the "It's not all about the numbers" podcast. Listen here for a conversation with my old open data nemesis (:P) Mike Rose and his co-conspirator Chris Sargent, where we kick off a conversation about... toilets!
Why toilets? Well, if you know me, you know that toilets are a great starting point to discuss the link between open data and policy, data generation and release processes, user needs, and standards.
We also...
- ...gave a few spoilers about the next location of Open Data Camp UK
- ...chatted about Elon Musk
- ...spoke about my old article "The Open Data Delusion"
- ...debated the expectation vs the reality of open data
- ...mentioned NaPTAN, the Nirvana of public open data
- ...discussed why I pivoted from open data activist to public servant
- ...compared notes about board-level data literacy and our experience as trustees and advisors
- ...agreed that linking the siloes might be more sustainable than breaking them
- ...gave shout outs to Gail Ramster, Robert Barr, and Ian Makgill for their data-driven achievements, and to data journalists as exemplars of evolving the data profession
- ...explored the evolution of professional labels for data wranglers
- ...had a go at linking Rugby and data (loosely).
'till next week,
Giuseppe @puntofisso
Before you go... DO YOU LIKE QUANTUM OF SOLLAZZO? → BECOME A SUPPORTER! :) If you enjoy this newsletter, you can support it by becoming a GitHub Sponsor. Or you can Buy Me a Coffee. I'll send you an Open Data Rottweiler sticker. You're receiving this email because you subscribed to Quantum of Sollazzo, a weekly newsletter covering all things data, written by Giuseppe Sollazzo (@puntofisso). If you have a product or service to promote and want to support this newsletter, you can sponsor an issue. |
✨ Topical
National Anthem
"Please rise and enable your audio for a stirring analysis of the United States National Anthem."
Jan Diehm and Michelle McGhee have created an amazingly visual analysis of the Super Bowl performances of the Star Spangled Banner for the Pudding.
By the way, The Pudding have opened an opportunity for a (paid) summer fellowship that might be of interest to a few of you (and me, if only I had time!)
How many moms are in the labor force?
USA Facts: "Over 24 million mothers of children younger than 18 are in the labor force. Nursing and teaching are the most common professions for working moms."
Geocoding at scale
In our final installment in our series about using open data for geocoding we contemplate the challenges of geocoding at scale. What are the issues you face when you have many hundreds of thousands or even millions of coordinates or addresses to work on daily? At OpenCage we serve numerous customers in this category, and a common question that comes up is whether an API based solution can handle that type of scale.
An API-based solution, managed by experts, is almost always the most reliable and most affordable way to develop such an on-going system, as otherwise you will soon be spending a lot of valuable developer time making sure your geodata is staying current. As anyone who has worked with software can confirm: “Building is easy, maintaining is hard”.
Nevertheless, there are challenges that come with depending on any external service, one of course being network availability. At OpenCage we have multiple, fully-redundant data centers, and the availability of our service is independently and publicly monitored by a third party (current and past operational status can be seen at status.opencagedata.com).
Still, even with a highly-available service, some customers worry about the “cost” of crossing the internet to an external service. The fastest API query is the one you don’t even make; a smart caching strategy can go a long way to reducing usage. Because our geocoding API is built on open data you can cache the results as long as you like, and we’ve published a few tips and points to consider.
We hope you’ve enjoyed our series on the issues around geocoding with open data. While we’ve used our service as the example, we believe many of the concepts and considerations will apply regardless of the data processing tools and services you are building on. If you have questions regarding anything we discussed, please get in touch.
Have a project that will need geocoding? See our geocoding buyer's guide for an overview of all the factors to consider when choosing between geocoding services.
🛠️📖 Tools & Tutorials
Falsehoods programmers believe about time zones
Some of these I knew about or are pretty obvious, but some others will really puzzle you.
How To Center a Div
"The Ultimate Guide to Centering in CSS".
One of the things I always struggle to do, spend time researching on Stack Overflow, recall and think "oh, it was so easy", and then forget again. Note that there's more than one away, mostly depending on what you're trying to center within.
EarthPy: Simplifying Geospatial Data Analysis in Python
"Among the myriad of tools available for such analysis, EarthPy emerges as a standout Python library, designed specifically to streamline the handling and visualization of spatial data."
There's also a readthedocs page.
explore
explore is an "R package that makes basic data exploration radically simple (interactive data exploration, reproducible data science)."
Guidelines for Brutalist Web Design
"Raw content true to its construction". I suppose we experienced baroque in the early 2000s...
🤯 Data thinking
The House of Lords could liberate the Postcode Address File if they back this amendment
This makes me nostalgic about that time of Open Data... Ah well, the story of PAF is famous in that it represented an intensely intellectual and political debate between those who thought that the state-run database should be released with an open licence, as it had already been "paid for" by the taxpayer, and those who believed that the state had a duty to maximise its revenue from it on behalf of the taxpayer. In the end, PAF was privatised alongside its hosting organisation (although it's not that simple – local government still plays a large role).
I thought that the argument on PAF had been settled, but it looks like there's something brewing in the House of Lords. Regardless of what happens, it's an interesting debate to witness, as it raises a lot of very interesting discussions about deeply technical aspects of data operations and sharing.
Is the "Modern Data Stack" Still a Useful Idea?
"We continue to be in the deployment phase for the MDS. The modern data stack that we’ve all come to love over the past decade isn’t going anywhere; its categories are getting increasingly mature and increasingly well-integrated. Its technologies and best practices are getting more widely deployed, both to more companies and more broadly inside of companies.
This is the phase of any cycle where the real work gets done and where the real value gets created. It’s the phase for getting living in the trenches and solving real problems. The MDS was the future five years ago and it’s still the future today, but we actually have to roll up our sleeves to make the replatforming happen.
Over the last month this has been bugging me. I don’t know if you’ve ever had that nagging feeling after writing something that just doesn’t feel right, but writing this knocked something loose in my head. Since then, I’ve become a little obsessed with the question: what’s going on with the modern data stack?"
📈Dataviz, Data Analysis, & Interactive
West Midlands Cyclotron
Chris Woods: "The West Midlands Cyclotron is a personal project inspired by the digital cycle counters on the Bristol Road. It provides near real time insight into the numbers of cyclists using city roads and cycle routes."
(via Daniele Bottillo)
Downs and ups in the crossword
I can't help linking to any n-grams-based analysis :-) Michael Do Thoi for Datawrapper: "I decided to dig into a dataset of clues and answers from all 1993–2021 New York Times crossword puzzles. I tracked the sudden rise of some now-ubiquitous terms, as well as the slow decline of other once-popular ones."
When and How Many Super Bowl Wins, by Team
Nathan Yau (Flowing Data): "The Kansas City Chiefs beat the San Francisco 49ers in Super Bowl LVIII. That’s three championships for the Chiefs in the last five years. How does that compare to teams who won previous Super Bowls over the past 58 years?"
quantum of sollazzo is also supported by Andy Redwood’s proofreading – if you need high-quality copy editing or proofreading, check out Proof Red. Oh, and he also makes motion graphics animations about climate change.
Supporters*
Alex Trouteaud
casperdcl
[*] this is for all $5+/months Github sponsors. If you are one of those and don't appear here, please e-mail me