Doodling Data logo

Doodling Data

Subscribe
Archives
August 26, 2023

The data round-up #4 - August 2023

Well, it’s already the end of August. September is looming over us, which will hopefully provide a much need respite to those of you living in places battered by the heat. We in Scotland haven’t had much of a summer this year, there have barely been any days with temperatures exceeding 25°C, and there’s been lots of rain and gloom. But it’s been good too.

Newhaven harbour, Edinburgh, August 2023.

A newsletter

If you don’t know it, Ian Ozsvald’s newsletter “NotANumber” is excellent, it works both as a community-led job board and as a general resource on projects and ideas.

The AI of bad practices

Timnit Gebru is an AI/ML researcher fighting for a fairer world where machines are not used to produce harm. She is one of the author of the now famous “stochastic parrots” paper (a great read), which costed her her job at Google.

She was recently a guest in “Reimagining the Internet”, a podcast by the Initiative for Digital Public Infrastructure (University of Massachusetts Amherst), interviewed by Prof. E. Zuckerman. The two spoke about a lot of interesting things, including how systems like ChatGPT & Co. may be breakthroughs, but really their genesis violates some of the basic principles of doing science: sharing the data, sharing the methods. OpenAI notably hasn’t shared info on the infrastructure of its system, nor the data that has been used to train it. There is no transparency. Sure, this is business and not academic science, but still the fact that these systems are penetrating ordinary life of everyone should raise these concerns more widely. Anyway, I really recommend a listen (or you can read the transcript linked), it’s much better than any summary I could ever give.

A fellowship from the UN

The UNESCO International Research Center for AI (IRCAI) is offering, in partnership with Amazon, a grant to fund startup ideas operating in the climate space - link here. It’s great to see.

Manipulating CSVs

For everyone working in data CSVs can be a nightmare. This post by A Borruso (it’s in Italian, but of course you can read it translated) is a great overview of how to do better, using tools like DuckDB.

Regulation: the EU Data Act

In late June the EU has agreed to the EU Data Act, which essentially regulates the use of data generated by services and devices within the Union in such a way that consumers are given more control over their own data and data sharing will be made easier, including with the public sector.

Subscribe now

Buildings footprint - new data

Both Google and Microsoft have freely released their own datasets of image detection of buildings from satellite imagery. These can be used for many purposes, e.g. measuring population density, vicinity to streams and rivers (for urban replanning), impact etc.

Last but not least [not data]

I saw somewhere someone shared this livecam from a pond in the Namib desert, showing animals coming to drink - it’s just wonderful.

Thanks for reading Doodling Data! Subscribe for free to receive new posts and support my work.

Don't miss what's next. Subscribe to Doodling Data:
Start the conversation:
Website Bluesky LinkedIn
This email brought to you by Buttondown, the easiest way to start and grow your newsletter.