readwrite

Archives
Subscribe
December 12, 2025

Edition 6 – The Curious Case of ChatGPT-share

Hi,

Hakan here. As a reporter, even if you've got a niche you're working in, with me it's cybersecurity, most of the time you're on the outside looking in. I'm no expert in most of the things I'm reporting on. Weirdly, this has a lot of upsides – when other people go d'uh, you go 'huh', because almost everything feels new or fresh. But one of the downsides is: It can be hard to understand what you're looking at.

If you don't need the context and just want to see how to parse the HTML, here's the link to the Jupyter Notebook. Be warned, though, the first sentence is going to encourage you to read this newsletter for context, so you might circle back. Here you go.

To give you one example: Back in 2020, when we published our investigation into Facebook and how freely hate speech spread even back then, I was looking at some HTML-pages to see how they were built. I then noticed something that felt weird: There were multiple single-letter strings throughout the DOM. Put together, it read "Sponsored". Basically, it was an ad, but chucked into pieces. Our reporting interest lay elsewhere, so wondering was all I did. But, it turned out, this was one way Facebook was using to avoid adblockers. I only realized this after I saw the discussion on Twitter.

This edition of the newsletter is basically in the same spirit: not knowing what I'm looking at. But for what purpose do you have a newsletter, if not to just point out something you find interesting?

My initial idea was to write a two-parter on a.) getting data from the Wayback Machine (here's part one) and b.) parsing that data. But, for me personally, it got a bit weird. I'm not saying this with a "huh, there's a story here" type-insinuation, it's just a 'huh, why? is it being done this way?'

Please send any suggestion, feedback, and clues you might have to readwritenewsletter@proton.me.

One huge disclaimer. I have zero experience with modern-day frontend development. Not gonna lie: What I set out to do was to read some of the chats, just because I was curious. But when looking at the HTML and parsing it with a tool called BeautifulSoup (love the library, not a big fan of the name), I came across a string that did look interesting: cfConnectingIp. Right next to it, was another keyword: userCountry. My initial thought was: Did ChatGPT store IP-addresses of its users within the conversation-page, and somehow, all that data was stored on the Wayback Machine now?

I reached out to my wonderful colleague Christo Buschek who quickly did some checking and shared a ChatGPT conversation and opened the link two times, first with his regular IP, and then with a VPN. The IP addresses stored in the HTML did change accordingly. In other words: Whoever viewed the chat is whose IP address was actually displayed in the HTML. Not the person who started the chat.

However, for the thousands of pages that I have: All of the IP addresses in there belong to the Wayback Machine, at least that's what I think (145 addresses across 8000+ pages I've looked at so far, my total being ~15000). Some unintended side-effects of capturing these sites, for there were the user-agents as well, regions and much more. In these chats you have parts of the scanning infrastructure used by the Wayback Machine, that's what it seems like to me. (I've reached out to Wayback Machine and asked whether they consider this to be an issue, and they don't).

The weirdest thing, to me: ChatGPT stores 100s of domains in the DOM. Some of them are labeled disabled, many belong to mail providers such as Proton, T-Online, Googlemail and so on. But there are also gaming companies, universities and publishing houses. There's a lot going on, which I did find fascinating.

As I said, I do not know why this has to be (or: is being done) this way. I've asked ChatGPT about it, though. Here's what it said.

chatgpt_verdict.png

In the spirit of this whole endeavor, I've shared the entire conversation I had with ChatGPT, you can find the link in the notebook.

Don't miss what's next. Subscribe to readwrite:
Powered by Buttondown, the easiest way to start and grow your newsletter.