What's the Most Expensive Software Per Byte?
Is it 1 dollar/mb? 10 dollars/mb? 100? 100,000?
The other day a friend asked me a question: "What's the most expensive commodity software you can buy in terms of dollars per byte?" It seemed like a fun question, so I went looking!
Ground rules: first, bounded costs only. Something like an AWS lambda is unbound in that you can spend an arbitrary amount of money on it. That's not interesting in the same way that Turing machines that run forever aren't as interesting as busy beaver machines. We want something where the price is total and finite.
Second, pricing structures for software can get very weird. So we're only looking at things that have a flat cost per license and aren't "contact us for a sales quote" where they just charge whatever they think you can afford. And it must be commodity software, aka you pay the price tag and you have it within a week. Otherwise the answer would be a 0day exploit or custom software written for NASA.
(Even with that restriction, we still have complications like "price per CPU core per year" or "are you paying for hardware or software". As you can probably tell, I'm not taking this very seriously.)
I'm also looking for rough ballpark figures and didn't really do thorough research on this. That's why this is newsletter post and not a full website essay. I'm not interested in finding the difference between something that's one dollar per byte and something that's $1.50 per byte, but between one dollar per byte and $10 per byte.
Okay so here's what I found.
Expensive
The obvious place to start is looking up lists of highly expensive software. If you do this you tend to see a lot of the same stuff: AutoCAD, game engines, visual editing tools. These are usually aimed at companies that can pay through the nose. For example I found a lot of people saying the Unreal engine cost $750,000.¹ That seems like that would have a high ratio, right?
The problem is that these are developer tools. They're expected to be run on systems with a lot of CPU, memory, and space. There's no constraint forcing these to become small programs. If we handwave that purchasing an Unreal engine license gets you 10 GB of software, then we're looking at approximately $75 per megabyte. Which is certainly more expensive than most consumer software. But we can do a lot better.
Small
There are two sides to maximizing the ratio: increasing the cost of the program and reducing the size of it. Let's look at the other extreme, the one my friend found: what are the smallest programs you can pay for?
Today that's a tricky question, but in the past there was one major candidate: Atari games. Based on some reading, Atari games cost between $20 and $40 per cartridge. But the smallest cartridges were only 4 kB. This puts us at approximately 5-10,000 dollars per megabyte… more if you adjust for inflation. We're already over 100 times the ratio you'd get from workstation software. Nowadays of course games are nowhere near that expensive. Portal cost me about $10 and is approximate 10 GB, at less than one cent per megabyte. While Atari games were the most expensive per byte in the past, that's no longer true now. Even so, it sets a new goal. We want to top ten thousand dollars per megabyte.
Small and Expensive
Is there something on both extremes, being both small and pricey? There's one domain I happen to know about. Let's talk about APLs.
APLs are a class of language that derived from drumroll APL. Right now the three extant APLs are J, Dyalog, and K. These languages are known for both being very fast and very, very terse. Here's an example of a J program that takes an array of points in an N-euclidian space and returns the two points that are furthest apart:
f =: 3 : 0
t =. +/&.:*:@:-"1/~
i =. (-. #)@:(i."1 >./@:,)
(i t y) { y
)
That's 80 characters and nowhere near optimized. I even included useless whitespace!
APLs are not only very terse: they're also very fast (for an interpreted language). J Software makes a timeseries database called Jd. While free for noncommercial use, a commercial license can cost anywhere between $3,000 and $24,000. It's also just under 500 kB. That gives us a ratio of roughly $50,000 per megabyte!²
Very Small and Very Expensive
There is another industry that has extreme disk space constraints: embedded devices. Embedded devices also often have extreme performance requirements.they need to be fast, responsive, reliable, and deterministic. This means they need special operating systems. A real-time operating system, or RTOS, is an operating system designed to process all events in "hard realtime".
These are extreme requirements means that it costs a lot of developer time and money to produce a stable RTOS. And the result has to be small enough to fit in the memory of embedded device, even a very small one. So we're looking at very small programs that are extremely expensive to produce. What ratio do we get from them?
This site estimates that the average cost of an RTOS is approximately $10,000 for a license. It also says that an RTOS can be as small as 2 kB. This suggests an upper bound of around $5 million per megabyte. At that point we're talking about the cost density as dollars per byte. Even if you bump the size of the RTOS to 100 kB, that's still a hundred thousand dollars per megabyte.
This isn't a perfect comparison, as we're excluding all of the developer tools used to work with the RTOS- compilers, IDEs, etc. That could potentially add gigs to the "size" of the license. I'm willing to let that slide, because 1) saying something costs millions of dollars per megabyte is hilarious, and 2) there could plausibly be a vendor that makes the tooling free, and you just have to pay for the RTOS itself.
Small and Hideously Expensive
I thought "low hundreds to mid-millions" was the best we can do. I brought this all up on Twitter and nobody had any better ideas, either.
Then I saw this:
Stuff that's per-core licensed on a modern high core box can also have extremely high costs. SQL server is $15k per core, put that on a 256 core box and it's almost $4m/yr.
— Dan Luu (@danluu) April 1, 2020
I think SQL server is too big to win, but something with a similar license that's smaller could win.
SQL Server certainly couldn't win. But you know what could? An APL database that charges per core.
Which brings us to kdb+.
Kdb+ is a time-series database from Kx Systems. It's similar in size to Jd but charges per core instead of a flat fee. I can't find clear numbers on how much per core, but based on some google searches I'm estimating between 10k and 20k. Call it 15k per core and a file size of 500 kB, that puts us at a little over 7.5 million / megabyte for a 256 core box.
But why stop at 256 cores? Microsoft apparently makes servers with 896 cores! Maxing out a kdb+ license for that machine would be over 25 million USD per megabyte. Or just go hog wild and grab one of these 2048 core CPU chips, putting us at 60 million per megabyte.
Conclusion
Most of these numbers are incredibly unrealistic. At that scale everything is done as enterprise deals, and I doubt that Kx will charge you the same rate if you're running kdb+ on 2048 cores. And the cost-ratio of the RTOS is reduced because you're running the code on tons of devices. Nonetheless, even for a "realistic" analysis getting to 100k/mb is feasible, and 1M/mb isn't out of reach. Here's a silly table for the "unrealistic" analysis:
software | price (USD/mb) |
---|---|
Unreal Engine | 35-75 |
Atari Games | 5,000-10,000 |
Jd | 6,000-50,000 |
RTOSes | 100k-5M |
kdb+ | 5M-50M |
There's actual no point to any of this. It was just fun to research!
Update 2020-04-03
Since writing this, someone informed me of go .com, an old program that sold for £5. While the least expensive software on this list, it blows everything out of the water at infinity dollars per megabyte. That's because go.com
was an empty file!³ Give the story a read, it's great.
¹ Obvious question: is that actually an accurate number? It's a umber everybody repeated but nobody actually had a source for. The closest I could find was an old magazine article that threw that number out offhand, as well as this link to the old Unreal 2 licensing. That's only $350,000, which is still roughly the same ballpark though. Also, it charges additional licensing fees, which is probably what the companies actually negotiate over.
² It's only 50k/MB if you buy the largest enterprise offering, which scales to an unlimited number of licenses. If you only buy a single developer license, it's "only" 6k/MB, which is actually still less than the Atari game! I could try to normalize everything but ¯\_(ツ)_/¯
effort
³ Though if you're working under specific total logics, it's actually zero dollars per megabyte...
If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers, is now in early access! Get it here.