Ignore Previous Directions

Subscribe
Archives
August 30, 2025

Ignore previous directions 6: floating points

Nature

dunwich.jpg

At this time of year the heather and gorse are flowering. View of Dunwich Heath. Once there was a port the size of London at the time here, but after the storms of 1286 and 1287 when hundreds of houses were washed away it went into decline, and was destroyed by the Grote Mandrenke flood of 1362, which drowned 25,000 people around the North Sea.

Floating points

I was reading Kevin Xu's piece The real Deepseek moment just arrived (essential reading to know what is going on in China) and it mentioned that Deepseek were adopting UE8M0 FP8, and this was being supported in hardware by Cambricon, a leading GPU company in China.

Now, I have been following the incredible shrinking AI floating points for a while, but this one was new to me. Outside AI these formats are all a bit novel. When I worked in HPC everyone loved 64 bit floats, and looked down on 32 bit floats, because their algorithms didn't necessarily have numerical stability. Power and Z architectures added support for 128 bit floats, which have a 15 bit exponent and 113 bits of precision.

AI has pushed for shorter float formats for transport purposes (memory bandwidth is a limiting factor so smaller storage formats are useful), and for compute reasons, as the amount of hardware you need for two 16 bit float operations is very similar to one 32 bit float; indeed you can build the hardware to do both reusing most of the hardware. This is easiest to understand with integers, as you just disable the carry between the two halves to split them, which is how vector registers work, supporting 8, 16, 32 and 64 bit operations on the same vector registers.

The first common format, which reached hardware first and is widely available now in CPUs and GPUs, just truncated fp32, keeping the exponent the same but reducing precision. This made it easy to use as a transport format even if the compute could only compute on fp32, it could just be truncated or rounded. But 16 bits is still a lot, so we started getting fp8 and fp4 formats, and even in between ones like fp6. There was even a while when people were working with 1 bit numbers.

In all these cases, the exponent tends to dominate, versus the classical floating point where precision of the fraction is important. And that is where UE8M0 comes in. That is unsigned, 8 bit exponent, 0 bit mantissa. So thats just an exponent, with an implicit 1 mantissa. The exponent can have an offset, without an offset it could represent numbers from 20 to 2255, with an offset it would likely be 2-127 to 2128. Arithmetic becomes quite fun, with multiplication being adding of exponents, you just have to add the numbers to multiple in this format, which is cheap int8 addition. Adding 2n + 2n gives you 2n+1, while 2n + 2n-1 is rounded up to 2n+1 but 2n + 2n-2 rounds to 2n. This operation is almost just max(), sometimes adding 1, so the operations are almost add and max for * and +. Addition is potentially quite lossy, so is often done with a higher precision format and then converted, or there are other techniques you might be able to use.

As you might imagine, this only really needs integer hardware, so is widely supported, including on Nvidia. It is currently largely a format for training not inference. Floating point has always been a kind of weird thing in terms of not being like the kinds of numbers that mathematicians like, but useful, and I do wonder how many more uses we will find for these kinds of numbers that have different tradeoffs in terms of precision.

And indeed, this particular construct has been around for some time, with the semiring with the operations of minimum and addition being known as the tropical semiring after the Hungarian computer scientist who lived in Brazil; the UE8M0 version with addition and maximum by interpolation is called the max tropical semiring or, of course, the arctic semiring.

Anyway, the price of Cambricon went up 30%, on a bet that floating point formats can offset the process disadvantage that the Chinese chipmakers have versus TSMC. This probably wasn't the right thing to move the market, the big advantage China has right now that is spreading across their ecosystem is open source.

Don't miss what's next. Subscribe to Ignore Previous Directions:
GitHub Bluesky LinkedIn
Powered by Buttondown, the easiest way to start and grow your newsletter.