Ignore Previous Directions

Archives
Subscribe
December 14, 2025

Ignore previous directions 9: hardware power

Nature

view near Sion, with vineyards in the foreground, but nothing on them as it is winter, then trees, and in the distance mountains
View near Sion, Switzerland

It has been a while since I wrote, some travelling, and then most recently had a horrible flu (avoid!). Switzerland was lovely, just in the transition to winter.

Hardware and power

I thought I would write another piece about how modern hardware has changed as hardware drives the form of software. I decided to take a look at a snapshot of the hardware ecosystem as it is now. I will probably follow up with some selected benchmarks of what hardware in these different categories can deliver.

The clearest way of understanding hardware today is to classify it by power consumption, as in the end we are turning electricity into useful work at various scales. I am not going to cover the sub 1W universe, the land of microcontrollers and so on here, although I might cover them in another article; by number these make up most device shipments. We will only cover devices with an MMU, so can run an OS like Linux or one of the other modern OSs. Although if you know of any sub 0.5W Linux capable machines let me know! Mostly I will cover "servers" but at the bottom end we will talk about other devices. We will bucket things in powers of 10 of watts, so the category for 150W will roughly cover 50W-450W, then 1.5kW will start at 500W, using peak energy consumption is the main metric. This is very rough, as modern CPUs and other components can drastically vary power consumption, often by a factor of 10 or more, and you can often use the same device in different categories by allowing it to use more power.

power category
1.5W mobile phones and embedded devices
15W high end phones, low power fanless laptops
150W higher power laptops, desktops, low end servers, low end GPUs
1.5kW large desktop with GPU, standard CPU servers
15kW CPU rack, standard 8 way GPU server
150kW small inference of AI cluster, single AI rack
1.5MW 32 rack Nvidia DGX SuperPOD
15MW Google large TPU cluster, largest current supercomputers
150MW large building installation for AI, Colossus first stage

Let us look at these in more detail.

1.5W Mobile phones make up the largest category of device shipments and installed base. Battery life and cooling really limit them to very short peaks above a couple of watts, so this power category is likely to stay in widespread use. Drones for example need something like this, although again there is a range and some are above becuase of vision model processing. Computers in this category now have around 4-6 cores, 4-12GB RAM. They tend to now have some GPU and NPU acceleration, in growing amounts. They have the order of 1Gb of external IO capacity for 5G, with some more for graphics and storage. This device class is repurposed for devices like small routers and so on. My ballpark estimate is that the performance is probably similar to servers from 20 years ago, in 2005 or so, like the Dell 1850. You can't directly rent something like this in the cloud, although a low end shared burstable Arm core like the t4g series on AWS is probably similar. A lot of machines in this power level are being produced on trailing edge semiconductor processes still.

15W This is the power level for tablets, low power laptops, like the MacBook Air I am writing this on, and other similar devices. High end phones now peak into this range, which is why they can get so hot under load. The average laptop is about 65W and falls into the next category, but we have these nice lower end devices, usually fanless, now. These can be scaled up phone chips, like with Apple, or powered down PC chips, like AMD sells. These generally come with integrated GPU on chip, as standalone GPUs are too high power. This category has 4-12 cores, and 8-32GB RAM. It can drive of the order of 10Gb of IO, often more. USB4 is 40Gb and is becoming more common, and some have PCIe4x4 SSDs which are around 60Gb. This is roughly equivalent to servers from 15 years ago, in 2010 I think. At this power consumption you get around two amd64 cores in AWS, or more Arm cores, so a small VM. Apparently this is a typical VM size for many applications.

150W Laptops go up to about 100W, or even higher now, and many desktop machines, and small business servers (which are more or less repackaged desktops), and some other smaller servers. A full power GPU, at 400-500W is going to take you out of this category, but lower power options are there, as Apple shows, and Nvidia is targeting with its DGX Spark (170W ish). This power level is very hard to passively cool without thermal throttling. RAM is up to about 128GB, with around 16 cores. These can easily drive 100Gb or more in IO, although many do not as there aren't use cases, and a lot of the IO capacity is often the CPU to GPU connection. This is currently roughly what the expectation for the entrypoint for GenAI applications, running small model inference is, with Nvidia having the DGX Spark and DGX Thor robotics dev kit in this power level. In terms of RAM capacity these correspond to Dell servers from 2015, so a decade ago, but the CPU and IO performance is going to be a lot better. Because IO was so slow on hard drives, servers had relatively more RAM, so structurally servers have changed, but this period is interesting as it is when containers were introduced. This is a typical size for a decent size database instance, say 16 cores on the cloud.

1.5kW Large desktop workstations with high power GPUs sit in this group too, so it covers quite a variety of hardware. This is what a large CPU server now consumes, with the large server CPUs being 400-500W, still typically in dual CPU configurations, with up to 128 or more cores per socket. These machines also have 100+ PCIe lanes, plenty for a couple of 400Gb ethernet connections in addition to storage. Cloud machines do not use most of the IO bandwidth that is available, with most machines on AWS connected to a total maximum of 100Gb network and remote storage bandwidth. This means that each core gets under 1Gb bandwidth. This size is what the large cloud providers mostly use, although their Arm based servers are often single socket and might just fall into the previous power category. Server chips more or less scale up from the previous category to this, by adding more chiplets for example, and adding more memory channels and IO, so there is a range of sizes between, with different thermal envelopes. Each socket can have 128 cores (or more), and support 768GB or more of memory. The vast majority of these systems are partitioned into smaller VMs, rather than used to run large applications, although of course scale-out applications do exist.

15kW This is what a rack of CPU servers, for example from Oxide will consume. However it is also the base unit of GPU compute, with a system that has a full high end server from the previous category, with 8 GPUs connected with PCIe switches, and then typically say 8x400Gb networking (with 800Gb coming shortly). This is the smallest unit for serious AI work. The networking is key because this is a component of the larger systems that we see below, that run large inference and training applications across multiple computers, so interconnect is key. These machines currently run at around 12kW or so, but this is continually being pushed up in each generation.

150kW This is what you might use for a Deepseek inference cluster for example, a small number of racks of GPU servers. The Nvidia SuperPOD starts at 8 racks of 4 machines, at 320kW, so at the top of this, including networking between the racks, but they are aiming for 600kW in the next generation, so bumping it out of this range. The Meta OpenCompute AI racks are around 140kW.

1.5MW At this point you get a 32 rack Nvidia SuperPOD, which is a large inference cluster or a small training cluster.

15MW This range is where Google's TPU clusters top out, and also where the fastest supercomputer in the top500 list is, at 40MW. Llama3 was trained in this bracket. You can get a 1 exaflop supercomputer at 15MW, according to the announcements about the forthcoming Alice Recoque supercomputer. So this is about the limit of where we run single applications on computer clusters right now. This limit is largely about networking scale limits, the usual scale limit in supercomputing.

150MW This is the size of the Colossus AI cluster first stage, and where very large AI training applications are perhaps starting to reach, but probably most installations of this size are partitioned into smaller clusters at present.

So what can we learn from this? We can see that modern GPU machines have raised the per machine power consumption by about one power of ten, so we now have a range of around 10,000x for single machines. AI has normalised what were much more unusual HPC architectures, and is trying to scale them up by another order of magnitude, limited really by network scalability, which is why there is so much demand for next generation switching right now. Many of the easier gains in AI compute, such as reducing precision, have now gone, and much of the growth now is by increasing the power, hence the rise in water cooling, or increasing the networking sizes.

At the bottom end, we are getting much better at varying power consumption for different use cases, so that while phones can use a lot of power then can also use very little at other times. Right at the bottom end of power consumption, ambient devices that scavenge power from teh envoronment but can still drive wireless communications are starting to be rolled out, for example by Walmart, so we have devices with no net power consumption at all.

Don't miss what's next. Subscribe to Ignore Previous Directions:
GitHub
Bluesky
LinkedIn
Powered by Buttondown, the easiest way to start and grow your newsletter.