How do you take the average of two timestamps?
A jaunt through affine measures and math loopholes
New Blog Post
A better explanation of the Liskov Substitution Principle. First new post since August! Patreon notes are here.
How do you take the average of two timestamps?
Let's send off the 6x era with something silly. Some physical units of measure, like timestamps, can be subtracted but not added. What does it mean to add 2 PM to 6 PM? That makes no sense. But you can subtract 2 PM from 6 PM and get 4 hours. The output we get is a different type entirely, a "timedelta", that we can add to either timestamp.
I've seen these kinds of units variously referred to as interval measures or affine measures. Other affine measures are things like map coordinates, test scores, and temperatures. All things where finding the sum isn't sensible but finding the difference is. You can think of the "point" and the "delta" as being different types, where operations with two points or two deltas produce deltas, while operations with a point and delta produce a point.
About that averaging
Today the sun rose at 6:32 AM and set at 4:35 PM. When was high noon?
The easiest way to find a midpoint between two numbers is to just do (a + b)/2
, but we can't just add the two timestamps. Instead we have to do this:
from datetime import time
sunrise = datetime(2023, 11, 9, 6, 32, 0)
sunset = datetime(2023, 11, 9, 16, 35, 0)
>>> sunrise + (sunset - sunrise) / 2
datetime.datetime(2023, 11, 9, 11, 33, 30)
This way we're not adding timestamps to timestamps, we're adding timedeltas to timestamps. All the math checks out.
By contrast, here's what happens if we do things wrong:
>>> h, m = 16+6, 32+35
>>> h, m
(22, 67)
>>> h/2, m/2
(11.0, 33.5)
wait that's the same thing
Okay so here's why we have affine measures in the first place: we use them when we're not measuring from an objective zero point. For some measures, like dates, there is no objective zero, while for others there's a true zero that's too inconvenient for general use. In this context, addition is sensitive to our choice of zero, but subtraction is not— nor, curiously, is averaging.
This is easier to see with temperature measurements. Absolute zero in Kelvin is 0 K, while for Celsius it's -273° C. So Celsius is an affine measure: we get totally different numbers if we add two temperatures in °C than if we add them in K.
>>> K = lambda x: x + 273
>>> K(30 + 20)
323
>>> K(30) + K(20)
596
But we can meaningfully subtract two temperatures, and get the same results regardless of which measure we use.
>>> 30 - 20
10
>>> K(30) - K(20)
10
And in fact it doesn't matter which offset we use, either: if K = C - 123.46223
we'd still get the same result for K(30) - K(20)
because the offsets cancel out.1
The offset explains why addition is weird and subtraction makes sense: if f(x) = mx + b
, then f(x) + f(y) = m(x + y) + 2b
, while f(x) - f(y)
is just m(x - y)
. This also explains why averages work:
mean(f(x) + f(y) + f(z))
= (mx+b + my+b + mz+b)/3
= m(x + y + z)/3 + b
= f(mean(x, y, z))
And again, it doesn't matter what m
and b
are: averaging is well-behaved for affine measures regardless of their "affinity".
But what about our poor computers
Just because we can mathematically average affine units doesn't mean computers can do it. I don't know of any languages where you can say "these units can't be added, unless it's part of taking the mean, then it's fine." So what if we avoided adding the units? Here's a different algorithm to find the average of N timestamps:
- Pick some arbitrary origin, like
1970-01-01 00:00:00 UTC
. - Subtract the origin from every timestamp, giving us a set of N timedeltas.
- Take the average of the timedeltas, giving us the mean timedelta
x
. - Add
x
to the arbitrary origin.
The cool thing here is when I mean an arbitrary origin, I mean arbitrary. It literally does not matter what you choose. And it just so happens that if I pick "12:00 AM today" as my origin, then the timestamp 6:32
becomes the timedelta "6 hours and 32 minutes". Then we're effectively subtracting 0 from every timestamp, taking the mean, and adding the mean to 0. Which is just adding the timestamps and dividing by N.
>>> origin = datetime(2023, 11, 9, 0, 0, 0)
>>> str(sunrise.time()), str(sunset.time())
('06:32:00', '16:35:00')
>>> str(sunrise - origin), str(sunset - origin)
('6:32:00', '16:35:00')
>>> noon = (sunset - origin + sunrise - origin) / 2
>>> str(noon)
'11:33:30'
>>> str(origin + noon)
'2023-11-09 11:33:30'
As for how we originally found the midpoint? That's just the special case where we pick origin = sunrise
:
sunrise + ((sunset - sunrise) + (sunrise - sunrise)) / 2
= sunrise + (sunset - sunrise) / 2
I don't have any programming insight or language implementation advice based on all of this. I just think it's neat how adding affine measures is mathematically invalid but there's a loophole that makes averaging them valid. Computers might not be smart enough to use that loophole, but we can!
-
Things are a bit different when converting to Fahrenheit, since that has a scaling factor in addition to the offset. ↩
If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers, is now in early access! Get it here.
Stumbled upon this article in your "ten pieces" archive email newsletter. Thank you!
JavaScript and other programming languages provide a function to get "milliseconds since the epoch" for any timestamp. So, we could simply call that function for each timestamp, add them up and find the average.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Date/getTime
On a related note, calculating circular mean is also an interesting topic: https://en.wikipedia.org/wiki/Circular_mean