Your Tests Pass, Your Types Check, Your Code Still Breaks

need

        August 14, 2025

Your Tests Pass, Your Types Check, Your Code Still Breaks

            I deployed with confidence. Pyright and ruff were happy. Tests were green. My type-safe match statement had for months reliably multiplexed scene data to the frontend of Mythia...then I added two structurally identical classes to the match statement. I ran the code in staging, and one scene never showed.
The false security stack had betrayed me: static type checker ✅, unit tests ✅, pre-production deployment 💥. This is the story of how I learned that Python's type system is compile-time theater, and runtime has very different plans.
The "Bulletproof" Code That Wasn't
My match statement had been the reliable heart of our game's scene system. It was elegant, comprehensive, and handled a dozen different scene types that drive our interactive narrative:
def to_javascript_type(segment: SegmentType) -> JavascriptStrEnum:
    match segment:
        case Segment():
            return JavascriptStrEnum.Segment
        case Segment1():
            return JavascriptStrEnum.Segment1
        case Segment2():
            return JavascriptStrEnum.Segment2
        # ... many more cases ...
        case ActionCounterTimelineCollapseWarningSegment():
            return JavascriptStrEnum.ActionCounterTimelineCollapseWarningSegment
        case ActionCounterTimelineCollapseDangerSegment():
            return JavascriptStrEnum.ActionCounterTimelineCollapseDangerSegment
        case ActionCounterTimelineCollapseFinalSegment():
            return JavascriptStrEnum.ActionCounterTimelineCollapseFinalSegment
    raise ValueError("Unknown segment type")

This function is our scene serialization workhorse. It takes rich Python objects and converts them to simple string enums that our JavaScript frontend can switch over to render the correct HTML templates. It's the cheapest hack we threw together to get scene information flowing in the game, and it worked great. Pattern match over objects, inject string representations, let the frontend switch over them.
When I added the new timeline collapse segments, pattern matching. These two classes were structurally identical:
class ActionCounterTimelineCollapseWarningSegment(BaseModel):
    class Config:
        extra = "allow"
    pct_exhausted: int = Field(..., description="Percentage of actions remaining when warning triggered.")

class ActionCounterTimelineCollapseDangerSegment(BaseModel):
    class Config:
        extra = "allow"
    pct_exhausted: int = Field(..., description="Percentage of actions remaining when danger triggered.")
    # Literally identical structure, just different class names

I thought the pattern matching parameters would distinguish them. Pyright was satisfied. My unit tests passed. I felt bulletproof.
The Mystery: Only Warning, Never Danger
However, despite all pre-runtime markers showing solid, staging tested started misbehaving. Testers were only ever seeing the 'warning' scene, never the 'danger' scene, regardless of their actual game state. I systematically checked the frontend hookup, the backend data structures, and the segment creation logic. Everything looked correct. I added more unit tests. They all passed.
Could it be the JavaScript serialization markers? I dove deep into our serialization pipeline, convinced I'd find some edge case in how we were handling the type mapping.
But the more I investigated, the more confused I became. The backend was creating the right objects. The logic was sound. The tests were comprehensive. Yet in production, every single timeline collapse scenario was being serialized as a warning shadowed danger.
That's when an unsettling thought crept in: what if the problem wasn't my logic, but my assumptions? What if that core matching infrastructure wasn't matching reliably at all?
The Investigation: JSON Doesn't Lie
I needed proof. So I constructed both types of objects the same way production was doing—generically, at runtime, not hand-crafted like my unit tests. I serialized them and I fed them through my match statement and watched what happened.
Every single object, regardless of whether it should be warning or danger, came out as ActionCounterTimelineCollapseWarningSegment. The first case was always winning.
But why? I serialized both objects to JSON and stared at the output:
// ActionCounterTimelineCollapseWarningSegment
{
  "pct_exhausted": 75
}

// ActionCounterTimelineCollapseDangerSegment  
{
  "pct_exhausted": 75
}

Identical. Completely identical structure, like I expected 'over-the-wire',
but Python was supposed to discriminate between them before hand.
Python wasn't matching on type names—it was matching on runtime shape. My type system was 'compile-time' theater.
The Deeper Truth: Python's Type Theater
Here's the uncomfortable reality about Python's type system: Python types are just hints. At runtime, structure trumps intention.
My match statement wasn't failing because of a bug—it was working exactly as designed. Python's structural typing looked at both classes and saw the same thing: an object with a pct_exhausted field. The class names, the careful inheritance from BaseModel, the thoughtful field descriptions—none of that mattered at runtime. What I thought was sophisticated type discrimination was actually just "first structural match wins."
This is fundamentally different from languages with truly strong type systems. In Go or Haskell, types are nominal. The type name matters at runtime. If you have two structs with identical fields but different names, they're different types, period. The compiler won't let you mix them up, and if somehow you tried, you'd get a runtime error, not silent wrong behavior. This is why Go and Haskell have interfaces and Typeclasses respectively, whereas Python doesn't need them. 
Python sees structure and assumes compatibility. Your static type checker might be happy, your unit tests might pass (if you are passing in concrete objects where the type information hasn't elided), but runtime has its own rules.
The Contrast: How Real Strong Typing Works
Imagine this same scenario in Go:
type WarningSegment struct {
    PctExhausted int
}

type DangerSegment struct {
    PctExhausted int  
}

func processSegment(segment interface{}) string {
    switch s := segment.(type) {
    case WarningSegment:
        return "warning"
    case DangerSegment:
        return "danger"
    default:
        return "unknown"
    }
}

This would work. Every time. Because Go's type system is nominal, WarningSegment and DangerSegment are different types, even if they have identical structure. The type switch examines the actual type, not the structural shape.
Or in Haskell, with algebraic data types:
data SegmentType = Warning Int | Danger Int

processSegment :: SegmentType -> String
processSegment (Warning _) = "warning"
processSegment (Danger _) = "danger"

The compiler guarantees that pattern matching is exhaustive and correct. There's no runtime guessing about structural similarity.
Python gives you the illusion of this safety with type hints, but it's still the same dynamic, duck-typed language underneath. Your types are comments, not runtime guarantees.
The Solution: Working With Python's Nature
Stop fighting Python's nature and work with it. Python is runtime focused, hence you need good old fashioned reflection. I replaced the clever match statement with a simple type map:
TYPE_MAP: dict[type, JavaScriptFriendlySegmentType] = {
    Segment1: JavaScriptFriendlySegmentType.Segment1,
    Segment2: JavaScriptFriendlySegmentType.Segment2,
    Segment3: JavaScriptFriendlySegmentType.Segment3,
    InputSegment: JavaScriptFriendlySegmentType.InputSegment,
    GameOverSegment: JavaScriptFriendlySegmentType.GameOverSegment,
    VictorySegment: JavaScriptFriendlySegmentType.VictorySegment,
    MilestoneSegment: JavaScriptFriendlySegmentType.MilestoneSegment,
    RoomChangeSegment: JavaScriptFriendlySegmentType.RoomChangeSegment,
    ActionCounterTimelineCollapseWarningSegment: JavaScriptFriendlySegmentType.ActionCounterTimelineCollapseWarningSegment,
    ActionCounterTimelineCollapseDangerSegment: JavaScriptFriendlySegmentType.ActionCounterTimelineCollapseDangerSegment,
    ActionCounterTimelineCollapseFinalSegment: JavaScriptFriendlySegmentType.ActionCounterTimelineCollapseFinalSegment,
    Segment: JavaScriptFriendlySegmentType.Segment,
}

def to_javascript_type(segment: SegmentType) -> JavaScriptFriendlySegmentType:
    try:
        return TYPE_MAP[type(segment)] # Use the type() method to reflect the
        class name. 
    except KeyError:
        raise ValueError(f"Unknown segment type: {type(segment).__name__}")

Now it works. Every time. Because type(segment) returns the actual class object—the one thing Python's runtime does track reliably.
The solution is simple, explicit, and it embraces Python's runtime behavior instead of fighting against it.
The Hard-Earned Wisdom
This pre-production horror story taught me the most important lesson about Python development: Python types are just hints. At runtime, structure trumps intention.
Here's what every Python developer needs to know:
Trust but verify your tools. Pyright, mypy, and your IDE will give you confidence, but they're reasoning about a different version of your program than the one that actually runs. Static analysis catches obvious errors but can't save you from Python's runtime behavior.
Your tests need to match production. My unit tests passed because they used hand-crafted objects in predictable scenarios. Production used generic object construction with varied timing and data. The gap between test construction and production construction was where the bug lived.
When Python's type system seems too good to be true, it probably is. Complex pattern matching on structurally similar types, clever generic constraints, elaborate type unions—Python's structural typing will find ways to surprise you.
Sometimes the simple solution is the right solution. My type map isn't as elegant as type-pattern matching, but it's explicit, and idiomatic of Python. Which is the best lesson here. Learning other languages teaches excellent techniques, however bring idioms across to new paradigms, e.g., strongly-typed functional languages, into a type-hinted scripting languages one must be careful. The false-friend when "it works" can not be working for the reasons you believe.  
Now when I see those green checkmarks from pyright and my test suite, I'm grateful for the safety they provide, but I now better understand where the boundary of that safety lies. 
Your tests can pass, your types can check, and your code can still break.

                            Don't miss what's next. Subscribe to Quiescent Current:

                Share this email:

                                Share on LinkedIn

                                Share via email