What Waymo's NHTSA investigation says about how far along autonomous cars are
NHTSA is investigating Waymo. They're investigating Zoox, too, and have reopened an investigation into Tesla after Tesla's recent autopilot recall was notably deficient. The Waymo investigation is particularly notable. One of the truisms in autonomous cars, discussed in whispers by people who don't want to undercut their own interests by admitting it publicly, is that Waymo has an overwhelming lead in making autonomous cars that work. Cruise and Zoox, Mercedes and Ford: these companies are generally understood to be so far behind Waymo technologically as to be playing a different game entirely. Tesla is in its own category, deploying systems so undercooked and dangerous that nobody wants to be associated with them even by implication. Nobody building AVs would switch places with Tesla. Everybody wishes they were Waymo. So it's particularly notable to see NHTSA investigating Waymo, and the specifics of the events their investigating are the best way to get a view from the outside of the state of the art in autonomous vehicles. Where that state of the art is today can tell us a lot about how close these vehicles are to being deployed.
One of the things that we were constantly surprised by, at my company, was how much the technical progress at the AV companies we were attempting to work with lagged what we expected. We were selling a solution to a problem that we pitched—correctly, I think—as something like the biggest remaining challenge for self-driving vehicles. In order for it to be that, though, a bunch of much easier challenges had to be solved first. Our software was designed to allow AVs to understand whether a pedestrian they had detected was crossing the street. In order to that, though, you had to know that a pedestrian was there. Over and over again we were to discover that our customers—companies with fleets of test vehicles on the road!—could not detect pedestrians.
Waymo has probably (probably!) solved that one. One of the other problems we got asked about a lot, though, was dealing with people directing traffic, whether they be police, other first responders, construction workers, truck drivers, or somebody else. It's one of those challenges that fits squarely in the framework of Moravec's Paradox; something that is so easy for humans that it is hard for us to even reason about why it's so hard for machines. It's something that we have no introspection about. That's a problem, because if you want to have a vehicle that navigates around crash scenes and construction sites without causing dangerous situations, it is something you need to understand.
When Cruise was pulled off the road in San Francisco, the proximal cause was a vehicle doing the (very) wrong thing after a crash; it (most likely) engaged an automatic program to pull over, and ended up dragging a pedestrian trapped underneath it. The reason that the vehicle had that automatic program, though, was as a fix for a bigger problem: Cruise vehicles that would drive directly through accident scenes, oblivious to the emergency context and to first responders trying to get them to move. The Cruise vehicles would pull into the scene and then stop, blocking ambulances and foiling attempts to treat severely injured people. In order to stay on the road, Cruise needed a rapid fix, and seems to have gone for one that simply pulled out of the way, regardless of context. It went poorly for them.
Cruise had to deploy that fix because dealing with construction and emergency situations is phenomenally difficult for automated cars. The cars rely on incredibly detailed maps of what is where; lane boundaries, the location of traffic signals and street signs, historical information about where cars are and are not. They can only work in the real world if the amount of novelty--unexpected things in unexpected places--is minimized, because machine learning, like computer vision and really the whole stack of perceptual and behavioral technologies in this vehicles is completely unable to deal with novelty: it does not work on situations that it has not seen before.
That's a notable problem when it comes to emergency situations or road construction, because those situations are necessarily unique. Construction workers place traffic cones where they believe them to be necessary and sufficient to communicate to human drivers what those human drivers should do. They do not place traffic cones according to a rigorous system, or so that their meaning can be divined by automated analysis. Police officers waving traffic past an incident do not have a formal repertoire of gestures. They are communicating to the drivers of vehicles using a rich and undocumented library of informal, unspoken communication that may vary by region, time of day, and individual. The locations of construction zones can shift day by day and hour by hour—they cannot be kept up-to-date on system-wide maps—and crashes and emergency scenes can occur anywhere at any time. The more you dig into the problem of these areas from the perspective of automated cars the more a simple-seeming problem ("just drive around where and when they tell you to") unfolds into an unfathomably complex set of behavioral reasoning tasks.
The primary thing we can learn from the NHTSA investigation is that Waymo has not solved those tasks. Not, based on the categorization of events in the write-ups of the investigation, even close. This is a problem for Waymo (as well, obviously, as the people in these construction zones and at these crash scenes) because this is not a problem amenable to a quick fix. Cruise showed that. This problem—navigating around novel but absolutely vitally unbreachable road obstructions at the direction of humans outside the car—is not only central to the wide rollout of these vehicles but precisely the sort of unsolved engineering problem that cannot be reasonably be addressed on a schedule, using standard engineering project management. Substantial breakthroughs at the frontiers of research will be required to address these issues at the level they need to be addressed.
Waymo can do that, probably. One of the reasons that they are so far in the lead of the rest of the industry is that they have far, far more resources to devote to the problem, and can afford to be far, far more patient. But the fact of these investigations is a priori evidence that while Waymo may be closer to real deployments than any of the other players in the space, and may be closer to deployments than they have ever been before, the moment where these vehicles switch from an ongoing technology development effort to an attempt to prove that these vehicles can work as a business remains far in the future.
I talked in this newsletter about one class of accidents mentioned, but the linked article mentions another: a Waymo drove slowly into a chain link fence, almost like it couldn't see it. That's because it probably couldn't. Chain-link fences, again, are a challenge that's deceptively hard. The way that autonomous cars see the world is via collections of sensors including—most notably for our purposes—cameras and lidar. Cameras can be very good at understanding the world but the algorithms for extracting semantic information from them are hard, and they're relatively low resolution: at the speeds cars need to go, very often features of interest in the surrounding world will be contained in areas with very few pixels, sometimes as few as a couple hundred. At that resolution it is very difficult to distinguish anything, and particularly objects that don't have a strong shape. For this reason the primary sensor that most autonomous cars—and especially Waymo cars—use is lidar. Lidar is good in all kinds of light conditions and returns accurate sensor readings (the collection of laser reflections a lidar creates; it's called a "point cloud"). The problem with lidar is that it only works on solid objects; if the laser goes through an object the lidar will simply never know its there. Chain-link fences, for all that they are effective and ubiquitous barriers, are mostly empty space. To a lidar, they barely exist at all. Cameras aren't much better. This, again, is a problem that seems simple (and is, even in self-driving cars, probably easier than construction zones) but resolves to an almost infinite complexity the more you try to engineer around it. Autonomous driving, at this point, is nothing BUT this kind of problem.
There's a jokey truism in software development that the first 90% of the problem takes 90% of the time, and then the last 10% of the problem takes 90% of the time. Autonomous driving is like the fractal version of that. Every last 10% takes 90% of the time, and every last 10% of that 10% also takes 90% of the time. Will Waymo eventually crack it, and get to the point of testing whether robotaxis are a viable business model? I am not sure I'd bet against it--they have all the money and all the time--but I'm certain I'd be unhappy if I had to bet ON it.