The problem is that the computer doesn’t know anything, it’s processing a massive amount of data using pattern matching algorithms with various layers and strategies to attempt to identify objects. The amount of variability of input data is enormous and they are just at the beginning stages of figuring this stuff out.
Input data that is just a little different than previously encountered can (certainly not always, but can) produce significantly different results.
Here’s an article about researchers altering images in a way that causes problems for neural networks but not humans:
Somewhere out there is a relatively recent paper showing that all neural networks have some significant errors (I don’t remember the details and I don’t remember what “significant” really was or how quantified, but it was a bunch of math showing that trained networks have hidden errors beyond some threshold that the trainers think is “ok”).
Like some of you I was relying on the Tempe sheriff, but yes, this really does look like a classification failure, or a complete hardware failure. I wonder if the poor woman paused briefly in the left-hand lane and then moved, or if it really failed to pick her up as a moving object, maybe because the shape was unfamiliar (and, perhaps, assigned her to some other part of the background movement)?
Depending on where I look, headlights are supposed to extend 160 feet, braking at 40 mph including reaction time somewhere between 80 and 112 feet. That’s pretty tight for a human, but ideally the car picks that up without a problem, even if it’s only relying on headlights.
A requirement for 100% reliability is obviously a logical impossibility but if you are designating a computer to operate a vehicle on public roads it should be better than a human driver, and most human drivers manage to years or decades between major accidents despite driving in often poor conditions, exceeding the speed limit, and with all manner of distractions from unruly passengers to YouTube vidoes of kittens in boxes. An autonomous pilot obviously doesn’t suffer from human distractions, fatigue, or anger, and should be programmed to operate within traffic laws, so just by default an autonomous vehicle of comparable awareness and perception to a human driver should already show a vast improvement notwithstanding that 360 degree spatial awareness and the ability to perceive objects by active LIDAR scanning or observation into the near-infrared should improve their perceiptiveness even further.
However, every researcher working in machine vision today will tell you that computers do not build three dimenional representations from video feed the way the human brain does from our eyes, and anyone working in cognitive neuroscience will tell you that despite how much slower the human nervous system is compared to electronic sensors in transmitting the data, the human brain is capable of anticipatory predictions that the best machine intelligence simply cannot do, and that we don’t even fully understand how the brain does it other than that it goes beyond higher congnition in the neocortex and draws from the affective and autonomic systems in the cerebellum. Given that we don’t really understand how brains work as well as they do, making any quantitive comparisons of performance between a human and autonomous driving system really boils down to functional metrics against specific types or classes of situation; hence, the need for a uniform standard of veification testing that demonstrates the ability of an arbitrary autonomous pilot to respond to potential hazards.
The comparison to the practical driving test that a human makes misses a fundamental and problematic point that we don’t actually test human drivers on their reaction speed or ability, and do very little to test their judgment beyond a basic interpretation of simple traffic rules. The most difficult task human drivers are generally asked to do on a basic driving test is parallel parking, and outside of tactical and performance driving courses students are never deliberately placed in a situation which tests their reaction skill in a way that could result in damage. We assume that human drivers will respond correctly to a hazard, and perhaps surprisingly, even the worst drivers do so most of the time, in part because we’ve spent years using our senses and bodies to avoid falling down, running into objects, et cetera, and cars are designed (somewhat unconciously) in a way that adapts those skills of perception and proprioception to driving. As humans, we are evoled to adapt and learn to use tools. Computers, on the other hand, can only follow algorithms and even our best attempts at learning (heuristic) algorithms are about as capable as a toddler and not nearly as generally adaptable. So we cannot begin with the assumption that an autonomous piloting system is the equivalent of even an inexperienced human driver because they don’t work using the same fundamental set of capabilities.
If I had to guess, I would say that is it, there. A person pushing a bike covered in plastic bags is probably not something that matches to anything in its database.
The fortunate thing about this, is that once we determine why this happened, we can update software so that not only would this car not hit someone in this situation again, but no cars with this software will either.
Training drivers requires that they each have their own learning curve and mistakes. Training computers means only one has to.
But, it not being in the database means that it has to make a decision. Is this a harmless piece of trash going across the road, or is this a person or animal or something else that it is supposed to stop for?
As it was a bike covered in plastic bags, I can understand that it would get confused, and mark it as harmless piece of debris.
Point is, is that it would have to make some sort of categorizing, and I’m saying it categorized it wrong. When they play back the logs, I would not be surprised if it has a log entry for detecting a harmless object in the road just prior to the collision.
Even if it can’t be positively identified as a person, it’s still a 5+ foot x 1 foot moving object in the car’s path. There’s no real circumstances where the answer to that obstacle is “run it over”.
That’s a human brain’s reasoning. It’s not the way the computer makes decisions. The article linked by RaftPeople above is instructive about the differences and how something very obvious to us is not to a computer.
It does need to be calibrated to limit false positives. It’s not going to go very well if it screeches to a halt everytime something drifts out into the road. All I’m saying is that the bags on the bike probably made the computer think it was a bunch of bags going across the road. The bike frame itself would be mostly hidden.
It certainly is a problem that needs to be addressed, but it shouldn’t be that difficult to do so. And once that is done, all driverless cars cars are safer.
That’s really the take away. There were nearly 6000 pedestrians killed by human drivers last year. Many of the drivers probably learned a valuable lesson, and will not hit another pedestrian, unfortunately, that training does not apply to any other drivers on the road. In the case of driverless cars, it does. It should not be that difficult to see what exactly went wrong, and fix it so that it doesn’t happen again.
This article shows the previous Uber self-driving car based on the Ford Fusion. It shows a top-mounted LIDAR as well as “LIDAR modules on front, rear and sides.” But this article shows the current Volvo-based car (same type involved in this crash), and it only mentions the top-mounted LIDAR, “360-deg radar coverage” and front-facing cameras. Makes me wonder if they eliminated the front/side/rear LIDAR modules on this iteration in favor of radar.
Nevertheless, I think bicycle wheels should be very visible on radar. And also provide strong cues for the image-processing software to identify it as a bicycle.
Also interesting to see there’s no mention of infrared cameras.
Computers make decisions any way you program them to make decisions. The size of something is not that hard to compute, and stopping before you run into something of a given size or bigger is not that hard to do either before you try to resolve what it is. All sorts of stuff comes off of trucks and cars around here - mattresses, chairs, car parts, chickens. Not accounting for these is just bad design.
This being Uber, though, that is not out of the question.
Come to think of it, a standard Volvo XC90 (without the Uber self-driving equipment) has a radar-based automatic braking system. I think that by itself should have prevented this accident. This video shows similar systems working well under similar situations (in broad daylight, but that wouldn’t make any difference for radar).
My guess is, this will turn out to be the fault of poor software design - not a classification error made by the software, but just poor software development practices leading to unintended behavior.
There are streetlights along the entire length of road, and she was crossing directly under one of them. And while not a crosswalk, it’s a very poor street design. The median has what appears to be pedestrian walkways crossing it, and the woman entered the road from one of them. It’s not surprising people attempt to cross there.
Here’s an article from last fall about a lead engineer in the driverless vehicle division of Uber who wants to do away with lidar and rely on ordinary cameras.
I wonder how Waymo’s vehicles would have responded. Anyone in Mountain View want to grab their bike, some trash bags, and test it out?
There should be some kind of a process log that shows the vehicle’s internal representation of its location and the objects around it, but the “decision making process” (e.g. the operation of internal algorithms used to assess hazards) is likely too complex for a person to read or interpret directly.
While it is true that the heurstically-refined algorithms developed by the experience of one vehicle should be able to be translated to any other vehicle using the same system, it probably isn’t possible to make them generically transferable to any arbitrary autonomous piloting system any more than you can make a good, or even adequate, translation of Shakespeare into Mandarin. I get the impression that many people think the computer ‘sees’ objects and compares them to some discrete database of potential hazards and then follows some logical seqence of state operations to avoid an accident but that isn’t how these systems work; machine vision at the current state of the art essentially recognizes general shapes or specific types of features, tries to integrate them into a pseudo-three dimensional representation of its nearby world but it really doesn’t feed incomplete data into different types of predictive systems and anticipate a hazard the way the brain does at a fundamental affective level long before any full picture is built.
If you’ve ever had someone throw a ball at you from the side and moved to dodge or catch it long before you actually realized what it was you’ve experienced this effect, and it is all based on conditioned instinct combined with inputs from a specific type of photoreceptor and movement detection system that does not function the same way as the system that forms static images (and it is the thing that makes text dance at the edge of your peripheral vision on old CRT monitors). Computer vision systems just do not do this because a camera and CCD does not function like an eye (which is actually an extension of the brain and does considerable preprocessing before signals ever get to the visual cortecies), and expecting autonomous piloting systems to learn or work the way the way the brain does is not correct.
I’ve driven that section on Mill after dark on countless occasions and between the streetlights and the ambient city lighting from the nearby Mill Avenue Sistrict it is pretty easy to see pedestrians and large objects several hundred feet in front of you even on a cloudless and moonless night. There are also a lot of homeless people in that area because of nearby parks and underpasses for shelter as well as the presence of restuarants that homeless people will rummage through dumpsters to get food from. I can’t say that any driver would have avoided that accident, but a reliable autonomous pilot should have been able to at least detect and attempt to swerve or brake before impact.