Thanks to advances in machine learning, computers have gotten really good at identifying what’s in photographs. They started beating humans at the task years ago, and can now even generate fake images that look eerily real. While the technology has come a long way, it’s still not entirely foolproof. In particular, researchers have found that image detection algorithms remain susceptible to a class of problems called adversarial examples.

Adversarial examples are like optical (or audio) illusions for AI. By altering a handful of pixels, a computer scientist can fool a machine learning classifier into thinking, say, a picture of a rifle is actually one of a helicopter. But to you or me, the image still would look like a gun—it almost seems like the algorithm is hallucinating. As image recognition technology is used in more places, adversarial examples may present a troubling security risk. Experts have shown they can be used to do things like cause a self-driving car to ignore a stop sign, or make a facial recognition system falsely identify someone.

Organizations like Google and the US Army have studied adversarial examples, but what exactly causes them is still largely a mystery. Part of the problem is that the visual world is incredibly complex, and photos can contain millions of pixels. Another issue is deciphering whether adversarial examples are a product of the original photographs, or how an AI is trained to look at them. Some researchers have hypothesized they are a high-dimensional statistical phenomenon, or caused when the AI isn’t trained on enough data.

Louise Matsakis covers cybersecurity, internet law, and online culture for WIRED.

Now, a leading group of researchers from MIT have found a different answer, in a paper that was presented earlier this week: adversarial examples only look like hallucinations to people. In reality, the AI is picking up on tiny details that are imperceptible to the human eye. While you might look at an animal’s ears to differentiate a dog from a cat, AI detects minuscule patterns in the photo’s pixels and uses those to classify it. “The only thing that makes these features special is that we as humans are not sensitive to them,” says Andrew Ilyas, a PhD student at MIT and one of the lead authors of the work, which has yet to be peer-reviewed.

The explanation makes intuitive sense, but is difficult to document because it’s hard to untangle which features an AI uses to classify an image. To conduct their study, the researchers used a novel method to separate “robust” characteristics of images, which humans can often perceive, from the “non-robust” ones that only an AI can detect. Then in one experiment, they trained a classifier using an intentionally mismatched dataset of images. According to the robust features—i.e., what the pictures looked like to the human eye—the photos were of dogs. But according to the non-robust features, invisible to us, the photos were in fact of cats, and that’s how the classifier was trained—to think the photos were of kitties.

The researchers then tested showing the classifier new, normal pictures of cats it hadn’t seen before. It was able to identify the kitties correctly, indicating the AI was relying on the hidden, non-robust features embedded in the training set. That suggests these invisible characteristics represent real patterns in the visual world, just ones that humans can’t see. And adversarial examples are instances where these patterns don’t line up with how we view the world.

When algorithms fall for an adversarial example, they’re not hallucinating—they’re seeing something that people don’t. “It’s not something that the model is doing weird, it’s just that you don’t see these things that are really predictive,” says Shibani Santurkar, a PhD student at MIT and another lead author on the paper. “It’s about humans not being able to see these things in the data.”

The study calls into question whether computer scientists can really explain how their algorithms make decisions. “If we know that our models are relying on these microscopic patterns that we don’t see, then we can’t pretend that they are interpretable in a human fashion,” says Santurkar. That may be problematic, say, if someone needs to prove in court that a facial recognition algorithm identified them incorrectly. There might not be a way to account for why the algorithm thought they were a person they’re not.

Engineers may ultimately need to make a choice between building automated systems that are the most accurate, versus ones that are the most similar to humans. If you force an algorithm to rely solely on robust features, there’s a chance it might make more mistakes than if it also used hidden, non-robust ones. But if the AI also leans on those invisible characteristics, it may be more susceptible to attacks like adversarial examples. As image recognition tech is increasingly used for tasks like identifying hate speech and scanning luggage at the airport, deciding how to navigate these kinds of trade offs will only become more important.


More Great WIRED Stories