�ݺ�ߣ

8/27/2017 Magic AI: these are the optical illusions that trick, fool, and flummox computers - The Verge
https://www.theverge.com/2017/4/12/15271874/ai-adversarial-images-fooling-attacks-artificial-intelligence 1/12
Illustration by William Joel / The Verge
MAGIC AI: THESE ARE THE OPTICAL ILLUSIONS THAT TRICK,
FOOL, AND FLUMMOX COMPUTERS
by James Vincent @jjvincent Apr 12, 2017, 12:04pm EDT
Illustrations by William Joel

T
here’s a scene in William Gibson’s 2010 novel Zero History, in which a character
embarking on a high-stakes raid dons what the narrator refers to as the
“ugliest T-shirt” in existence — a garment which renders him invisible to
CCTV. In Neal Stephenson’s Snow Crash, a bitmap image is used to transmit a
virus that scrambles the brains of hackers, leaping through computer-augmented
optic nerves to rot the target’s mind. These stories, and many others, tap into a
recurring sci-fi trope: that a simple image has the power to crash computers.
But the concept isn’t fiction — not completely, anyway. Last year, researchers were able to
fool a commercial facial recognition system into thinking they were someone else just by
wearing a pair of patterned glasses. A sticker overlay with a hallucinogenic print was stuck
onto the frames of the specs. The twists and curves of the pattern look random to humans,
but to a computer designed to pick out noses, mouths, eyes, and ears, they resembled the
contours of someone’s face — any face the researchers chose, in fact. These glasses won’t
delete your presence from CCTV like Gibson’s ugly T-shirt, but they can trick an AI into
thinking you’re the Pope. Or anyone you like.

Researchers wearing simulated pairs of fooling glasses, and the people the facial recognition system thought they were.
These types of attacks are bracketed within a broad category of AI cybersecurity known as
“adversarial machine learning,” so called because it presupposes the existence of an
adversary of some sort — in this case, a hacker. Within this field, the sci-fi tropes of ugly T-
shirts and brain-rotting bitmaps manifest as “adversarial images” or “fooling images,” but
adversarial attacks can take forms, including audio and perhaps even text. The existence of
these phenomena were discovered independently by a number of teams in the early 2010s.
They usually target a type of machine learning system known as a “classifier,” something
that sorts data into different categories, like the algorithms in Google Photos that tag
pictures on your phone as “food,” “holiday,” and “pets.”
| Image
by Mahmood Sharif, Sruti Bhagavatula, Lujo Bauer, and Michael K. Reiter

To a human, a fooling image might look like a random tie-dye pattern or a burst of TV static,
but show it to an AI image classifier and it’ll say with confidence: “Yep, that’s a gibbon,” or
“My, what a shiny red motorbike.” Just as with the facial recognition system that was fooled
by the psychedelic glasses, the classifier picks up visual features of the image that are so
distorted a human would never recognize them.
These patterns can be used in all sorts of ways to bypass AI systems, and have substantial
implications for future security systems, factory robots, and self-driving cars — all places
where AI’s ability to identify objects is crucial. “Imagine you’re in the military and you’re
using a system that autonomously decides what to target,” Jeff Clune, co-author of a 2015
paper on fooling images, tells The Verge. “What you don’t want is your enemy putting an
adversarial image on top of a hospital so that you strike that hospital. Or if you are using the
same system to track your enemies; you don’t want to be easily fooled [and] start following
the wrong car with your drone.”

A selection of fooling images, and what an AI sees when it looks at them.
These scenarios are hypothetical, but perfectly viable if we continue down our current path
of AI development. “It’s a big problem, yes,” Clune says, “and I think it’s a problem the
research community needs to solve.”
The challenge of defending from adversarial attacks is twofold: not only are we unsure how
to effectively counter existing attacks, but we keep discovering more effective attack
variations. The fooling images described by Clune and his co-authors, Jason Yosinski and
Anh Nguyen, are easily spotted by humans. They look like optical illusions or early web art,
| Image by Jeff Clune, Jason Yosinski, Anh Nguyen

all blocky color and overlapping patterns, but there are far more subtle approaches to be
used.
One type of adversarial image —
referred to by researchers as a
“perturbation” — is all but invisible to the
human eye. It exists as a ripple of pixels
on the surface of a photo, and can be
applied to an image as easily as an Instagram filter. These perturbations were first
described in 2013, and in a 2014 paper titled “Explaining and Harnessing Adversarial
Examples,” researchers demonstrated how flexible they were. That pixely shimmer is
capable of fooling a whole range of different classifiers, even ones it hasn’t been trained to
counter. A recently revised study named “Universal Adversarial Perturbations” made this
feature explicit by successfully testing the perturbations against a number of different neural
nets — exciting a lot of researchers last month.
On the left is the original image; in the middle, the perturbation; and on the right, the final, perturbed image.
PERTURBATIONS CAN BE APPLIED TO
PHOTOS AS EASILY AS INSTAGRAM
FILTERS
| Image by Ian
Goodfellow, Jonathon Shlens, and Christian Szegedy

Using fooling images to hack AI systems does have its limitations: first, it takes more time to
craft scrambled images in such a way that an AI system thinks it’s seeing a specific image,
rather than making a random mistake. Second, you often — but not always — need access
to the internal code of the system you’re trying to manipulate in order to generate the
perturbation in the first place. And third, attacks aren’t consistently effective. As shown in
“Universal Adversarial Perturbations,” what fools one neural network 90 percent of the time,
may only have a success rate of 50 or 60 percent on a different network. (That said, even a
50 percent error rate could be catastrophic if the classifier in question is guiding a self-
driving semi truck.)
To better defend AI against fooling images, engineers subject them to “adversarial training.”
This involves feeding a classifier adversarial images so it can identify and ignore them, like
a bouncer learning the mugshots of people banned from a bar. Unfortunately, as Nicolas
Papernot, a graduate student at Pennsylvania State University who’s written a number of
papers on adversarial attacks, explains, even this sort of training is weak against
“computationally intensive strategies” (i.e, throw enough images at the system and it’ll
eventually fail).

A number of images with perturbations applied to them, captioned with what the AI sees.
To add to the difficulty, it’s not always clear why certain attacks work or fail. One explanation
is that adversarial images take advantage of a feature found in many AI systems known as
“decision boundaries.” These boundaries are the invisible rules that dictate how a system
can tell the difference between, say, a lion and a leopard. A very simple AI program that
spends all its time identifying just these two animals would eventually create a mental map.
Think of it as an X-Y plane: in the top right it puts all the leopards it’s ever seen, and in the
| Image by Seyed-Mohsen Moosavi-
Dezfooli, Alhussein Fawzi, Omar Fawzi, Pascal Frossard

bottom left, the lions. The line dividing these two sectors — the border at which lion
becomes leopard or leopard a lion — is known as the decision boundary.
The problem with the decision boundary approach to classification, says Clune, is that it’s
too absolute, too arbitrary. “All you’re doing with these networks is training them to draw
lines between clusters of data rather than deeply modeling what it is to be leopard or a
lion.” Systems like these can be manipulated in all sorts of ways by a determined adversary.
To fool the lion-leopard analyzer, you could take an image of a lion and push its features to
grotesque extremes, but still have it register as a normal lion: give it claws like digging
equipment, paws the size of school buses, and a mane that burns like the Sun. To a human
it’s unrecognizable, but to an AI checking its decision boundary, it’s just an extremely liony
lion.
As far as we know, adversarial images
have never been used to cause real-
world harm. But Ian Goodfellow, a
research scientist at Google Brain who
co-authored “Explaining and Harnessing Adversarial Examples,” says they’re not being
ignored. “The research community in general, and especially Google, take this issue
seriously,” says Goodfellow. “And we're working hard to develop better defenses.” A number
of groups, like the Elon Musk-funded OpenAI, are currently conducting or soliciting
research on adversarial attacks. The conclusion so far is that there is no silver bullet, but
researchers disagree on how much of a threat these attacks are in the real world. There are
already plenty of ways to hack self-driving cars, for example, that don’t rely on calculating
complex perturbations.
Papernot says such a widespread weakness in our AI systems isn’t a big surprise —
classifiers are trained to “have good average performance, but not necessarily worst-case
performance — which is typically what is sought after from a security perspective.” That is
“WE'RE WORKING HARD TO
DEVELOP BETTER DEFENSES.”

to say, researchers are less worried about the times the system fails catastrophically than
how well it performs on average. One way of dealing with dodgy decision boundaries,
suggests Clune, is simply to make image classifiers that more readily suggest they don’t
know what something is, as opposed to always trying to fit data into one category or
another.
Meanwhile, adversarial attacks also invite deeper, more conceptual speculation. The fact
that the same fooling images can scramble the “minds” of AI systems developed
independently by Google, Mobileye, or Facebook, reveals weaknesses that are apparently
endemic to contemporary AI as a whole.
“It’s like all these different networks are sitting around saying why don’t these silly humans
recognize that this static is actually a starfish,” says Clune. “That is profoundly interesting
and mysterious; that all these networks are agreeing that these crazy and non-natural
images are actually of the same type. That level of convergence is really surprising people.”
For Clune’s colleague, Jason Yosinski,
the research on fooling images points to
an unlikely similarity between artificial
intelligence and intelligence developed
by nature. He noted that the same category errors made by AI and their decision
boundaries also exists in the world of zoology, where animals are tricked by what scientists
call “supernormal stimuli.”
These stimuli are artificial, exaggerated versions of qualities found in nature that are so
enticing to animals that they override their natural instincts. This behavior was first
observed around the 1950s, when researchers used it to make birds ignore their own eggs
in favor of fakes with brighter colors, or to get red-bellied stickleback fish to fight pieces of
trash as if they were rival males. The fish would fight trash, so long as it had a big red belly
painted on it. Some people have suggested human addictions, like fast food and
“THAT IS PROFOUNDLY INTERESTING
AND MYSTERIOUS.”

pornography, are also examples of supernormal stimuli. In that light, one could say that the
mistakes AIs are making are only natural. Unfortunately, we need them to be better than
that. ■
View all stories in Tech
Apple is collecting
donations for Harvey
storm relief through
iTunes
A stuﬀed bunny phone
case seems like a good
idea, but it’s not very
practical
Hackers are using the
promise of Game of
Thrones spoilers to
spread malware
AD
APPLE CIRCUIT BREAKER ENTERTAINMENT

�ݺ�ߣ

Magic ai these are the optical illusions that trick, fool, and flummox computers the verge

More Related Content

Magic ai these are the optical illusions that trick, fool, and flummox computers the verge