This is a guest post by Alex Shepherd.
There is a growing body of research literature concerning the potential threat of physical-world adversarial attacks against machine-vision models. By applying adversarial perturbations to physical objects, machine-vision models may be vulnerable to images containing these perturbed objects, resulting in an increased risk of misclassification. The potential impacts could be significant and have been identified as risk areas for autonomous vehicles and military UAVs.
For this Three Paper Thursday, we examine the following papers exploring the potential threat of physical-world adversarial attacks, with a focus on the impact for autonomous vehicles.
Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world, arXiv:1607.02533 (2016)
In this seminal paper, Kurakin et al. report their findings of an experiment conducted using adversarial images taken from a phone camera as input for a pre-trained ImageNet Inceptionv3 image classification model. Methodology was based on a white-box threat model, with adversarial images crafted from the ImageNet validation dataset using the Inceptionv3 model.
All 50,000 images from the validation dataset were used to generate adversarial examples using different methods and values. The experiment was conducted using printouts containing pairs of clean and adversarial images, with QR codes to assist automatic cropping. A photo of each printout was taken with a phone camera followed by cropping of each image from the full photo page. Images were then fed to the classifier.
Classification accuracy for all clean versions of the ImageNet validation images was computed as a baseline, in addition to computation of classification accuracy for all adversarial versions. The findings showed a significant percentage of the crafted adversarial examples retained their adversarial property when printed and were misclassified, demonstrating the potential of adversarial attacks in the physical world.
Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song, Robust physical-world attacks on deep learning visual classification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp 1625-1634.
Perhaps the most well-known paper regarding physical-world adversarial attacks, Eykholt et al. conducted an experiment by applying physical perturbations to traffic stop signs in an attempt to fool two CNN classifiers trained on publicly-available traffic-sign datasets.
They generated their adversarial examples by using an image of a target stop sign as input for the Robust Physical Perturbations (RP2) attack algorithm. RP2 models physical dynamics such as distance and angles by sampling from a distribution and uses a mask to project computed perturbations to a shape that resembles graffiti. The adversary would then print out the perturbations as stickers or a poster, and attach them to the target stop sign.
The two-phase experiment included a lab test classifying objects from stationary fixed positions, and field tests capturing objects with a phone camera mounted on a moving car. Adversarial examples included both sticker and poster attacks. Findings show the targeted-attack success rate varied depending on the attack method, but was generally high; poster attacks proved the most successful with 100% targeted-attack success rate in both lab and field tests.
There is an issue regarding reproducibility as both training datasets were altered from their original state without explanation of specific changes. This may be considered an issue regarding the credibility of the experiment’s findings.
Jiajun Lu, Hussein Sibai, Evan Fabry, and David Forsyth, Standard detectors aren’t (currently) fooled by physical adversarial stop signs, arXiv:1710.03337 (2017).
In response to the above paper, Lu et al. tested the effectiveness of the adversarial examples generated by Ekyholt et al. using common off-the-shelf object detection models (pre-trained MS-COCO YOLO9000 and Faster R-CNN). This experiment was conducted in cooperation with Eykholt et al.
Images and videos of physical-world adversarial examples taken at various distances and angles were provided by Ekholt et al. and fed to both detectors. Findings show that the adversarial examples have a targeted-attack success rate of zero percent against both YOLO9000 and Faster R-CNN at various distances and angles. Issues in Eykholt et al.’s methodology are also highlighted and the conclusion is reached that currently there is no evidence to support a claim that physical-world adversarial examples can fool an object-detection model. They do state that this does not imply object detection models are immune to physical-world adversarial examples, and that future work to either create such examples or disprove their existence is required.
Drawing from the findings of all three papers, the potential for physical-world adversarial examples to exist is apparent. While Eykholt et al.’s adversarial examples were successful against specialist image classification models in specific conditions and environments, more credible evidence is required to support claims that these attacks pose an imminent threat. Future research may focus on development of robust adversarial attacks that are capable of fooling a broader array of off-the-shelf object detection models in a diverse range of conditions and environments. Expanding experiments to include ensembles would also be of interest to discover if the transferability of adversarial examples extends to the physical world.