Three Paper Thursday: Attacking Machine Vision Models In Real Life

This is a guest post by Alex Shepherd.

There is a growing body of research literature concerning the potential threat of physical-world adversarial attacks against machine-vision models. By applying adversarial perturbations to physical objects, machine-vision models may be vulnerable to images containing these perturbed objects, resulting in an increased risk of misclassification. The potential impacts could be significant and have been identified as risk areas for autonomous vehicles and military UAVs.

For this Three Paper Thursday, we examine the following papers exploring the potential threat of physical-world adversarial attacks, with a focus on the impact for autonomous vehicles.

Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world, arXiv:1607.02533 (2016)

In this seminal paper, Kurakin et al. report their findings of an experiment conducted using adversarial images taken from a phone camera as input for a pre-trained ImageNet Inceptionv3 image classification model. Methodology was based on a white-box threat model, with adversarial images crafted from the ImageNet validation dataset using the Inceptionv3 model.

All 50,000 images from the validation dataset were used to generate adversarial examples using different methods and values. The experiment was conducted using printouts containing pairs of clean and adversarial images, with QR codes to assist automatic cropping. A photo of each printout was taken with a phone camera followed by cropping of each image from the full photo page. Images were then fed to the classifier.

Classification accuracy for all clean versions of the ImageNet validation images was computed as a baseline, in addition to computation of classification accuracy for all adversarial versions. The findings showed a significant percentage of the crafted adversarial examples retained their adversarial property when printed and were misclassified, demonstrating the potential of adversarial attacks in the physical world.

Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song, Robust physical-world attacks on deep learning visual classification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp 1625-1634.

Perhaps the most well-known paper regarding physical-world adversarial attacks, Eykholt et al. conducted an experiment by applying physical perturbations to traffic stop signs in an attempt to fool two CNN classifiers trained on publicly-available traffic-sign datasets.

They generated their adversarial examples by using an image of a target stop sign as input for the Robust Physical Perturbations (RP2) attack algorithm. RP2 models physical dynamics such as distance and angles by sampling from a distribution and uses a mask to project computed perturbations to a shape that resembles graffiti. The adversary would then print out the perturbations as stickers or a poster, and attach them to the target stop sign.

The two-phase experiment included a lab test classifying objects from stationary fixed positions, and field tests capturing objects with a phone camera mounted on a moving car. Adversarial examples included both sticker and poster attacks. Findings show the targeted-attack success rate varied depending on the attack method, but was generally high; poster attacks proved the most successful with 100% targeted-attack success rate in both lab and field tests.

There is an issue regarding reproducibility as both training datasets were altered from their original state without explanation of specific changes. This may be considered an issue regarding the credibility of the experiment’s findings.

Jiajun Lu, Hussein Sibai, Evan Fabry, and David Forsyth, Standard detectors aren’t (currently) fooled by physical adversarial stop signs, arXiv:1710.03337 (2017).

In response to the above paper, Lu et al. tested the effectiveness of the adversarial examples generated by Ekyholt et al. using common off-the-shelf object detection models (pre-trained MS-COCO YOLO9000 and Faster R-CNN). This experiment was conducted in cooperation with Eykholt et al.

Images and videos of physical-world adversarial examples taken at various distances and angles were provided by Ekholt et al. and fed to both detectors. Findings show that the adversarial examples have a targeted-attack success rate of zero percent against both YOLO9000 and Faster R-CNN at various distances and angles. Issues in Eykholt et al.’s methodology are also highlighted and the conclusion is reached that currently there is no evidence to support a claim that physical-world adversarial examples can fool an object-detection model. They do state that this does not imply object detection models are immune to physical-world adversarial examples, and that future work to either create such examples or disprove their existence is required.

Lessons Learned

Drawing from the findings of all three papers, the potential for physical-world adversarial examples to exist is apparent. While Eykholt et al.’s adversarial examples were successful against specialist image classification models in specific conditions and environments, more credible evidence is required to support claims that these attacks pose an imminent threat. Future research may focus on development of robust adversarial attacks that are capable of fooling a broader array of off-the-shelf object detection models in a diverse range of conditions and environments. Expanding experiments to include ensembles would also be of interest to discover if the transferability of adversarial examples extends to the physical world.

2 thoughts on “Three Paper Thursday: Attacking Machine Vision Models In Real Life

  1. Thanks for discussing our work here! However, we wanted to clarify several fundamental misunderstandings in the discussion:

    1) For reproducibility, we release the checkpoints for the models we trained, along with the attack code here: https://github.com/evtimovi/robust_physical_perturbations With this information, anybody can recreate the attacks. Please, note that the training set is not needed for reproducing any of our results. You can verify that the trained models perform in line with any other road sign classification model available at the time on the standard test sets for road sign classification.

    2) We never claimed these objects were adversarial to object detectors and this is a misunderstanding we cleared up with the authors of “Standard detectors aren’t (currently) fooled by physical adversarial stop signs” Our claims are clearly evident from the title of the paper (“On Deep Learning Visual Classification”), the abstract (“misclassifications”), and the text throughout. In this paper, we demonstrate physical objects can cause errors in classification – not detection. There is no reason to expect that our attack should work on detectors.

    3) Nonetheless, we later demonstrated that physical adversarial objects can also fool detectors. We did so in a subsequent paper published at the 12th USENIX Workshop on Offensive Technologies (WOOT 2018) and available here: Physical Adversarial Examples for Object Detectors https://www.usenix.org/system/files/conference/woot18/woot18-paper-eykholt.pdf

    Thank you for your thoughtful discussion! We look forward to seeing these confusions cleared up, for the benefit of your readers, who rely on a trusted and well-rounded source of scientific information on computer security.

    1. Thanks for raising these points Ivan and it’s great to have you joining the discussion!

      To address your points:

      1) This information was not in the paper so I was unaware of it, but it’s great that you’ve made this publicly available.

      2) To clarify, it was not my intention to imply this was your claim, only to highlight the findings from Lu et al. (specifically in reference to the Discussions section of their paper). Re-reading what I wrote, I now recognise the sentence containing “the conclusion is reached that currently there is no evidence to support a claim that physical-world adversarial examples can fool an object-detection model” may be open to misinterpretation and I should have framed it more clearly.

      3) I was unaware of this paper, but thank you for sharing it and I look forward to reading it with great interest!

Leave a Reply

Your email address will not be published. Required fields are marked *