Three Paper Thursday: Subverting Neural Networks via Adversarial Reprogramming

This is a guest post by Alex Shepherd.

Five years after Szegedy et al. demonstrated the capacity for neural networks to be fooled by crafted inputs containing adversarial perturbations, Elsayed et al. introduced adversarial reprogramming as a novel attack class for adversarial machine learning. Their findings demonstrated the capacity for neural networks to be reprogrammed to perform tasks outside of their original scope via crafted adversarial inputs, creating a new field of inquiry for the fields of AI and cybersecurity.

Their discovery raised important questions regarding the topic of trustworthy AI, such as what the unintended limits of functionality are in machine learning models and whether the complexity of their architectures can be advantageous to an attacker. For this Three Paper Thursday, we explore the three most eminent papers concerning this emerging threat in the field of adversarial machine learning.

Adversarial Reprogramming of Neural Networks, Gamaleldin F. Elsayed, Ian Goodfellow, and Jascha Sohl-Dickstein, International Conference on Learning Representations, 2018.

In their seminal paper, Elsayed et al. demonstrated their proof-of-concept for adversarial reprogramming by successfully repurposing six pre-trained ImageNet classifiers to perform three alternate tasks via crafted inputs containing adversarial programs. Their threat model considered an attacker with white-box access to the target models, whose objective was to subvert the models by repurposing them to perform tasks they were not originally intended to do. For the purposes of their hypothesis testing, adversarial tasks included counting squares and classifying MNIST digits and CIFAR-10 images.

Elsayed et al.’s method utilises adversarial inputs crafted from the ImageNet dataset, which contain a trainable adversarial program and designated output label mapping. The authors note that, unlike many adversarial perturbations, their adversarial program is not specific to a single image and was applied to all images in their experiments.
Significantly, the architecture and parameters of target models were unmodified, in contrast to the closely-related field of transfer learning. The distinction between transfer learning and adversarial reprogramming is addressed by the authors in the paper and elaborated further in an OpenReview response. Elsayed et al. observe adversarial reprogramming as an example of DNNs behaving similarly to weird machines; however, they frame their method as a form of parasitic computing.

Elsayed et al.’s findings also demonstrate the potential for adversarial reprogramming samples to avoid detection. One experiment showed that adversarial programs concealed inside images from the ImageNet dataset could be nearly imperceptible. Further highlighting the risk presented by adversarial reprogramming, the authors defended the simplicity of their method in their OpenReview response, claiming “one can perform this attack with ease”. While the authors did not release their code, there are several implementations available on GitHub.

Adversarial Reprogramming of Text Classification Neural Networks, Paarth Neekhara, Shehzeen Hussain, Shlomo Dubnov, and Farinaz Koushanfar, in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019.

The findings of Neekhara et al. confirmed Elsayed et al.’s prediction of adversarial reprogramming as a cross-domain threat by successfully repurposing text classification models, including both RNNs and CNNs. Building on the foundational work of Elsayed et al., the authors demonstrated the effectiveness of adversarial reprogramming in both white-box and black-box settings, reprogramming word-level classifiers as character-level classifiers and vice versa.

The authors demonstrated the susceptibility of targeted models to adversarial reprogramming in both white-box and black-box settings, using the Gumbel-Softmax trick for white-box training of the adversarial program and a REINFORCE-based optimisation algorithm for black-box training. Their findings showed the white-box approach outperformed the black-box approach in all of their experiments. Reinforcement learning methods for sequence generation tasks were identified as a potential avenue for improving black-box training of adversarial programs.

Transfer learning without knowing: Reprogramming black-box machine learning models with scarce data and limited resources, Yun-Yun Tsai, Pin-Yu Chen, and Tsung-Yi Ho, in International Conference on Machine Learning, 2020.

In this paper, Tsai et al. introduced black-box adversarial reprogramming (BAR) as a novel method to repurpose black-box models for transfer learning, using only their input-output responses. The authors showed how the feature extraction capabilities of target models could be successfully leveraged via zeroth-order optimisation on iterative input-output responses to enable transfer learning in the target models. Multi-label mapping of source and target-domain labels was used to improve the performance of their method.

Tsai et al. demonstrated the efficacy of their approach by successfully repurposing three pre-trained ImageNet classifiers to perform medical image classification tasks, including Autistic Spectrum Disorder classification, diabetic retinopathy detection and melanoma detection.

Confirming Elsayed et al.’s prediction of risks for service providers, the authors also demonstrated the vulnerability of image classification APIs to adversarial reprogramming by successfully repurposing two cloud-based MLaaS toolsets to perform medical imaging tasks. Significantly, the authors remarked that the cost of reprogramming the two APIs was less than 24 USD.

Lessons Learned

On the balance of the findings from these papers, adversarial reprogramming can be characterised as a relatively simple and cost-effective method for attackers seeking to subvert machine learning models across multiple domains. The potential for adversarial programs to successfully avoid detection and be deployed in black-box settings further highlights the risk implications for stakeholders.

Elsayed et al. identify theft of computational resources and violation of the ethical principles of service providers as future challenges presented by adversarial reprogramming, using the hypothetical example of repurposing a virtual assistant as spyware or a spambot. Identified directions for future research include establishing the formal properties and limitations of adversarial reprogramming, and studying potential methods to defend against it.

Leave a Reply

Your email address will not be published. Required fields are marked *