Robust Machine Learning

Improving the natural-robust accuracy tradeoff for adversary-resilient deep learning.

MERL Researchers: Ye Wang, Toshiaki Koike-Akino, Matthew Brand, Anoop Cherian, Kuan-Chuan Peng, Jonathan Le Roux, Kieran Parsons.
University Consultants: Prof. Shuchin Aeron (Tufts University), Prof. Pierre Moulin (University of Illinois Urbana-Champaign)
Interns: Adnan Rakin (Arizona State University), Xi Yu (University of Florida), Niklas Smedemark-Margulies (Northeastern University), Tejas Jayashankar (University of Illinois Urbana-Champaign)


Deep learning is widely applied, yet incredibly vulnerable to adversarial examples, i.e., virtually imperceptible perturbations that fool deep neural networks (DNNs). We aim at developing robust machine learning technology: practical defenses that yield deep learning-based systems that are resilient to adversarial examples, through better theoretical understanding of the fragility of conventional DNNs.

We study the problem of learning compact representations for sequential data. To maximize extraction of implicit spatiotemporal cues, we set the problem within the context of contrastive representation learning and to that end propose a novel objective that includes maximizing the optimal transport distance of the data from an adversarial data distribution. To generate the adversarial distribution, we propose a novel framework connecting Wasserstein GANs with a classifier, allowing a principled mechanism for producing good negative distributions for contrastive learning, which is currently a challenging problem. Our results demonstrate competitive performance on the task of human action recognition in video sequences.

Figure 1: Architecture for contrastive representation learning employing adversarial noise generation (implemented via a Wasserstein GAN) and a joint optimal transport and representation learning formulation.

Various adversarial audio attacks have recently been developed to fool automatic speech recognition (ASR) systems. We propose a defense against such attacks based on the uncertainty introduced by dropout in neural networks. We show that our defense is able to detect attacks created through optimized perturbations and frequency masking on a state-of-the-art end-to-end ASR system. Furthermore, the defense can be made robust against attacks that are immune to noise reduction. We test our defense on Mozilla's CommonVoice dataset, the UrbanSound dataset, and an excerpt of the LibriSpeech dataset, showing that it achieves high detection accuracy in a wide range of scenarios.

Figure 2: Visualization of attack detection feature, mean uncertainty distribution of distances to medoid for original clean audio samples (left) vs adversarial audio samples (right).

Robust machine learning formulations have emerged to address the prevalent vulnerability of DNNs to adversarial examples. Our work draws the connection between optimal robust learning and the privacy-utility tradeoff problem, a generalization of the rate-distortion problem. The saddle point of the game between a robust classifier and an adversarial perturbation can be found via the solution of a maximum conditional entropy problem. This information-theoretic perspective sheds light on the fundamental tradeoff between robustness and clean data performance, which ultimately arises from the geometric structure of the underlying data distribution and perturbation constraints.

Figure 3: The robust machine learning and privacy-utility tradeoff problems are theoretically connected through a minimax equivalence result.

Figure 4: Model loss depends on the attack strength (in terms of perturbation distance) and the strength that the model was designed for, with mismatch leading to suboptimality.

Adversarial examples have recently exposed the severe vulnerability of neural network models. However, most of the existing attacks require some form of target model information (i.e., weights/model inquiry/architecture) to improve the efficacy of the attack. We leverage the information-theoretic connections between robust learning and generalized rate-distortion theory to formulate a universal adversarial example (UAE) generation algorithm. Our algorithm trains an offline adversarial generator to minimize the mutual information between the label and perturbed data. At the inference phase, our UAE method can efficiently generate effective adversarial examples without high computation cost. These adversarial examples in turn allow for developing universal defenses through adversarial training. Our experiments demonstrate promising gains in improving the training efficiency of conventional adversarial training.

Figure 5: Comparison of traditional substitute black-box attack versus our attack approach based on a trained universal adversarial example generator.

Figure 6: Projected gradient descent (PGD) attacks can be accelerated by initializing with our universal adversarial example generator.


Video 1: (ISIT 2021) Robust Machine Learning via Privacy/Rate Distortion Theory
Video 2: [ITW 2021] Towards Universal Adversarial Examples and Defenses

Software Download

Adversarially-Contrastive Optimal Transport (ACOT) for studying the problem of learning compact representations for sequential data that captures its implicit spatio-temporal cues.

MERL Publications

  •  Cherian, A., Aeron, S., "Representation Learning via Adversarially-Contrastive Optimal Transport", International Conference on Machine Learning (ICML), H. Daumé and A. Singh, Eds., July 2020, pp. 10675-10685.
    BibTeX TR2020-093 PDF Software
    • @inproceedings{Cherian2020jul,
    • author = {Cherian, Anoop and Aeron, Shuchin},
    • title = {Representation Learning via Adversarially-Contrastive Optimal Transport},
    • booktitle = {International Conference on Machine Learning (ICML)},
    • year = 2020,
    • editor = {H. Daumé and A. Singh},
    • pages = {10675--10685},
    • month = jul,
    • url = {}
    • }
  •  Jayashankar, T., Le Roux, J., Moulin, P., "Detecting Audio Attacks on ASR Systems with Dropout Uncertainty", Annual Conference of the International Speech Communication Association (Interspeech), DOI: 10.21437/​Interspeech.2020-1846, October 2020, pp. 4671-4675.
    BibTeX TR2020-137 PDF
    • @inproceedings{Jayashankar2020oct,
    • author = {Jayashankar, Tejas and Le Roux, Jonathan and Moulin, Pierre},
    • title = {Detecting Audio Attacks on ASR Systems with Dropout Uncertainty},
    • booktitle = {Annual Conference of the International Speech Communication Association (Interspeech)},
    • year = 2020,
    • pages = {4671--4675},
    • month = oct,
    • doi = {10.21437/Interspeech.2020-1846},
    • issn = {1990-9772},
    • url = {}
    • }
  •  Wang, Y., Aeron, S., Rakin, A.S., Koike-Akino, T., Moulin, P., "Robust Machine Learning via Privacy/Rate-Distortion Theory", IEEE International Symposium on Information Theory (ISIT), DOI: 10.1109/​ISIT45174.2021.9517751, July 2021.
    BibTeX TR2021-082 PDF Video Presentation
    • @inproceedings{Wang2021jul,
    • author = {Wang, Ye and Aeron, Shuchin and Rakin, Adnan S and Koike-Akino, Toshiaki and Moulin, Pierre},
    • title = {Robust Machine Learning via Privacy/Rate-Distortion Theory},
    • booktitle = {IEEE International Symposium on Information Theory (ISIT)},
    • year = 2021,
    • month = jul,
    • publisher = {IEEE},
    • doi = {10.1109/ISIT45174.2021.9517751},
    • isbn = {978-1-5386-8210-4},
    • url = {}
    • }
  •  Rakin, A.S., Wang, Y., Aeron, S., Koike-Akino, T., Moulin, P., Parsons, K., "Towards Universal Adversarial Examples and Defenses", IEEE Information Theory Workshop, DOI: 10.1109/​ITW48936.2021.9611439, October 2021.
    BibTeX TR2021-125 PDF Video
    • @inproceedings{Rakin2021oct,
    • author = {Rakin, Adnan S and Wang, Ye and Aeron, Shuchin and Koike-Akino, Toshiaki and Moulin, Pierre and Parsons, Kieran},
    • title = {Towards Universal Adversarial Examples and Defenses},
    • booktitle = {IEEE Information Theory Workshop},
    • year = 2021,
    • month = oct,
    • publisher = {IEEE},
    • doi = {10.1109/ITW48936.2021.9611439},
    • isbn = {978-1-6654-0312-2},
    • url = {}
    • }