Machine learning models trained using gradient descent can be forced to make arbitrary misclassifications by an attacker that can influence the items to be classified. The impact of a misclassification varies widely depending on the ML model's purpose and of what systems it is a part.
This vulnerability results from using gradient descent to determine classification of inputs via a neural network. As such, it is a vulnerability in the algorithm. In plain terms, this means that the currently-standard usage of this type of machine learning algorithm can always be fooled or manipulated if the adversary can interact with it. What kind or amount of interaction an adversary needs is not always clear, and some attacks can be successful with only minor or indirect interaction. However, in general more access or more interaction options reduce the effort required to fool the machine learning algorithm. If the adversary has information about some part of the machine learning process (training data, training results, model, or operational/testing data), then with sufficient effort the adversary can craft an input that will fool the machine learning tool to yield a result of the adversary's choosing. In instantiations of this vulnerability that we are currently aware of, "sufficient effort" ranges widely, between seconds and weeks of commodity compute time.
Within the taxonomy by Kumar et al., such misclassifications are either perturbation attacks or adversarial examples in the physical domain. There are other kinds of failures or attacks related to ML systems, and other ML systems besides those trained via gradient descent. However, this note is restricted to this specific algorithm vulnerability. Formally, the vulnerability is defined for the following case of classification.
In the case where f(θ,x) is a neural network, finding the global minimizer θ* is often computationally intractable. Instead, various methods are used to find θ^, which is a "good enough" approximation. We refer to f(θ^, .) as the fitted neural network.
If stochastic gradient descent is used to find θ^ for the broadly defined set of f(θ,x) representing neural networks, then the fitted neural network f(θ^, .) is vulnerable to adversarial manipulation.
An attacker can interfere with a system which uses gradient descent to change system behavior. As an algorithm vulnerability, this flaw has a wide-ranging but difficult-to-fully-describe impact. The precise impact will vary with the application of the ML system. We provide three illustrative examples; these should not be considered exhaustive.
The CERT/CC is currently unaware of a specific practical solution to this problem. To defend generally, do both of:
2. Standard defense in depth. A machine learning tool is not different from other software in this regard. Any tool should be deployed in an ecosystem that supports and defends it from adversarial manipulation. For machine learning tools specifically designed to serve a cybersecurity purpose, this is particularly important, as they are exposed to adversarial input as part of their designed tasking. See CMU/SEI-2019-TR-005 for more information on evaluating machine learning tools for cybersecurity.
Other proposed solutions, which rely on either pre-processing the data or simply obfuscating the gradient of the loss, do not work when your adversary is aware that you are attempting those mitigations.
This advisory information is generic and does not describe any specific instances of this type of problem, so no vendors have been notified or listed here. There are neither CVE IDs nor a CVSS score.
See Papernot et al. (2016) Towards the Science of Security and Privacy in Machine Learning or Biggio and Roli (2018) Wild patterns: Ten years after the rise of adversarial machine learning for a brief history.
This document was written by Allen Householder, Jonathan M. Spring, Nathan VanHoudnos, and Oren Wright.
|Date First Published:||2020-03-19|
|Date Last Updated:||2020-03-20 21:17 UTC|