In Computer Vision an adversarial sample is defined as a perturbed image
The most common norm taken is
Of course one can use misclassification, but since the goal of the adversarial attacks is to stay unnoticeable for humans, it would be very convenient to have a similarity metric that aligns with human judgment. Here, we discuss usage of so-called perceptual similarity metrics for comparing adversarial attacks, and show that they are not suitable because they can be fooled by adversarial samples.
1 Perceptual Similarity Metrics
As motivation for the metric we use, we quickly recall first how three classical similarity metrics are built. The first, SSIM aggregates rough pixel information globally. The second, MSSIM, does so in patches to allow for some locality. The third, FSIM, looks at engineered low-level features. After these, a natural step presents itself: to use learned features, as does LPIPS, or, going a bit further, an ensemble of them, in E-LPIPS.
1.1 (Mean) Structural Similarity Index (SSIM)
One of the most popular perceptual similarity metrics is the Structural Similarity Index (SSIM) [Wan04I]. Because the human visual system (HVS) attends to the structural information in a scene, SSIM separates the task of similarity measurement into three comparisons: luminance, contrast, and structure.
The means of pixel values
where
1.2 Features Similarity Index Matrix (FSIM)
[Zha11F] argue that SSIM has the deficiency that all pixels have the same importance, but the HVS attributes different importance to different regions of an image. The authors argue that the HVS mostly pays attention to low-level features, such as edges or other zero crossings, and suggest comparing two sets of features:
Phase Congruency (PC) PC is a measure of the alignment of phase across different scales of an image and is a feature that is considered to be invariant to changes in brightness or contrast. It is believed to be a good approximation to how the HVS detects features in an image because the latter is more sensitive to structure than to amount of light.2 2 According to the authors: Based on the physiological and psychophysical evidences, the PC theory provides a simple but biologically plausible model of how mammalian visual systems detect and identify features in an image. PC can be considered as a dimensionless measure for the significance of a local structure. [Zha11F]
The computation of phase congruency involves several steps (for details, see [Zha11F]). First a decomposition using wavelets or filter banks, then pixel-wise extraction of phase information, followed by a comparison of the phases.
Gradient magnitude (GM) Image gradient computation is a cornerstone of
image processing, e.g. for edge detection (high luminance gradients). The GM of
image
For color images, PC and GM features are computed from their luminance
channels. Given images
where
1.3 Learned Perceptual Image Patch Similarity (LPIPS)
[Zha18U] choose a different strategy for computing similarity. They argue that internal activations of neural networks trained for high-level classification tasks, even across network architectures and without further calibration, correspond to human perceptual judgments.
To get the similarity metric between two images
The weights
After tuning LPIPS, the authors evaluate several similarity metrics (including
SSIM and FSIM) on a separate dataset and compute agreement of the algorithm
with all of the judgments. To aggregate information, if
Figure 1. Quantitative comparison. Authors show a quantitative comparison across metrics on the test sets. (Left) Results averaged across traditional and CNN-based distortions. (Right) Results averaged across 4 algorithms.
In Figure 1, we see that LPIPS has a higher 2AFC score
than the SSIM or FSIM measures or the
However, a problem with the LPIPS distance is that since it uses activations of Deep Neural Networks, it is prone to adversarial attacks itself. Hence, [Ket19E] introduced the E-LPIPS or Ensembled LPIPS. They transform both input images using simple random transformations and define:
where the expectation is taken over a family of image perturbations. The authors claim that the E-LPIPS model is more robust but has the same prediction power.
2 Problem Setting
As previously discussed, our main goal is to determine whether perceptual metrics can be used to compare adversarial attacks in image classification. We focus on LPIPS for its popularity and because it more closely aligns with human ratings than other ones (Figure 1).
[Ket19E] showed that LPIPS is sensitive to adversarial samples, but used attacks specifically targetting it. If generic adversarial attacks for image classification transferred poorly to an LPIPS network, then LPIPS could be considered for perceptual evaluation of adversarial attacks.
Consider a target network
is large enough. To compute this, we take
That is, the classification attack transfers to LPIPS successfully when LPIPS
believes the adversarial samples
3 Experimental setup
To test out the hypothesis we design the experiment following the description
above. We construct adversarial samples
We work with the Imagenette dataset [Fas22I], which is a subsample of 10 classes from the
Imagenet dataset [Den09I]. For the attacks we
consider a pre-trained VGG-11 architrecture [Sim15V] and craft adversarial samples with Projected
Gradient Descent (PGD) [Kur17A],
Wasserstein Attack using Frank Wolfe method with Dual Linear Minimization
Oracle (FW + Dual LMO) [Wu20S], and Improved
Image Wassertain attack (IIW) [Hu20I]. We use
multiple tolerance radiuses
To create the fake adversaries we take perturbations at random from all those used for crafting the adversarial samples and add them to the source images:
where
The source-code9 9 github.com/bezirganyan/adversarial_arena is open-sourced and a demo is available for interactively exploring the results.10 10 bezirganyan-aai-adversarial-arena-demo-main-58mbz2.streamlitapp.com
4 Results
4.1 Measuring effectiveness of attack transfers
As a baseline, we explore if samples built against the VGG-16 network transfer
to LPIPS with a VGG-16 backend. Taking
Figure 2. LPIPS score distributions for PGD attack
with
As expected, we see that LPIPS scores on adversarial samples tend to be higher than on fake samples, and quite unsurprisingly conclude that LPIPS metrics with a VGG-16 backend cannot be used for comparing adversarial attacks against VGG-16 networks, as the LPIPS metric itself is affected by adversarial perturbations.11 11 For adversarial samples we only take into account those which were succesful attacks, and for the fake adversaries only perturbed images which did not misclassify (successful fake attacks). Hence, when the cardinality of either of the sets is low, we will have lower number of samples, and hence, bigger bins for the histograms.
However, what if we use LPIPS with other backend networks? From the transferability property of adversarial attacks to different networks we hypothesize that these metrics will be affected as well. To test this, we run the same experiment using AlexNet [Kri12I] and SqueezeNet [Ian16S] as backends for LPIPS. Whether or not the classification of an adversary is correct is determined by these target networks.
In Figure 3 we have the LPIPS scores with an AlexNet backbone. We can see that the distributions are much closer and have a much smaller Wasserstein distance of 0.005, compared to the 0.181 with the VGG backend. In Figure 4 we observe a similar pattern.
From these plots we hypothetise that our initial assumption was incorrect, and
the
Figure 3. LPIPS score distributions for
PGD attack with
Figure 4. LPIPS score
distributions for PGD attack with
4.2 Can we use LPIPS to compare Wasserstein and attacks?
To make our experiments with Wasserstein attacks comparable, we use an
Figure 5. LPIPS score distributions for FW+Dual LMO
attack with
In Figures 5, 6, 7, and 8 we can clearly see that Wassertein attacks on VGG-16 transfer succesfully to other networks and hence affect the LPIPS score. This indicates that the LPIPS perceptual metric cannot be used for comparing Wasserstein adversarial attacks with other adversarial attack methods.
Figure 6. LPIPS scores distributions for
FW+Dual LMO attack with
Figure 7. LPIPS scores distributions for IIW
attack with
Figure 8. LPIPS scores distributions for
IIW attack with
5 Conclusions
In this article we investigated whether adversarial attacks can be compared using LPIPS. We crafted adversarial samples against the VGG-16 network, and also crafted non-adversarial samples with similar perturbations, which are classified correctly. Comparing the LPIPS score distributions between the source images and the adversarial samples, and source images and the perturbed but not adversarial images, we observed that the distributions are significantly different from each another, which suggests that although the perturbations were similar, the adversarial perturbation changed the activation values in a way that makes LPIPS not suitable for comparing adversarial attacks, since it is also being fooled. Thefore, human surveys are still necessary for now.
We also saw that while for PGD attacks with
Appendix A: Comparing FW+Dual LMO and IIW attacks
As we mentioned earlier, to the best of our knowledge there is no comparison
between the FW+Dual LMO [Wu20S] and IIW
[Hu20I] attacks, although they both are
improvements of the original Wasserstein Attack [Won20W]. We report in passing a quick comparison
of duration and misclassification rate as a function of budget
Figure 9. Misclassification rates and
Time (sec/batch) of Wasserstein attacks as a function of budget
In Figure 9 we see that FW+Dual LMO has much better misclassifcation rate, but IIW finds adversarial samples much faster for larger budgets, while the difference of durations is comparably very small for FW+Dual LMO attacks. On average the IIW attack is almost twice as fast (329 sec/batch) than FW+Dual LMO (902 sec/batch). From Figures 7, 8, 9, and 10, we can say that IIW attempts smaller changes to the activations, but also achieves lower misclassification rate, while FW+Dual LMO makes bigger changes and achieves a higher misclassification rate.
Figure 10. LPIPS scores distributions for
IIW attack with
Appendix B: Bi-modality in the distance distributions
An intriguing detail in Figures 7
and 8 is that the score distributions
on adversarial images (blue) have two modes: one very close to 0, and another
one beyond the mode of the fake adversaries (orange). To investigate this
behaviour further, we check the score distributions of the IIW attack under
different values of
Figure 11. Fitted
modes of the adversarial (blue) and perturbed (orange) distributions (using a
Gaussian Mixture Model, and a single Gaussian respectively), and
misclassification rate (green) for the IIW attack and VGG as LPIPS backend.
While at some point the misclassification rate plateaus, the mode of the
adversarial distribution keeps increasing.
This
indicates that under low values of
Another check to substantiate our reasoning behind the two modes is to visually
compare images from each one. In Figures 12
and 13 we see samples from both parts of
the bi-modal adversarial distribution. We can see that the ones which have
LPIPS distance from source images closer to the smaller mode (i.e.

Figure 12. Adversarial
samples and their source images crafted with IIW attack with

Figure 13. Adversarial
samples and their source images crafted with IIW attack with