Adaptive Verifiable Training Using Pairwise Class Similarity
Verifiable training techniques have shown success in training neural networks that are provably robust to a certain amount of noise. However, these techniques, which attempt to enforce a single robustness constraint, have scaled poorly with dataset complexity. Classes that are similar (i.e.,close in the featurespace) increase the difficulty of learning a robust model, re-flected by the reduction of the model‚Äôs clean performance.For example, on CIFAR10, a non-robust LeNet model hasa 21.63% error rate while a model trained using CROWN-IBP, a state-of-the-art verifiable training technique, and ro-bustness region of 8/255, increases the error rate to 57.10%.Upon closer examination, we note that when labeling visually similar classes, the model‚Äôs error rate is as high as 72%.Thus, while it may be desirable to train a model to be robust for a large robustness region, pairwise class similarities limit the potential gains. Furthermore, consideration must be made regarding the relative cost of mistaking one class for another.In security or safety critical tasks, similar classes are likely to belong to the same group, and thus are equally sensitive.In this work, we propose a new approach that accounts for inter-class similarity and enables verifiable training to create a robust model with respect to multiple adversarial constraints.First, we cluster similar classes using agglomerate clustering, used in prior work to provide explainability regarding a neural network‚Äôs decision. Next, we propose two training methods:(1) the Inter-Group Robustness Prioritization method, which optimizes a customized loss term and creates a single model with multiple robustness guarantees and; (2) the neural deci-sion tree method, which trains multiple classifiers with differ-ent robustness guarantees and combines them in a decision tree architecture. Our experiments on Fashion-MNIST andCIFAR10 demonstrate that our approach, which prioritizes the robustness for dissimilar groups, improves clean performance by up to 10.93% and 28.13% respectively. Furthermore, on CIFAR100, our approach reduces clean the error rate by 26.32%.