Criteria for Learning without Forgetting in Artificial Neural Networks
Abstract
Task progressive learning without catastrophic forgetting using artificial neural networks (ANNs) has demonstrated viability and promise. Due to the large number of ANN hyper-parameters, a model already trained over a group of tasks can further learn a new task without forgetting the previous tasks. Several algorithms have been proposed for progressive learning, including synaptic weight consolidation, ensemble, rehearsal, and sparse coding. One major problem with such methods is that they fail to detect the congestion in the ANN shared parameter space to indicate the saturation of the existing network and its inability to add new tasks using progressive learning. The detection of such saturation is especially needed to avoid the catastrophic forgetting of old trained task and the concurrent loss in their generalization quality. In this paper, we address such problem and propose a methodology for ANN congestion detection. The methodology is based on computing the Hessian of the ANN loss function at the optimal weights for a group of previously learned tasks. Since the Hessian calculation is compute-intensive, we provide a set of approximation heuristics that are computationally efficient. The algorithms are implemented and analyzed in the context of two cloud network security datasets, namely, UNSW-NB15 and AWID, as well as the MNIST image recognition dataset. Results show that the proposed congestion metrics give an accurate assessment of the ANN progressive learning capacity for these various datasets. Furthermore, the results show that models that have more features exhibit higher congestion thresholds and are therefore more amenable to progressive learning.