Spatio-Temporal anomaly detection by unsupervised learning have applications in a wide range of practical settings. In this paper we present a surveillance system for industrial robots using a monocular camera. We propose a new unsupervised learning method to train a deep feature extractor from unlabeled images. Without any data augmentation, the algorithm co-learns the network parameters on different pseudo-classes simultaneously to create unbiased feature representation. Combining the learned features with a prediction system, we can detect irregularities in high dimensional data feed (e.g. video of a robot performing pick and place task). The results show how the proposed approach can detect previously unseen anomalies in the robot surveillance video. Although the technique is not designed for classification, we show the use of the learned features in a more traditional classification application for CIFAR-10 dataset.