Abstract
For many automated navigation applications, the underlying localization algorithm must be able to continuously produce both accurate and stable results by using a spectrum of redundant sensing technologies. To this end, various sensors have been used for localization, such as Wi-Fi, Bluetooth, GPS, LiDAR and cameras. In particular, a class of vision-based localization techniques using Structure from Motion (SfM) has been shown to produce very accurate position estimates in the real-world with moderate assumptions about the motion of the camera and the amount of visual texture in the environment. However, when these assumptions are violated, SfM techniques can fail catastrophically (i.e., cannot generate any estimate). Recently, a deep convolutional neural network (CNN) has been applied to images to robustly regress 6-DOF camera poses at the cost of lower accuracy than SfM. In this work, we propose improving image-based localization accuracy of deep CNN by combining Bluetooth radio-wave signal readings. In our experiments, we show that our proposed dual-stream CNN can robustly regress 6-DOF poses from images and radiowave signals better than one sensing modality alone. More importantly, we show that when both modes are used, the localization accuracy of the proposed deep CNN is comparable to that of SfM and significantly more robust than SfM.