Great progress has been made in computer vision-based localization systems. However, some systems tend to work well only in certain visually feature-rich environments. It is often the case that feature-based matching techniques can have a hard time dealing with scenes with only a few features or a large number of repetitive features. In these situations, computer vision-based localization may fail to estimate camera position or may yield a large localization error. We approach this problem from a systems perspective, where we are required to obtain accurate localization of blind travellers using a smartphones app for localization. In particular, we assume that the environment is already instrumented with Bluetooth low energy (BLE) signals to provide rough proximity information, and we propose to integrate it with visual information to perform efficient structure-from-motion and camera localization. Our multi-model sensing approach can accelerate localization speed and obtain more accuracy in challenging environments when compared to traditional baseline approaches. We also show that our approach can accelerate the time for reconstructing large 3D models. Our framework is released as an open source project. It can be used by different mobile operating systems, enabling the development of navigation applications on mobile platforms.