Deep neural networks are becoming increasingly popular in Internet of Things (IoT) applications. Their capabilities of fusing multiple sensor inputs and extracting temporal relationships can enhance intelligence in a wide range of applications. However, one key problem is the missing of adaptation to heterogeneous on-device sensors. These low-end sensors on IoT devices possess different accuracies, granularities, and amounts of information, whose sensing qualities are heterogeneous and vary over time. The existing deep learning frameworks for IoT applications usually treat every sensor input equally over time or increase model capacity in an ad-hoc manner, lacking the ability to identify and exploit the sensor heterogeneities. In this work, we propose SADeepSense, a deep learning framework that can automatically balance the contributions of multiple sensor inputs over time by exploiting their sensing qualities. SADeepSense makes two key contributions. First, SADeepSense employs the self-attention mechanism to learn the correlations among different sensors over time with no additional supervision. The correlations are then applied to infer the sensing qualities and to reassign model concentrations in multiple sensors over time. Second, instead of directly learning the sensing qualities and contributions, SADeepSense generates the residual concentrations that are deviated from the equal contributions, which helps to stabilize the training process. We demonstrate the effectiveness of SADeepSense with two representative IoT sensing tasks: heterogeneous human activity recognition with motion sensors and gesture recognition with the wireless signal. SADeepSense consistently outperforms the state-of-the-art methods by a clear margin. In addition, we show that SADeepSense only imposes little additional resource-consumption burden on embedded devices compared to the corresponding state-of-the-art framework.