Augmented Reality is increasingly explored as the new medium for two-way remote collaboration applications to guide the participants more effectively and efficiently via visual instructions. As users strive for more natural interaction and automation in augmented reality applications, new visual recognition techniques are needed to enhance the user experience. Although simple object recognition is often used in augmented reality towards this goal, most collaboration tasks are too complex for such recognition algorithms to suffice. In this paper, we propose a fine-grained visual recognition approach for mobile augmented reality, which leverages RGB video frames and sparse depth feature points identified in real-time, as well as camera pose data to detect various visual states of an object. We demonstrate the value of our approach through a mobile application designed for hardware support, which automatically detects the state of an object to present the right set of information in the right context.