Depression affects more than 300 million people around the world and is the leading cause of disability in USA for individuals ages from 15 to 44. The damage of it compares to most common diseases like cancer, diabetes, or heart disease according to the WHO report. However, people with depression symptoms sometimes do not receive proper treatment due to access barriers. In this paper, we propose a method that automatically detects depression using only landmarks of facial expressions, which are easy to collect with less privacy exposure. We deal with the coarse-grained labels i.e. one final label for the long-time series video clips, which is the common cases in applications, through the integration of feature manipulation and multiple instance learning. The effectiveness of our method is compared to other visual based methods, and our method even outperforms multi-modal methods that use multiple modalities.