Vertical Federated learning (VFL) allows each client to collect partial features and jointly train the shared model. In this paper, we identified two challenges in VFL: (1) some works directly average the learned feature embeddings and therefore might lose the unique properties of each local feature set; (2) the server needs to communicate gradients with the clients for each training step, incurring high communication cost. We aim to address the above challenges and propose an efficient VFL with multiple heads (VIM) framework, where each head corresponds to local clients by taking the separate contribution of each client into account. In addition, we propose an Alternating Direction Method of Multipliers (ADMM)-based method to solve our optimization problem, which reduces the communication cost by allowing multiple local updates in each step. We show that VIM achieves significantly higher accuracy and faster convergence compared with state-of-the-arts on four datasets, and the weights of learned heads reflect the importance of local clients.