Churn prediction is very important to the insurance industry. Therefore, there is a big value to investigate how to improve its performance. More importantly, a good model can be used by a common service provider and benefit many companies. State-of-The-Art methods either use 1) shallow models such as logistic regression, with sophisticated feature engineering, or 2) deep models that learn features and classification models simultaneously. In terms of performance, shallow models can memorize better while deep models can generalize better but may under-generalize with insufficient data. Therefore, we propose a combined Deep &, Shallow model (DSM) to take the strengths of both memorization and generalization in one model by jointly training shallow models and deep models. The experiment results show that for insurance churn prediction, joint training can significantly improve the performance and the DSM earns better performance than both shallow-only and deep-only models. In our real-life dataset, the DSM performs better than CNN, LSTM, Stochastic Gradient Descent, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Gaussian Naive Bayes, AdaBoost, Random Forest, and Gradient Tree Boosting. In addition, the DSM can also be applied to other prediction services.