Big data powered Deep Learning (DL) and its applications have blossomed in recent years, fueled by three technological trends: a large amount of data openly accessible, a growing number of DL frameworks, and a selection of affordable hardware devices. However, no single DL framework, to date, dominates, making the selection of DL frameworks overwhelming. This paper takes a holistic approach to conduct empirical comparison and analysis of four representative DL frameworks with three unique contributions. First, we show that for a specific DL framework, different configurations of its hyper-parameters may have a significant impact on performance. Second, this study is the first to identify the opportunities for improving the runtime performance and accuracy of DL frameworks by configuring computing libraries and tuning individual and multiple hyper-parameters. Third, we conduct a comparative measurement study on the resource consumption patterns and their performance implications, including CPU and memory usage, and their correlations to hyper-parameters. We argue that this study provides in-depth empirical comparison and analysis of DL frameworks, and offers practical guidance for service providers to deploying and delivering DL as a Service and for application developers and DLaaS consumers to select the right DL frameworks for the right DL workloads.