Using FPGA to accelerate Deep Neural Networks (DNNs) requires RTL programming, hardware verification, and precise resource allocation, which is both time-consuming and challenging. To address this issue, we present AccDNN, an end-to-end automation tool that can generate high-performance DNN designs on FPGAs automatically. Highlights of this tool include high-quality RTL network layer IPs, a fine-grained layer-based pipeline architecture, and a column-based cache scheme for high throughput, low latency, and reduced on-chip memory utilization. AccDNN also includes an automatic design space exploration tool, called A-REALM, used to generate optimized parallelism schemes by considering external memory access bandwidth, data reuse behaviors, resource availability, and network complexity. We demonstrate AccDNN on four DNNs (Alexnet, ZF, VGG16, and YOLO) on two Xilinx FPGAs (ZC706 and KU115) for edge- and cloud-computing, respectively. AccDNN generates designs that deliver 263 GOPS and 36.4 GOPS/W on ZC706 without any batching and 2109 GOPS and 94.5 GOPS/W on KU115.