The growing prominence and computational challenges imposed by Deep Neural Networks (DNNs) has fueled the design of specialized accelerator architectures and associated dataflows to improve their implementation efficiency. Each of these solutions serve as a datapoint on the throughput vs. energy trade-offs for a given DNN and a set of architectural constraints. In this paper, we set out to explore whether it is possible to systematically explore the design space so as to estimate a given DNN's (both inference and training) performance on an shared memory architecture specification using a variety of data-flows. To this end, we have developed a framework, DEEPMATRIX, which given a description of a DNN and a hardware architecture, automatically identifies how the computations of the DNN's layers need to partitioned and mapped on to the architecture such that the overall performance is maximized, while meeting the constraints imposed by the hardware (processing power, memory capacity, bandwidth etc.) We demonstrate DEEPMATRIX's effectiveness for the VGG DNN benchmark, showing the trade-offs and sensitivity of utilization based on different architecture constraints.