Adders are the most fundamental arithmetic units, and often on the timing critical paths of microprocessors. Among various adder configurations, parallel prefix adders provide the best performance vs. power/area trade-off, especially for higher bit-widths. With aggressive technology scaling, the performance of a parallel prefix adder, in addition to the dependence on the logic-level, is determined by wire-length and congestion which can be mitigated by adjusting fan-out. This paper proposes a polynomial-time algorithm to synthesize n bit parallel prefix adders targeting the minimization of the size of the prefix graph with log2 n logic level and any arbitrary fan-out restriction. A structure aware prefix node cloning is then applied to the resultant prefix adder solutions to further optimize the size of the prefix graphs. The design space exploration by our approach provides a set of pareto-optimal solutions for delay vs. power trade-off, and these pareto-optimal solutions can be used in high-performance designs instead of picking from a fixed library (Kogge-Stone, Sklansky, etc.). Experimental results demonstrate that our approach: 1) excels highly competitive industry standard Synopsys design compiler adder, regular adders such as Sklansky adder and Kogge-Stone adder, and a highly run-time/memory intensive recent algorithm in 32 nm technology node and 2) improves performance/area over even 64 bit custom designed adders targeting 22 nm technology library and implemented in an industrial high-performance design.