Adders are the most fundamental arithmetic units, and often on the timing critical paths of microprocessors. Among various adder configurations, parallel prefix structures provide the high performance adders for higher bit-widths. With aggressive technology scaling, the performance of a parallel prefix adder, in addition to the dependence on the logic-level, is determined by wire-length and congestion which can be mitigated by adjusting fan-out. This paper proposes a polynomial-time algorithm to synthesize n bit parallel prefix adders targeting the minimization of the size of the prefix graph with log2n logic level and any arbitrary fan-out restriction. The design space exploration by our algorithm provides a set of pareto-optimal solutions for delay vs. power trade-off, and these pareto-optimal solutions can be used in high-performance designs instead of picking from a fixed library (Kogge Stone, Sklansky etc.). Experimental results demonstrate that our approach (i) excels highly competitive industry standard Synopsys Design Compiler adder (128 bit) in performance (2%), area (25%) and power (13.3%) in 32nm technology node, and (ii) improves performance/area over even 64 bit custom designed adders targeting 22nm technology library and implemented in an industrial high-performance design.