In neural architecture search (NAS), training every sampled architecture is very time-consuming and should be avoided. Weight-sharing is a promising solution to speed up the evaluation process. However, training the supernetwork incurs many discrepancies between the actual ranking and the predicted one. Additionally, efficient deep-learning engineering processes require incorporating realistic hardware-performance metrics into the NAS evaluation process, also known as hardware-aware NAS (HW-NAS). In HW-NAS, estimating task-specific performance and hardware efficiency are both required. This paper proposes a supernetwork training methodology that preserves the Pareto ranking between its different subnetworks resulting in more efficient and accurate neural networks for a variety of hardware platforms. The results show a 97% near Pareto front approximation in less than 2 GPU days of search, which provides 2x speed up compared to state-of-the-art methods. We validate our methodology on NAS-Bench-201, DARTS, and ImageNet. Our optimal model achieves 77.2% accuracy (+1.7% compared to baseline) with an inference time of 3.68ms on Edge GPU for ImageNet, which yields a 2.3x speedup. Training implementation can be found at https://github.com/IHIaadj/PRP-NAS.