Data imbalance handling approaches for accurate statistical modeling and yield analysis of memory designs
Abstract
Data imbalance can impact the fidelity of a classifier. We rely on advances in data imbalance handling techniques for machine learning applications to propose an enhanced fast statistical analysis methodology. Particularly, we employ data handling techniques in the context of a logistic regression based importance sampling methodology for accurate statistical modeling of rare fail events in memory designs. We demonstrate that for purposes of achieving conservative yield estimates, the synthetic minority oversampling technique outperforms other data handling methods and portrays the best model recall and precision rates. We report more than 70% reduction in the number of False Negatives compared to imbalanced data set based approaches. We also report on average a low 5% relative error rate in the yield estimate for the balanced data set-based modeling approaches compared to the pure circuit simulation based approach. This is compared to on average an 18% relative error rate obtained for the imbalanced data set-based approaches. These results were verified on state-of-the-art industrial FinFET SRAM designs.