Publication
SYSTOR 2017
Conference paper

"Memory loss" in commodity hardware? Predicting DIMM failures with machine learning

View publication

Abstract

Failures of memory modules have been a concern for a long time, as they are costly both in terms of hardware replacement and service disruption. These failures can be preceded by correctable (soft) and then uncorrectable (hard) errors, which accumulate over time. Valuable large scale studies of DIMM errors in the wild [2, 1] analyze in depth hard and soft errors and their correlations with specific sensors. However, little has been reported on how these findings could be used to automatically predict future DIMM failures. We show that by understanding which factors drive such failures, we can build intelligent predictive models with off-theshelf machine learning techniques to predict DIMM failures ahead of time with high accuracy. Such models not only provide early signs of failures, but also allow administrators to proactively replace DIMMs at risk weeks in advance, thus avoiding "memory loss" of their commodity hardware.

Date

Publication

SYSTOR 2017

Authors

Share