Publication
SOLI 2013
Conference paper

Survival analysis for HDLSS data with time dependent variables: Lessons from predictive maintenance at a mining service provider

View publication

Abstract

In gene expression analysis it is often the goal to predict survival given a high-dimensional space of covariates. In corresponding literature models are described that deal with low sample size which is a typical feature of such studies. This is also the case in asset management services where downtime of assets is very costly and thereby replacements are scheduled long before the actual risk of failure increases. Although sometimes good surrogates of the true failure probability are available, it is in practice often the case that a number of weak predictors exist which needed to be filtered from a large set of candidates. Although the challenge is similar to gene expression analysis, a crucial difference is that covariates in condition monitoring are dynamic whereas genes are not. The result is that in gene expression analysis any data in between failure can be omitted, which leads to a potentially high bias in variable selection for condition monitoring. The authors are not aware of any survival models that deal with high dimensional low sample size (HDLSS) data in case of time-dependent covariates. In this paper we evaluate the performance of different modeling techniques in case of HDLSS survival data including the definition of a discrete time model where survival is modeled as a locally independent, binary outcome variable. We thereby study the trade-off between omitting measurements between times of failure and disregarding temporal dependencies. The analysis is based on a real life case study where 39 components of 50 mining haul trucks were monitored in operations over almost 6 years. © 2013 IEEE.

Date

Publication

SOLI 2013

Authors

Share