You are on page 1of 8

Machine learning model

for predicting asset failure

White paper

This paper examines the benefits of using advanced machine learning models in predictive
maintenance software for asset-intensive industries. It discusses how the latest predictive
maintenance software solutions go beyond condition-based maintenance (CBM) models
where asset replacement is based on average engineering-determined condition
thresholds for replacing assets in the same class.
Contents

Introduction 3
Condition-based maintenance 3
State transitions 3
Machines vs. humans: Does experience matter? 4
Prediction in two temporal dimensions 5
Seeing over the hill 6
Risk tolerance and optimizing the prediction 7
Conclusion 8
About Nokia Software IoT 8

2 White paper
Machine learning model for predicting asset failure
Introduction
Asset-intensive industries are intrigued by what predictive maintenance software offers to help them
determine when to repair or replace their equipment. Until recently, the options were to run something until
it broke or replace it on a schedule, as part of planned maintenance. The former means maximizing the useful
life of the asset, but accepting outages, operating downtime, and operating inefficiencies from unexpected
and emergency-level repairs. The latter means increased capital expenditures and operating costs.

Condition-based maintenance
Today, many solutions labeled “predictive maintenance,” purport to predict when an asset will fail and
when to recommend repair or replacement. Most of these solutions use condition-based maintenance
(CBM) models that base their predictions for a given asset on how its condition compares to the statistical
models for that class of asset. Some, including applications from Nokia, use machine learning models to
analyze the behavior and condition of each individual unit and predict whether and how it will degrade in
the future, based on the model’s understanding of how that type of asset has degraded in the past. This
distinction may seem subtle, but it has significant consequences, which are explored in this white paper.
Specifically, the paper explains how the unsupervised machine learning engine provided by Nokia predicts
the probability of asset failure and overcomes the limitations of most CBM models.

State transitions
CBM is based on monitoring a set of conditions for which acceptable parameters or patterns have been
set, and alerting operators when conditions exceed those parameters. This assumes that the conditions
that predict failure are well-understood and that the parameters or patterns that indicate imminent
failure are well-established. To the extent some CBM systems use machine learning, they may use a form
of supervised learning. In this case, the possible failure outcomes are considered to be known and the
model is trained to monitor for the specific conditions or parameters determined to predict failure. The
conditions are based on experience and the parameters are based on statistical averages.
Unfortunately, this is insufficient for true predictive modeling. If all the conditions that lead to failure
were well known and followed neat paths to failure, engineers would have designed these flaws out of the
asset long ago. Further, predictions based on supervised learning apply only to a specific asset model. A
different asset from a different vendor or even a different model from the same vendor would have unique
combinations of conditions and parameters to monitor.
Unsupervised machine learning takes, in effect, the opposite approach. Programmers can guide the model
based on domain expertise to get it started looking for machine states that lead to failure, but the model
must learn on its own, from the data, what patterns emerge that characterize the machine’s state and
ultimately lead to a failure state. Figure 1 is a visual representation of the state transition model created
by our machine learning engine for a specific type of asset. It has discovered fourteen different states,
labeled A through N, that are relevant to the probability of that asset type failing. It does this by analyzing
data from the asset type, including assets that have failed.

3 White paper
Machine learning model for predicting asset failure
Figure 1. Visual representation of a state transition model
B

D M

F K
A

L J
C
I

H
G
E
N

(Failure)

Unsupervised learning
Unsupervised learning is a type of machine learning algorithm used to draw inferences from data sets
consisting of input data without labeled responses. The most common unsupervised learning method is
cluster analysis, which is used for exploratory data analysis to find hidden patterns or grouping in data.1

The more types of data the model can analyze, and the more likely it is that it will learn the states that
matter along the degradation path, the better the model will be. If it has some such relevant data streams,
however, it can create a model. Historical data is useful, as the model can learn and observe failures
without waiting for assets to fail in the field. However, it is not necessary. The model continues to learn
as it continues to receive current data.

If we are only monitoring conditions that engineers and operators know about and understand, we are
likely not monitoring all the conditions that matter.

Machines vs. humans: Does experience matter?


These discovered machine states may or may not be understandable by even the most experienced
operator or engineer. In any case, some of them will very rarely be observed, and we gain the most by
predicting rare events. It is here where there is one clear point of difference with CBM: if we are only
monitoring conditions that engineers and operators know about and understand, we are likely not
monitoring all the conditions that matter.
Machine learning discerns from the data the significant transitions on the way to failure and calculates the
probabilities of the asset transitioning from one state to another along a path that leads to failure. These
states may or may not be observable by humans. Typically, the states are hidden, and some of them are
counterintuitive to an engineer and revealed only by the data.

1
Mathworks

4 White paper
Machine learning model for predicting asset failure
Looking at Figure 1 again, the lines represent a path of transition from one state to another state. The
thickness of the line represents the probability of that transition occurring. You can see that the path
to failure usually does not follow a simple linear progression. An asset might fail from any of a variety of
prior states. Nor does an asset progress inexorably toward failure; it may transition from one state to
another and then return to a prior state. The asset in Figure 1 is as likely to fail from state B as it is from
state M. Nokia’s approach to machine learning calculates a specific asset’s probability of failure because
it understands these probabilities, having modeled this asset type using all the data at its disposal.
Ultimately, the failure probability is based on the model’s observation of the specific asset and its actual
state transitions.

Prediction in two temporal dimensions


Another critical advantage over CBM that emerges from this model is the ability to not just predict the
current probability of failure, but to understand what that prediction would be at a certain time in the
future. CBM can say “based on historical averages, assets in this condition fail, on average, within sixty
days.” Our machine learning model, however, can say: “Based on the state transitions we’ve seen this
particular asset go through, we can predict the probability of it taking a path to failure within the next sixty
days. At the same time, though, we can also tell you that, since we know the probable path it will take over
time, if we make the same prediction three months from now, the probability of it failing within sixty days
from then will be different.”
And that, as they say, makes all the difference.
Let’s look at a three-dimensional model of the predictions shown in Figure 2. The height of the curve is the
probability of failure. There are two time axes: number of days into the future of the prediction, and the
days into the future that a prediction is made. Therefore, the 0,0 point is “now.” This asset right now does
not indicate a probability of failing in the next 60 days. If we were to make that prediction six weeks from
now, we would predict that the probability of failure going forward 60 days is much higher. If we continue
pushing forward in time, we can see that the probability of failure comes back down. Looking further into
the future, the probability once again spikes. Clearly this asset has a problem and, at some point, will likely
fail. The trick is knowing the best time to repair or replace it.
Let’s look at this in a simpler way.

Figure 2. Probability of failure changes both forward in time and by day of prediction

70%

60%

50%

40%

30%

20% 50
10% 25

0% 0
0 50 100 150 200 231

Prediction day

5 White paper
Machine learning model for predicting asset failure
Seeing over the hill
CBM and less sophisticated models are parameterized and suffer reduced statistical confidence as they try
to predict forward in time, because they aren’t looking at a degradation path, they are looking at current
conditions. They can only predict, based on averages, the likelihood of failure under those conditions. As
they extrapolate what the asset’s future condition might be further into the future, the failure prediction
will have even less confidence.
As a result, they are likely to assume an asset is failing and recommend replacement, unnecessarily
lowering the asset lifespan and increasing the cost of repair or replacement when parts have to be
rush ordered or crew schedules disrupted.

Figure 3. Better failure time predictions drive extended asset life and optimize maintenance

Detect anomalies Predict failure Optimize operations

Failure
A B
CBM with prediction Optimization
Assumes failure at A: zone
lower asset life, increased Machine learning
repair/replacement costs • Predict forward in time
without loss of confidence
Probably • “See over the hill” to extend
of failure asset life
• Produce better optimization
for lower costs and
increased productivity

Extended asset life

The machine learning analytics modeled in Figure 2 predicts the probability of asset failure at multiple
times in the future without loss of confidence. As shown in Figure 3, it “sees over the hill” and predicts
the probability of surviving event A but not event B. Reinforcement learning algorithms then have a time
window to work with, so the asset owners can optimize purchasing, spare parts location, crew schedules,
down times, and other maintenance activities. The result is extended asset life, increased return on
invested capital, and lower maintenance and operating costs.

Reinforcement learning
Reinforcement learning involves learning what to do — how to map situations to actions — to maximize a
numerical reward signal. The three most important distinguishing features of reinforcement learning are:
1) being closed-loop in an essential way; 2) not having direct instructions of what actions to take; and
3) playing out the consequences of actions, including reward signals, over extended periods of time.2

2
Reinforcement Learning: An Introduction (Second Edition, in progress, 2016), Rich Sutton and Andrew Barto, MIT Press.

6 White paper
Machine learning model for predicting asset failure
Risk tolerance and optimizing the prediction
The graph in Figure 4 shows actual failure prediction curves for disk drives in a data center. The graph
compares actual condition-based predictive models from the manufacturers with an average of the
predictions from our machine learning model (the blue line). This graph shows averages for a set of assets,
to create an apples-to-apples comparison with these other models. In practice, our machine learning
model predicts the probability of failure for each asset individually.
In an operating environment, risk tolerance is an important constraint. Any predictive maintenance plan
should minimize risk: it could simply predict that every asset will fail one week after it is put into service
and, if replaced accordingly, will never result in a part failure. However, that’s no way to run a business
since asset utilization will be very low. A model is needed with a better fit, one that allows the operation
of the asset as long as possible before the risk of failure is actually too high. Risk might be thought of as
a function of the cost to the business of the asset failing balanced against other costs, such as the cost
of capital used to deploy that asset.

Figure 4. Failure rate vs. operating days by model


1.00

.75
Failure rate

.50

.25

0
0.00 5000 10000 15000
Mean operating days

The graph in Figure 4 may be recognizable as an illustration of Gini coefficients. Gini coefficients are a
measure of statistical dispersion, often used to describe income inequality in a population. They can,
however, be used to represent continuous probability distribution and effectively illustrate how well the
various predictive models of asset failure balance the cost of failure against the cost of prematurely
replacing the asset.

The graph in Figure 4 illustrates this balancing. Extending the operating hours is, in effect, lowering the
cost of capital and other costs associated with purchasing and deploying that asset. The ideal curve would
be one that predicts all or nearly all the asset failures in a narrow band as far to the right as possible,
representing the longest operating lifetimes while still predicting most asset failures before they happen.
The least desirable curve is one that equally distributes failure predictions along the operating hours axis.

7 White paper
Machine learning model for predicting asset failure
In Figure 4, the red line comes close to representing the latter, and the blue line, representing the
Nokia machine learning model, comes closest to representing the former. The other lines are the CBM
predictions delivered by the disk manufacturers. All of the models have access to the same data. Our
model best makes the trade-off between risk tolerance and operating hours.

Conclusion
Models that predict the probability of asset failure are the foundation of a predictive maintenance
system that reduces business costs associated both with unexpected asset failures and with premature
replacement of operating assets. While condition-based maintenance is an improvement over “wait until
it breaks” at one extreme or overly conservative planned maintenance schedules at the other, for many
types of assets, CBM lacks the analytical strength to create models that optimize business outcomes.
Machine learning can create such models, and as such should be a requirement for choosing any predictive
maintenance solution.

About Nokia software IoT


Nokia enables organizations in asset-intensive industries to generate more value from their people,
processes, and assets. Our award-winning analytics and industrial IoT applications optimize operations
in motion, in context and in real time. Teams at some of the largest organizations in the world, including
transportation and energy firms and some of the world’s largest utilities, use Nokia analytics software to
power mission-critical systems.
To find out more about Nokia IoT analytics-driven applications, visit https://nokia.ly/IoTapplications

About Nokia
We create the technology to connect the world. Powered by the research and innovation of Nokia Bell Labs, we serve communications service providers, governments,
large enterprises and consumers, with the industry’s most complete, end-to-end portfolio of products, services and licensing.

From the enabling infrastructure for 5G and the Internet of Things, to emerging applications in digital health, we are shaping the future of technology to transform
the human experience. networks.nokia.com

Nokia is a registered trademark of Nokia Corporation. Other product and company names mentioned herein may be trademarks or trade names of their respective owners.

© 2018 Nokia

Nokia Oyj
Karaportti 3
FI-02610 Espoo, Finland
Tel. +358 (0) 10 44 88 000

Document code: SR1809028549EN (September) CID 205621

You might also like