CodeMagus PDF

FT
D
RA
R EPORT
Adaptive Statistical Learning Applied to
Telemetry Data for Monitoring
Prepared by: Stephen R Donaldson
Reference: CML00097-01
Code Magus Limited (England reg. no. 4024745)

Number 6, 69 Woodstock Road
Oxford, OX2 6EY, United Kingdom
www.codemagus.com
c 2014 by Code Magus Limited
Copyright
All rights reserved
March 4, 2015
Code Magus Limited
Contents
1
2 Telemetry for Monitoring
3 Hypothesis Testing and Fault Determination
4 Model parameters
5 Processing Metric Data
6 Sensitivity to Errors
7 Illustrative Examples
13
13
36
D
RA
References
FT
1 Introduction
Adaptive Statistical Learning Applied to Telemetry Data for MonitoringCML00097-01
Code Magus Limited
List of Figures
1
2
3
Components in instrumented application landscape . . . . . . . . . . .

Time line showing telemetry arrival order for a single metric . . . . . .
Metric records picked from a time line after key attributed assignment
to form training distribution details . . . . . . . . . . . . . . . . . . . .
4
6
12
D
RA
FT
List of Tables
FT
Abstract
D
RA
This paper describes using streaming telemetry for self tuning application monitoring.
With relatively little configuration, the process described here can monitor a large collection of metrics streamed from instrumented applications and probes using a common
format for transmitting metric samples to a central server. The technique uses elements
of measurement theory and statistical/machine learning; as well as certain sensitivity
parameters together with certain partitioning schemes for splitting the metrics into appropriate training sets in which fixed Gaussian or Poisson distribution parameters are
expected to hold. Typically, the partitioning is based on dates and times the observations are made with knowledge of local calendar and client to business behaviour. When
anomalies are detected, either against certain configured significance levels, or persistently against certain significance levels, alarms can be triggered, and the supported case
data presented.
Code Magus Limited
1 Introduction
FT
Large distributed systems measure and collect many metrics on their own performance,
much of which could be used as indication of health of the system against the expected
performance of the system. The sources of such metrics span the full application space
from network and I/O infrastructure, hardware platforms, operating systems, application
hosting and instrumented applications; reflecting the vertical dependencies of application availability. In addition, the multi-tiered component based architectures increase
application availability dependence simultaneously across many platforms. A number
of problems are posed by this spread availability dependence and in the monitoring of
the health of such systems:
D
RA
The number of metrics required to be monitored. Each element of each application stack, by default, typically measures many metrics, a significant portion of
which could be useful in determining whether not there is something amiss with
the availability of the system. It is not feasible to dedicate real estate to the monitoring of each possible metric. It is also not reasonable to rely on screens to be
watched continuously in order to determine if something is amiss or whether the
system is showing signs of stress (which may or may not lead to an outage).
The frequency of certain events is dependent on user population behaviour. Without the impact or influence of campaigns, or other events1 , expected rates will
change depending on time of day, day of month, day of week, whether the day in
question is a week-day, business day, weekend day, bank holiday, bank holiday
weekend, public holiday or religious holiday. This means that it is not possible to
determine a fixed expected rate for a particular metric that is continuously applicable.
Distributions of performance metrics are dependent on usage patterns. Certain
events correspond to events that imply processing which in turn implies resource
consumption. Consumption of a particular resource (memory, CPU capacity, network bandwidth, etc.) may get driven to the point of impacting performance.
Further, for different times of the day, the processing of certain requests may be
against different background activity, some of which could impact performance.
Around the globe, different time zones would fall within peak local activities.
It is also not only the sample means of metrics that change over time. Sample
variance is often dependent on the number of samples; and this is not necessarily just because of the number of samples dropping to statistically insignificant
levels. For example, processing loads caches and buffer pools, as request rates
drop, within a fixed period of time, the opportunity for false sharing or date reuse
diminishes, in favour of supporting other, possibly unrelated, activity.
1
An important sporting event can introduce a notice dip in expected user behaviour, allowing half-time
and full-time to be recognised.
Code Magus Limited
Distribution parameters change over time. Over time organic growth, infrastructure upgrades, acquisitions and mergers, etc. change the accepted baseline parameters against which distributions of sampled values and rates should realistically
be compared. If these are hard-coded, or specifically determined baseline values, then under normal conditions, this would imply almost continual review and
intervention.
FT
In this paper, we describe a mechanism to monitor event rates, and metric values which
in a manner that address these problems. Rather than consider a configuration of for
each event counter or metric value that could possibly arise, the approach can be applied
en-masses to event counters and metric values. If there are exceptions, it is suggested
that these are masked out rather than the alternative approach specifically including
items for monitoring on an exception by exception basis.
2 Telemetry for Monitoring
D
RA
In general the classes of telemetry that one would be interested through an application
stack and across the hosting platforms that support an application fall in to one of three
categories:
Categorical or qualitative telemetry: This category indicates the state of an
element within the system. For example, the the state of a connection could be
one of disconnected, connecting, or connected.
Event counters: This category indicates the number of times an association event
has occurred and typically reflects events which are atomic in nature and have no
associated value (for example, the number of CRC-errors detected is not associated with a value corresponding either to the duration, size, or resource consumption associated with a CRC-error.
Metric values: This category associates with an event, either a duration, size, or
level of resource consumption. It is possible that an event can be associated with
more than one value, in which case the various values will be associated with
different metrics.
Typically, telemetry metric values or counts arises from application instrumentation

in which case every event would be sampled (such sampling would be handled synchronously with the applications processing of the event); or the values, counts or
activity correspond to telemetry arising outside of the application stack, either within
the kernel of the operating system, some component of the application hosting or network device. In such cases, the metric data is maintained independently of the sampling
mechanism and in most cases the values made available are only samples.
Telemetry data streamed for processing is maintained using an instrumentation library [1],
with the application calls to the various functions of the instrumentation library made inAdaptive Statistical Learning Applied to Telemetry Data for MonitoringCML00097-01
Code Magus Limited
line (i.e. synchronously) with the application code processing the corresponding event.
For example, the code that determines the duration of a session is found within the
function that deals with terminating the corresponding session.
Although metrics are maintained synchronously as described, the telemetry data is
not processes synchronously. Instead, periodically accumulate metrics are parcelled
as telemetry and asynchronously sent for processing. This asynchronous dispatch of
the telemetry ensure that the application is not held up sending out metrics; and the
accumulation of events ensures that the application does something useful in between
maintaining metric structures and that the network is not flooded with metric updates.
FT
The determination of when accumulated metrics are communicated is under the application control and could be based on either a number of updates being elapsing since
the previous update or a minimum amount of time elapsing since the previous update
was sent out.
D
RA
Operating systems and network devices typically maintain metric data as a normal
course of operation and using such data causes no additional processing overhead, apart
from the occasional system or network call to sample the data. A probe is a simple
instrumented application which harvests this kernel or network device maintained data.
These probe applications are instrumented appropriately with the same metric instrumentation library [1], and hence collect metrics and dispatch telemetry in a manner
consistent with the application instrumentation telemetry.
The SNMP [131, 133, 112, 132, 95, 105, 106, 134, 129, 12, 78, 152, 150, 89, 74, 107,
151, 153, 65, 84, 83, 91, 109, 114, 10, 9, 14, 36, 90, 100, 101, 97, 143, 59, 98, 76, 37,
31, 13, 110, 61, 60, 64, 123, 145, 159, 77, 136, 92, 158, 7, 165, 6, 164, 147, 73, 24,
23, 38, 99, 63, 62, 156, 111, 108, 30, 29, 32, 103, 102, 104, 80, 79, 81, 130, 137, 8,
27, 28, 126, 33, 148, 155, 146, 22, 127, 154, 160, 70, 26, 25, 113, 144, 117, 124, 42,
88, 87, 157, 96, 149, 34, 54, 41, 67, 125, 56, 48, 46, 82, 58, 50, 47, 57, 68, 121, 122,
119, 120, 43, 44, 141, 86, 11, 93, 51, 53, 52, 142, 16, 128, 15, 115, 116, 75, 49, 39, 40,
55, 118, 163, 162, 45, 66, 139, 140, 161, 85, 135, 21, 18, 20, 17, 35, 138, 19] probe
(cmlxsnmp) [133] provides remote access any management information base (MIB)
hosted on an SNMP server. Kernel and operating system metric probes are available
for Solaris [5] using the kstat(1M,3KSTAT) interface [94]; for AIX [2] using the
perfstat API for extracting perfstat kernel extension data; for Windows [3] using
Windows counters; z/OS RMF [71]; and WebSphere Application Server [4].
The CML metric library is thus useful in tools and instrumented applications in maintaining and extracting metrics, and making their values available as telemetry suitable
for remote monitoring. The library and tools allow dashboards, rules engines, performance analysis, etc., to be performed, both vertically within the application stack of a
single application as well as horizontally across the systems and infrastructure components across the application landscape.
Code Magus Limited
Sampling, using the CML instrumentation metric library, maintains samples summarised
in a manner that allows event or sample event, sample mean, and sample variances to be
extracted at the receiving or processing end of the telemetry data, in real time, historically or within existing or past intervals.
In this paper we discuss the exploitation of these metrics in order to provide an adaptive
statistical learning framework in which a large number of metrics could be monitored
with alerts being sent out once outliers are detected.
FT
The outliers are detected not only against some mean end variance over past observations but takes into account user behaviour, for example, over days of the week, time
of the day, business days, weekends and holidays. Usually this is done for every event
detected.
The Metric/Instrumentation Structure Applications instrumentation using the CML metric library allows the application to update the metric structure by calling a method
which adds an additional data point to the metric being sampled or collected. Usually
this is done for every event detected.
D
RA
Instead of maintaining an entry for the additional data point, the data from the additional
item is merged with prior samples or values.
Periodically, using a configured time interval as well as the IP address and port of a
CML data collection agent (called a dish), the accumulated item is sent to the dish as
telemetry.
Typically, accumulated telemetry will not be sent more frequently than once every five
seconds, and would typically be sent at least once every sixty seconds.
Host A
Tools for Graphically

Rendering Telemetry
App 1
Probe
Kernel
Instrumented Application Metrics
Host Resource and Performance Data
Dish
Monitoring
and Data Gathering
Infrastructure
Instrumented Application Metrics

Host Resource and Performance Data
Host B
App 2
Probe
Rules Engines for

Analysing Telemetry
Kernel
Figure 1: Components in instrumented application landscape
Code Magus Limited
Figure 1 shows the components of the application landscape across which telemetry
are collected from applications and operating environments. Typically, the collection
of metrics is very inexpensive, and the range of useful metrics, depending on circumstances, is quite broad. This puts pressure on the available real estate to render the
metrics for viewing. Consequently, it is difficult to keep the focus on all available metrics for viewing, and only a few high-level system health and performance metrics can
be viewed at any one point in time. Hence the requirement for an analytics engine.
FT
3 Hypothesis Testing and Fault Determination
D
RA
Certain metrics of a system are ideally invariant of the level of activity against the system, and hence a rather static approach can be taken toward training values within a
happy range of operation. Where metrics values are dependent on load and where the
determination of happy range needs to cater for the variance in load, either across channels, or over time, this leads to margins being set too wide, admitting too many Type I
errors, or too narrow, admitting too many Type II errors. Instead, an adaptive approach
should be adopted, taking into account variations over time, and variations across channels.
Our approach is to use a method which allocates the metric to a pool, dynamically,
allowing the method to change the pool as the requirements dictate. This allows us to
formulate, for a particular metric (e.g. transaction rate) for a particular time of day, for
a particular day of the week, for a particular type of day (working day, public holiday,
etc.) and (as need be), the metrics describing the historical behaviour for the time of day,
channel, etc. The same assignment scheme can be applied to recently collected data to
provide a current sample of the metric in the form of the sample mean and sample
variance. With the null hypothesis being that the historical or training distribution is the
same as the test or current sample distribution:
H0 : h = c .
(1)
The monitoring system tests the hypothesis according to a configured level of significance; and then if the null hypothesis is rejected, an alarm or warning can be issued.
Since the process continually updates the data, repeating the calculations of the historical and current mean and variance, the warning of an alarm can be conditioned on the
repeated rejection of the null hypothesis.
When an instrumented application updates a metric, such as response time in seconds,
the new measurement is presented to the corresponding metric structure as a new data
point, xn , say. Prior to this point, other values would have been presented to the metric
structure, say xn1 . Then prior to the update, the metric structure would have values
Code Magus Limited
representing the previous updates in the following form:
(n 1,
n1
xi ,
i=1
n1
x2i ).
(2)
i=1
Introducing the n-th metric update, xn , results in the following:
(n,
xi ,
x2i ).
(3)
i=1
FT
i=1
Some time after this update, possibly after further updates, the metrics current state
would be sent to the dish.
(n + j,
n+j
i=1
xi ,
n+j
x2i ) for some j 0.
(4)
i=1
D
RA
The content of the telemetry update is as follows:
(tk , k,
i=1
xi ,
x2i ).
(5)
i=1
Over time, these updates are placed in appropriate buckets depending on the attributes
of tk . (For example, the time of day, day of month, day of week, holiday, business day,
etc.)
Within a bucket, historical values are used to determine a base line for the distribution
of the metric. Assume the following time-line as it applies to a single metric within a
single bucket.

Training data set
-
Lag
-
Test data set
Time (t)
tio
tij
tik
Figure 2: Time line showing telemetry arrival order for a single metric
The time period between tio and tij is used to build the point statistics describing the
baseline. As time goes on, newer values are used for tio and tij . This allows the baseline
til
Code Magus Limited
to be adaptive. However, the gap between tij and tik is a parameterised constant so that
the distributions on either end are kept separate. This, together with the fact that the gap
between tio and tij i is significantly larger than the gap between tij and tik , ensures that
the baseline or training distribution, while it can change over time, is reasonably stable.
The sample statistics for the baseline distributions are calculated as follows:
i j
o
xi ii=0
xi
nij nij
i=0
mh =
i j
(6)
xi
nij nij
ij 2
io xi
2
sh =
m2h .
nij nio
Similarly
i=io
FT
(7)
(8)
il
D
RA
xi
nil nik
il 2
ik xi
s2c =
m2c .
nil nik
mc =
i=ik
(9)
(10)
The derivation of the sample variance (s2 ) follows from
s = 1/N
= 1/N
= 1/N
= 1/N
i=1
N
i=1
N
i=1
N
(xi m)2
(11)
(x2i 2xi m + m2 )
(12)
(x2i )
2m/N
xi + 1/N
i=1
(x2i ) m2
m2
(13)
i=1
(14)
i=1
The test reduces to checking whether observations in the historical samples and the current samples are drawn from the same distribution, testing whether the null hypothesis
H0 : h = c and h2 = c2 .
(15)
Code Magus Limited
should be rejected using the sample point statistics, and raising an alarm in favour of the
alternate hypothesis under the assumption that an anomaly has been detected:
HA : h = c or h2 = c2 .
(16)
The determination that the distributions are different is performed by considering the
rejected p-values against a supplied significance level.
Together with the values
FT
An alarm can also be raised on condition that H0 is repeatedly or consistently rejected

over consecutive updates of the distribution metrics.
(tk , nk ,
i=0
xi ,
x2i )
(17)
i=0
D
RA
of telemetry updates, the name of the metric and a short description of the metric is also
provided. The name of the metric is used to determine what is being updated and is
used to qualify the bucket. The short description is used to provide some context for the
alarm being raised.
In addition to the name, the time-stamp tk is also used to qualify the bucket into which
the metric is to be assigned. Included with the time-stamp tk is the epoch (seconds since
00:00, Jan 1 1970) when this metric update was sent. The epoch seconds can be used to
provide a qualification (appended to the name of the metric) in order to determine the
current bucket to be updated and tested. For example, an index of the day of the week
can be determined by
((epoch div 86400) + 4) mod 7
(18)
gives the day of the week indexed from zero (Sunday) and
((epoch + timezone offset ) mod 86400) mod 3600
(19)
gives the time of the day in local time.
More generally though, strftime(3) can be applied to the time-stamp to produce a

qualifying string; and the date from the resulting string can be applied to a look up table
of local holidays, business days, etc.
A stream of time-stamps thus has the ability to change the bucket into which the metric
is assigned from time to time, perhaps hourly. The intervals of historical data would
have been split into various buckets. The calculation of the historical sample mean and
sample variance in this case are calculated in the same manner as indicated here except
they are potentially calculated piece-meal.
Code Magus Limited
4 Model parameters
There is some degree of control over the sensitivity and currency of the training data
as well as the sample size of the test data. The volatility of the training data can be
controlled by choosing the number of points to include in the training data set. The
number of points is determined by the times span tij tio . Furthermore, the training
set can be kept independent of the effects of recent events by ensuring a suitable lag
between the training data set and the test data set. This lag is determined by the gap
tik tij .
FT
A lag that is too small may mask changes in the data over time, allowing the historical
and current distributions of the metrics to track each other. By keeping the lag sufficiently large, unwanted gradual changes over time can be detected, as well as unwanted
sudden changes.
If there are service level agreements in place, then it is suggested that both distributions
be checked for against these.
D
RA
Certain assumptions need to be checked and or made regarding the nature of the distributions and the metric sampling; however, the training set size for the historical distribution is expected to carry a large number of samples and the test set size is expected to
cover at least 30 measurements, that is
nil nik 30 and

nij nio 30.
1/2
2e1/2
(x)2
2
1
2
(20)
(21)
(22)
5 Processing Metric Data
The general processing of arriving metric data will be to determine whether or not the
metric record has occurred out of order. Metric records that occur out of order can be
ignored. Generally, a metric records sequence number is used to determine whether or
not a metric record should already have been received (i.e. the sequence number is less
than the highest sequence number already seen).
When the sequence number is greater than the last sequence number seen by more than
one, a gap is indicated. Apart from noting and possibly raising an alert because of the
missing sequence number, no change in the processing of the arriving metric record
10
Code Magus Limited
occurs the metric record is processed in the same manner as all other arriving metric
records.
When a metric record arrives and is not set to be ignored, the respective accumulative
items of the last record received and processed are subtracted from the current records
to form the respective interval values. The interval values are calculated by subtracting
the respective accumulative values of the last processed record from the current record
being processed. The interval duration is calculated by subtracting the time-stamp of
the previous record processed from the current record being processed, and the result is
the interval duration in seconds.
D
RA
FT
Specifically, the interval value 1 value is calculated by subtracting from the current metric record the value 1 value from the previously processed metric records
value 1 value. This number, interval value 1, corresponds to the number of
samples of the metric value within the interval. In a similar way, the interval value 2
and interval value 3 values are calculated by subtracting the respective value 2
and value 3 values from the metric records. The calculated interval value 2
value corresponds to the sum of all samples or values of the corresponding metric in the
interval, and the interval value 3 value corresponds to the sum of the squares of
all the samples or values taken in the interval.
From the resulting 4-tuple
(interval, interval value 1, interval value 2, interval value 3)
for each of the instances can be used to calculate the sample rate for the interval, the
sample mean for the interval, and the sample variance for the interval. For instrumented
applications where a value is captured for each occurrence of an event, their values are
the event rate over the interval, the mean value over the interval and the variance over
the interval.
Each metric that arrives contains the following tuple
(time-stamp, group-name, metric-name, n,
xi ,
x2i , description)
.
In the manner described above, provided duplicate and out of order metric records are
ignored, the interval statistics will be meaningful.
Additionally, in order to derive the same point statistics for larger intervals or larger sample sizes (not necessarily made up of contiguous metric records) for a particular metric
(as indicated by a combination of the group-name and metric-name), one only has
to accumulate the values interval, interval value 1, interval value 2,
and interval value 3.
11
Code Magus Limited
This is the basis of calculating the historical and current values of the required metrics
(historical values are used for estimating training distributions and current values are
used for estimating test distributions).
Some performance metrics are expected to have different values depending on the time
of day, day of week or day of month, or whether within the above time based breakdowns, the type of day, for example, business day, work day or public/bank holiday.
Determining whether any or which of these time and calendar based attributes apply to
a particular metric record is a function of the metric record time stamp, the local time
zone and a local calendar of holidays and working days.
FT
These attributes derived from a metric records time-stamp are used to qualify the metric
in addition to the group-name and the metric-name. Thus a metric record could
be qualified by the following assigned key:
<host-name>.<metric-name>.HOUR11.SATURDAY
D
RA
The qualification of the metric name from a key used to aggregate the value to either
form a baseline (historic) training distribution or a current test distribution for the given
metric. The qualified processed metric records will be stored in a database table.
At some point in time, once the lag has passed, a metric record that does not as yet
form part of an aggregated training dataset will have its 4-tuple values added to the item
qualified by the assigned key. Once the number of accumulated samples required in the
aggregated training distribution has been exceeded by the number of samples in the oldest metric record included training set, the corresponding 4 values are subtracted from
the training distribution values. The oldest values are subtracted from the aggregated
training values until the time span and sample size requirements are met by the least
number of processed metric records.
This process maintains a single aggregate training set record for each qualified key
value. This training set has both a minimum number of samples and covers a minimum amount of time (provided enough metric records have come to pass). Further, due
to the lag, the aggregate describes historical data older than a determined age (so as not
to be overly influenced by more recent events).
A single row of a few column values is all that is required to describe a single training set
distribution. This makes keeping many training set distributions for the same assigned
key qualified metric relatively inexpensive. As an option, training sets can be spun off
and frozen in time as well as or instead of running a dynamic training set at least as old
as the configured lag. This allows, for example, to compare and report a test distribution
against multiple training distributions, allowing, for example, statements to be made
such as: The response times are worse than the corresponding response times of 1 year
ago, but better than the corresponding response times of six months ago.
Figure 3 shows a time-line of arriving metrics together with a key assigned to the metric,
allowing the metrics ostensibly from the same distribution to be aggregated into training
12
Code Magus Limited
sets to compare against later test distributions in order to detect anomalies.

AGGREGATE METRIC TABLE
Start Time
End Time
Key
4Tuple
K2
T3
T4
K1
4Tuple
T1
K2
T3
T4
K2
4Tuple
T2
K1
T2
K2
T3
K1
T3
K2
T4
K1
T4
K2
T4
K3
T5
K1
Key
T0
Interval
Value_1
Value_2
Value_3
Description
FT
Timer
D
RA
Time
DETAIL METRIC TABLE
Figure 3: Metric records picked from a time line after key attributed assignment to
form training distribution details
The current values of a metric will be qualified by the last time-stamp seen, but in order
to build up a sufficient sample for hypothesis testing, the necessary number of the most
recent processed metric records matching the group-name and metric-name will
be used.
In order to determine whether or not there is a possible anomaly, the historical/training

and current sample mean Z-score differences will have their p-value checked against
a supplied significance value (). When the p-value is used in this manner, the null
hypothesis may be rejected in favour of the alternative hypothesis, asserting a possible
anomaly in the value of the corresponding metric.
When a possible anomaly is detected, an alert is to be sent out. The alert should include
any evidence upon which the assertion is made. This evidence will include the training
and test sample mean values, the standard sample variance as well as the Z-score values.
In addition, all processed metric records including the test set will be included in the
detail supporting the alert.
Provided the assumptions of the hypothesis testing are reasonably upheld, the significance level, or -value, is the cut-off value for the p-value and hence indicates a measure
of the cut-off probability that the data from the assumed training set distribution could
be as least as extreme as the observed values making up the sample test metric values
assuming that the null-hypothesis is true. A small value, then indicates that the sample
13
Code Magus Limited
test data is less likely to be from the same distribution, and thus cause for a possible
rejection of the null-hypothesis, and to raise a corresponding alert.
6 Sensitivity to Errors
FT
There is some conflict between an earliest possible warning to a problem and admitting
too many Type I Errors. This can be countered by allowing the first errors encountered to
be raised as warnings; and then to escalate the warnings to alarms if the null hypothesis
for a metric is continuously rejected. The first such occurrence could simply be warning
that some metric values are showing as outliers. With a significance of, say, = 0.05
(and under the assumptions of the testing method) the probability of the data being as
extreme as observed by chance alone would not be in excess of the value. For a
second consecutive rejection, the probability that this happened by chance alone would
not exceed 2 = 0.0025, and so on with further consecutive rejections.
D
RA
For this scenario of testing multiple consecutive test sets against a training set, the second and subsequent hypothesis tests need to be performed with sampled values independent of the first test set and with each other. The system will thus allow a full test
set to be regenerated for a particular metric before performing another hypothesis test
against the same training set. This is achieved by flagging the rows in Figure 3 once an
hypothesis test rejects the null hypothesis so that a completely new aggregation of data
is collected for the subsequent tests.
Care should be taken that while the condition of independence is necessary in building
up subsequent test sets, it is not a sufficient condition to guarantee independence. There
are cases where independent individual metric values are dependent on each other, and if
this influence across independent metrics spans multiple aggregations to form different
test sets, then the test sets will not be entirely independent of each other. An example
is the sampling of a metric that represents a smoothed value in the system, such as a 5
minute smoothed CPU load average.
In order to guard against the contamination of the training set by a tainted test set (a test
set in which the null hypothesis was rejected) the failed data would need to be flagged
so as not to participate in the establishment of any future training set. This could also
be achieved by flagging the row in Figure 3.
7 Illustrative Examples
While the system does not exist and this
[72, 69]
Jun
Jul
Date/Time
Aug
Sep
Unseen/Future
Training (Span)
Reject
Pending (Lag)
Copyright (c) 2014 Code Magus Limited. All rights reserved.
FT
D
RA
Accept
Legend
Code Magus Limited

14
Response Time (milliseconds)
15
D
RA
FT
Code Magus Limited
16
D
RA
FT
Code Magus Limited
17
D
RA
FT
Code Magus Limited
c(2, 1.97989949748744, 1.95979899497487, 1.93969849246231,

FT
D
RA
Code Magus Limited
18
c(2.40205794344621e37, 9.41864311395823e37, 3.65248277926852e36,
19
0.0
theoretical
2.5
2
10
15
20
sample
25
2.5
0.0
theoretical
D
RA
2.5
2.5
FT
Code Magus Limited
sample
20
D
RA
FT
Code Magus Limited
21
4
Theoretical Quantiles
FT
Normal QQ Plot
D
RA
Normal QQ Plot
Code Magus Limited
Sample Quantiles
10
15
20
25
Sample Quantiles
0.5
3 3
D
RA
2
3 3
3
Test
Theoretical
Simulation
FT
Accept Zscore = 0.39, pvalue = 0.697
Train
1.0
1.5
2.0
0.5
1.0
1.5
2.0
Reject Zscore = 2.98, pvalue = 0.003
Code Magus Limited

22
Sample
Sample Quantiles
Sample Quantiles
2.0
1.5
1.0
0.5
2.0
1.5
Sample Quantiles
Sample Quantiles
2.0
1.5
1.0
0.5
2.0
1.5
1.0
0.5
1.0
0.5
FT
D
RA
Code Magus Limited
23
Sample Quantiles
Sample Quantiles
2.0
1.5
1.0
0.5
2.0
1.5
Sample Quantiles
Sample Quantiles
2.0
1.5
1.0
0.5
2.0
1.5
1.0
0.5
1.0
0.5
FT
D
RA
Code Magus Limited
24
10
20
30
Jul 2012
Oct 2012
Jan 2013
Apr 2013
Date/Time
Jul 2013
Jan 2014
Oct 2013
Maint perf
Degrading perf
Legend
FT
D
RA
Code Magus Limited
25
Response Time (milliseconds)
26
Code Magus Limited
References
[1] metric: Metric Library Reference.
CML Document CML00006-01, Code Magus Limited, September 2010.
http://www.codemagus.com/documents/metric CML0000601.pdf .
[2] cmlxaixp: Serfboard AIX Performance Metric Probe.
CML Document CML00045-01, Code Magus Limited, June 2009.
http://www.codemagus.com/documents/cmlxaixp CML0004501.pdf .
FT
[3] cmlxwinp: Serfboard Windows Performance Metric Probe.

http://www.codemagus.com/documents/cmlxwinp CML0004801.pdf .
[4] cmlxwasp: Serfboard Websphere Application Server Performance Metric Probe.

http://www.codemagus.com/documents/cmlxwasp CML0004901.pdf .
D
RA
[5] cmlxsolp: Serfboard Solaris Performance Metric Probe.

http://www.codemagus.com/documents/cmlxsolp CML0006501.pdf .
[6] B. Aboba and G. Zorn. RADIUS Accounting Client MIB. RFC 2620 (Informational), June 1999. Obsoleted by RFC 4670.
[7] B. Aboba and G. Zorn. RADIUS Authentication Client MIB. RFC 2618 (Proposed Standard), June 1999. Obsoleted by RFC 4668.
[8] R. Austein. Applicability Statement for DNS MIB Extensions. RFC 3197 (Informational), November 2001.
[9] R. Austein and J. Saperia. DNS Resolver MIB Extensions. RFC 1612 (Historic),
May 1994.
[10] R. Austein and J. Saperia. DNS Server MIB Extensions. RFC 1611 (Historic),
May 1994.
[11] M. Baer, R. Charlet, W. Hardaker, R. Story, and C. Wang. IPsec Security Policy
Database Configuration MIB. RFC 4807 (Proposed Standard), March 2007.
[12] F. Baker. IP Forwarding Table MIB. RFC 1354 (Proposed Standard), July 1992.
Obsoleted by RFC 2096.
[13] F. Baker. IP Forwarding Table MIB. RFC 2096 (Proposed Standard), January
1997. Obsoleted by RFC 4292.
[14] J. Barnes, L. Brown, R. Royston, and S. Waldbusser. Modem Management Information Base (MIB) using SMIv2. RFC 1696 (Historic), August 1994.
27
Code Magus Limited
[15] G. Beacham, S. Kumar, and S. Channabasappa. Signaling MIB for PacketCable

and IPCablecom Multimedia Terminal Adapters (MTAs). RFC 5098 (Proposed
Standard), February 2008.
[16] E. Beili. Ethernet in the First Mile Copper (EFMCu) Interfaces MIB. RFC 5066
(Proposed Standard), November 2007. Updated by RFC 7124.
[17] E. Beili. ATM-Based xDSL Bonded Interfaces MIB. RFC 6768 (Proposed Standard), February 2013.
FT
[18] E. Beili. xDSL Multi-Pair Bonding Using Time-Division Inverse Multiplexing

(G.Bond/TDIM) MIB. RFC 6766 (Proposed Standard), February 2013.
[19] E. Beili. Ethernet in the First Mile Copper (EFMCu) Interfaces MIB. RFC 7124
(Proposed Standard), February 2014.
[20] E. Beili and M. Morgenstern. Ethernet-Based xDSL Multi-Pair Bonding
(G.Bond/Ethernet) MIB. RFC 6767 (Proposed Standard), February 2013.
[21] E. Beili and M. Morgenstern. xDSL Multi-Pair Bonding (G.Bond) MIB. RFC
6765 (Proposed Standard), February 2013.
D
RA
[22] A. Berger and D. Romascanu. Power Ethernet MIB. RFC 3621 (Proposed Standard), December 2003.
[23] R. Bergman. Job Submission Protocol Mapping Recommendations for the Job
Monitoring MIB. RFC 2708 (Informational), November 1999.
[24] R. Bergman, T. Hastings, S. Isaacson, and H. Lewis. Job Monitoring MIB - V1.0.
RFC 2707 (Informational), November 1999.
[25] R. Bergman, H. Lewis, and I. McDonald. Printer Finishing MIB. RFC 3806
(Informational), June 2004.
[26] R. Bergman, H. Lewis, and I. McDonald. Printer MIB v2. RFC 3805 (Proposed
Standard), June 2004.
[27] A. Bierman. Remote Monitoring MIB Extensions for Differentiated Services.
RFC 3287 (Proposed Standard), July 2002.
[28] A. Bierman, C. Bucci, R. Dietz, and A. Warth. Remote Network Monitoring
MIB Protocol Identifier Reference Extensions. RFC 3395 (Proposed Standard),
September 2002.
[29] A. Bierman, C. Bucci, and R. Iddon. Remote Network Monitoring MIB Protocol
Identifier Macros. RFC 2896 (Informational), August 2000.
[30] A. Bierman, C. Bucci, and R. Iddon. Remote Network Monitoring MIB Protocol
Identifier Reference. RFC 2895 (Draft Standard), August 2000. Updated by RFC
3395.
28
Code Magus Limited
[31] A. Bierman and R. Iddon. Remote Network Monitoring MIB Protocol Identifiers.
RFC 2074 (Proposed Standard), January 1997. Obsoleted by RFC 2895.
[32] A. Bierman and K. Jones. Physical Topology MIB. RFC 2922 (Informational),
September 2000.
[33] A. Bierman and K. McCloghrie. Remote Monitoring MIB Extensions for High
Capacity Alarms. RFC 3434 (Proposed Standard), December 2002.
[34] A. Bierman and K. McCloghrie. Entity MIB (Version 3). RFC 4133 (Proposed
Standard), August 2005. Obsoleted by RFC 6933.
FT
[35] A. Bierman, D. Romascanu, J. Quittek, and M. Chandramouli. Entity MIB (Version 4). RFC 6933 (Proposed Standard), May 2013.
[36] D. Brower, B. Purvy, A. Daniel, M. Sinykin, and J. Smith. Relational Database
Management System (RDBMS) Management Information Base (MIB) using
SMIv2. RFC 1697 (Proposed Standard), August 1994.
[37] N. Brownlee. Traffic Flow Measurement: Meter MIB. RFC 2064 (Experimental),
January 1997. Obsoleted by RFC 2720.
D
RA
[38] N. Brownlee. Traffic Flow Measurement: Meter MIB. RFC 2720 (Proposed
Standard), October 1999.
[39] S. Channabasappa, W. De Ketelaere, and E. Nechamkin.
Management
Event Management Information Base (MIB) for PacketCable- and IPCablecomCompliant Devices. RFC 5428 (Proposed Standard), April 2009.
[40] J. Chesterfield and B. Haberman. Multicast Group Membership Discovery MIB.
RFC 5519 (Proposed Standard), April 2009.
[41] S. Chisholm and D. Perkins. Entity State MIB. RFC 4268 (Proposed Standard),
November 2005.
[42] S. Chisholm and D. Romascanu. Alarm Management Information Base (MIB).
RFC 3877 (Proposed Standard), September 2004.
[43] S. De Cnodder, N. Jonnala, and M. Chiba. RADIUS Dynamic Authorization
Client MIB. RFC 4672 (Informational), September 2006.
[44] S. De Cnodder, N. Jonnala, and M. Chiba. RADIUS Dynamic Authorization
Server MIB. RFC 4673 (Informational), September 2006.
[45] S. Combes, P. Amundsen, M. Lambert, and H-P. Lexow. The SatLabs Group
DVB-RCS MIB. RFC 5728 (Informational), March 2010.
[46] C. DeSanti, V. Gaonkar, K. McCloghrie, and S. Gai. Fibre Channel Fabric Address Manager MIB. RFC 4439 (Proposed Standard), March 2006.
29
Code Magus Limited
[47] C. DeSanti, V. Gaonkar, K. McCloghrie, and S. Gai. MIB for Fibre Channels
Fabric Shortest Path First (FSPF) Protocol. RFC 4626 (Proposed Standard),
September 2006.
[48] C. DeSanti, V. Gaonkar, H.K. Vivek, K. McCloghrie, and S. Gai. Fibre-Channel
Name Server MIB. RFC 4438 (Proposed Standard), April 2006.
[49] C. DeSanti, F. Maino, and K. McCloghrie. MIB for Fibre-Channel Security Protocols (FC-SP). RFC 5324 (Proposed Standard), September 2008.
FT
[50] C. DeSanti, K. McCloghrie, S. Kode, and S. Gai. Fibre Channel Routing Information MIB. RFC 4625 (Proposed Standard), September 2006.
[51] C. DeSanti, H.K. Vivek, K. McCloghrie, and S. Gai. Fibre Channel Fabric Configuration Server MIB. RFC 4935 (Proposed Standard), August 2007.
[52] C. DeSanti, H.K. Vivek, K. McCloghrie, and S. Gai. Fibre Channel Registered
State Change Notification (RSCN) MIB. RFC 4983 (Proposed Standard), August
2007.
D
RA
[53] C. DeSanti, H.K. Vivek, K. McCloghrie, and S. Gai. Fibre Channel Zone Server
MIB. RFC 4936 (Proposed Standard), August 2007.
[54] R. Dietz and R. Cole. Transport Performance Metrics MIB. RFC 4150 (Proposed
Standard), August 2005.
[55] T. Dreibholz and J. Mulik. Reliable Server Pooling MIB Module Definition. RFC
5525 (Experimental), April 2009.
[56] M. Dubuc, T. Nadeau, J. Lang, and E. McGinnis. Link Management Protocol
(LMP) Management Information Base (MIB). RFC 4327 (Proposed Standard),
[57] M. Dubuc, T. Nadeau, J. Lang, E. McGinnis, and A. Farrel. Link Management
Protocol (LMP) Management Information Base (MIB). RFC 4631 (Proposed
Standard), September 2006.
[58] B. Fenner and D. Thaler. Multicast Source Discovery Protocol (MSDP) MIB.
RFC 4624 (Experimental), October 2006.
[59] J. Flick. IEEE 802.12 Interface MIB. RFC 2020 (Proposed Standard), October
1996.
[60] N. Freed and S. Kille. Mail Monitoring MIB. RFC 2249 (Proposed Standard),
[61] N. Freed and S. Kille. Network Services Monitoring MIB. RFC 2248 (Proposed
Standard), January 1998. Obsoleted by RFC 2788.
[62] N. Freed and S. Kille. Mail Monitoring MIB. RFC 2789 (Proposed Standard),
March 2000.
30
Code Magus Limited
[63] N. Freed and S. Kille. Network Services Monitoring MIB. RFC 2788 (Proposed
Standard), March 2000.
[64] M. Greene, J. Luciani, K. White, and T. Kuo. Definitions of Managed Objects
for Classical IP and ARP Over ATM Using SMIv2 (IPOA-MIB). RFC 2320
(Proposed Standard), April 1998.
[65] P. Grillo and S. Waldbusser. Host Resources MIB. RFC 1514 (Proposed Standard), September 1993. Obsoleted by RFC 2790.
FT
[66] R. Haas. Forwarding and Control Element Separation (ForCES) MIB. RFC 5813
(Proposed Standard), March 2010.
[67] B. Haberman. IP Forwarding Table MIB. RFC 4292 (Proposed Standard), April
2006.
[68] D. Harrington. Transferring MIB Work from IETF Bridge MIB WG to IEEE
802.1 WG. RFC 4663 (Informational), September 2006.
[69] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning. Springer, 2 edition, 2009.
D
RA
[70] H. Hazewinkel and D. Partain. The Differentiated Services Configuration MIB.

RFC 3747 (Proposed Standard), April 2004.
[71] IBM Corporation. Resource Measurement Facility Programmers Guide Modem
Network Modules with V.92, version 1 release 13 edition, 2011.
[72] G. James, D. Witten, T. Hastie, and R. Tibshirani. An Introduction to Statistical
Learning: with Applications in R. Springer Texts in Statistics. Springer, 2013.
[73] M. St. Johns. DOCSIS Cable Device MIB Cable Device Management Information Base for DOCSIS compliant Cable Modems and Cable Modem Termination
Systems. RFC 2669 (Proposed Standard), August 1999. Obsoleted by RFC 4639.
[74] M. St. Johns and M. Rose. Identification MIB. RFC 1414 (Historic), February
1993.
[75] B. Joshi and R. Bijlani. Protocol Independent Multicast (PIM) Bootstrap Router
MIB. RFC 5240 (Proposed Standard), June 2008.
[76] C. Kalbfleisch. Applicability of Standards Track MIBs to Management of World
Wide Web Servers. RFC 2039 (Informational), November 1996.
[77] C. Kalbfleisch, C. Krupczak, R. Presuhn, and J. Saperia. Application Management MIB. RFC 2564 (Proposed Standard), May 1999.
[78] F. Kastenholz. Implementation Notes and Experience for the Internet Ethernet
MIB. RFC 1369 (Informational), October 1992.
31
Code Magus Limited
[79] R. Kavasseri. Distributed Management Expression MIB. RFC 2982 (Proposed

Standard), October 2000.
[80] R. Kavasseri. Event MIB. RFC 2981 (Proposed Standard), October 2000.
[81] R. Kavasseri. Notification Log MIB. RFC 3014 (Proposed Standard), November
2000.
[82] G. Keeni. The Managed Object Aggregation MIB. RFC 4498 (Experimental),
May 2006.
FT
[83] S. Kille and N. Freed. Mail Monitoring MIB. RFC 1566 (Proposed Standard),
January 1994. Obsoleted by RFCs 2249, 2789.
[84] S. Kille and N. Freed. Network Services Monitoring MIB. RFC 1565 (Proposed
[85] D. King and M. Venkatesan. Multiprotocol Label Switching Transport Profile
(MPLS-TP) MIB-Based Management Overview. RFC 6639 (Informational),
June 2012.
D
RA
[86] S. Kipp, G. Ramkumar, and K. McCloghrie. The Virtual Fabrics MIB. RFC 4747
(Proposed Standard), November 2006.
[87] K. Kompella. A Traffic Engineering (TE) MIB. RFC 3970 (Proposed Standard),
January 2005.
[88] H. Lam, A. Huynh, and D. Perkins. Alarm Reporting Control Management Information Base (MIB). RFC 3878 (Proposed Standard), September 2004.
[89] G. Malkin and F. Baker. RIP Version 2 MIB Extensions. RFC 1389 (Proposed
[90] G. Malkin and F. Baker. RIP Version 2 MIB Extension. RFC 1724 (Draft Standard), November 1994.
[91] G. Mansfield and S. Kille. X.500 Directory Monitoring MIB. RFC 1567 (Proposed Standard), January 1994. Obsoleted by RFC 2605.
[92] G. Mansfield and S. Kille. Directory Server Monitoring MIB. RFC 2605 (Proposed Standard), June 1999.
[93] M. Mathis, J. Heffner, and R. Raghunarayan. TCP Extended Statistics MIB. RFC
4898 (Proposed Standard), May 2007.
[94] Jim Mauro and Richard McDougall. Solaris Internals (2Nd Edition). Prentice
Hall PTR, Upper Saddle River, NJ, USA, 2006.
[95] K. McCloghrie. Extensions to the generic-interface MIB. RFC 1229 (Proposed
Standard), May 1991. Obsoleted by RFC 1573, updated by RFC 1239.
32
Code Magus Limited
[96] K. McCloghrie. Fibre Channel Management MIB. RFC 4044 (Proposed Standard), May 2005.
[97] K. McCloghrie, F. Baker, and E. Decker. IEEE 802.5 Station Source Routing
MIB using SMIv2. RFC 1749 (Historic), December 1994.
[98] K. McCloghrie and A. Bierman. Entity MIB using SMIv2. RFC 2037 (Proposed
Standard), October 1996. Obsoleted by RFC 2737.
[99] K. McCloghrie and A. Bierman. Entity MIB (Version 2). RFC 2737 (Proposed
Standard), December 1999. Obsoleted by RFC 4133.
FT
[100] K. McCloghrie and E. Decker. IEEE 802.5 MIB using SMIv2. RFC 1743 (Draft
Standard), December 1994. Obsoleted by RFC 1748.
[101] K. McCloghrie and E. Decker. IEEE 802.5 MIB using SMIv2. RFC 1748 (Draft
Standard), December 1994. Updated by RFC 1749.
[102] K. McCloghrie, D. Farinacci, and D. Thaler. Internet Group Management Protocol MIB. RFC 2933 (Proposed Standard), October 2000. Obsoleted by RFC
5519.
D
RA
[103] K. McCloghrie, D. Farinacci, and D. Thaler. IPv4 Multicast Routing MIB. RFC
2932 (Proposed Standard), October 2000. Obsoleted by RFC 5132.
[104] K. McCloghrie, D. Farinacci, D. Thaler, and B. Fenner. Protocol Independent
Multicast MIB for IPv4. RFC 2934 (Experimental), October 2000.
[105] K. McCloghrie and R. Fox. IEEE 802.4 Token Bus MIB. RFC 1230 (Historic),
May 1991. Updated by RFC 1239.
[106] K. McCloghrie, R. Fox, and E. Decker. IEEE 802.5 Token Ring MIB. RFC 1231
(Proposed Standard), May 1991. Obsoleted by RFCs 1743, 1748, updated by
RFC 1239.
[107] K. McCloghrie and J. Galvin. Party MIB for version 2 of the Simple Network
Management Protocol (SNMPv2). RFC 1447 (Historic), April 1993.
[108] K. McCloghrie and G. Hanson. The Inverted Stack Table Extension to the Interfaces Group MIB. RFC 2864 (Proposed Standard), June 2000.
[109] K. McCloghrie and F. Kastenholz. Evolution of the Interfaces Group of MIB-II.
RFC 1573 (Proposed Standard), January 1994. Obsoleted by RFC 2233.
[110] K. McCloghrie and F. Kastenholz. The Interfaces Group MIB using SMIv2. RFC
2233 (Proposed Standard), November 1997. Obsoleted by RFC 2863.
[111] K. McCloghrie and F. Kastenholz. The Interfaces Group MIB. RFC 2863 (Draft
33
Code Magus Limited
[112] K. McCloghrie and M. Rose. Management Information Base for Network Management of TCP/IP-based internets:MIB-II. RFC 1213 (INTERNET STANDARD), March 1991. Updated by RFCs 2011, 2012, 2013.
[113] I. McDonald. IANA Charset MIB. RFC 3808 (Informational), June 2004.
[114] W. McKenzie and J. Cheng. SNA APPN Node MIB. RFC 1593 (Informational),
March 1994.
[115] D. McWalter. A MIB Textual Convention for Language Tags. RFC 5131 (Proposed Standard), December 2007.
FT
[116] D. McWalter, D. Thaler, and A. Kessler. IP Multicast MIB. RFC 5132 (Proposed
Standard), December 2007.
[117] T. Nadeau, C. Srinivasan, and A. Viswanathan. Multiprotocol Label Switching
(MPLS) Forwarding Equivalence Class To Next Hop Label Forwarding Entry
(FEC-To-NHLFE) Management Information Base (MIB). RFC 3814 (Proposed
D
RA
[118] T. Nadeau and D. Zelig. Pseudowire (PW) Management Information Base (MIB).
RFC 5601 (Proposed Standard), July 2009.
[119] D. Nelson. RADIUS Accounting Client MIB for IPv6. RFC 4670 (Informational), August 2006.
[120] D. Nelson. RADIUS Accounting Server MIB for IPv6. RFC 4671 (Informational), August 2006.
[121] D. Nelson. RADIUS Authentication Client MIB for IPv6. RFC 4668 (Proposed
[122] D. Nelson. RADIUS Authentication Server MIB for IPv6. RFC 4669 (Proposed
[123] M. ODell, H. Alvestrand, B. Wijnen, and S. Bradner. Advancement of MIB
specifications on the IETF Standards Track. RFC 2438 (Best Current Practice),
October 1998.
[124] J. Pastor and M. Belinchon. Stream Control Transmission Protocol (SCTP) Management Information Base (MIB). RFC 3873 (Proposed Standard), September
2004.
[125] M. Patrick and W. Murwin. Data Over Cable System Interface Specification
Quality of Service Management Information Base (DOCSIS-QoS MIB). RFC
4323 (Proposed Standard), January 2006.
[126] R. Presuhn. Management Information Base (MIB) for the Simple Network Management Protocol (SNMP). RFC 3418 (INTERNET STANDARD), December
2002.
34
Code Magus Limited
[127] B. Ray and R. Abbi. High Capacity Textual Conventions for MIB Modules Using Performance History Based on 15 Minute Intervals. RFC 3705 (Proposed
Standard), February 2004.
[128] G. Renker and G. Fairhurst. MIB for the UDP-Lite protocol. RFC 5097 (Proposed Standard), January 2008.
[129] J.K. Reynolds. Reassignment of experimental MIBs to standard MIBs. RFC
1239 (Historic), June 1991.
FT
[130] D. Romascanu. Remote Monitoring MIB Extensions for Interface Parameters

Monitoring. RFC 3144 (Proposed Standard), August 2001.
[131] M.T. Rose. Management Information Base for network management of TCP/IPbased internets: MIB-II. RFC 1158 (Proposed Standard), May 1990. Obsoleted
by RFC 1213.
[132] M.T. Rose. SNMP MUX protocol and MIB. RFC 1227 (Historic), May 1991.
[133] M.T. Rose and K. McCloghrie. Concise MIB definitions. RFC 1212 (INTERNET
STANDARD), March 1991.
D
RA
[134] G. Satz. CLNS MIB for use with Connectionless Network Protocol (ISO 8473)
and End System to Intermediate System (ISO 9542). RFC 1238 (Experimental),
June 1991.
[135] J. Schoenwaelder. Translation of Structure of Management Information Version
2 (SMIv2) MIB Modules to YANG Modules. RFC 6643 (Proposed Standard),
July 2012.
[136] J. Schoenwaelder and J. Quittek. Script MIB Extensibility Protocol Version 1.0.
RFC 2593 (Experimental), May 1999. Obsoleted by RFC 3179.
[137] J. Schoenwaelder and J. Quittek. Script MIB Extensibility Protocol Version 1.1.
[138] G. Schudel, A. Jain, and V. Moreno. Locator/ID Separation Protocol (LISP) MIB.
[139] Y. Shi, D. Perkins, C. Elliott, and Y. Zhang. Control and Provisioning of Wireless
Access Points (CAPWAP) Protocol Base MIB. RFC 5833 (Informational), May
2010.
[140] Y. Shi, D. Perkins, C. Elliott, and Y. Zhang. Control and Provisioning of Wireless
Access Points (CAPWAP) Protocol Binding MIB for IEEE 802.11. RFC 5834
(Informational), May 2010.
[141] A. Siddiqui, D. Romascanu, and E. Golovinsky. Real-time Application Qualityof-Service Monitoring (RAQMON) MIB. RFC 4711 (Proposed Standard), October 2006.
35
Code Magus Limited
[142] R. Sivaramu, J. Lingard, D. McWalter, B. Joshi, and A. Kessler. Protocol Independent Multicast MIB. RFC 5060 (Proposed Standard), January 2008.
[143] R. Smith, F. Wright, T. Hastings, S. Zilles, and J. Gyllenskog. Printer MIB. RFC
1759 (Proposed Standard), March 1995. Obsoleted by RFC 3805.
[144] C. Srinivasan, A. Viswanathan, and T. Nadeau. Multiprotocol Label Switching
(MPLS) Label Switching Router (LSR) Management Information Base (MIB).
RFC 3813 (Proposed Standard), June 2004.
FT
[145] K. Tesink. Textual Conventions for MIB Modules Using Performance History
Based on 15 Minute Intervals. RFC 2493 (Proposed Standard), January 1999.
Obsoleted by RFC 3593.
[146] K. Tesink. Textual Conventions for MIB Modules Using Performance History
Based on 15 Minute Intervals. RFC 3593 (Draft Standard), September 2003.
[147] D. Thaler. IP Tunnel MIB. RFC 2667 (Proposed Standard), August 1999. Obsoleted by RFC 4087.
D
RA
[148] D. Thaler. Multicast Address Allocation MIB. RFC 3559 (Proposed Standard),
June 2003.
[149] D. Thaler. IP Tunnel MIB. RFC 4087 (Proposed Standard), June 2005.
[150] D. Throop. SNMP MIB Extension for the X.25 Packet Layer. RFC 1382 (Proposed Standard), November 1992.
[151] D. Throop. SNMP MIB extension for Multiprotocol Interconnect over X.25. RFC
1461 (Historic), May 1993.
[152] D. Throop and F. Baker. SNMP MIB Extension for X.25 LAPB. RFC 1381
(Proposed Standard), November 1992.
[153] S. Waldbusser. Token Ring Extensions to the Remote Network Monitoring MIB.
RFC 1513 (Historic), September 1993.
[154] S. Waldbusser. Application Performance Measurement MIB. RFC 3729 (Proposed Standard), March 2004.
[155] S. Waldbusser, R. Cole, C. Kalbfleisch, and D. Romascanu. Introduction to the
Remote Monitoring (RMON) Family of MIB Modules. RFC 3577 (Informational), August 2003.
[156] S. Waldbusser and P. Grillo. Host Resources MIB. RFC 2790 (Draft Standard),
March 2000.
[157] S. Waldbusser, J. Saperia, and T. Hongal. Policy Based Management MIB. RFC
4011 (Proposed Standard), March 2005.
36
Code Magus Limited
[158] R. Waterman, B. Lahaye, D. Romascanu, and S. Waldbusser. Remote Network

Monitoring MIB Extensions for Switched Networks Version 1.0. RFC 2613
(Draft Standard), June 1999.
[159] K. White and R. Moore. Definitions of Protocol and Managed Objects for
TN3270E Response Time Collection Using SMIv2 (TN3270E-RT-MIB). RFC
2562 (Proposed Standard), April 1999.
[160] B. Wijnen and A. Bierman. IANA Guidelines for the Registry of Remote Monitoring (RMON) MIB modules. RFC 3737 (Proposed Standard), April 2004.
FT
[161] D. Zelig, R. Cohen, and T. Nadeau. Synchronous Optical Network/Synchronous

Digital Hierarchy (SONET/SDH) Circuit Emulation over Packet (CEP) MIB Using SMIv2. RFC 6240 (Proposed Standard), May 2011.
[162] D. Zelig and T. Nadeau. Ethernet Pseudowire (PW) Management Information
Base (MIB). RFC 5603 (Proposed Standard), July 2009.
[163] D. Zelig and T. Nadeau. Pseudowire (PW) over MPLS PSN Management Information Base (MIB). RFC 5602 (Proposed Standard), July 2009.
D
RA
[164] G. Zorn and B. Aboba. RADIUS Accounting Server MIB. RFC 2621 (Informational), June 1999. Obsoleted by RFC 4671.
[165] G. Zorn and B. Aboba. RADIUS Authentication Server MIB. RFC 2619 (Proposed Standard), June 1999. Obsoleted by RFC 4669.

CodeMagus PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CodeMagus PDF

Uploaded by

Copyright:

Available Formats

FT

Code Magus Limited (England reg. no. 4024745)

Code Magus Limited

2 Telemetry for Monitoring

3 Hypothesis Testing and Fault Determination

5 Processing Metric Data

Adaptive Statistical Learning Applied to Telemetry Data for MonitoringCML00097-01

Code Magus Limited

Components in instrumented application landscape . . . . . . . . . . .

Adaptive Statistical Learning Applied to Telemetry Data for MonitoringCML00097-01

Code Magus Limited

Adaptive Statistical Learning Applied to Telemetry Data for MonitoringCML00097-01

Code Magus Limited

2 Telemetry for Monitoring

Typically, telemetry metric values or counts arises from application instrumentation

Code Magus Limited

Adaptive Statistical Learning Applied to Telemetry Data for MonitoringCML00097-01

Code Magus Limited

Tools for Graphically

Instrumented Application Metrics

Host Resource and Performance Data

Instrumented Application Metrics

Rules Engines for

Figure 1: Components in instrumented application landscape

Adaptive Statistical Learning Applied to Telemetry Data for MonitoringCML00097-01

Code Magus Limited

3 Hypothesis Testing and Fault Determination

Adaptive Statistical Learning Applied to Telemetry Data for MonitoringCML00097-01

Code Magus Limited

representing the previous updates in the following form:

Introducing the n-th metric update, xn , results in the following:

x2i ) for some j 0.

The content of the telemetry update is as follows:

Training data set

Test data set

Code Magus Limited

The derivation of the sample variance (s2 ) follows from

Adaptive Statistical Learning Applied to Telemetry Data for MonitoringCML00097-01

Code Magus Limited

Together with the values

An alarm can also be raised on condition that H0 is repeatedly or consistently rejected

gives the time of the day in local time.

More generally though, strftime(3) can be applied to the time-stamp to produce a

Adaptive Statistical Learning Applied to Telemetry Data for MonitoringCML00097-01

Code Magus Limited

nil nik 30 and

5 Processing Metric Data

Adaptive Statistical Learning Applied to Telemetry Data for MonitoringCML00097-01

Code Magus Limited

(interval, interval value 1, interval value 2, interval value 3)

(time-stamp, group-name, metric-name, n,

Adaptive Statistical Learning Applied to Telemetry Data for MonitoringCML00097-01

Code Magus Limited

Code Magus Limited

sets to compare against later test distributions in order to detect anomalies.

DETAIL METRIC TABLE

In order to determine whether or not there is a possible anomaly, the historical/training

Adaptive Statistical Learning Applied to Telemetry Data for MonitoringCML00097-01

Code Magus Limited

Adaptive Statistical Learning Applied to Telemetry Data for MonitoringCML00097-01

Copyright (c) 2014 Code Magus Limited. All rights reserved.

Code Magus Limited