You are on page 1of 12

Contents

List of Tables ........................................................................................................................................... 1


List of Figures .......................................................................................................................................... 2
Part One .................................................................................................................................................. 3
1.1.

Daily Rainfall........................................................................................................................ 3

1.2.

Monthly Rainfall .................................................................................................................. 4

1.3.

Annual Rainfall .................................................................................................................... 4

Part Two .................................................................................................................................................. 5


Part Three ............................................................................................................................................... 9
References ............................................................................................................................................ 12

List of Tables
Table 1: Estimated Percentile values for Exponential Distribution ( 4.1) ....................................... 3
Table 2: Estimations of Distribution Fits for Daily Rainfall Data .......................................................... 3
Table 3: Estimated Percentile values for Gamma Distribution ( 2.8; 25.5) ............................... 4
Table 4: Estimated Percentile values for Normal Distribution ( 851; 117)................................ 4
Table 5: Model Summary of SAAR vs Elevation Regression ................................................................. 5
Table 6: Coefficients of SAAR vs Elevation Regression ......................................................................... 5
Table 7: Best combinations of SAAR regression variables .................................................................... 6
Table 8: Model Summary of revised SAAR regression .......................................................................... 7
Table 9: Coefficients of revised SAAR regression model ...................................................................... 7
Table 10: Model application results from ungauged site ..................................................................... 8
Table 11: Trend model details and uncertainty parameters .............................................................. 12

List of Figures
Figure 1: Top - Histogram of Daily Rainfall ............................................................................................. 3
Figure 2: Top - Histogram of Monthly Rainfall ........................................................................................ 4
Figure 3: Top - Histogram of Annual Rainfall .......................................................................................... 4
Figure 4: Scatterplot of Regression relation (SAAR versus Elevation) .................................................... 5
Figure 5: Residual Plots of SAAR Regression with Elevation ................................................................... 6
Figure 6: Residual plots of refined SAAR regression model .................................................................... 7
Figure 7: Correlogram of Daily Rainfall in the Eden Catchment (30-day lag) ......................................... 9
Figure 8: Sample time series plot of Daily Rainfall Data . ....................................................................... 9
Figure 9: Autocorrelation (12 month lag) plot for Monthly Rainfall in the Eden Catchment ................. 9
Figure 10: Autocorrelation (10 year lagged) of Annual rainfall in the Eden Catchment. ..................... 10
Figure 11: Autocorrelation (12 month lagged) for mean monthly flows ............................................. 10
Figure 12: Correlogram of deseasonalised monthly flow data (12 month lagged) .............................. 10
Figure 13: Mean Annual Temperatures for Central England for 1659 - 2011 ...................................... 11
Figure 14: Mean Annual Temperature of Central England (1701 - 1800) ............................................ 11
Figure 15: Mean Annual Temperature of Central England (1801 - 1900) ............................................ 12
Figure 16: Mean Annual Temperature of Central England (1901 - 2000) ............................................ 12

Part One
This part is aimed at understanding the various distributions that estimate the likelihood of given
probabilistic events. The stochastic nature of precipitation events present clear examples of events
that need to be estimated from some established probability distribution.
The objective of this part is to analyse various rainfall frequencies from the Eden Catchment over a
40 year period to determine general shapes of individual probability density curves. For each curve,
the properties and resultant estimated catchment parameters would be presented as well.

1.1.

Daily Rainfall

It is evident from the plot of relative frequencies of the various daily rainfall occurrences that the
exponential distribution estimates the daily rainfall depth reasonably. The cumulative distribution
function also fits the distribution.
The daily distribution has an approximated mean () of 4.1mm. For the given record length, there
were 8239 wet days. The depth of rainfall
on these days varied between 0.1mm and
79mm. This wide range complicates the
identification of distribution fits for daily
rainfall data, especially for estimations of
rainfall depths close to zero. First
comparison assessments on Minitab with
given candidate distributions (Normal,
Exponential, 3-parameter lognormal and
Gamma) produced Table 2.
It was noticed that no particular
distribution immediately fit the data
accurately (as shown by p values < 0.005).
However, a reduction in the range of values
caused by raising the calculation threshold
significantly reduced the Anderson-Darling
(AD) statistic for the exponential
distribution alone. Additionally, visual
comparison confirmed this selection.
Table 1: Estimated Percentile values for Exponential
Distribution ( 4.1)
P( X x )
0.10 (10th Percentile)
0.50 (50th Percentile)
0.90 (90th Percentile)
0.99 (99th Percentile)

x (mm)
0.4
2.8
9.4
18.7

Figure 1: Top - Histogram of Daily Rainfall; Bottom Cumulative


Distribution Plot

Table 2: Estimations of Distribution Fits for Daily Rainfall Data


Distribution
Normal
Exponential
3-Parameter Lognormal
Gamma

No Threshold (All non-zero values)


AD
P
LRT P
675.409
< 0.005
238.213
< 0.003
60.101
*
0.000
54.904
< 0.005

Threshold (Values > 0.4mm)


AD
P
LRT P
511.455 < 0.005
58.677
< 0.003
35.517
*
0.000
67.735
< 0.005

1.2.

Monthly Rainfall

The monthly rainfall during the record


length in the Eden catchment had 469
observations ranging between 0.9 and
228.5 mm. The frequency distribution and
cumulative distribution plot is shown
below. Selection of the best-fitting
distribution followed the procedure for
daily rainfall above. Of the candidate
distributions tested, the Gamma
distribution estimated the monthly data
best (with AD = 0.472 and p > 0.250). The
shape () and scale () of the Gamma
distribution are given (approximately) as
2.8 and 25.5 respectively. These values
combine () to give a mean monthly
rainfall for the record length of 71.2mm
and an approximated standard deviation
(2)0.5 of 43mm.
`Table 3: Estimated Percentile values for Gamma
Distribution ( 2.8; 25.5)
P( X x ) x (mm)
0.10 (10th Percentile) 24.9
0.50 (50th Percentile) 62.9
0.90 (90th Percentile) 128.3
0.99 (99th Percentile) 205.2

Figure 2: Top - Histogram of Monthly Rainfall; Bottom - Cumulative


Distribution Plot of Monthly Rainfall
Histogram of Annual Rainfall (mm)
Normal Distribution Fit

Annual Rainfall

The annual rainfall as expected


approximately followed a normal
distribution. Descriptive parameters of the
estimated curve are the mean ()
approximately 851mm and standard
deviation () approximately 117mm for 39
years. Best fit selection process followed as
above.

14
12

Frequency

1.3.

16

10
8
6
4
2
0

600

720

840

960

1080

Annual Rainfall (mm)

Table 4: Estimated Percentile values for Normal


Distribution ( 851; 117)
P( X x ) x (mm)
0.10 (10th Percentile) 701.2
0.50 (50th Percentile) 851.3
0.90 (90th Percentile) 1001.4
0.99 (99th Percentile) 1123.7

Figure 3: Top - Histogram of Annual Rainfall; Bottom - Cumulative


Distribution Plot of Annual Rainfall

Part Two
Part One somewhat highlighted the variation of rainfall at various temporal scales. This part aims at
defining relationships between Standard Annual Average Rainfall (SAAR) and geospatial variables in
the Eden Catchment. To achieve this, regression analysis would be used to test the dependence and
the resulting model would be used to predict a possible scenario (within stated margins of
uncertainty) given a specific location.
In this case, the predictor variables given for the analysis are Elevation (Elev), Easting (E) and
Northing (N). First glances at the catchments Digital Elevation and Interpolated Annual Rainfall maps
suggest some correlation, especially in the lower lying areas of the catchment. (See Appendix).
Similar spatial variation is also evident in the steady decrease in rainfall with Northward progress,
but few difficulties arise in visual East West estimations.

Equation 1: Regression Model of SAAR with Elevation

SAAR (mm) = 523.8 + 2.565 Elevation (m)

Interpretation of the model results


presented in Table 5 and Table 6 show
reasonable prediction of the SAAR with
considerably small standard errors in the
coefficients (R2 = 0.716; p < 0.005).
Table 5: Model Summary of SAAR vs Elevation
Regression

Standard Annual Average Rainfall (mm)

2200

Initial regression of SAAR with Elevation


produced the following model:

2000
1800
1600
1400
1200
1000
800
600
100

200

300

400

500

Elevation (m)

S (mm)

R2

R2 (adjusted)

PRESS

Figure 4: Scatterplot of Regression relation (SAAR versus Elevation)


R2 (predictive)

194.2

71.60%

70.46%

1143016

65.55%

Table 6: Coefficients of SAAR vs Elevation Regression


Term Coef
SE Coef
95% CI
T-Value
Constant 523.8 90.7
(337.1, 710.5) 5.78
Elevation (m) 2.565 0.323
(1.900, 3.231) 7.94

P-Value
0.000
0.000

Although the histogram of the residuals Figure 5 seem not to follow a normal distribution at visual
inspection of the histogram, analysis of the residuals give some evidence of normality at 95%
confidence (Mean of residuals = -0.01; AD = 0.483; p = 0.212). It is worthy of note that the sample
size of the distribution in question may play a major role in this seeming contradiction, as small
sample sizes usually always pass statistical normality tests (Machiwel & Jha, 2012). Nevertheless, the
normal probability plot shows points clustered about the normal line. The functional form accuracy
assumption that the residuals follow a normal distribution is thus satisfied.
The residuals also show random patterns about the centre line (and no clustering) when plotted in
order of observation. This characteristic satisfies the assumption that the residuals are not
correlated with one another.
Examination of the residuals plotted against fitted values shows an increase of variance from left to
right. This gives evidence of non-constant variance and violates the assumption of homoscedasticity.

6
This violation affects the validity of the model. Thus, the model may require refinement. Either by
transformation of the response variable or by inclusion of other predictor variables.

Figure 5: Residual Plots of SAAR Regression with Elevation

Subsequent manipulation of the variables in Minitab to select the optimal (high R2, significant p
values, low errors and few variables) combination of terms produced the following summary table:
Table 7: Best combinations of SAAR regression variables
Model Summary
2
No of
R
R2
R2 (adjusted)
Mallows CP
Variables
(predictive)
1
71.6
70.5
65.6
10.6
1
54.7
52.9
46.3
30.5
2
77.6
75.8
67.8
5.4
2
72.6
70.3
65.5
11.4
All
80.5
78.0
66.4
4.0

S (mm)
194.16
245.08
175.79
194.71
167.56

Variable Combination
Elevation
Easting
Northing
(m)
X (p=0.000)
X (p=0.000)
X (p=0.000)
X (p=0.000)

X (p=0.000)
X (p=0.018)
X (p=0.363)
X (p=0.077)

X (p=0.005)

The results (Table 7) clearly show that elevation (orographic uplift or cloud seeding) is the major
physical determinant of rainfall for this catchment as its variations are the most significant
determinant of responses in SAAR. This observation of the predominant physical activity through
statistics would assist in the interpretation of the other physical effects that generally determine wet
and dry areas within the catchment.
The combination of results also show that the model with all three variables also quite reasonably
models the responses. The added terms generally improve the models ability to fit responses to
changes in the variables (Adjusted R2 = 0.78). Comparing the coefficients, we still find very high
significance of Elevation to the overall annual rainfall model (p = 0.000). All other terms except the
Easting (p = 0.077) show high levels of significance to response fitting.
This may suggest that the Easting variable is not very useful to the model, and the model may
reproduce similar responses in SAAR without it. Indeed, the model which predicts SAAR from
Elevation and Northing alone has a higher predictive R2 value. Nevertheless, a trade-off is made for

7
fitness of model (Mallows Cp) and standard difference of the predicted results from actual
observances shown in the S (mm) values. It may thus be concluded that regression of SAAR with
Elevation, and the included variables of Easting and Northing seems practical enough to be used for
subsequent predications.
The revised regression produced the following model summarized in Table 8:
Equation 2: Revised Regression Model of SAAR

SAAR (mm) = 12633 + 2.142 Elevation (m) 0.009 Easting 0.017 Northing
Table 8: Model Summary of revised SAAR regression
S (mm)
R2
R2 (adjusted) PRESS
R2 (predictive)
167.559

80.54%

78.00%

1115354

66.39%

Table 9: Coefficients of revised SAAR regression model


Term Coef
SE Coef
95% CI
Constant 12633
3758
(4859, 20407)
Elevation (m) 2.142
0.388
(1.339, 2.945)
Easting -0.00900 0.00487 (-0.01907, 0.00107)
Northing -0.01684 0.00549 (-0.02820, -0.00548)

T-Value
3.36
5.52
-1.85
-3.07

P-Value
0.003
0.000
0.077
0.005

VIF
1.94
1.52
1.88

Details of the model in Table 9 show that average annual rainfall within in the catchment increases
(positive coefficients) with higher progress towards higher elevations but decreases (negative
coefficients) with progress in northward and eastward directions. Prior understanding of the
predominant effect of orographic uplift (or cloud seeding) within the catchment and visual
inspection of catchment area maps assist detecting physical patterns. The catchment maps show
highland areas on the southern and eastern boundaries with lower lying areas towards the north.
Comparing the average rainfall map with DEM map (see Appendix), the low-lying northern reaches
of the catchment receive less rainfall. However, even the highlands in the eastern boundaries get
significantly less amounts of rainfall. This corresponds with the model predictions and can be
interpreted to mean a rain shadow effect caused by the highlands in the south-west shading rain

Figure 6: Residual plots of refined SAAR regression model

8
laden predominant south westerly winds (Pollock, et al., 2013). Relation of results with the given
2005 rainfall map which shows intense raining in the eastern highlands may be due to enhanced
cloud seeding during a convective storm.
Figure 6 shows residual plots which test the validity of the refined linear model. The residuals clearly
follow a normal distribution (Mean = -0.0000; AD = 0.283; p = 0.607) and as in the first model, the
residuals show a random pattern when plotted against record order. This random pattern of
residuals against order gives evidence that errors are not correlated with one another. This statistic
is also represented in Table 9 (all VIF values relatively close to 1).
However, the residuals in this revised model still show evidence of non-constant variance. This may
be due to missing variables in the model. From previous analysis, direction of slope (aspect)
combined with elevation may give better predictions of the SAAR responses in the catchment.
Thus applying this model to an ungauged site, its shortfalls must be taken into consideration as
predictions are accurate only if the model represents the true relationship. Given such a site, with
predictor variables: Elevation 400m, Easting 380000; Northing 500000, SAAR can be estimated
as follows:
Table 10: Model application results from ungauged site
Estimated SAAR SE Fit
95% CI
95% PI
1648.3 mm 65.1
(1513.7, 1782.8) (1276.4, 2020.1)

From the above table, it is predicted that the SAAR is 1648.3mm (given a set of parameters) at 95%
confidence interval. This shows that there is a 95% chance that the true mean (expected value) of
SAAR lies between 1513.7mm and 1782.8mm. On the other hand, the prediction interval gives the
range of values that are likely to contain the particular estimated value 95% of times. This interval
has a wider range of values because it seeks to predict a particular value from a range rather than
the mean of a sample (a wider set) of values from the same range. Therefore, even if the model
rightly represents the expected value of responses given a set of variables, its representation of any
particular response given the same set of variables is at best a crude estimate.

Part Three
Autocorrelation Function for Daily Rainfall
(with 5% significance limits for the autocorrelations)
1.0
0.8
0.6

Autocorrelation

Part three focusses on the


temporal relationship of
events with themselves and
one another. The temporal
focus aids understanding of
specific processes by
investigating behaviour
through time. This
understanding is crucial in
decision making, optimized
engineering design accurate
prediction because of the
dependence of future events
on past and present events.

0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
2

10

12

14

16

18

20

22

24

26

28

30

Lag

Figure 7: Correlogram of Daily Rainfall in the Eden Catchment (30-day lag)

Autocorrelation

Rainfall Depth (mm)

Statistical tools assist the


35
detection of time-based patterns in data
30
and provide methods of analysis. One of
25
such analytical tools is autocorrelation,
20
which investigates the inherent memory
or influence of a process on itself
15
(Machiwel & Jha, 2012). Figure 7 shows a
10
correlogram of daily rainfall data from the
5
Eden catchment, lagged at 30 days, to
0
understand monthly variations. The
1
18
36
54
72
90
108
126
144
162
180
Time Step Index (Days)
correlogram shows strong autocorrelation
which still have significant effects few days Figure 8: Sample time series plot of Daily Rainfall Data showing
persistence (autocorrelation) in rainfall data.
after. This correlation is
observed physically as the
Autocorrelation Function for Monthly Rainfall
(with 5% significance limits for the autocorrelations)
tendency for events to
persist in occurrence. Figure
1.0
0.8
8 highlights clear evidence
0.6
of this persistence in the red
0.4
ovals that highlight dry days
0.2
following dry days or wet
0.0
days following wet days.
-0.2

-0.4
Monthly rainfall also shows
-0.6
strong autocorrelation with
-0.8
the immediately succeeding
-1.0
month. However, this effect
1
2
3
4
5
6
7
8
9
10
11
12
wanes significantly after one
Lag
month. This persistence or
Figure 9: Autocorrelation (12 month lag) plot for Monthly Rainfall in the Eden
Catchment
influence by antecedent conditions
is not evident in annual autocorrelation of rainfall at 10 year lags
(Figure 10). This is usually due to changes in the environment and the dissipation of physical inertia
over time. Rainfall is generally a phenomenon that responds quickly to alterations in atmospheric

10

Autocorrelation Function for Annual Rainfall


(with 5% significance limits for the autocorrelations)
1.0
0.8
0.6

Autocorrelation

conditions. It is therefore,
more likely to exert
influence over subsequent
events in its time series only
for a relatively short period
as antecedent conditions
vary rapidly.

0.4
0.2
0.0

Autocorrelation

This dependence on
-0.2
-0.4
antecedent conditions is also
-0.6
demonstrated in the
-0.8
correlogram for stream flow
-1.0
time series. High flows tend
1
2
3
4
5
6
7
8
9
10
to follow high flows and low
Lag
flows have a higher chance
Figure 10: Autocorrelation (10 year lagged) of Annual rainfall in the Eden Catchment.
of succeeding low flows.
Autocorrelation Function for Monthly Flow
Because time series are
(with 5% significance limits for the autocorrelations)
usually a combination of
1.0
several complex and
0.8
intricately correlated
0.6
components, it is sometimes
0.4
possible for a certain
0.2
component to mask the
0.0
-0.2
detection of another
-0.4
component. This masking
-0.6
prevents proper
-0.8
understanding of the
-1.0
masked component, which
1
2
3
4
5
6
7
8
9
10
11
12
may be crucial to overall
Lag Time (12 months)
insight into the behaviour of Figure 11: Autocorrelation (12 month lagged) for mean monthly flows at Eden
Sheepmouth (1970 - 2000)
the time series. A clear
example of this masking effect is the effect of seasonality component on trend component.

This process of stripping is


called deseasonalisation. To

Autocorrelation Function for Deseasonalised Monthly Flows


(with 5% significance limits for the autocorrelations)
1.0
0.8
0.6

Autocorrelation

Stream flows are known to


follow seasonal patterns of
high and low flows.
However, other factors such
as land use variations which
are not seasonal may affect
stream flow. It is therefore
necessary to strip the stream
flow series of its seasonality
component to determine the
significance of stream flow
variation caused by other
factors.

0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1

10

11

12

Lag

Figure 12: Correlogram of deseasonalised monthly flow data (12 month lagged)

11
achieve this, the difference between the observed data is standardized using the standard deviation.
This ensures that monthly variations are significantly different from seasonal variations. The formula
used for deseasonalising the data is:
=

( )

: = observed flow for month; = mean for calendar month


= standard deviation for calendar month; = calendar month in question
The resulting correlogram in Figure 12 shows persistence extended only to adjacent months.

Figure 13: Mean Annual Temperatures for Central England for 1659 - 2011

When seasonality is understood and


addressed, it is then possible to view
trends. Trend analysis is immediately
central to forecasting and projections, and
ultimately quintessential to decision making
processes which may rely on forecasts and
projections. Because forecasting and
projection models are good only if they
represent the true behaviour of the system,
the (partial duration) time series used to
detect a general trend must be
Figure 14: Mean Annual Temperature of Central England (1701 representative of the entire system. This
1800)
property is called ergodicity. The difficulty
of obtaining a representative time series is primarily due to record length limits. All behaviour which
precede the first available records can only be crudely guessed while behaviour (trends) which
succeeds record length can be predicted within reasonable uncertainty limits.

12
The importance of record lengths to
developing decision support systems is
illustrated clearly in the following graph
(Figure 13) of mean annual temperature in
Central England from 1659 2011. This
temperature series has been split into three
century long partial duration series (Figure
14, Figure 15 and Figure 16).
Each partial series exhibits a unique trend
applicable only within its record length and
does not conform to the overall trend of
the entire series. This highlights the danger
of extrapolating outside the range of
predictor values. It is therefore imperative
to understand the uncertainty of the data
record period available for use and calibrate
decision support models to reflect such
unknowns accordingly.

Figure 15: Mean Annual Temperature of Central England (1801 - 1900)

The general upward trend of mean annual


temperatures displays the nonhomogeneity of the mean. This must either
be due to changes in the method of data
collection and/or the environment
Figure 16: Mean Annual Temperature of Central England (1901 2000)
(Machiwel & Jha, 2012). Variations in the
environment due to climate change are possible causes for this non-homogeneity.
Table 11: Trend model details and uncertainty parameters

Record Length

Trend Equation
Y(t) = 9.31 - 0.003t

1701 1800
1801 1900 Y(t) = 9.10 + 0.00036t
1901 - 2000 Y(t) = 9.16 + 0.007t
Complete Series Y(t) = 8.76 + 0.003t

Mean Absolute
Percentage Error

Mean Absolute
Deviation

Mean Squared
Deviation

4.9%
5.5%
4.1%
5.3%

0.44
0.49
0.39
0.48

0.34
0.38
0.24
0.37

It must be emphasized nonetheless that the errors shown in Table 11 give error margins only for the
record length supplied to Minitab. This may mean that making extrapolations from one time window
to the next, additional error terms must be included, thus increasing uncertainty.

References
Machiwel, D. & Jha, M., 2012. Hydrologic Time Series Analysis. New Delhi: Capital Publishing
Company.
Pollock, M. et al., 2013. World Meteorological Organisation. [Online]
Available at: http://www.wmo.int/pages/prog/www/IMOP/publications/IOM-116_TECO2014/Session%203/O3_9_Pollock_Accurate_Rainfall_measurement.pdf
[Accessed 9 December 2014].

You might also like