You are on page 1of 9

A tour operator charges Rs 136 per passenger for 100 passengers with a discount of Rs 4 each for 10 passengers in excess

of 100. Determine the number of passengers that will maximize the amount of money the tour operator receives. Solution: Let x be the no. of passengers and R(x) be the revenue function. The function R (x) should be maximized to find the no. of passengers that will maximize the amount of money the tour operator receives. The revenue from each passenger in case of the no. of passengers exceed 100 is 136-4/10 (x-10)for x100 The total revenue function R(x), is given by R(x) = 136 x for x 100 R(x) = x [136-4/10 (x-10)] =136 x-2/5 x^2+40 x =176x-2/5 x^2 for x100 R(x) is maximum when R(x)=0, R( x) <0. Therefore, R(x)=0 =176-4/5 x=0 x=220 R (x) = -4/5= -ve Hence R(x) is maximum when x=220. Therefore, when 220 passengers are travelling the tour operator receives the maximum money. Calculate Bowleys coefficient of skewness (based on quartiles) from the following data Weight (lbs) No. of students Weight (lbs) No. of students 70-80 12 110-120 50 80-90 18 120-130 45 90-100 35 130-140 20 100-110 42 140-150 8

Solution: Weight (lbs) No. of students Cumulative Frequency 70-80 12 12 80-90 18 30 90-100 35 65 100-110 42 107 110-120 50 157 120-130 45 202 130-140 20 222 140-150 8 230 Computation ofQ_1: SinceN/4=230/4=57.5, the first Quartile class is 90-100. Thus l_(Q_1 )=90, C=30, f_(Q_1 )=35 and h=10. Therefore, Q_1=l_(Q_1 )+ ( N/4-C)/f_(Q_1 ) *h =90+ (57.5-30)/35*10 = 97.85 Q_1=97.85 Computation of M_d (Q_(2)): SinceN/2=230/2=115, the median class is 110-120. Thus l_m=110, C=107, f_m=50 and h=10. Therefore, Q_2=l_m+ ( N/2-C)/f_m *h =110+ (115-107)/50*10 = 111.6

Q_2=111.6 Computation ofQ_3: Since3N/4=(3 * 230)/4=172.5, the median class is 120-130. Thus l_(Q_3 )=120, C=157, f_(Q_3 )=45 and h=10. Therefore, Q_3=l_(Q_3 )+ ( 3N/4-C)/f_(Q_3 ) *h =120+ (172.5-157)/45*10 = 123.44 Q_3=123.44 Hence, Bowleys Coefficient S_(Q=) (Q_3+Q_1-2 Q_2)/(Q_3-Q_1 ) = (123.44+97.852*111.6)/(123.44-97.85) =(-1.91)/25.59 = -0.075 S_(Q=)-0.075 A normal curve has mean=20 and standard deviation = 10. Find the area between x1=15 and x2=40. Solution: Given mean, X = 20, Standard Deviation, =10 Between x_1 = 15 and x_2 = 40 Z_1 = SNV (Standard Normal Variate) corresponding to 15 = (x_1-X )/( ) = (15-20 )/10 = (-5 )/10 = -0.5 Z_2 = SNV corresponding to 40 = (x_2-X )/( ) = (40-20 )/10 = (20 )/10 = +2.0 Required Area = Area between Z = -0.5 and Z=0 + Area between Z = 0 and Z = +2 =0.1915+0.4772 = 0.6687 A simple random sample of the height of 6400 Englishmen has a mean of 67.85 and a standard deviation of 2.56 while a simple random sample of heights of 1600 Austrians has a mean of 68.55 and standard deviation of 2.52. Do the data indicate that the Austrians are on average taller than the Englishmen? Give reasons for our answer. Solution: H_0:_(1=) _2 against H_1:_(1>) _2 where _(1 and ) _2 are denoting mean height of Austrian, and Englishmen respectively. Given: Austrian n_1=1600, (x_1 ) =68.55, S_1=2.52 and Englishmen n_2=6400, (x_2 ) =67.85, S_2=2.56 Where _((x_1 ) -(x_2 ) ) = ((S_1^2/n_1 @ + S_2^2/n_2 @ )) = (((2.52)^2/1600@ + (2.56)^2/6400@ )) We compute Z = ((x_1 ) -(x_2 ) )/_((x_1 ) -(x_2 ) ) = (68.55-67.85)/(((2.52)^2/1600@ + (2.56)^2/6400@ )) = 9.9 Taking = 0.05 (significance level value), Z_ = 2.58 As calculated Z = 9.9 > Z_ H_0 is rejected, Austrians are on the average taller than the Englishmen. Write short notes on: One-tail & two-tail tests Solution: one-tail test If we are using a significance level of 0.05, a one-tailed test allots our entire alpha to testing the statistical significance in the one direction of interest. This means that .05 is in one tail of the distribution of our test statistic. When using a one-tailed test, we are testing for the possibility of the relationship in one direction and completely disregarding the possibility of a relationship in the other direction. Example- comparing the mean of a sample to a given value x using a t-test. Our null hypothesis is that the mean is equal to x. A one-tailed test will test either if the mean is significantly greater than x or if the mean is significantly less than x, but not both. Then, depending on the chosen tail, the mean is significantly greater than or less than x if the test statistic is in the top 5% of its probability

distribution or bottom 5% of its probability distribution, resulting in a p-value less than 0.05. The onetailed test provides more power to detect an effect in one direction by not testing the effect in the other direction.

Two-Tail Test If we are using a significance level of 0.05, a two-tailed test allots half of our alpha to testing the statistical significance in one direction and half of our alpha to testing statistical significance in the other direction. This means that .025 is in each tail of the distribution of our test statistic. When using a two-tailed test, regardless of the direction of the relationship we hypothesize, we are testing for the possibility of the relationship in both directions. For example, we may wish to compare the mean of a sample to a given value x using a t-test. Our null hypothesis is that the mean is equal to x. A two-tailed test will test both if the mean is significantly greater than x and if the mean significantly less than x. The mean is considered significantly different from x if the test statistic is in the top 2.5% or bottom 2.5% of its probability distribution, resulting in a p-value less than 0.05.

Moving average models Solution: In time series analysis, the moving-average (MA) model is a common approach for modeling univariate time series models. The notation MA(q) refers to the moving average model of order q: Where is the mean of the series, the 1, ..., q are the parameters of the model and the t, t1,... are white noise error terms. The value of q is called the order of the MA model. That is, a moving-average model is conceptually a linear regression of the current value of the series against previous (unobserved) white noise error terms or random shocks. The random shocks at each point are assumed to come from the same distribution, typically a normal distribution, with location at zero and constant scale. The distinction in this model is that these random shocks are propagated to future values of the time series. Fitting the MA estimates is more complicated than with autoregressive models (AR models) because the error terms are not observable. This means that iterative non-linear fitting procedures need to be used in place of linear least squares. MA models also have a less obvious interpretation than AR models. Sometimes the autocorrelation function (ACF) and partial autocorrelation function (PACF) will suggest that a MA model would be a better model choice and sometimes both AR and MA terms should be used in the same model. Note, however, that the error terms after the model is fit should be independent and follow the standard assumptions for a univariate process: Random drawings from a fixed distribution with the distribution having fixed location and with the distribution having fixed variation. The moving-average model is essentially a finite impulse response filter applied to white noise, with some additional interpretation placed on it. Standard error of the slope Solution: In statistics, the parameters of a linear mathematical model can be determined from experimental data using a method called linear regression. This method estimates the parameters of an equation of the form y = mx + b (the standard equation for a line) using experimental data. However, as with most statistical models, the model will not exactly match the data; therefore, some parameters, such as the slope, will have some error (or uncertainty) associated with them. The standard error is one way of measuring this uncertainty and can be accomplished in a few short steps. Find the sum of square residuals (SSR) for the model. This is the sum of the square of the difference

between each individual data point and the data point that the model predicts. For example, if the data points were 2.7, 5.9 and 9.4 and the data points predicted from the model were 3, 6 and 9, then taking the square of the difference of each of the points gives 0.09 (found by subtracting 3 by 2.7 and squaring the resulting number), 0.01 and 0.16, respectively. Adding these numbers together gives 0.26. Divide the SSR of the model by the number of data point observations, minus two. In this example, there are three observations and subtracting two from this gives one. Therefore, dividing the SSR of 0.26 by one gives 0.26. Call this result A. Take the square root of result A. In the above example, taking the square root of 0.26 gives 0.51. Determine the explained sum of squares (ESS) of the independent variable. For example, if the data points were measured at intervals of 1, 2 and 3 seconds, then you will subtract each number by the mean of the numbers and square it, then sum the ensuing numbers. For example, the mean of the given numbers is 2, so subtracting each number by two and squaring gives 1, 0 and 1. Taking the sum of these numbers gives 2. Find the square root of the ESS. In the example here, taking the square root of 2 gives 1.41. Call this result B. Divide result B by result A. Concluding the example, dividing 0.51 by 1.41 gives 0.36. This is the standard error of the slope.

5. Write short notes on: a) One-tail & two-tail tests differences between one-tailed and two-tailed tests? When you conduct a test of statistical significance, whether it is from a correlation, an ANOVA, a regression or some other kind of test, you are given a p-value somewhere in the output. If your test statistic is symmetrically distributed, you can select one of three alternative hypotheses. Two of these correspond to one-tailed tests and one corresponds to a two-tailed test. However, the p-value presented is (almost always) for a two-tailed test. But how do you choose which test? Is the p-value appropriate for your test? And, if it is not, how can you calculate the correct p-value for your test given the p-value in your output? Suppose we have a null hypothesis H0 and an alternative hypothesis H1. We consider the distribution given by the null hypothesis and perform a test to determine whether or not the null hypothesis should be rejected in favour of the alternative hypothesis. There are two different types of tests that can be performed. A one-tailed test looks for an increase or decrease in the parameter whereas a two-tailed test looks for any change in the parameter (which can be any change- increase or decrease). We can perform the test at any level (usually 1%, 5% or 10%). For example, performing the test at a 5% level means that there is a 5% chance of wrongly rejecting H0. If we perform the test at the 5% level and decide to reject the null hypothesis, we say "there is significant evidence at the 5% level to suggest the hypothesis is false". One-Tailed Test We choose a critical region. In a one-tailed test, the critical region will have just one part (the red area below). If our sample value lies in this region, we reject the null hypothesis in favour of the alternative. Suppose we are looking for a definite decrease. Then the critical region will be to the left. Note, however, that in the one-tailed test the value of the parameter can be as high as you like. Example

Suppose we are given that X has a Poisson distribution and we want to carry out a hypothesis test on the mean,, based upon a sample observation of 3. Suppose the hypotheses are: H0: = 9 H1: < 9 We want to test if it is "reasonable" for the observed value of 3 to have come from a Poisson distribution with parameter 9. So what is the probability that a value as low as 3 has come from a Po(9)? P(X 3) = 0.0212 (this has come from a Poisson table) The probability is less than 0.05, so there is less than a 5% chance that the value has come from a Poisson(3) distribution. We therefore reject the null hypothesis in favour of the alternative at the 5% level. However, the probability is greater than 0.01, so we would not reject the null hypothesis in favour of the alternative at the 1% level. Two-Tailed Test In a two-tailed test, we are looking for either an increase or a decrease. So, for example, H0 might be that the mean is equal to 9 (as before). This time, however, H1 would be that the mean is not equal to 9. In this case, therefore, the critical region has two parts: Example Lets test the parameter p of a Binomial distribution at the 10% level. Suppose a coin is tossed 10 times and we get 7 heads. We want to test whether or not the coin is fair. If the coin is fair, p = 0.5 . Put this as the null hypothesis: H0: p = 0.5 H1: p 0.5 Now, because the test is 2-tailed, the critical region has two parts. Half of the critical region is to the right and half is to the left. So the critical region contains both the top 5% of the distribution and the bottom 5% of the distribution (since we are testing at the 10% level). If H0 is true, X ~ Bin(10, 0.5). If the null hypothesis is true, what is the probability that X is 7 or above? P(X 7) = 1 - P(X < 7) = 1 - P(X 6) = 1 - 0.8281 = 0.1719 Is this in the critical region? No- because the probability that X is at least 7 is not less than 0.05 (5%), which is what we need it to be. So there is not significant evidence at the 10% level to reject the null hypothesis.

b) Moving average models In time series analysis, the moving-average (MA) model is a common approach for modeling univariate time series models. The notation MA(q) refers to the moving average model of order q: where is the mean of the series, the 1, ..., q are the parameters of the model and the t, t1,... are white noise error terms. The value of q is called the order of the MA model. That is, a moving-average model is conceptually a linear regression of the current value of the series against previous (unobserved) white noise error terms or random shocks. The random shocks at each point are assumed to come from the same distribution, typically a normal distribution, with location at zero and constant scale. The distinction in this model is that these random shocks are propagated to future values of the time series. Fitting the MA estimates is more complicated than with autoregressive models (AR models) because the error terms are not observable. This means that iterative non-linear fitting procedures need to be used in place of linear least squares. MA models also

have a less obvious interpretation than AR models. Sometimes the autocorrelation function (ACF) and partial autocorrelation function (PACF) will suggest that a MA model would be a better model choice and sometimes both AR and MA terms should be used in the same model (see Box-Jenkins). Note, however, that the error terms after the model is fit should be independent and follow the standard assumptions for a univariate process: random drawings from a fixed distribution with the distribution having fixed location and with the distribution having fixed variation. The moving-average model is essentially a finite impulse response filter applied to white noise, with some additional interpretation placed on it. Autoregressive model The notation AR(p) refers to the autoregressive model of order p. The AR(p) model is written where are parameters, is a constant, and the random variable is white noise. An autoregressive model is essentially an all-pole infinite impulse response filter with some additional interpretation placed on it. Some constraints are necessary on the values of the parameters of this model in order that the model remains stationary. For example, processes in the AR(1) model with |1| 1 are not stationary. Moving-average model The notation MA(q) refers to the moving average model of order q: where the 1, ..., q are the parameters of the model, is the expectation of (often assumed to equal 0), and the , ,... are again, white noise error terms. The moving-average model is essentially a finite impulse response filter with some additional interpretation placed on it. Autoregressivemoving-average model The notation ARMA(p, q) refers to the model with p autoregressive terms and q moving-average terms. This model contains the AR(p) and MA(q) models, The general ARMA model was described in the 1951 thesis of Peter Whittle, who used mathematical analysis (Laurent series and Fourier analysis) and statistical inference.[1][2] ARMA models were popularized by a 1971 book by George E. P. Box and Jenkins, who expounded an iterative (Box Jenkins) method for choosing and estimating them. This method was useful for low-order polynomials (of degree three or less).[3] Moving Averages and Time-series Forecasting One of the well known approaches to forecasting is the use of the Moving Averages. But what is the Moving Average and what effects does it have on time series? What is a Moving Average A Moving Average (MA) is a mathematical sum carried over the time series. In general the MA is a weighted MA in the sense that each term of the sum bears a weight that is used in the sum itself that thus becomes a weighed sum. In mathematical terms an m period MA of the time-series y with weighting coefficient ws for lag s is the following expression: zt yt-sws where the sum must be taken from s=0 to m-1 This is the most general equation for a MA whichever the subtype is; it is thus valid for Exponential Moving Averages, Triangular MA, Parabolic MA, etc... the only things that varies is how the weights are calculated. If the weighting coefficients are uniform that is: ws = 1/m for every 0s<m we obtain the classical and overused Simple Moving Average. This is a very simple yet effective algorithm that has been used for ages in forecasting and even today with so much computing power these are (or variations of these) are the algorithms that are used everywhere to produce forecasts and predictions in every human field. But what are the statistical effects of applying the Moving Average algorithm to the time-series? Statistical Effects of the Moving Average and the Convolution Theorem

To explain the statistical effects of the MA we need to introduce a little bit of math that may not be familiar to many of you. Lets consider the Discrete Fourier Transform (DFT) of the original time-series y: Y() yt exp(it) and suppose the original time-series were replaced with an m period MA over past values as defined before: zt yt-sws The DFT of the obtained time-series would then be (using the Convolution Theorem): Z() = zt exp(it) = W()Y() where W() ws exp(is) If we use the uniform weighting introduced in the previous paragraph (i.e. we use the Simple Moving Average) then the previous equation becomes (after some math): W() ws exp(is) = 1/m exp(-is) = 1/m exp() sin(m/2) / sin(/2) The value of this expression W(2/m) will then be zero! Thus, taking an m period Simple Moving Average of the time-series has completely destroyed the evidence for an m period periodicity. So if we take a 12 month Simple Moving Average of our timeseries then this will completely destroy the evidence of a yearly periodicity in the smoothed timeseries. This is not evident until we write some basic math like we did here and if you go further down analyzing the results of the Convolution Theorem you will notice also other values where the periodicity is completely distorted not only in amplitude like in this case but also in phase. This effect is usually ignored and you will find a lot of "statisticians" that have never studied enough mathematics to understand this effect and that are still using this algorithm (or some of the family like Exponential Moving Averages) for time-series predictions. The only thing that can be accomplished using Simple Moving Averages is to make the graph of our time-series look better at our poor human eye and the time-series less usable by a computer program. The Moving Average (Time Series) function returns the moving average of a field over a given period of time based on linear regression. Parameters -----------------Data The data to use in the regression. This is typically a field in a data series or a calculated value. Period The number of bars of data to include in the regression, including the current value. For example, a period of 3 includes the current value and the two previous values. Function Value -----------------------The time series moving average is calculated by fitting a linear regression line over the values for the given period, and then determining the current value for that line. A linear regression line is a straight line which is as close to all of the given values as possible. The time series moving average at the beginning of a data series is not defined until there are enough values to fill the given period. Note that a time series moving average differs greatly from other types of moving averages in that the current value follows the recent trend of the data, not an actual average of the data. Because of this, the value of this function can be greater or less than all of the values being used if the trend of

the data is generally increasing or decreasing. Usage ---------Moving averages are useful for smoothing noisy raw data, such as daily prices. Price data can vary greatly from day-to-day, obscuring whether the price is going up or down over time. By looking at the moving average of the price, a more general picture of the underlying trends can be seen. Since moving averages can be used to see trends, they can also be used to see whether data is bucking the trend. Entry/exit systems often compare data to a moving average to determine whether it is supporting a trend or starting a new one. See the sample entry/exit systems for an example of using a Moving Average in an entry/exit system. This function is the same as the Linear Regression Indicator. It is also the same as the Time Series Forecast with an offset of zero. #############################################

c) Standard error of the slope In statistics, the parameters of a linear mathematical model can be determined from experimental data using a method called linear regression. This method estimates the parameters of an equation of the form y = mx + b (the standard equation for a line) using experimental data. However, as with most statistical models, the model will not exactly match the data; therefore, some parameters, such as the slope, will have some error (or uncertainty) associated with them. The standard error is one way of measuring this uncertainty and can be accomplished in a few short steps. Find the sum of square residuals (SSR) for the model. This is the sum of the square of the difference between each individual data point and the data point that the model predicts. For example, if the data points were 2.7, 5.9 and 9.4 and the data points predicted from the model were 3, 6 and 9, then taking the square of the difference of each of the points gives 0.09 (found by subtracting 3 by 2.7 and squaring the resulting number), 0.01 and 0.16, respectively. Adding these numbers together gives 0.26. Divide the SSR of the model by the number of data point observations, minus two. In this example, there are three observations and subtracting two from this gives one. Therefore, dividing the SSR of 0.26 by one gives 0.26. Call this result A. Take the square root of result A. In the above example, taking the square root of 0.26 gives 0.51. Determine the explained sum of squares (ESS) of the independent variable. For example, if the data points were measured at intervals of 1, 2 and 3 seconds, then you will subtract each number by the mean of the numbers and square it, then sum the ensuing numbers. For example, the mean of the given numbers is 2, so subtracting each number by two and squaring gives 1, 0 and 1. Taking the

sum of these numbers gives 2. Find the square root of the ESS. In the example here, taking the square root of 2 gives 1.41. Call this result B. Divide result B by result A. Concluding the example, dividing 0.51 by 1.41 gives 0.36. This is the standard error of the slope.

In any area of measurement science, there is always some error in any signal. The error can arise from many sources, and can normally be accounted for using statistical techniques. However, because measurment is inherently random, it contributes some degree of uncertainty into the measurement, which corresponds to a certain confidence limit, within which we can be fairly certain about the accuracy of our measurement. This leads to the way in which results are normally displayed, where a measurement is reported with the estimated error, such as C = 51.2 0.5 g/ml. The 0.5 is the error, normally 2.58 standard deviations. The 2.58 will be explained below. When preparing a calibration curve, there is always some degree of uncertainty in the calibration equation, in the slope and the y-intercept. To calculate the standard errors of the slope and the intercept, we require the residuals. The residual is the difference between each measured y-value and that calculated from the calibration curve, for a given observation. The calculated y-value is determined from the calibration equation and denoted , so the residual would be . Once the residuals are known, we can calculate the standard deviation of y, which is a measure of random error of y-values. The use of n-2 in the denominator are the degrees of freedom. Normally, there is a degree of freedom for each data point. However, we are making two assumptions in this equation: a) that the sample population is representative of the entire population, and b) that the , are representative of the true yvalues. For each assumption we make, we must remove a degree of freedom, and our estimated standard deviation becomes larger. This sx/y standard deviation can be used to calculate the standard deviations of the slope and the yintercept using the formulas where sb is the standard deviation of the slope and sa is the standard deviation of the y-intercept. Confidence intervals for the slope and intercept are calculated from the t-statistic for n-2 degrees of freedom. Tables of t-statistics are available in any statistics textbook, and are also included in the lab manual. Note that some tables give values of t for different values of n, while others give them for values of = n-1. Check carefully so that you use the appropriate value. The confidence limit for the slope is btn-2sb and for the y-intercept atn-2sa. For a large number of samples with a 99% confidence interval, we can use tn-2=2.58. For the fluorescence data, the standard deviation of the slope is sb = 0.0350, so the slope with confidence interval b=1.88(2.580.0350)=1.880.09. The y-intercept with confidence interval is a=1.80.9.

You might also like