Statistic PROJECT

qwertyuiopasdfghjklzxcvbnmqwe Page 1 of 15
rtyuiopasdfghjklzxcvbnmqwertyui
opasdfghjklzxcvbnmqwertyuiopa
sdfghjklzxcvbnmqwertyuiopasdfg
MBA 605
ESSENTIAL ANALYTICAL METHODS
hjklzxcvbnmqwertyuiopasdfghjklz
FOR BUSINESS
xcvbnmqwertyuiopasdfghjklzxcv 9/29/2010
ROLL NO 431(SHJ)
bnmqwertyuiopasdfghjklzxcvbnm
JAVED MOHAMMEDHANIF BUMLAJIWALA
qwertyuiopasdfghjklzxcvbnmqwe
rtyuiopasdfghjklzxcvbnmqwertyui
opasdfghjklzxcvbnmqwertyuiopa
xcvbnmqwertyuiopasdfghjklzxcv
bnmqwertyuiopasdfghjklzxcvbnm
qwertyuiopasdfghjklzxcvbnmqwe
rtyuiopasdfghjklzxcvbnmrtyuiopa
`11sA[Type text] Page 1
1Page 1 of 15
xcvbnmqwertyuiopasdfghjklzxcv
Question 1
Why study statistics? Describe how statistics might help and/or

benefit you in your current or future job.
• Statistics is the science (and, arguably, also the art!) of learning

from data. As a discipline it is concerned with the collection,
analysis, and interpretation of data, as well as the effective
communication and presentation of results relying on data.
Statistics lies at the heart of the type of quantitative reasoning
necessary for making important advances in the sciences, such
as medicine and genetics, and for making important decisions in
business and public policy.
Statistics is a very pragmatic field, investigating real problems in the

practical world. It has applications in:
• Bioinformatics;
• Biology (biostatistics or biometrics);
• Climatology;
• Computing or computer science (statistical computing is a highly
sought-after skill);
• Economics (econometrics);
• Finance (financial statistics);
• Psychology (psychometrics);
• Physics (statistical physics is a modern discipline in physics);
• The health industry (medical statistics).
• Statistics are used by all industries and businesses as a

standardized unit of measurement for presenting data in a useful
and meaningful format. Statistics can be used to measure
historical performance and to forecast future targets. For
business managers and leaders, statistics provide insight into
how business units are performing relative to an organizations
goals and objectives. Statistics also forecast future trends and
are used in all areas of human thought an endeavor for planning
purposes.
Statistics are used in all areas of trade and commerce. Governments

frequently undertake studies for the purposes of formulating policy and
businesses use data to identify what is currently working and what is
2
not. Business initiatives are altered to improve organizational
performance based on feedback from statistical studies. Advertising
companies use statistics to assess target markets and formulate
campaigns. Marketers use statistics to identify opportunities for
business development.
Statistics helping my to create the performance chart for my team and

other departments. As in our banking industries we usually track the
daily performance of our team and other department and create many
MIS task to give a presentation to the higher management in our
department.
Statistics helps as to show our performance data in many graphical

ways which gives nice and fast explanation to succeed our vision and
mission.
We use many statistical formulas which is help us to cut short our

work load in tremendous way and help us to improve our better
performance. It help us to present the data in a very short way to
analyzes and present it. There are many formulas mentioned below
help us to create our performance.
Please find below some great formulas which help us to count our daily
wages and % of performance..
 COUNT - Counts the number of cells that contain numbers and

also numbers within the list of arguments. Use COUNT to get the
number of entries in a number field that's in a range or array of
numbers.
 COUNTBLANK - Counts empty cells in a specified range of cells.
 EXPONDIST - Returns the exponential distribution. Use
EXPONDIST to model the time between events, such as how long
an automated bank teller takes to deliver cash. For example, you
can use EXPONDIST to determine the probability that the
process takes at most 1 minute.
Like this there are many other ways like graphical presentation and
other data management issues we use a the statistics in a very
deep way.
Since the advent of the internet, statistics have become important for
online business operators. Whether it be measuring search engine
traffic, assessing product conversion or determining which paid ads are
working, statistics provide the necessary data in a meaningful way to
3
support strategic decision making. Without the use of statistics,
businesses could not calculate returns on investment and business
decision making would become a hit and miss affair.
The computer industry makes use of statistics to detect emerging

trends and develop products that are in line with consumer
preferences. Without data to support product development,
organizations would have no way of determining changing consumer
preferences and tastes. With competing demands for capital and labor,
organizations rely on data to determine how to best make use of
exiting resources and plan for future requirements. Managers rely on
monthly and quarterly statistics to adjust business variables to
improve overall performance.
Surveys are often used by companies to get closer to the target

market. Data is compiled into useful reports for the purposes of
determining consumer preferences, assessing purchasing habits and
analyzing the motivation for buying behavior. Companies rely on this
feedback to narrow down criteria to create unique selling propositions
and build marketing campaigns. Without this data, companies would
face difficulties in providing goods and services that solve community
and organizational problems.
Question 2
Discuss the limitations of using correlation as an analytical
technique. How does it compare to regression analysis?
The correct use of the coefficient of correlation depends heavily on the

assumptions made with respect to the nature of data to be correlated
and on understanding the principles of forming this index of
association. Correlation is a central measure within the general linear
model of statistics. It can be employed for measurement of
relationships in countless applied settings. However, in situations
where its assumptions are violated, correlation becomes inadequate to
explain a given relationship. These assumptions mandate that the
distributions of both variables related by the coefficient of correlation
should be normal and that the scatter-plots should be linear and
4
homoscedastic. Referring to diagrams of data typical of various
magnitudes of the coefficient correlation,
Positive correlation Zero Correlation Negative Correlation
one may notice that the assumption of linearity pertains to the main
axis of the ellipse enclosing the data points. Its main axis should be
approximately linear. The assumption of homoscedascity pertains to
the secondary axis of this ellipse. The width of the ellipse should be
approximately equal to the length of the secondary axis. To the extent
that any of these assumptions are violated, the coefficient of
correlation does not correctly reflect the relationship.
• Correlation and regression analysis are related in the sense that

both deal with relationships among variables. The correlation
coefficient is a measure of linear association between two
variables. Values of the correlation coefficient are always
between -1 and +1. A correlation coefficient of +1 indicates that
two variables are perfectly related in a positive linear sense; a
correlation coefficient of -1 indicates that two variables are
perfectly related in a negative linear sense, and a correlation
coefficient of 0 indicates that there is no linear relationship
between the two variables. For simple linear regression, the
sample correlation coefficient is the square root of the coefficient
of determination, with the sign of the correlation coefficient
5
being the same as the sign of b1, the coefficient of x1 in the
estimated regression equation.
Neither regression nor correlation analyses can be interpreted as

establishing cause-and-effect relationships. They can indicate only how
or to what extent variables are associated with each other. The
correlation coefficient measures only the degree of linear association
between two variables. Any conclusions about a cause-and-effect
relationship must be based on the judgment of the analyst.
Question 3
1. List different types of forecasting models, and in general

terms, explain the possible components of time-series data.
• Different types of forecasting models
First there are the plain old classics: autoregressive, moving

average, (double, triple) exponential smoothing, Box-Jenkins, Holt-
Winters, ARMA, ARIMA ... Those models typically handle neither multi-
series nor tags or events; yet simplicity is king is many situations.
Don’t discard moving average just because it looks too simple to be
good.
Then, for more advanced models, I'd rather speak of approaches than
models. Indeed, the more complex the model, the more latitude is left
to the mathematician to tweak in subtle ways the behavior of the
forecasting model.
The Bayesian approach: establishing graphs of relationships is

especially useful in the context of many areas where we exploit
correlations between time-series. It’s also useful in order to deal with
tags and events.
The vast margin approach: Support Vector Machines (SVM) have

become incredibly popular those days. Although, as far time-series are
concerned, it’s rather Support Vector Regression (SVR) that is the
most useful for us. As a minor drawback, SVM and SVR are typically
quite expensive in terms of raw processing power.
6
The mixture / boosting approach: mixing loads of simple predictors
in order in improve the overall forecast works well. The combination of
large number of simple predictors can be used to reflect really complex
behaviors.
The meta-heuristic approach: genetic algorithm, neural networks,

genetic programming and other evaluative / adaptive approaches.
Those approaches are powerful but also notoriously known for their
intrinsic sensibility to many tuning parameters.
As final note, our technology is still going under a fast-paced

evolution. New models get put in production every month or so. This
list isn’t definitive and cloud computing is actually creating a lot of
opportunities for us to push models that were just too expensive in the
past.
• Components of Time series
1. Secular trend
2. Seasonal variation
3. Cyclical variation
4. Irregular variation
Secular trend: A time series data may show upward trend or

downward trend for a period of years and this may be due to factors
like increase in population,change in technological progress ,large
scale shift in consumers demands,etc.For example, population
increases over a period of time, price increases over a period of years,
production of goods on the capital market of the country increases
over a period of years. These are the examples of upward trend. The
sales of a commodity may decrease over a period of time because of
better products coming to the market. This is an example of declining
trend or downward trend. The increase or decrease in the movements
of a time series is called Secular trend.
7
Seasonal variation: Seasonal variations are short-term fluctuation in a
time series which occur periodically in a year. This continues to repeat
year after year. The major factors that are responsible for the
repetitive pattern of seasonal variations are weather conditions and
customs of people. More woolen clothes are sold in winter than in the
season of summer .Regardless of the trend we can observe that in
each year more ice creams are sold in summer and very little in winter
season. The sales in the departmental stores are more during festive
seasons that in the normal days.
Cyclical variations: Cyclical variations are recurrent upward or

downward movements in a time series but the period of cycle is
greater than a year. Also these variations are not regular as seasonal
variation. There are different types of cycles of varying in length and
size. The ups and downs in business activities are the effects of cyclical
variation. A business cycle showing these oscillatory movements has
to pass through four phases-prosperity, recession, depression and
recovery. In a business, these four phases are completed by passing
one to another in this order.
Irregular variation: Irregular variations are fluctuations in time series

that are short in duration, erratic in nature and follow no regularity in
the occurrence pattern. These variations are also referred to as
residual variations since by definition they represent what is left out in
a time series after trend, cyclical and seasonal variations. Irregular
fluctuations results due to the occurrence of unforeseen events like
floods, earthquakes, wars, famines, etc.
Question 4
2. Prepare a good questionnaire which will be suitable for

collecting data from MBA aspirants.
* Why business school?
* Why did you decide to apply to this business school?
* What can you contribute to our program?
8
* How do you plan to use your degree?
* What are your expectations of this program?
* What makes you stand out among other candidates?
* Where do you see yourself five years down the line?
* Do you find yourself to be in top position after doing the MBA in

short period of time?
* What is the different between MBA and non MBA academically?
* Do you believe the MBA can take you on your target stage.
Above is a list of common questionnaire data which we can get from

MBA aspirants.
Question 5
Graphically present some data relevant to your current job.
The data mentioned below is related to collections and recoveries

department for Majeed Al Futtaim Najm Card Collections daily
recoveries on day wise.
The full data is presenting the total delinquent customers for 60

days and above dues.
The data representing each team member’s performance on daily

basis.
First left hand row presenting the total team members for
collections process.
9
Allocation – This column shows how many accounts allocated to
each team team player with the value.
Target Forward Flow – The next column presenting how much flow
they allowed to forward for next level.
Current Forward Flow – the third column showing how much flow
they currently sitting on.
Normalize – The last column which is presenting how many

customers has paid their total dues.
Officer Allocation Target Forward Flow Current Forward Flow Normalise

AED Count AED Count % AED Count % AED %
ALI 2,353,645 323 188,292 26 8% 171,210 28 7.27% 936,236 39.8%
HEBA 2,376,276 324 190,102 26 8% 168,419 38 7.09% 964,271 40.6%
RAWAN 2,363,982 324 189,119 26 8% 160,402 34 6.79% 960,174 40.6%
PADDY 2,439,618 324 195,169 26 8% 168,098 34 6.89% 871,802 35.7%
SYED 2,401,395 324 192,112 26 8% 246,538 42 10.27% 975,266 40.6%
PAID 5,573,541 857 0 0 0% 0 0 0.00% 3,700,255 66.4%
100%
80% PAID
SYED
60%
PADDY
RAWAN
40%
HEBA
20% ALI
0%
AED Count AED Count % AED Count % AED %
Allocation Target Forward Current Forward Normalise

Flow Flow
On daily basis we are generating this report to calculate our

performance and keeping the data update for audit purpose.
10
To create this report we are using many statistical formulas behind on
row sheets which result as above format.
Question 6
Compare mean and mode as measures of central tendency.
Mode no doubt possesses the merit of being the most popular item of
a series and has also the advantage of easy calculation and common
understandability yet its drawbacks are too many to be set off against
these merits. Mean is simple in calculation, its value is definite and can
be easily determined. It is amenable to algebraic treatment and is
usually not affected much by fluctuations of sampling. Mode is hardly
suitable for most of the elementary studies as it is correctly
determined only by curve-fitting which is an extremely difficult
process. It is unrepresentative in many cases, and is not based on all
the observations of a series. Thus, of these two averages, mean has
definite advantages over mode, though there may be some cases
where mode may have preference over mean. Mode has its own
importance and it may be the reason for giving its value along with
mean but it should be clearly understood that mode cannot replace
mean and for that matter neither can median do so. However, it
should not be taken to mean that mode is superficial averages and
have no independent virtues. There are certain fields in which mode
may give better result that the mean, but such cases are few and the
universality of mean cannot be challenged on account of these cases.
Question 7
Describe the uses of different measures of dispersion.
Standard Deviation
The standard deviation is kind of the "mean of the mean," and often
can help you find the story behind the data. To understand this
concept, it can help to learn about what statisticians call normal
distribution of data.
A normal distribution of data means that most of the examples in a set

of data are close to the "average," while relatively few examples tend
to one extreme or the other.
11
Let's say you are writing a story about nutrition. You need to look at
people's typical daily calorie consumption. Like most data, the
numbers for people's typical consumption probably will turn out to be
normally distributed. That is, for most people, their consumption will
be close to the mean, while fewer people eat a lot more or a lot less
than the mean.
When you think about it, that's just common sense. Not that many
people are getting by on a single serving of kelp and rice. Or on eight
meals of steak and milkshakes. Most people lie somewhere in
between.
Interquartile Range
The interquartile range is used as a robust measure of scale.

That is, it is an alternative to the standard deviation. The
interquartile range is less affected by extremes than the
standard deviation. It is the measure of scale used by the box
plot.
Mean difference
The mean is commonly used measures of central tendency. When we

talk about an average we usually are referring to mean. The mean is
simply the sum of the values divided by total numbers of items in set.
The result is referred to as the arithmetic mean. Sometime it is most
useful to give more weighting to certain data points, in which case the
result is called the weighted arithmetic mean.
Ideally it is being used to calculate the average of available data.
Median absolute deviation
The median absolute deviation is a measure of statistical dispersion. It

is a more robust estimator of scale than the sample variance or
standard deviation. It thus behaves better with distributions without a
mean or variance, such as the Cauchy distribution.
For instance, the MAD is a robust statistic, being more resilient to

outliers in a data set than the standard deviation. In the standard
12
deviation, the distances from the mean are squared, so on average,
large deviations are weighted more heavily, and thus outliers can
heavily influence it. In the MAD, the magnitude of the distances of a
small number of outliers is irrelevant.
Average absolute deviation
Routine experiments involving small samples from a normal

distribution, it is often found to be more convenient to use the sample
mean absolute deviation
Question 8
State uses/Limitations and various stages of construction

of Index numbers.
The usage of constructions of index numbers specially happening in

physical production.
Although the movement to measure the physical volume of production

is comparatively now. Certain important advances have been made in
this art. More and more data have been discovered and utilized so that
the more recently constructed index rest upon a much broader factual
basis than the earlier. Certain pitfalls in the mathematics mechanics of
index number construction have been recognized and to some extent
avoided.
The limitation can be faced in case of lack of data. The primary

requisition for the construction of reliable index numbers is the
availability of complete and accurate data. Complete coverage of all
data, although preferable, may not be absolutely necessary, but at
least a large enough sample must be obtained to be representative of
the type of physical production being measured.
13
At first the indexes were constructed only on annual basis, but shortly
the time interval reduced to a monthly basis and weekly index
comparable within themselves, the constitute data are adjusted for
sensational variation and for the difference in the working numbers of
day.
Yet in spite of the progress has been made, their remain so many
problems and limitation.
Reference Page
Question 1
→ http://www.bu.edu/stat/undergraduate-program-information/why-study-statistics
→ http://www.stat.mq.edu.au/information_for/career/why_study_statistics/
→ http://ezinearticles.com/?Using-Statistics-To-Improve-And-Measure-Business-
Performance&id=744164
Question 2
→ http://abyss.uoregon.edu/~js/glossary/correlation.html
Question 3
→ http://wwwjas-style.blogspot.com/2008/12/components-of-time-series.html
→ http://blog.lokad.com/journal/2009/7/7/favorite-forecasting-models.html
Question 4
→ http://www.expertcollective.com/interview-mba/mba-interview-questions.html
14
Question 5
Office Chart data provided by our management Majid Al Futtaim Jcb Najm card.
Question 6
http://www.mean/mode/comparison/tutore.
Question 7
http://www.robertniles.com/stats/stdev.shtml
http://en.wikipedia.org/wiki/Median_absolute_deviation
Question 8
http://www.jstor.org/pss/2278841
15

Statistic PROJECT

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistic PROJECT

Uploaded by

Copyright:

Available Formats

qwertyuiopasdfghjklzxcvbnmqwe Page 1 of 15

Why study statistics? Describe how statistics might help and/or

• Statistics is the science (and, arguably, also the art!) of learning

Statistics is a very pragmatic field, investigating real problems in the

• Statistics are used by all industries and businesses as a

Statistics are used in all areas of trade and commerce. Governments

Statistics helping my to create the performance chart for my team and

Statistics helps as to show our performance data in many graphical

We use many statistical formulas which is help us to cut short our

 COUNT - Counts the number of cells that contain numbers and

The computer industry makes use of statistics to detect emerging

Surveys are often used by companies to get closer to the target

The correct use of the coefficient of correlation depends heavily on the

Positive correlation Zero Correlation Negative Correlation

• Correlation and regression analysis are related in the sense that

Neither regression nor correlation analyses can be interpreted as

1. List different types of forecasting models, and in general

• Different types of forecasting models

First there are the plain old classics: autoregressive, moving

The Bayesian approach: establishing graphs of relationships is

The vast margin approach: Support Vector Machines (SVM) have

The meta-heuristic approach: genetic algorithm, neural networks,

As final note, our technology is still going under a fast-paced

• Components of Time series

Secular trend: A time series data may show upward trend or

Cyclical variations: Cyclical variations are recurrent upward or

Irregular variation: Irregular variations are fluctuations in time series

2. Prepare a good questionnaire which will be suitable for

* Why business school?

* Why did you decide to apply to this business school?

* What can you contribute to our program?

* What are your expectations of this program?

* What makes you stand out among other candidates?

* Where do you see yourself five years down the line?

* Do you find yourself to be in top position after doing the MBA in

* What is the different between MBA and non MBA academically?

Above is a list of common questionnaire data which we can get from

Graphically present some data relevant to your current job.

The data mentioned below is related to collections and recoveries

The full data is presenting the total delinquent customers for 60

The data representing each team member’s performance on daily

Normalize – The last column which is presenting how many

Officer Allocation Target Forward Flow Current Forward Flow Normalise

Allocation Target Forward Current Forward Normalise

On daily basis we are generating this report to calculate our

Compare mean and mode as measures of central tendency.

Describe the uses of different measures of dispersion.

A normal distribution of data means that most of the examples in a set

The interquartile range is used as a robust measure of scale.

The mean is commonly used measures of central tendency. When we

Ideally it is being used to calculate the average of available data.

Median absolute deviation

The median absolute deviation is a measure of statistical dispersion. It

For instance, the MAD is a robust statistic, being more resilient to

Average absolute deviation

Routine experiments involving small samples from a normal

State uses/Limitations and various stages of construction

The usage of constructions of index numbers specially happening in

Although the movement to measure the physical volume of production

The limitation can be faced in case of lack of data. The primary

You might also like