You are on page 1of 15

Statistics is the study of the collection, organization, analysis, interpretation and presentation of data.

It deals with all aspects of data, including the planning of data collection in terms of the design of surveys and experiments. Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data, or the quantitative description itself statistical inference is the process of drawing conclusions from data that is subject to random variation, for example, observational errors or sampling variation. A population is a summation of all the organisms of the same group or species, who live in the same geographical area, and have the capability of interbreeding a data sample is a set of data collected and/or selected from a population by a defined procedure. sampling is concerned with the selection of a subset of individuals from within a statistical population to estimate characteristics of the whole population variable is a value that may change within the scope of a given problem or set of operations Data-Information in raw or unorganized form (such as alphabets, numbers, or symbols) that refer to, or represent, conditions, ideas, or objects. Data is limitless and present everywhere in the universe parameter is a value that is fixed in the statement of the problem being studied (although its value may not be explicitly known) A statistic (singular) is a single measure of some attribute of a sample methods of collecting data 1. Survey - Statistical surveys are used to collect quantitative information from a specific population. A survey may focus on opinions or factual information depending upon the purpose of the study. Surveys may involve

answering a questionnaire or being interviewed by a researcher. The census is a type of survey. Advantages of surveys: can be administered in a variety of forms (telephone, mail, on-line, mall interview, etc.) are efficient for collecting data from a large population can be designed to focus only on the needed response questions are applicable to a wide range of topics Disadvantages of surveys: are dependent upon the respondent's honesty and motivation when answering can be flawed by non-response can possess questions or answer choices that may be interpreted differently by different respondents (such as the choice "agree slightly")

Randomization and a well-designed survey: A sample population is considered random if the probability of selecting the sample is the same as the probability of selecting every other sample. When a sample is not random, a bias is introduced which may influence the study in favor of one outcome over other outcomes. Surveys can be conducted in different methods.

Questionnaire: is the most commonly used method in survey. Questionnaires are a list of questions either open-ended or close ended for which the respondent give answers. Questionnaire can be conducted via telephone, mail, live in a public area, or in an institute, through electronic mail or through fax and other methods. Interview: Interview is a face-to-face conversation with the respondent. In interview the main problem arises when the respondent deliberately hides information otherwise it is an in depth source of information. The interviewer can not only record the statements the interviewee speaks but he can observe the body language, expressions and other reactions to the questions too. This enables the interviewer to draw conclusions easily.

Observations: Observation can be done while letting the observing person know that he is being observed or without letting him know. Observations can also be made in natural settings as well as in artificially created environment.

2. Experimental study - In an experimental study, the researcher takes measurements, or surveys, the sample population. The researcher then manipulates the sample population in some manner. After the manipulation, the researcher re-measures, or re-surveys, using the same procedures to determine if the manipulation possibly changed the measurements. During a "controlled" experiment, the researcher will separate the sample population into groups with one group established as the control group. All groups will be manipulated in some manner, except for the control group which will remain the same. An example of an experimental study: A group of students is interested in knowing if the number of times they can sink a basketball is related to the color of the basketball. The students shoot a series of baskets and record their success using a regulation colored basketball. They then switch to a blue colored basketball and shoot the same series of baskets. A statistical analysis is performed. 3. Observational study - In an observational study, the sample population being studied is measured, or surveyed, as it is. The researcher does not influence the population in any way or attempt to intervene in the study. There is no experimental manipulation. Instead, data is simply gathered and correlations are investigated.

An example of an observational study: A group of students is interested in knowing if there is a correlation between attending an SAT Prep class and scores achieved on the SAT Examination. The students use a survey to collect their data from both students who took an SAT Prep class and those that did not take an SAT Prep class. A statistical analysis is

performed. Primary and Secondary Data Primary Data: Data that has been collected from first-hand-experience is known as primary data. Primary data has not been published yet and is more reliable, authentic and objective. Primary data has not been changed or altered by human beings, therefore its validity is greater than secondary data. Sources for primary data are limited and at times it becomes difficult to obtain data from primary source because of either scarcity of population or lack of cooperation. Regardless of any difficulty one can face in collecting primary data; it is the most authentic and reliable data source. Following are some of the sources of primary data. 1. Experiment 2. Survey Secondary Data: Data collected from a source that has already been published in any form is called as secondary data. The review of literature in nay research is based on secondary data. MNostly from books, journals and periodicals. Sources of Secondary Data: Secondary data is often readily available. After the expense of electronic media and internet the availability of secondary data has become much easier. Published Printed Sources: There are variety of published printed sources. Their credibility depends on many factors. For example, on the writer, publishing company and time and date when published. New sources are preferred and old sources should be avoided as new technology and researches bring new facts into light. Books: Books are available today on any topic that you want to research. The use of books start before even you have selected the topic. After

selection of topics books provide insight on how much work has already been done on the same topic and you can prepare your literature review. Books are secondary source but most authentic one in secondary sources. Journals/periodicals: Journals and periodicals are becoming more important as far as data collection is concerned. The reason is that journals provide up-to-date information which at times books cannot and secondly, journals can give information on the very specific topic on which you are researching rather talking about more general topics. Magazines/Newspapers: Magazines are also effective but not very reliable. Newspaper on the other hand are more reliable and in some cases the information can only be obtained from newspapers as in the case of some political studies. Published Electronic Sources: As internet is becoming more advance, fast and reachable to the masses; it has been seen that much information that is not available in printed form is available on internet. In the past the credibility of internet was questionable but today it is not. The reason is that in the past journals and books were seldom published on internet but today almost every journal and book is available online. Some are free and for others you have to pay the price. e-journals: e-journals are more commonly available than printed journals. Latest journals are difficult to retrieve without subscription but if your university has an e-library you can view any journal, print it and those that are not available you can make an order for them. General websites; Generally websites do not contain very reliable information so their content should be checked for the reliability before quoting from them. Weblogs: Weblogs are also becoming common. They are actually diaries written by different people. These diaries are as reliable to use as personal written diaries. Unpublished Personal Records: Some unpublished data may also be useful in some cases.

Diaries: Diaries are personal records and are rarely available but if you are conducting a descriptive research then they might be very useful. The Anne Franks diary is the most famous example of this. That diary contained the most accurate records of Nazi wars. Letters: Letters like diaries are also a rich source but should be checked for their reliability before using them. Governement Records: Government records are very important for marketing, management, humanities and social science research. Census Data/population statistics: Health records Educational institutes records Public Sector Records: NGOs's survey data Other private companies records A probability sampling method is any method of sampling that utilizes some form of random selection. In order to have a random selection method, you must set up some process or procedure that assures that the different units in your population have equal probabilities of being chosen. Humans have long practiced various forms of random selection, such as picking a name out of a hat, or choosing the short straw. These days, we tend to use computers as the mechanism for generating random numbers as the basis for random selection. Non-probability sampling is a sampling technique where the samples are gathered in a process that does not give all the individuals in the population equal chances of being selected. Types of Probability Sampling Simple Random Sampling

Simple random sampling is the easiest form of probability sampling. All the researcher needs to do is assure that all the members of the population are included in the list and then randomly select the desired number of subjects. There are a lot of methods to do this. It can be as mechanical as picking strips of paper with names written on it from a hat while the researcher is blindfolded or it can be as easy as using a computer software to do the random selection for you. Stratified Random Sampling Stratified random sampling is also known as proportional random sampling. This is a probability sampling technique wherein the subjects are initially grouped into different classifications such as age, socioeconomic status or gender. Then, the researcher randomly selects the final list of subjects from the different strata. It is important to note that all the strata must have no overlaps. Researchers usually use stratified random sampling if they want to study a particular subgroup within the population. It is also preferred over the simple random sampling because it warrants more precise statistical outcomes. Systematic Random Sampling Systematic random sampling can be likened to an arithmetic progression wherein the difference between any two consecutive numbers is the same. Say for example you are in a clinic and you have 100 patients. The first thing you do is pick an integer that is less than the total number of the population; this will be your first subject e.g. (3). Select another integer which will be the number of individuals between subjects e.g. (5). You subjects will be patients 3, 8, 13, 18, 23, and so on. There is no clear advantage when using this technique.

Cluster Random Sampling Cluster random sampling is done when simple random sampling is almost impossible because of the size of the population. Just imagine doing a simple random sampling when the population in question is the entire population of Asia. In cluster sampling, the research first identifies boundaries, in case of our example; it can be countries within Asia. The researcher randomly selects a number of identified areas. It is important that all areas (countries) within the population be given equal chances of being selected. The researcher can either include all the individuals within the selected areas or he can randomly select subjects from the identified areas. Mixed/Multi-Stage Random Sampling This probability sampling technique involves a combination of two or more sampling techniques enumerated above. In most of the complex researches done in the field or in the lab, it is not suited to use just a single type of probability sampling. Most of the researches are done in different stages with each stage applying a different random sampling technique. Types of Non-Probability Sampling Convenience Sampling Convenience sampling is probably the most common of all sampling techniques. With convenience sampling, the samples are selected because they are accessible to the researcher. Subjects are chosen simply because they are easy to recruit. This technique is considered easiest, cheapest and least time consuming. Consecutive Sampling Consecutive sampling is very similar to convenience sampling except that it seeks to include ALL accessible subjects as part of the sample. This non-

probability sampling technique can be considered as the best of all nonprobability samples because it includes all subjects that are available that makes the sample a better representation of the entire population.

Quota Sampling Quota sampling is a non-probability sampling technique wherein the researcher ensures equal or proportionate representation of subjects depending on which trait is considered as basis of the quota. For example, if basis of the quota is college year level and the researcher needs equal representation, with a sample size of 100, he must select 25 1st year students, another 25 2nd year students, 25 3rd year and 25 4th year students. The bases of the quota are usually age, gender, education, race, religion and socioeconomic status. Judgmental Sampling Judgmental sampling is more commonly known as purposive sampling. In this type of sampling, subjects are chosen to be part of the sample with a specific purpose in mind. With judgmental sampling, the researcher believes that some subjects are more fit for the research compared to other individuals. This is the reason why they are purposively chosen as subjects. Snowball Sampling Snowball sampling is usually done when there is a very small population size. In this type of sampling, the researcher asks the initial subject to identify another potential subject who also meets the criteria of the research. The downside of using a snowball sample is that it is hardly representative of the population. The terms dependent and independent variables are used mostly in mathematics and statistics. The difference between a dependent and independent variable is that these to variables are used to distinguish between two different terms or variables. The independent variable is the one that will be changed in the experiment and the dependent variable will be the one to be observed to see how it is effected by the independent

variables change. The independent variable is usually located on the y-axis and the dependent variable usually lies on the x-axis. Using Independent and Dependent Variables While the definition is more-or-less universal, the application varies slightly between statistical experiments and mathematics. For example: If a scientist conducts an experiment to test the theory that a vitamin could extend a persons life-expectancy, then the independent variable is the amount of vitamin that is given to the subjects within the experiment. This is controlled by the experimenting scientist. The dependent variable, or the variable being affected by the independent variable in this case, is life span. It varies from person to person within each group, and is what is being tested; that is, whether or not the people given the vitamin live, on average, longer than the people not given the vitamin. The scientist might then conduct further experiments to increase the number of independent variables -- gender, ethnicity, overall health, etc. -- in order to narrow down the specific effects of the vitamin. Here are some other examples of dependent and independent variables in science: A scientist studies the impact of a drug on cancer. The independent variable is the administration of the drug. The dependent variable is the impact the drug has on cancer. A scientist studies the impact of withholding affection on rats. The independent variable is the affection. The dependent variable is the reaction of the rats. A scientist studies how many days people can eat soup until they get sick. The independent variable is the number of days of consuming soup. The dependent variable is the onset of illness. Independent and Dependent Variables in Math

In mathematics, the x and y values in an equation or a graph are referred to as "variables." If an equation shows a relationship between x and y in which y is specified in terms of x, y is known as the dependent variable and is sometimes referred to as function(x) or f(x). The final solution of the equation, y, depends on the value of x, the independent variable which can be changed. Discrete data can only take particular values. There may potentially be an infinite number of those values, but each is distinct and there's no grey area in between. Discrete data can be numeric -- like numbers of apples -but it can also be categorical -- like red or blue, or male or female, or good or bad.

Continuous data are not restricted to defined separate values, but can occupy any value over a continuous range. Between any two continuous data values there may be an infinite number of others. Continuous data are always essentially numeric.

It sometimes makes sense to treat numeric data that is properly of one type as being of the other. For example, something like height is continuous, but often we don't really care too much about tiny differences and instead group heights into a number of discrete bins. Conversely, if we're counting large amounts of some discrete entity -- grains of rice, or termites, or pennies in the economy -- we may choose not to think of 2,000,006 and 2,000,008 as crucially different values but instead as nearby points on an approximate continuum.

It can also sometimes be useful to treat numeric data as categorical, eg: underweight, normal, obese. This is usually just another kind of binning.

It seldom makes sense to consider categorical data as continuous.

Differences Between Discrete and Continuous Data

Before getting into this, you need to know How to calculate simple statistics. Several differences between discrete and continuous data would be that "continuous" data is:

measured and represented by an infinite number of values and can possess any value, and has no natural category, meaning we cannot precisely measure its category. For example, categories such as:

the number of weight the number of width, or the number in length cannot be measured because their values could be or are infinite.

Whereas discrete data can only possess:

a specific value and can only represent a few values. (It is what it is and it's measures are limited).

Discrete data however, unlike continuous, does possess:

natural categories. In Statistics for example, when determining the age of 100 people, discrete data sets are used in categories to classify the different ages of the 100 people. My example below is considered a "natural category."

For Example,

category 1 could represent 1 year -10 years old category 2 is 11 - 20 years old...and so on...up to any category, e.g. 20 is 190 years old - 200 years old. Each category represents a population or group of people. Although category 20 is highly unlikely, it will keep rising higher and higher until determined by the statistician that higher categories will not be required or will eventually stop when all of the 100 people are placed in their specific categories, thus placing a discrete limit on the number of categories.

Another example of discrete data would be to categorize how far two people can run without stopping. Eventually both will stop, either for rest, or, to quit. Unless of course someone can run a "continuous" infinite distance without stopping, in that case that would be a miracle, and I don't believe that miracles can be measured statistically or scientifically. A type of data is discrete if there are only a finite number of values possible or if there is a space on the number line between each 2 possible values. Ex. A 5 question quiz is given in a Math class. The number of correct answers on a student's quiz is an example of discrete data. The number of correct answers would have to be one of the following : 0, 1, 2, 3, 4, or 5.

There are not an infinite number of values, therefore this data is discrete. Also, if we were to draw a number line and place each possible value on it, we would see a space between each pair of values. Ex. In order to obtain a taxi license in Las Vegas, a person must pass a written exam regarding different locations in the city. How many times it would take a person to pass this test is also an example of discrete data. A person could take it once, or twice, or 3 times, or 4 times, or . So, the possible values are 1, 2, 3, . There are infinitely many possible values, but if we were to put them on a number line, we would see a space between each pair of values. Discrete data usually occurs in a case where there are only a certain number of values, or when we are counting something (using whole numbers). Continuous data makes up the rest of numerical data. This is a type of data that is usually associated with some sort of physical measurement. Ex. The height of trees at a nursery is an example of continuous data. Is it possible for a tree to be 76.2" tall? Sure. How about 76.29"? Yes. How about 76.2914563782"? You betcha! The possibilities depends upon the accuracy of our measuring device. One general way to tell if data is continuous is to ask yourself if it is possible for the data to take on values that are fractions or decimals. If your answer is yes, this is usually continuous data. Ex. The length of time it takes for a light bulb to burn out is an example of continuous data. Could it take 800 hours? How about 800.7? 800.7354? The answer to all 3 is yes. Four scales of measurement The first is Nominal. It is data that iscategorized and can't be arranged in an order from low to high; e.g. answer a question yes or no, colors of cars in a parking lot, race and gender. The second is Ordinal. It is data that is categorized and can be arranged in an order from low to high, but differences can not be determined, or are meaningless; e.g. grades A, B, C, D, E or grade levels 9th, 10th, 11th, 12th,

or survey type questions such as do not like, somewhat like, like, love; or another is movie ratings. The third is Interval. It is the ordinal scale, but with the additional property that the difference is meaningful between the data, but does not have a natural zero starting point; e.g. the years (2009, 2000, 1610, etc), the temperature scale (50, 68, 90, oF etc). The fourth is Ratio. It is the interval scale, but with the additional property that it does have a natural zero starting point. Money and weight are examples; 0 money means you have none, and $4.00 is 2 X $2. Likewise with weight; no weight means there is none and 4 pounds is 2 x 2 pounds; the same analogy applies with height.

You might also like