You are on page 1of 4

Determining linear correlation between exam scores and absences of BS Physics and BS Applied Physics students

Ritz Aguilar, Mac Aydinan, Pio Calderon, and Karl Cedric Gonzales National Institute of Physics, University of the Philippines, Diliman, Quezon City Corresponding author: @nip.upd.edu.ph

Abstract
The linear correlation coecient r of each combination of long exam scores, quizzes, nals, and absences of BS Applied Physics and BS Physics students was obtained. The p-values of the obtained correlation coecients were obtained, after which the obtained r values were tested at the 1% and 5% levels to determine if there exists signicant correlation among these variables. It was determined that signicant correlation among long exam scores, quiz scores and nal exam scores exists. On the other hand, the number of absences was not correlated with any of the exam scores. There is no conclusion to be drawn regarding who the smarter group is between the BS Applied Physics and BS Physics students. Keywords: correlation

1.

Introduction

The sample linear correlation coecient, rxy or simply r, measures the strength and direction of the linear association between two quantitative variables, X and Y . In honor of the developer of the linear correlation coecient, Karl Pearson, the term is also referred to as the Pearson product moment correlation coecient. The linear correlation coecient has a wide variety of applications on dierent types of studies, i.e. study of the economical aspects of dierent nations such as correlation of GNP per capita and trade, study of the correlation of heights of the males and females with the families they belong to, and study of the academic performance of students through correlation of grades of their exams, quizzes, and absences. In this study, a given data containing the quiz scores, the long exam scores, the nal exam score, and the absences of 45 B.S. Applied Physics and B.S. Physics students was analyzed by determining the correlation of the specied quantitative variables to one another. Moreover, the goals for the quantitative analysis of the data are the following: (i) the determination of the correlation of a high (low) LQ score with high (low) long exam scores; LQ1 vs E1, LQ2 vs E2 and LQ3 vs E3, (ii) the determination of the correlation of high (low) quiz (LQ1, LQ2 and LQ3) and long exam scores (E1, E2 and E3) with high (low) nal exam scores, (iii) the determination of the correlation of low exam scores (quizzes and long exams) with absences, and nally, (iv) the conclusion of which group of students are smarter: B.S. Applied Physics students or the B.S. Physics students; based from the gathered linear correlation coecient results among the data.

2. Theory 2..1 Coecient of linear correlation r

Given N pairs of measurements of x and y , (x1 , y1 ) to (xN , yN ), respectively, the measure of the extent at which a linear relation between x and y of the form y = Ax + b can be justied is given by the coecient of linear correlation r [1]. Mathematically, it is related to the covariance xy of x and y, and the standard deviation of x and y, x and y , by [1] xy r= . (1) x y 1

In terms of the individual measurements (xi , yi ) and the mean of x and y x , y , respectively, equation 1 may be written as [1] (xi x )(yi y ) (2) r= 2 (yi y )2 (xi x ) The coecient of linear correlation r is a number from -1 to 1. It is a measure of how well the data points (xi , yi ) t a straight line. If the obtained r is close to 1, then it can be concluded that x and y are correlated. On the other hand, when r is close to 0, then it can be concluded that x and y are uncorrelated [1].

2..2

Signicance of correlation: the p-value

To make calculations on the correlation between two values more rigorous, the p-value is considered. Specically, for the case of the coecient of linear correlation, the p-value is the probability P robN (|r| r0 ) that the correlation coecient r of a data set of N pairs of uncorrelated measurements x and y is larger than a predetermined value r0 [1]. These probabilities for dierent N and r0 are usually tabulated in statistics textbooks [1]. More generally, the p-value of a test statistic is the probability of obtaining a value of the test statistic higher than a predetermined value, given that the null hypothesis is true [2]. In this case, the null hypothesis is that the measurements x and y are uncorrelated. The two alternate hypotheses are that the two measurements 1.) are negatively correlated or 2.) are positively correlated. For this case, we use a two-tailed test, which tests for signicance of correlation in both directions, since there is a possibility of positive or negative correlation, in lieu of a one-tailed test, which tests for signicance in one direction only [2]. Given a data set of two measurements x and y , we rst calculate the correlation coecient r given in equation 2. We set this value to r0 and look up P robN (|r| r0 ). If the obtained probability is small, then it is very unlikely that uncorrelated variables would have produced the calculated r0 , making it very likely that the variables x and y are correlated. As a rule of thumb, if the probability P robN (|r| r0 ) is less than 5% (termed condence level), then the correlation is said to be signicant, while if it is less than 1%, the correlation is said to be highly signicant [1].

3.

Methodology

The Physics 103 exam scores (three long quizzes, three exams, one nal exam) and absence tally of 45 undergraduate (28 BS Applied Physics, 17 BS Physics) students of the National Institute of Physics were tabulated. The coecient of linear correlation r of the long quiz averages (i.e. the average of the three long quizzes) and exam averages (i.e. the average of the three exams) was obtained. Afterwards, the p-value of a two-tailed test for correlation was obtained and the correlation was tested at the 5% and 1% condence level to see if the correlation was insignicant, signicant or highly signicant. The procedure of obtaining the correlation coecient r, obtaining the p-value, and testing the signicance of correlation was repeated for the nth long quiz scores (LQn) and the nth exam scores (En), n = 1,2,3; the long quiz averages and the nal exam scores; the nth long quiz scores (LQn), n = 1,2,3,, and the nal exam scores; the exam averages and the nal exam scores; the nth exam scores (En), n = 1,2,3,, and the nal exam scores; the long quiz averages and absences; the exam averages and absences; and the nal exam scores and absences. In addition, the average long quiz score, the average exam score and the average nal exam score of BS Physics and BS Applied Physics students were separately calculated and tabulated. This served to compare the average scores of students of the two degree programs to see whether which one between the two may be considered as the smarter group. 2

4.

Results and Discussion

From Table 1, it is evident that long quiz scores are strongly correlated with both long exam scores and nal exam scores: their p-values are way below both the 5% and 1% condence levels. The average long exam scores are also strongly correlated with the nal exam scores, but this is not true for the individual exam scores: the rst exam scores are only moderately correlated with the nal exam scores, as seen from their p falling being between the 5% and 1% condence levels, while the second exam scores are completely uncorrelated with the nal exam scores. Meanwhile, the number of absences are completely uncorrelated with with long quiz, long exam and nal exam scores: their p values are, in fact, much higher than the 5% condence level. In fact, their p-values are all higher than 35%, implying strong uncorrelation. These numbers mean that students performances in all types of exams are very much correlated with each other. A student getting a relatively high long quiz score, for instances, has a very high probability of getting relatively high long quiz and nal exam scores as well, while those who fail the long quiz are likely to fail the long exam and nal exams. This may be attributable to the fact that these scores are related to the students study habits: if a student has good study habits, they are likely to get high scores in all three exams, while if they have poor study habits, they are likely to get low scores in all the said exams. On the other hand, the absences are strongly uncorrelated with any of the exam scores. This may be attributed to the fact that most students study on their own, so even if they miss classes, they still have a high chance of getting high exam scores if they study well. This implies, too, that the self-studying is sucient in getting high exam scores. From Table 2, it is evident that while BS Physics students have higher long quiz and long exam scores and BS Applied Physics have higher nal exam scores, their average scores are relatively close to each other. Because their scores are comparable, no conclusion can be drawn as to which type of student performs better in exams. It is worth pointing out that statistical errors in this analysis may be present. For one, a lot of the students in the class that took the subject under investigation have shifted out of their respective degree programs, so the analysis in the comparison of degree programs may have changed if the statistical study is performed at present.

5.

Conclusion

The linear correlation coecient and p-value was used to analyze the data of the exam scores, quizzes, and absences of B.S. Applied Physics and B.S. Physics students in their Physics 103 course. From the obtained correlation coecient and p-value results from the correlated quantitative variables, it was found out that the long exam scores, quiz scores, and nal exam scores are all correlated with one another. The number of absences, however, is not correlated in any way from the other quantitative variables.

Appendix
Tables
Table 1: Correlation coecient, two-tailed p-value, and correlation of long quiz scores, exam scores and nal exam scores with each other under 5% and 1% condence levels.

Table 2: Average exam scores of BS Physics and BS Applied Physics students.

Acknowledgments
We would like to thank Sir Wilson Garcia for providing the set of data which was analyzed in this paper.

References
[1] [2] John R. Taylor. An Introduction to Error Analysis. University Science Books, 1997. Timothy C Urdan. Statistics in Plain English. Routledge, 2010.

You might also like