10-2. The Linear Correlation Coefficient

Figure 10.3 "Linear Relationships of Varying Strengths" illustrates linear relationships between two variables xx and yy of varying strengths. It is visually apparent that in the situation in panel (a), xx could serve as a useful predictor of yy , it would be less useful in the situation illustrated in panel (b), and in the situation of panel (c) the linear relationship is so weak as to be practically nonexistent. The linear correlation coefficient is a number computed directly from the data that measures the strength of the linear relationship between the two variables xx and yy.

Figure 10.3 Linear Relationships of Varying Strengths

The linear correlation coefficient for a collection of nn pairs (x,y)(x,y) of numbers in a sample is the number rr given by the formula

r=SSxySSxxâ‹…SSyyr= \frac{SS_{xy}} {\sqrt{SS_{xx}â‹…SS_{yy}}}

where SSxx=Σx2−1n(Σx)2SS_{xx}=Σx^2− \frac{1}{n}(Σx)^2 ,  SSxy=Σxy−1n(Σx)(Σy)SS_{xy}=Σxy− \frac{1}{n}(Σx)(Σy) ,   SSyy=Σy2−1n(Σy)2SS_{yy}=Σy^2− \frac{1}{n}(Σy)^2

The linear correlation coefficient has the following properties, illustrated in Figure 10.4 "Linear Correlation Coefficient ":

  1. The value of rr lies between −1 and 1, inclusive.

  2. The sign of rr indicates the direction of the linear relationship between xx and yy:

    1. If r<0r<0 then yy tends to decrease as xx is increased.

    2. If r>0r>0 then yytends to increase as xx is increased.

  3. The size of ∣r∣|r| indicates the strength of the linear relationship between xx and yy:

    1. If ∣r∣|r| is near 1 (that is, if rr is near either 1 or −1) then the linear relationship between xx and yy is strong.

    2. If ∣r∣|r| is near 0 (that is, if rr is near 0 and of either sign) then the linear relationship between xx and yy is weak.

Figure 10.4 Linear Correlation Coefficient R

Pay particular attention to panel (f) in Figure 10.4 "Linear Correlation Coefficient ". It shows a perfectly deterministic relationship between xx and yy, but r=0r=0 because the relationship is not linear. (In this particular case the points lie on the top half of a circle.)

EXAMPLE 1. Compute the linear correlation coefficient for the height and weight pairs plotted in Figure 10.2 "Plot of Height and Weight Pairs".

[ Solution ]

Even for small data sets like this one computations are too long to do completely by hand. In actual practice the data are entered into a calculator or computer and a statistics program is used. In order to clarify the meaning of the formulas we will display the data and related quantities in tabular form. For each (x,y)(x,y) pair we compute three numbers: x2x^2 , xyxy , and y2y^2 , as shown in the table provided. In the last line of the table we have the sum of the numbers in each column. Using them we compute:

SSxx=Σx2−1n(Σx)2=61537−859212=46.916SS_{xx}=Σx^2− \frac{1}{n}(Σx)^2 = 61537 - \frac{859^2}{12} = 46.916

SSxy=Σxy−1n(Σx)(Σy)=143626−(859)(2003)12=244.583SS_{xy}=Σxy− \frac{1}{n}(Σx)(Σy) = 143626− \frac{(859)(2003)}{12}=244.583

SSyy=Σy2−1n(Σy)2=336025−(2003)212=1690.916SS_{yy}=Σy^2− \frac{1}{n}(Σy)^2 = 336025− \frac{(2003)^2}{12}=1690.916

so that

r=SSxySSxxâ‹…SSyy=244.583(46.916)(1690.916)=0.868r= \frac{SS_{xy}} {\sqrt{SS_{xx}â‹…SS_{yy}}} = \frac{244.583 } {\sqrt {(46.916)(1690.916)}} = 0.868

The number r=0.868r=0.868 quantifies what is visually apparent from Figure 10.2 "Plot of Height and Weight Pairs": weights tends to increase linearly with height ( rr is positive) and although the relationship is not perfect, it is reasonably strong (rr is near 1).

x <- c(68, 69, 70, 70, 71, 72, 72, 72, 73, 73, 74, 75)
y <- c(151, 146, 157, 164, 171, 160, 163, 180, 170, 175, 178, 188)

cor(x, y, method="pearson")

Last updated