10-1. Linear Relationships Between Variables

Our interest in this chapter is in situations in which we can associate to each element of a population or sample two measurements xx and yy , particularly in the case that it is of interest to use the value of xx to predict the value of yy. For example, the population could be the air in automobile garages, xx could be the electrical current produced by an electrochemical reaction taking place in a carbon monoxide meter, and yy the concentration of carbon monoxide in the air. In this chapter we will learn statistical methods for analyzing the relationship between variables xx and yy in this context.

A list of all the formulas that appear anywhere in this chapter are collected in the last section for ease of reference.

10.1 Linear Relationships Between Variables

LEARNING OBJECTIVE

  1. To learn what it means for two variables to exhibit a relationship that is close to linear but which contains an element of randomness.

The following table gives examples of the kinds of pairs of variables which could be of interest from a statistical point of view.

xx

yy

Predictor or independent variable

Response or dependent variable

Temperature in degrees Celsius

Temperature in degrees Fahrenheit

Area of a house (sq.ft.)

Value of the house

Age of a particular make and model car

Resale value of the car

Amount spent by a business on advertising in a year

Revenue received that year

Height of a 25-year-old man

Weight of the man

The first line in the table is different from all the rest because in that case and no other the relationship between the variables is deterministic: once the value of xx is known the value of yy is completely determined. In fact there is a formula for yy in terms of xx: y=95x+32y=\frac{9}{5}x+32 . Choosing several values for xx and computing the corresponding value for yy for each one using the formula gives the table

xx −40 −15 0 20 50 yy −40 5 32 68 122

We can plot these data by choosing a pair of perpendicular lines in the plane, called the coordinate axes, as shown in Figure 10.1 "Plot of Celsius and Fahrenheit Temperature Pairs".

Then to each pair of numbers in the table we associate a unique point in the plane, the point that lies xx units to the right of the vertical axis (to the left if x<0x<0 ) and yy units above the horizontal axis (below if y<0y<0 ).

The relationship between xx and yy is called a linear relationship because the points so plotted all lie on a single straight line. The number 95\frac{9}{5} in the equation y=95x+32y=\frac{9}{5}x+32 is the slope of the line, and measures its steepness. It describes how yy changes in response to a change in xx: if xx increases by 1 unit then yy increases (since95\frac{9}{5} is positive) by 95\frac{9}{5} unit.

If the slope had been negative then yy would have decreased in response to an increase in xx. The number 32 in the formula y=95x+32y=\frac{9}{5}x+32 is the yy-intercept of the line; it identifies where the line crosses the yy-axis. You may recall from an earlier course that every non-vertical line in the plane is described by an equation of the form y=mx+b y=mx+b , where mm is the slope of the line and bb is its yy-intercept.

Figure 10.1 Plot of Celsius and Fahrenheit Temperature Pairs

The relationship between xxand yy in the temperature example is deterministic because once the value of xx is known, the value of yy is completely determined.

In contrast, all the other relationships listed in the table above have an element of randomness in them. Consider the relationship described in the last line of the table, the height xx of a man aged 25 and his weight yy. If we were to randomly select several 25-year-old men and measure the height and weight of each one, we might obtain a collection of (x,y)(x,y) pairs something like this:

(68, 151) (69, 146) (70, 157) (70, 164) (71, 171) (72, 160) (72, 163) (72, 180) (73, 170) (73, 175) (74, 178) (75, 188)

A plot of these data is shown in Figure 10.2 "Plot of Height and Weight Pairs". Such a plot is called a scatter diagram or scatter plot. Looking at the plot it is evident that there exists a linear relationship between height xx and weight yy, but not a perfect one.

The points appear to be following a line, but not exactly. There is an element of randomness present.

Figure 10.2 Plot of Height and Weight Pairs

In this chapter we will analyze situations in which variables xx and yy exhibit such a linear relationship with randomness. The level of randomness will vary from situation to situation. In the introductory example connecting an electric current and the level of carbon monoxide in air, the relationship is almost perfect. In other situations, such as the height and weights of individuals, the connection between the two variables involves a high degree of randomness. In the next section we will see how to quantify the strength of the linear relationship between two variables.

x <- c(68, 69, 70, 70, 71, 72, 72, 72, 73, 73, 74, 75)
y <- c(151, 146, 157, 164, 171, 160, 163, 180, 170, 175, 178, 188)

plot(x, y, xlab="Height", ylab="Weight")
abline(lm(y ~ x))

Last updated