10-1. Linear Relationships Between Variables
Last updated
Last updated
Our interest in this chapter is in situations in which we can associate to each element of a population or sample two measurements and , particularly in the case that it is of interest to use the value of to predict the value of . For example, the population could be the air in automobile garages, could be the electrical current produced by an electrochemical reaction taking place in a carbon monoxide meter, and the concentration of carbon monoxide in the air. In this chapter we will learn statistical methods for analyzing the relationship between variables and in this context.
A list of all the formulas that appear anywhere in this chapter are collected in the last section for ease of reference.
To learn what it means for two variables to exhibit a relationship that is close to linear but which contains an element of randomness.
The following table gives examples of the kinds of pairs of variables which could be of interest from a statistical point of view.
We can plot these data by choosing a pair of perpendicular lines in the plane, called the coordinate axes, as shown in Figure 10.1 "Plot of Celsius and Fahrenheit Temperature Pairs".
Figure 10.1 Plot of Celsius and Fahrenheit Temperature Pairs
(68, 151) (69, 146) (70, 157) (70, 164) (71, 171) (72, 160) (72, 163) (72, 180) (73, 170) (73, 175) (74, 178) (75, 188)
The points appear to be following a line, but not exactly. There is an element of randomness present.
Figure 10.2 Plot of Height and Weight Pairs
The first line in the table is different from all the rest because in that case and no other the relationship between the variables is deterministic: once the value of is known the value of is completely determined. In fact there is a formula for in terms of : . Choosing several values for and computing the corresponding value for for each one using the formula gives the table
−40 −15 0 20 50 −40 5 32 68 122
Then to each pair of numbers in the table we associate a unique point in the plane, the point that lies units to the right of the vertical axis (to the left if ) and units above the horizontal axis (below if ).
The relationship between and is called a linear relationship because the points so plotted all lie on a single straight line. The number in the equation is the slope of the line, and measures its steepness. It describes how changes in response to a change in : if increases by 1 unit then increases (since is positive) by unit.
If the slope had been negative then would have decreased in response to an increase in . The number 32 in the formula is the -intercept of the line; it identifies where the line crosses the -axis. You may recall from an earlier course that every non-vertical line in the plane is described by an equation of the form , where is the slope of the line and is its -intercept.
The relationship between and in the temperature example is deterministic because once the value of is known, the value of is completely determined.
In contrast, all the other relationships listed in the table above have an element of randomness in them. Consider the relationship described in the last line of the table, the height of a man aged 25 and his weight . If we were to randomly select several 25-year-old men and measure the height and weight of each one, we might obtain a collection of pairs something like this:
A plot of these data is shown in Figure 10.2 "Plot of Height and Weight Pairs". Such a plot is called a scatter diagram or scatter plot. Looking at the plot it is evident that there exists a linear relationship between height and weight , but not a perfect one.
In this chapter we will analyze situations in which variables and exhibit such a linear relationship with randomness. The level of randomness will vary from situation to situation. In the introductory example connecting an electric current and the level of carbon monoxide in air, the relationship is almost perfect. In other situations, such as the height and weights of individuals, the connection between the two variables involves a high degree of randomness. In the next section we will see how to quantify the strength of the linear relationship between two variables.
Predictor or independent variable
Response or dependent variable
Temperature in degrees Celsius
Temperature in degrees Fahrenheit
Area of a house (sq.ft.)
Value of the house
Age of a particular make and model car
Resale value of the car
Amount spent by a business on advertising in a year
Revenue received that year
Height of a 25-year-old man
Weight of the man