10-5. Statistical Inferences

The parameter β1β_1 , the slope of the population regression line, is of primary importance in regression analysis because it gives the true rate of change in the mean E(y)E(y) in response to a unit increase in the predictor variable xx . For every unit increase in x the mean of the response variable yy changes by β1β_1 units, increasing if β1>0β_1>0 and decreasing if β1<0β_1 <0. We wish to construct confidence intervals for β1β_1 and test hypotheses about it.

1. Confidence Intervals for β1β_1

The slope β1^\hat{β_1} of the least squares regression line is a point estimate of β1β_1. A confidence interval for β1β_1 is given by the following formula.

100(1α)%100(1−α)\% Confidence Interval for the Slope β1β_1 of the Population Regression Line

β1^±tα2sεSSxx\hat{β_1}±t_{α∕2} \frac {s_ε} {\sqrt{SS_{xx}}}

where sε=SSEn2s_ε=\sqrt{ \frac {SSE}{n−2}} and the number of degrees of freedom is df=(n2)df=(n−2).

The assumptions listed in Section 10.3 "Modelling Linear Relationships with Randomness Present" must hold.

The statistic sεs_ε is called the sample standard deviation of errors. It estimates the standard deviation σσ of the errors in the population of yy-values for each fixed value of xx(see Figure 10.5 "The Simple Linear Model Concept" in Section 10.3 "Modelling Linear Relationships with Randomness Present").

EXAMPLE 6. Construct the 95% confidence interval for the slope β1β_1 of the population regression line based on the five-point sample data

x 2 2 6 8 10 y 0 1 2 3 3

[ Solution ]

The point estimate β1^\hat{β_1} of β1β_1 was computed in Note 10.18 "Example 2" in Section 10.4 "The Least Squares Regression Line" as β1^=0.34375\hat{β_1}=0.34375.

In the same example SSxxSS_{xx} was found to be SSxx=51.2SS_{xx} = 51.2.

The sum of the squared errors SSESSE was computed in Note 10.23 "Example 4" in Section 10.4 "The Least Squares Regression Line" as SSE=0.75SSE = 0.75 .

Thus sε=SSEn2=0.753=0.50s_ε=\sqrt{ \frac {SSE}{n−2}} = \sqrt{ \frac{0.75}{3} } = 0.50

Confidence level 95% means α=(10.95)=0.05α=(1−0.95)=0.05 so α2=0.025α∕2=0.025 . From the row labeled df=3df=3 in Figure 12.3 "Critical Values of " we obtain t0.025=3.182t_{0.025}=3.182 .

Therefore

β1^±tα2sεSSxx=0.34375±0.505.12=0.34375±0.2223\hat{β_1}±t_{α∕2} \frac {s_ε} {\sqrt{SS_{xx}}} = 0.34375 ± \frac{0.50}{\sqrt{5.12}} = 0.34375±0.2223

which gives the interval (0.1215,0.5661) (0.1215,0.5661). We are 95% confident that the slope β1β_1 of the population regression line is between 0.1215 and 0.5661.

EXAMPLE 7. Using the sample data in Table 10.3 "Data on Age and Value of Used Automobiles of a Specific Make and Model" construct a 90% confidence interval for the slope β1β_1 of the population regression line relating age and value of the automobiles of Note 10.19 "Example 3" in Section 10.4 "The Least Squares Regression Line". Interpret the result in the context of the problem.

[ Solution ]

The point estimate β1^\hat{β_1} of β1β_1 was computed in Note 10.19 "Example 3", as was SSxxSS_{xx}. Their values are β1^=2.05\hat{β_1}=-2.05 and SSxx=14SS_{xx}=14.

The sum of the squared errors SSESSE was computed in Note 10.24 "Example 5" in Section 10.4 "The Least Squares Regression Line" as SSE=28.946SSE=28.946.

Thus

sε=SSEn2=28.9468=1.902169814s_ε=\sqrt{ \frac {SSE}{n−2}} = \sqrt{ \frac{28.946}{8} } = 1.902169814

Confidence level 90% means α=(10.90)=0.10α=(1−0.90)=0.10 so α2=0.05α∕2=0.05 . From the row labeled df=8df=8 in Figure 12.3 "Critical Values of " we obtain t0.05=1.860t_{0.05}=1.860 .

Therefore

β1^±tα2sεSSxx=2.05±1.8601.90216981414=2.05±0.95\hat{β_1}± t_{α∕2}\frac {s_ε} {\sqrt{SS_{xx}}} = -2.05 ± 1.860\frac{1.902169814}{\sqrt{14}} = −2.05±0.95

which gives the interval (3.00,1.10)(−3.00,−1.10). We are 90% confident that the slope β1β_1 of the population regression line is between −3.00 and −1.10. In the context of the problem this means that for vehicles of this make and model between two and six years old we are 90% confident that for each additional year of age the average value of such a vehicle decreases by between $1,100 and $3,000.

2. Testing Hypotheses About β1β_1

Hypotheses regarding β1β_1 can be tested using the same five-step procedures, either the critical value approach or the p-value approach, that were introduced in Section 8.1 "The Elements of Hypothesis Testing" and Section 8.3 "The Observed Significance of a Test" of Chapter 8 "Testing Hypotheses". The null hypothesis always has the form H0:β1=B0H_0:β_1=B_0 where B0B_0 is a number determined from the statement of the problem. The three forms of the alternative hypothesis, with the terminology for each case, are:

Form of HaH_a

Terminology

Ha:β1<B0H_a:β_1<B_0

Left-tailed

Ha:β1>B0H_a:β_1>B_0

Right-tailed

Ha:β1B0H_a:β_1\ne B_0

Two-tailed

The value zero for B0B_0 is of particular importance since in that case the null hypothesis is H0:β1=0H_0:β_1=0 , which corresponds to the situation in which xx is not useful for predicting yy . For if β1=0β_1=0 then the population regression line is horizontal, so the mean E(y)E(y) is the same for every value of xx and we are just as well off in ignoring xx completely and approximating yy by its average value.

Given two variables xx and yy, the burden of proof is that xx is useful for predicting yy, not that it is not. Thus the phrase “test whether xx is useful for prediction of yy,” or words to that effect, means to perform the test

H0:β1=0H_0:β_1=0 vs. Ha:β1B0H_a:β_1\ne B_0

Standardized Test Statistic for Hypothesis Tests Concerning the Slope β1β_1 of the Population Regression Line

to=(β1^B0)/sεSSxxt_o=(\hat{β_1}−B_0) / \frac {s_ε} {\sqrt{SS_{xx}}}

The test statistic has Student’s t-distribution with df=(n2)df=(n−2) degrees of freedom.

The assumptions listed in Section 10.3 "Modelling Linear Relationships with Randomness Present" must hold.

EXAMPLE 8. Test, at the 2% level of significance, whether the variable xx is useful for predicting yy based on the information in the five-point data set

x 2 2 6 8 10 y 0 1 2 3 3

[ Solution ]

We will perform the test using the critical value approach.

  • Step 1. Since xxis useful for prediction of yy precisely when the slope β1β_1 of the population regression line is nonzero, the relevant test is   H0:β1=0H_0:β_1=0 vs. Ha:β10@α=0.02H_a:β_1\ne 0 @ α=0.02

  • Step 2. The test statistic is

    to=(β1^B0)/sεSSxxt_o=(\hat{β_1}−B_0) / \frac {s_ε} {\sqrt{SS_{xx}}}

    and has Student’s t-distribution with (n2)=(52)=3(n−2)=(5−2)=3 degrees of freedom.

  • Step 3. From Note 10.18 "Example 2", β1^=0.34375\hat{β_1}=0.34375 and SSxx=51.2SS_{xx}=51.2. From Note 10.30 "Example 6", sε=0.50s_ε=0.50 . The value of the test statistic is therefore to=(β1^B0)/sεSSxx=(0.343750)/0.5051.2=4.919t_o=(\hat{β_1}−B_0) / \frac {s_ε} {\sqrt{SS_{xx}}} =(0.34375 - 0) / \frac{ 0.50}{\sqrt{51.2}} =4.919

  • Step 4. Since the symbol in Ha is “≠” this is a two-tailed test, so there are two critical values ±tα2=±t0.01±t_{α∕2}=±t_{0.01}. Reading from the line in Figure 12.3 "Critical Values of " labeled df=3df=3 , t0.01=4.541t_{0.01}=4.541 . The rejection region is (,4.541][4.541,)(−∞,−4.541]∪[4.541,∞) .

  • Step 5. As shown in Figure 10.9 "Rejection Region and Test Statistic for " the test statistic falls in the rejection region. The decision is to reject H0H_0.

  • In the context of the problem our conclusion is:

    The data provide sufficient evidence, at the 2% level of significance, to conclude that the slope of the population regression line is nonzero, so that xx is useful as a predictor of yy.

Figure 10.9 Rejection Region and Test Statistic for Note 10.33 "Example 8"

EXAMPLE 9. A car salesman claims that automobiles between two and six years old of the make and model discussed in Note 10.19 "Example 3" in Section 10.4 "The Least Squares Regression Line" lose more than $1,100 in value each year. Test this claim at the 5% level of significance.

[ Solution ]

We will perform the test using the critical value approach.

  • Step 1. In terms of the variables xx and yy, the salesman’s claim is that if xx is increased by 1 unit (one additional year in age), then yy decreases by more than 1.1 units (more than $1,100). Thus his assertion is that the slope of the population regression line is negative, and that it is more negative than −1.1. In symbols, β1<1.1β_1<−1.1 . Since it contains an inequality, this has to be the alternative hypotheses. The null hypothesis has to be an equality and have the same number on the right hand side, so the relevant test is H0:β1=1.1H_0:β_1=-1.1 vs. Ha:β1<1.1,@α=0.05H_a:β_1 < -1.1 , @ α=0.05

  • Step 2. The test statistic is to=(β1^B0)/sεSSxxt_o=(\hat{β_1}−B_0) / \frac {s_ε} {\sqrt{SS_{xx}}}

    and has Student’s t-distribution with 8 degrees of freedom.

  • Step 3. From Note 10.19 "Example 3", β1^=2.05\hat{β_1}=−2.05 and SSxx=14SS_{xx}=14 . From Note 10.31 "Example 7", sε=1.902169814s_ε=1.902169814 . The value of the test statistic is therefore to=(β1^B0)/sεSSxx={2.05(1.1)}/1.90216981414=1.869t_o=(\hat{β_1}−B_0) / \frac {s_ε} {\sqrt{SS_{xx}}} = \{-2.05 - (-1.1)\} / \frac{1.902169814}{\sqrt{14}} = -1.869

  • Step 4. Since the symbol in HaH_a is “ << ” this is a left-tailed test, so there is a single critical value tα=t0.05−t_α=−t_{0.05} . Reading from the line in Figure 12.3 "Critical Values of " labeled df=8df=8 , t0.05=1.860t_{0.05}=1.860 . The rejection region is (,1.860](−∞,−1.860] .

  • Step 5. As shown in Figure 10.10 "Rejection Region and Test Statistic for " the test statistic falls in the rejection region. The decision is to reject H0H_0 .

  • In the context of the problem our conclusion is:

    The data provide sufficient evidence, at the 5% level of significance, to conclude that vehicles of this make and model and in this age range lose more than $1,100 per year in value, on average.

Figure 10.10 Rejection Region and Test Statistic for Note 10.34 "Example 9"

Last updated