# 10-5. Statistical Inferences

The parameter $$β\_1$$ , the slope of the population regression line, is of primary importance in regression analysis because it gives the true rate of change in the mean $$E(y)$$ in response to a unit increase in the predictor variable $$x$$ . For every unit increase in *x* the mean of the response variable $$y$$ changes by $$β\_1$$ units, increasing if $$β\_1>0$$ and decreasing if $$β\_1 <0$$. We wish to construct confidence intervals for $$β\_1$$ and test hypotheses about it.

## 1. Confidence Intervals for $$β\_1$$

The slope $$\hat{β\_1}$$ of the least squares regression line is a point estimate of $$β\_1$$. A confidence interval for $$β\_1$$ is given by the following formula.

> $$100(1−α)%$$ **Confidence Interval for the Slope** $$β\_1$$ **of the Population Regression Line**
>
> &#x20;                                                   $$\hat{β\_1}±t\_{α∕2}  \frac  {s\_ε} {\sqrt{SS\_{xx}}}$$&#x20;
>
> where $$s\_ε=\sqrt{ \frac {SSE}{n−2}}$$ and the number of degrees of freedom is $$df=(n−2)$$.
>
> The assumptions listed in Section 10.3 "Modelling Linear Relationships with Randomness Present" must hold.

####

> *The statistic* $$s\_ε$$ *is called the* **sample standard deviation of errors***. It estimates the standard deviation* $$σ$$ *of the errors in the population of*  $$y$$*-values for each fixed value of*  $$x$$*(see Figure 10.5 "The Simple Linear Model Concept" in Section 10.3 "Modelling Linear Relationships with Randomness Present").*

**EXAMPLE 6*****.***  Construct the 95% confidence interval for the slope $$β\_1$$ of the population regression line based on the five-point sample data&#x20;

&#x20;                            x   2   2   6   8   10\
&#x20;                            y   0   1   2   3     3

**\[ Solution ]**

The point estimate $$\hat{β\_1}$$ of $$β\_1$$ was computed in Note 10.18 "Example 2" in Section 10.4 "The Least Squares Regression Line" as $$\hat{β\_1}=0.34375$$.&#x20;

In the same example $$SS\_{xx}$$ was found to be $$SS\_{xx} = 51.2$$.&#x20;

The sum of the squared errors $$SSE$$ was computed in Note 10.23 "Example 4" in Section 10.4 "The Least Squares Regression Line" as $$SSE = 0.75$$ .

Thus\
$$s\_ε=\sqrt{ \frac {SSE}{n−2}} = \sqrt{ \frac{0.75}{3} } = 0.50$$

Confidence level 95% means $$α=(1−0.95)=0.05$$ so $$α∕2=0.025$$ . From the row labeled $$df=3$$ in Figure 12.3 "Critical Values of " we obtain $$t\_{0.025}=3.182$$ .&#x20;

Therefore

$$\hat{β\_1}±t\_{α∕2}  \frac  {s\_ε} {\sqrt{SS\_{xx}}} = 0.34375 ± \frac{0.50}{\sqrt{5.12}} = 0.34375±0.2223$$

which gives the interval $$(0.1215,0.5661)$$. We are 95% confident that the slope $$β\_1$$ of the population regression line is between 0.1215 and 0.5661.

**EXAMPLE 7.** Using the sample data in Table 10.3 "Data on Age and Value of Used Automobiles of a Specific Make and Model" construct a 90% confidence interval for the slope $$β\_1$$ of the population regression line relating age and value of the automobiles of Note 10.19 "Example 3" in Section 10.4 "The Least Squares Regression Line". Interpret the result in the context of the problem.

**\[ Solution ]**

The point estimate $$\hat{β\_1}$$ of $$β\_1$$ was computed in Note 10.19 "Example 3", as was $$SS\_{xx}$$. \
Their values are $$\hat{β\_1}=-2.05$$ and $$SS\_{xx}=14$$.&#x20;

The sum of the squared errors $$SSE$$ was computed in Note 10.24 "Example 5" in Section 10.4 "The Least Squares Regression Line" as $$SSE=28.946$$.&#x20;

Thus

$$s\_ε=\sqrt{ \frac {SSE}{n−2}} = \sqrt{ \frac{28.946}{8} } = 1.902169814$$

Confidence level 90% means $$α=(1−0.90)=0.10$$ so $$α∕2=0.05$$ . From the row labeled $$df=8$$ in Figure 12.3 "Critical Values of " we obtain $$t\_{0.05}=1.860$$ .&#x20;

Therefore

$$\hat{β\_1}± t\_{α∕2}\frac  {s\_ε} {\sqrt{SS\_{xx}}} = -2.05 ± 1.860\frac{1.902169814}{\sqrt{14}} = −2.05±0.95$$

which gives the interval $$(−3.00,−1.10)$$. We are 90% confident that the slope $$β\_1$$ of the population regression line is between −3.00 and −1.10. In the context of the problem this means that for vehicles of this make and model between two and six years old we are 90% confident that for each additional year of age the average value of such a vehicle decreases by between $1,100 and $3,000.

## 2. Testing Hypotheses About $$β\_1$$

Hypotheses regarding $$β\_1$$ can be tested using the same five-step procedures, either the critical value approach or the *p*-value approach, that were introduced in Section 8.1 "The Elements of Hypothesis Testing" and Section 8.3 "The Observed Significance of a Test" of Chapter 8 "Testing Hypotheses". The null hypothesis always has the form $$H\_0:β\_1=B\_0$$ where $$B\_0$$ is a number determined from the statement of the problem. The three forms of the alternative hypothesis, with the terminology for each case, are:

| Form of $$H\_a$$      | Terminology  |
| --------------------- | ------------ |
| $$H\_a:β\_1\<B\_0$$   | Left-tailed  |
| $$H\_a:β\_1>B\_0$$    | Right-tailed |
| $$H\_a:β\_1\ne B\_0$$ | Two-tailed   |

The value zero for $$B\_0$$ is of particular importance since in that case the null hypothesis is $$H\_0:β\_1=0$$ , which corresponds to the situation in which $$x$$ is not useful for predicting $$y$$ . For if $$β\_1=0$$ then the population regression line is horizontal, so the mean $$E(y)$$ is the same for every value of $$x$$ and we are just as well off in ignoring $$x$$ completely and approximating $$y$$ by its average value.&#x20;

Given two variables $$x$$ and $$y$$, the burden of proof is that $$x$$ is useful for predicting $$y$$, not that it is not. Thus the phrase “test whether $$x$$ is useful for prediction of $$y$$,” or words to that effect, means to perform the test&#x20;

&#x20;                                             $$H\_0:β\_1=0$$ vs. $$H\_a:β\_1\ne B\_0$$

> **Standardized Test Statistic for Hypothesis Tests** \
> **Concerning the Slope** $$β\_1$$ **of the Population Regression Line**&#x20;
>
> &#x20;                                          $$t\_o=(\hat{β\_1}−B\_0) / \frac {s\_ε}  {\sqrt{SS\_{xx}}}$$ &#x20;
>
> The test statistic has Student’s *t*-distribution with $$df=(n−2)$$ degrees of freedom.&#x20;
>
> The assumptions listed in Section 10.3 "Modelling Linear Relationships with Randomness Present" must hold.

**EXAMPLE 8.** Test, at the 2% level of significance, whether the variable $$x$$ is useful for predicting $$y$$ based on the information in the five-point data set

&#x20;                           x    2    2    6    8    10\
&#x20;                           y    0    1    2    3      3

**\[ Solution ]**

We will perform the test using the critical value approach.

* **Step 1.** Since $$x$$is useful for prediction of $$y$$ precisely when the slope $$β\_1$$ of \
  &#x20;             the population regression line is nonzero, the **relevant test** is  \
  &#x20;                    $$H\_0:β\_1=0$$ vs. $$H\_a:β\_1\ne 0  @ α=0.02$$
* **Step 2.** The **test statistic** is

  &#x20;                              $$t\_o=(\hat{β\_1}−B\_0) / \frac {s\_ε}  {\sqrt{SS\_{xx}}}$$

  &#x20;            and has Student’s *t*-distribution \
  &#x20;            with $$(n−2)=(5−2)=3$$ degrees of freedom.
* **Step 3.** From Note 10.18 "Example 2", $$\hat{β\_1}=0.34375$$ and $$SS\_{xx}=51.2$$.\
  &#x20;            From Note 10.30 "Example 6", $$s\_ε=0.50$$ . \
  &#x20;            The **value of the test statistic** is therefore \
  &#x20;           $$t\_o=(\hat{β\_1}−B\_0) / \frac {s\_ε}  {\sqrt{SS\_{xx}}} =(0.34375 - 0) / \frac{ 0.50}{\sqrt{51.2}} =4.919$$
* **Step 4.** Since the symbol in *Ha* is “≠” this is a two-tailed test, \
  &#x20;            so there are **two critical values** $$±t\_{α∕2}=±t\_{0.01}$$. \
  &#x20;            Reading from the line in Figure 12.3 "Critical Values of " \
  &#x20;                            labeled $$df=3$$ , $$t\_{0.01}=4.541$$ . \
  &#x20;            **The rejection region** is $$(−∞,−4.541]∪\[4.541,∞)$$ .
* **Step 5.** As shown in Figure 10.9 "Rejection Region and Test Statistic for " \
  &#x20;             the **test statistic falls in the rejection region**. \
  &#x20;            The decision is **to rejec**t $$H\_0$$.&#x20;
* In the context of the problem our **conclusion** is:

  *The data provide sufficient evidence, at the 2% level of significance, to conclude that the slope of the population regression line is nonzero, so that* $$x$$ *is useful as a predictor of* $$y$$*.*

Figure 10.9 Rejection Region and Test Statistic for Note 10.33 "Example 8"

![](https://saylordotorg.github.io/text_introductory-statistics/section_14/9f1c174c40e1d4364dca3822c4cc03d5.jpg)

**EXAMPLE 9.** A car salesman claims that automobiles between two and six years old of the make and model discussed in Note 10.19 "Example 3" in Section 10.4 "The Least Squares Regression Line" lose more than $1,100 in value each year. Test this claim at the 5% level of significance.

**\[ Solution ]**

We will perform the test using the critical value approach.

* **Step 1**. In terms of the variables $$x$$ and $$y$$, the salesman’s claim is that if $$x$$ is increased by 1 unit (one additional year in age), then $$y$$ decreases by more than 1.1 units (more than $1,100). Thus his assertion is that the slope of the population regression line is negative, and that it is more negative than −1.1. In symbols, $$β\_1<−1.1$$ . \
  Since it contains an inequality, this has to be the alternative hypotheses. The null hypothesis has to be an equality and have the same number on the right hand side, \
  so the relevant test is\
  &#x20;                   $$H\_0:β\_1=-1.1$$ vs. $$H\_a:β\_1 < -1.1 ,  @ α=0.05$$
* **Step 2.** The **test statistic** is\
  &#x20;                  $$t\_o=(\hat{β\_1}−B\_0) / \frac {s\_ε}  {\sqrt{SS\_{xx}}}$$

  &#x20;            and has Student’s *t*-distribution with 8 degrees of freedom.
* **Step 3.** From Note 10.19 "Example 3", $$\hat{β\_1}=−2.05$$ and $$SS\_{xx}=14$$ . \
  &#x20;            From Note 10.31 "Example 7", $$s\_ε=1.902169814$$ . \
  &#x20;            The **value of the test statistic** is therefore\
  &#x20;            $$t\_o=(\hat{β\_1}−B\_0) / \frac {s\_ε}  {\sqrt{SS\_{xx}}} = {-2.05 - (-1.1)} / \frac{1.902169814}{\sqrt{14}} = -1.869$$
* **Step 4.** Since the symbol in $$H\_a$$ is “ $$<$$ ” this is a **left-tailed test**, \
  &#x20;            so there is a **single critical value** $$−t\_α=−t\_{0.05}$$ . \
  &#x20;            Reading from the line in Figure 12.3 "Critical Values of " labeled $$df=8$$ , $$t\_{0.05}=1.860$$ .  \
  &#x20;            The **rejection region** is $$(−∞,−1.860]$$ .
* **Step 5.** As shown in Figure 10.10 "Rejection Region and Test Statistic for " \
  &#x20;             the **test statistic falls in the rejection region**. The decision is **to reject** $$H\_0$$ .&#x20;
* In the context of the problem our **conclusion** is:

  The data provide sufficient evidence, at the 5% level of significance, to conclude that vehicles of this make and model and in this age range lose more than $1,100 per year in value, on average.

Figure 10.10 Rejection Region and Test Statistic for Note 10.34 "Example 9"

![](https://saylordotorg.github.io/text_introductory-statistics/section_14/da0a5676198f1c1c85bdb57bbd79e2b1.jpg)
