> For the complete documentation index, see [llms.txt](https://kmis.gitbook.io/statistics/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://kmis.gitbook.io/statistics/chapter-7.-estimation/7-1.-large-sample-estimation-of-a-population-mean.md).

# 7-1. Large Sample Estimation of a Population Mean

{% file src="/files/-Lz9xtROcI03n2QGRahA" %}
Chapter 7 (Korean)
{% endfile %}

{% file src="/files/-Lz9xxkWfHUbPkAWT-PT" %}
Chapter 7 (Chinese)
{% endfile %}

{% file src="/files/-LyRxdApIkzqc1CHtZhe" %}
Examples of Chapter 7 (R Source)
{% endfile %}

If we wish to estimate the mean $$μ$$ of a population for which a census is impractical, say the average height of all 18-year-old men in the country, a reasonable strategy is to take a sample, compute its mean $$\bar{x}$$ , and estimate the unknown number $$μ$$ by the known number $$\bar{x}$$ . For example, if the average height of 100 randomly selected men aged 18 is 70.6 inches, then we would say that the average height of all 18-year-old men is (at least approximately) 70.6 inches.

Estimating a population parameter by a single number like this is called **point estimation**; in the case at hand the statistic $$\bar{x}$$ is a **point estimate** of the parameter $$μ$$ . The terminology arises because a single number corresponds to a single point on the number line.

**A problem with a point estimate** is that it gives no indication of how reliable the estimate is. In contrast, in this chapter we learn about **interval estimation**. In brief, in the case of estimating a population mean $$μ$$ we use a formula to compute from the data a number $$E$$ , called the **margin of error of the estimate**, and form the interval $$\[\bar{x}−E, \bar{x}+E]$$ . We do this in such a way that a certain proportion, say 95%, of all the intervals constructed from sample data by means of this formula contain the unknown parameter $$μ$$ . Such an interval is called a **95%** confidence interval **for** $$μ$$ .

Continuing with the example of the average height of 18-year-old men, suppose that the sample of 100 men mentioned above for which $$\bar{x}=70.6$$ inches also had sample standard deviation $$s = 1.7$$ inches. It then turns out that $$E = 0.33$$ and we would state that we are 95% confident that the average height of all 18-year-old men is in the interval formed by $$70.6±0.33$$ inches, that is, the average is between 70.27 and 70.93 inches. If the sample statistics had come from a smaller sample, say a sample of 50 men, the lower reliability would show up in the 95% confidence interval being longer, hence less precise in its estimate. In this example the 95% confidence interval for the same sample statistics but with $$n = 50$$ is $$70.6±0.47$$ inches, or from 70.13 to 71.07 inches.

## 1. Large Sample Estimation of a Population Mean

#### LEARNING OBJECTIVES

1. To become familiar with the concept of an interval estimate of the population mean.
2. To understand how to apply formulas for a confidence interval for a population mean.

The Central Limit Theorem says that, for large samples (samples of size $$n ≥ 30$$ ), when viewed as a random variable the sample mean $$\bar{X}$$ is normally distributed with mean $$μ\_\bar{X}=μ$$ and standard deviation $$σ\_\bar{X}=σ/n$$ . The Empirical Rule says that we must go about two standard deviations from the mean to capture 95% of the values of $$\bar{X}$$ generated by sample after sample. A more precise distance based on the normality of $$\bar{X}$$ is 1.960 standard deviations, which is $$E=1.960σ/\sqrt{n}$$ .

The key idea in the construction of the **95% confidence interval** is this, as illustrated in Figure "When Winged Dots Capture the Population Mean": because in sample after sample 95% of the values of $$\bar{X}$$ lie in the interval $$\[μ−E,μ+E]$$ , if we adjoin to each side of the point estimate $$\bar{X}$$ a “wing” of length $$E$$ , 95% of the intervals formed by the winged dots contain $$μ$$ . The 95% confidence interval is thus $$\bar{x}±1.960σ/\sqrt{n}$$ . For a different level of confidence, say 90% or 99%, the number 1.960 will change, but the idea is the same.

![When Winged Dots Capture the Population Mean](https://saylordotorg.github.io/text_introductory-statistics/section_11/28bf0a4db4415dec382b8a1a07e79927.jpg)

Figure  "Computer Simulation of 40 95% Confidence Intervals for a Mean" shows the intervals generated by a computer simulation of drawing 40 samples from a normally distributed population and constructing the 95% confidence interval for each one. We expect that about $$(0.05)(40)=2$$ of the intervals so constructed would fail to contain the population mean *μ*, and in this simulation two of the intervals, shown in red, do.

![Computer Simulation of 40 95% Confidence Intervals for a Mean](https://saylordotorg.github.io/text_introductory-statistics/section_11/7827472ac37fbd53928abecec353fcc4.jpg)

{% tabs %}
{% tab title="R Source" %}

```
set.seed(9)

n <- 10
x <- 1:40
y <- seq(-3, 3, by=0.01)
alpha <- 0.05

smps <- matrix(rnorm(n * length(x)), ncol=n)

xbar <- apply(smps, 1, mean)
se <- 1 / sqrt(10)

z <- qnorm(1-alpha/2)
ll <- xbar - z * se
ul <- xbar + z * se

plot(y, type="n", xlab = "trial", ylab = "z",
     main = "95% Confidence Interval for Population Mean",
     xlim = c(1,40), ylim = c(-1.5, 1.5), cex.lab=1.8)
abline(h=0, col="red", lty=2)
l.c <- rep(NA, length(x))
l.c <- ifelse(ll * ul > 0, "red", "black")
arrows(1:length(x), ll, 1:length(x), ul, code=3, 
       angle=90, length=0.02, col=l.c, lwd=1.5)
```

{% endtab %}

{% tab title="95% Confidence Interval for Population Mean" %}
![](/files/-LxdK1_iZNopfYX_bzii)
{% endtab %}
{% endtabs %}

It is standard practice to identify the level of confidence in terms of the area $$α$$ in the two tails of the distribution of  $$\bar{X}$$ when the middle part specified by the level of confidence is taken out. This is shown in Figure 7.3, drawn for the general situation, and in Figure 7.4, drawn for 95% confidence. Remember from Section 5.4.1 "Tails of the Standard Normal Distribution" in Chapter 5 "Continuous Random Variables" that the *z*-value that cuts off a right tail of area $$c$$ is denoted $$z\_c$$ . Thus the number 1.960 in the example is $$z\_{0.25}$$ , which is $$z\_{α∕2}$$ for $$α=1−0.95=0.05$$ .

Figure 7.3

![](https://saylordotorg.github.io/text_introductory-statistics/section_11/e3f246f42e1caa418685cbcb55e93279.jpg)

For $$100(1−α) %$$ confidence the area in each tail is $$α∕2$$ .

Figure 7.4

![](https://saylordotorg.github.io/text_introductory-statistics/section_11/9befc784764fb56426a1ba328be4bc98.jpg)

For 95% confidence the area in each tail is $$α∕2=0.025$$ .

The level of confidence can be any number between 0 and 100%, but the most common values are probably $$90% (α=0.10), 95% (α=0.05),$$ and $$99% (α=0.01)$$ .

Thus in general for a $$100(1−α)%$$ confidence interval,  $$E=z\_{α∕2}(σ/\sqrt{n})$$ , so the formula for the confidence interval is $$\bar{x}±z\_{α∕2}(σ/\sqrt{n})$$ . While sometimes the population standard deviation $$σ$$  is known, typically it is not. If not, for $$n ≥ 30$$ it is generally safe to approximate $$σ$$ by the sample standard deviation $$s$$ .

> **Large Sample** $$100(1−α)%$$ **Confidence Interval for a Population Mean**&#x20;
>
> If $$σ$$ is known:  $$\bar{x}±z\_{α∕2}(σ/\sqrt{n})$$
>
> If $$σ$$ is unknown:  $$\bar{x}±z\_{α∕2}(s/\sqrt{n})$$
>
> A sample is considered large when $$n ≥ 30$$ .

As mentioned earlier, the number $$E=z\_{α∕2}(σ/\sqrt{n})$$ or $$E=z\_{α∕2}(s/\sqrt{n})$$ is called the ***margin of error*** of the estimate.

**EXAMPLE 1.** Find the number $$z\_{α∕2}$$ needed in construction of a confidence interval:

1. when the level of confidence is 90%;
2. when the level of confidence is 99%.

**\[ Solution ]**

1. For confidence level 90%, $$α=1−0.90=0.10$$ , so $$z\_{α∕2}=0.05.$$ The closest entries in the table of Figure 12.2 are 0.9495 and 0.9505, corresponding to *z*-values 1.64 and 1.65. Since 0.95 is exactly halfway between 0.9495 and 0.9505 we use the average **1.645** of the *z*-values for $$z\_{0.05}$$ .
2. For confidence level 99%, $$α=1−0.99=0.01$$ , so $$z\_{α∕2}=0.005.$$ The closest entries in the table are 0.9949 and 0.9951, corresponding to *z*-values 2.57 and 2.58. Since 0.995 is halfway between 0.9949 and 0.9951 we use the average **2.575** of the *z*-values for $$z\_{0.005}$$.&#x20;

{% tabs %}
{% tab title="R Source" %}

```
library(Rstat)

# 1.
pv <- c(0.05, 0.95) 
snorm.quant(pv, pv)      # percentile of Normal Distribution

# 2.
pv <- c(0.005, 0.995 ) 
snorm.quant(pv, pv)
```

{% endtab %}

{% tab title="1." %}

```
# 1.
> pv <- c(0.05, 0.95) 
> snorm.quant(pv, pv)      # percentile of Normal Distribution
##    0.05    0.95 
## -1.6449  1.6449 
```

{% endtab %}

{% tab title="1-Plot" %}
![](/files/-Lxd01lV0lIBXYQNlwsb)
{% endtab %}

{% tab title="2." %}

```
> # 2.
> pv <- c(0.005, 0.995 ) 
> snorm.quant(pv, pv)      
##   0.005   0.995 
## -2.5758  2.5758
```

{% endtab %}

{% tab title="2-Plot" %}
![](/files/-Lxd0DSvi_GWjTcIKIi7)
{% endtab %}
{% endtabs %}

**EXAMPLE 2.** Use Figure 12.3 "Critical Values of " to find the number $$z\_{α∕2}$$ needed in construction of a confidence interval:

1. when the level of confidence is 90%;
2. when the level of confidence is 99%.

**\[ Solution ]**

1. In the next section we will learn about a continuous random variable that has a probability distribution called the Student *t*-distribution. Figure 12.3 "Critical Values of " gives the value $$t\_c$$ that cuts off a right tail of area $$c$$ for different values of $$c$$ . The last line of that table, the one whose heading is the symbol $$∞$$ for infinity and $$\[z]$$ , gives the corresponding *z*-value $$z\_c$$ that cuts off a right tail of the same area $$c$$ . In particular, $$z\_{0.05}$$ is the number in that row and in the column with the heading $$t\_{0.05}$$ . We read off directly that $$z\_{0.05}=1.645$$ .
2. In Figure 12.3 "Critical Values of " $$z\_{0.005}$$ is the number in the last row and in the column headed $$t\_{0.005}$$ , namely **2.576**.

**EXAMPLE 3.** A sample of size 49 has sample mean 35 and sample standard deviation 14. Construct a 98% confidence interval for the population mean using this information. Interpret its meaning.

**\[ Solution ]**

* For confidence level 98%, $$α=1−0.98=0.02$$ , so $$z\_{α∕2}=z\_{0.01}$$ . From Figure 12.3 "Critical Values of " we read directly that $$z\_{0.01}=2.326$$ . Thus\
  &#x20;$$\bar{x} ±z\_{α∕2} \frac {s}{\sqrt{n}}=35±2.326(\frac{14}{\sqrt{49}})=35±4.652≈35±4.7$$ .
* We are 98% confident that the population mean $$μ$$ lies in the interval $$\[30.3,39.7]$$, in the sense that in repeated sampling 98% of all intervals constructed from the sample data in this manner will contain $$μ$$ .

{% tabs %}
{% tab title="R Source" %}

```
n <- 49
mu <- 35
sd <- 14
alpha <- 0.02

se <- sd / sqrt(n)
z <- qnorm(1-alpha/2)

ll <- mu - z * se
ul <- mu + z * se

ll   # lower limit
ul   # upper limit
```

{% endtab %}

{% tab title="Confidence Interval" %}

```
> ll   # lower limit
## [1] 30.3473
> ul   # upper limit
## [1] 39.6527
```

{% endtab %}
{% endtabs %}

* Using **Rstat** package : `pmean.ci()`

{% tabs %}
{% tab title="R Source" %}

```
library(Rstat)

pmean.ci(xb=35, sig=14, n=49, alp=0.02, dig=3)
```

{% endtab %}

{% tab title="Second Tab" %}

```
> pmean.ci(xb=35, sig=14, n=49, alp=0.02, dig=3)
## [35 ± 2.326×14/√49] = [35 ± 4.653] = [30.347, 39.653]
```

{% endtab %}
{% endtabs %}

**EXAMPLE 4.** A random sample of 120 students from a large university yields mean GPA 2.71 with sample standard deviation 0.51. Construct a 90% confidence interval for the mean GPA of all students at the university.

**\[ Solution ]**

{% tabs %}
{% tab title="R Source" %}

```
n <- 120
mu <- 2.71
sd <- 0.51
alpha <- 0.1

se <- sd / sqrt(n)
z <- qnorm(1-alpha/2)

ll <- mu - z * se
ul <- mu + z * se
ll   # lower limit
ul   # upper limit
```

{% endtab %}

{% tab title="Confidence Interval" %}

```
> ll   # lower limit
## [1] 2.633422
> ul   # upper limit
## [1] 2.786578
```

{% endtab %}
{% endtabs %}

* Using **Rstat** package : `pmean.ci()`

{% tabs %}
{% tab title="R Source" %}

```
library(Rstat)

pmean.ci(xb=2.71, sig=0.51, n=120, alp=0.1, dig=3)
```

{% endtab %}

{% tab title="Second Tab" %}

```
> pmean.ci(xb=2.71, sig=0.51, n=120, alp=0.1, dig=3)
## [2.71 ± 1.645×0.51/√120] = [2.71 ± 0.077] = [2.633, 2.787]
```

{% endtab %}
{% endtabs %}

**点估计** Point Estimation\
&#x20;**区间估计** Interval estimation\
**置信区间** Confidence interval(CI) \
**置信水平** Confidence Level ( $$α$$ ) \
**误差余地** Margin of Error \
**标准正态分布** Standard Normal Distribution \
**臨界值** Critical Value \
**学生(?) t-分布** Student t-distribution  : Student(姓氏)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://kmis.gitbook.io/statistics/chapter-7.-estimation/7-1.-large-sample-estimation-of-a-population-mean.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
