7-4. Sample Size Considerations

Sampling is typically done with a set of clear objectives in mind. For example, an economist might wish to estimate the mean yearly income of workers in a particular industry at 90% confidence and to within $500. Since sampling costs time, effort, and money, it would be useful to be able to estimate the smallest size sample that is likely to meet these criteria.

Estimating $μ$

The confidence interval formulas for estimating a population mean μ have the form $\bar{x}±E$ . When the population standard deviation σ is known,

$E=z_{α∕2} \frac{σ}{\sqrt{n}}$

The number $z_{α∕2}$ is determined by the desired level of confidence. To say that we wish to estimate the mean to within a certain number of units means that we want the margin of error $E$ to be no larger than that number. Thus we obtain the minimum sample size needed by solving the displayed equation for $n$ .

Minimum Sample Size for Estimating a Population Mean The estimated minimum sample size $n$ needed to estimate a population mean $μ$ to within $E$ units at $100(1−α)\%$ confidence is $n= \frac{(z_{α∕2})^2σ^2} {E^2}$ (rounded $up$ )

To apply the formula we must have prior knowledge of the population in order to have an estimate of its standard deviation $σ$ . In all the examples and exercises the population standard deviation will be given.

EXAMPLE 8. Find the minimum sample size necessary to construct a 99% confidence interval for $μ$ with a margin of error $E = 0.2$ . Assume that the population standard deviation is $σ = 1.3$ .

[ Solution ]

Confidence level 99% means that $α=1−0.99=0.01,$ so $α∕2=0.005.$ From the last line of Figure 12.3 "Critical Values of " we obtain $z0.005=2.576.$ Thus $n= \frac{(z_{α∕2})^2σ^2} {E^2} = (2.576^2) (1.3^2) / (0.2^2) = 280.361536$

which we round up to 281, since it is impossible to take a fractional observation.

spn <- function(del, sig, alp) 
       ceiling((qnorm(1-alp/2)*sig/del)^2)

spn(del=0.2, sig=1.3, alp=0.01)

> spn(del=0.2, sig=1.3, alp=0.01)
## [1] 281

EXAMPLE 9. An economist wishes to estimate, with a 95% confidence interval, the yearly income of welders with at least five years experience to within $1,000. He estimates that the range of incomes is no more than $24,000, so using the Empirical Rule he estimates the population standard deviation to be about one-sixth as much, or about $4,000. Find the estimated minimum sample size required.

[ Solution ]

Confidence level 95% means that $α=1−0.95=0.05,$ so $α∕2=0.025.$ From the last line of Figure 12.3 "Critical Values of " we obtain $z_{0.025}=1.960.$

To say that the estimate is to be “to within $1,000” means that $E = 1,000$ . Thus $n= \frac{(z_{α∕2})^2σ^2} {E^2} = (1.960)^2(4,000)^2/(1,000)^2=61.4656$ which we round up to 62.

spn <- function(del, sig, alp) 
       ceiling((qnorm(1-alp/2)*sig/del)^2)

spn(del=1000, sig=4000, alp=0.05)

> spn(del=1000, sig=4000, alp=0.05)
## [1] 62

Estimating $p$

The confidence interval formula for estimating a population proportion $p$ is $\hat{p}±E$ ,

where $E= z_{α∕2} \sqrt{ \frac{\hat{p}(1−\hat{p})}{n} }$ .

The number $z_{α∕2}$ is determined by the desired level of confidence. To say that we wish to estimate the population proportion to within a certain number of percentage points means that we want the margin of error $E$ to be no larger than that number (expressed as a proportion). Thus we obtain the minimum sample size needed by solving the displayed equation for $n$ .

Minimum Sample Size for Estimating a Population Proportion The estimated minimum sample size $n$ needed to estimate a population proportion $p$ to within $E$ at $100(1−α)\%$ confidence is
$n= \frac {(z_{α∕2})^2pˆ(1−pˆ) }{E^2}$ (rounded $up$ ).

There is a dilemma here: the formula for estimating how large a sample to take contains the number $\hat{p}$ , which we know only after we have taken the sample. There are two ways out of this dilemma. Typically the researcher will have some idea as to the value of the population proportion p, hence of what the sample proportion $\hat{p}$ is likely to be. For example, if last month 37% of all voters thought that state taxes are too high, then it is likely that the proportion with that opinion this month will not be dramatically different, and we would use the value 0.37 for $\hat{p}$ in the formula.

The second approach to resolving the dilemma is simply to replace $\hat{p}$ in the formula by 0.5. This is because if $\hat{p}$ is large then $(1-\hat{p})$ is small, and vice versa, which limits their product to a maximum value of 0.25, which occurs when $\hat{p}=0.5$ . This is called the most conservative estimate, since it gives the largest possible estimate of $n$ .

EXAMPLE 10. Find the necessary minimum sample size to construct a 98% confidence interval for $p$ with a margin of error $E = 0.05$ ,

assuming that no prior knowledge about $p$ is available; and
assuming that prior studies suggest that $p$ is about 0.1.

[ Solution ]

Confidence level 98% means that $α=1−0.98=0.02$ so $α∕2=0.01.$ From the last line of Figure 12.3 "Critical Values of " we obtain $z_{0.01}=2.326.$

Since there is no prior knowledge of p we make the most conservative estimate that $\hat p =0.5.$ Then $n= \frac {(z_{α∕2})^2pˆ(1−pˆ) }{E^2} = (2.326)^2(0.5)(1−0.5)/(0.05)^2=541.0276$ which we round up to 542.
Since $p ≈ 0.1$ we estimate $\hat p$ by 0.1, and obtain $n= \frac {(z_{α∕2})^2pˆ(1−pˆ) }{E^2} = (2.326)^2(0.1)(1−0.1)/(0.05)^2=194.769936$ which we round up to 195.

nsample <- function(err, alp, ph) { 
             n <- qnorm(1-alp/2)^2 * ph * (1-ph) / err ^2
             cat("n >=", n, " ==> n >=", ceiling(n), "\n")}

# 1.
nsample(err=0.05, alp=0.02, ph=0.5)

# 2.
nsample(err=0.05, alp=0.02, ph=0.1)

> # 1.
> nsample(err=0.05, alp=0.02, ph=0.5)
## n >= 541.1894  ==> n >= 542 
> 
> # 2.
> nsample(err=0.05, alp=0.02, ph=0.1)
## n >= 194.8282  ==> n >= 195

EXAMPLE 11. A dermatologist wishes to estimate the proportion of young adults who apply sunscreen regularly before going out in the sun in the summer. Find the minimum sample size required to estimate the proportion to within three percentage points, at 90% confidence.

[ Solution ]

Confidence level 90% means that $α=1−0.90=0.10$ so $α∕2=0.05.$ From the last line of Figure 12.3 "Critical Values of " we obtain $z_{0.05}=1.645.$

Since there is no prior knowledge of $p$ we make the most conservative estimate that $\hat p=0.5.$ To estimate “to within three percentage points” means that $E = 0.03.$ Then

$n= \frac {(z_{α∕2})^2pˆ(1−pˆ) }{E^2} = (1.645)^2(0.5)(1−0.5)/(0.03)^2=751.6736111$ which we round up to 752.

nsample <- function(err, alp, ph) { 
             n <- qnorm(1-alp/2)^2 * ph * (1-ph) / err ^2
             cat("n >=", n, " ==> n >=", ceiling(n), "\n")}

nsample(err=0.03, alp=0.10, ph=0.5)

> nsample(err=0.03, alp=0.10, ph=0.5)
## n >= 751.5398  ==> n >= 752

Previous7-3. Exercises Next7-4. Exercises

Last updated 5 years ago

Was this helpful?

Estimating μμμ

Estimating ppp

Estimating $μ$

Estimating $p$