7-4. Sample Size Considerations

Sampling is typically done with a set of clear objectives in mind. For example, an economist might wish to estimate the mean yearly income of workers in a particular industry at 90% confidence and to within $500. Since sampling costs time, effort, and money, it would be useful to be able to estimate the smallest size sample that is likely to meet these criteria.

Estimating μμ

The confidence interval formulas for estimating a population mean μ have the form xˉ±E\bar{x}±E . When the population standard deviation σ is known,

E=zα2σnE=z_{α∕2} \frac{σ}{\sqrt{n}}

The number zα2z_{α∕2} is determined by the desired level of confidence. To say that we wish to estimate the mean to within a certain number of units means that we want the margin of error EE to be no larger than that number. Thus we obtain the minimum sample size needed by solving the displayed equation for nn .

Minimum Sample Size for Estimating a Population Mean The estimated minimum sample size nn needed to estimate a population mean μμ to within EE units at 100(1α)%100(1−α)\% confidence is n=(zα2)2σ2E2n= \frac{(z_{α∕2})^2σ^2} {E^2}  (rounded  upup )

To apply the formula we must have prior knowledge of the population in order to have an estimate of its standard deviation σσ . In all the examples and exercises the population standard deviation will be given.

EXAMPLE 8. Find the minimum sample size necessary to construct a 99% confidence interval for μμ with a margin of error E=0.2E = 0.2 . Assume that the population standard deviation is σ=1.3σ = 1.3 .

[ Solution ]

Confidence level 99% means that α=10.99=0.01,α=1−0.99=0.01, so α2=0.005.α∕2=0.005. From the last line of Figure 12.3 "Critical Values of " we obtain z0.005=2.576.z0.005=2.576. Thus n=(zα2)2σ2E2=(2.5762)(1.32)/(0.22)=280.361536n= \frac{(z_{α∕2})^2σ^2} {E^2} = (2.576^2) (1.3^2) / (0.2^2) = 280.361536

which we round up to 281, since it is impossible to take a fractional observation.

spn <- function(del, sig, alp) 
       ceiling((qnorm(1-alp/2)*sig/del)^2)

spn(del=0.2, sig=1.3, alp=0.01)

EXAMPLE 9. An economist wishes to estimate, with a 95% confidence interval, the yearly income of welders with at least five years experience to within $1,000. He estimates that the range of incomes is no more than $24,000, so using the Empirical Rule he estimates the population standard deviation to be about one-sixth as much, or about $4,000. Find the estimated minimum sample size required.

[ Solution ]

Confidence level 95% means that α=10.95=0.05,α=1−0.95=0.05, so α2=0.025.α∕2=0.025. From the last line of Figure 12.3 "Critical Values of " we obtain z0.025=1.960.z_{0.025}=1.960.

To say that the estimate is to be “to within $1,000” means that E=1,000E = 1,000 . Thus n=(zα2)2σ2E2=(1.960)2(4,000)2/(1,000)2=61.4656n= \frac{(z_{α∕2})^2σ^2} {E^2} = (1.960)^2(4,000)^2/(1,000)^2=61.4656 which we round up to 62.

spn <- function(del, sig, alp) 
       ceiling((qnorm(1-alp/2)*sig/del)^2)

spn(del=1000, sig=4000, alp=0.05)

Estimating pp

The confidence interval formula for estimating a population proportion pp is p^±E\hat{p}±E ,

where E=zα2p^(1p^)nE= z_{α∕2} \sqrt{ \frac{\hat{p}(1−\hat{p})}{n} } .

The number zα2z_{α∕2} is determined by the desired level of confidence. To say that we wish to estimate the population proportion to within a certain number of percentage points means that we want the margin of error EE to be no larger than that number (expressed as a proportion). Thus we obtain the minimum sample size needed by solving the displayed equation for nn .

Minimum Sample Size for Estimating a Population Proportion The estimated minimum sample size nn needed to estimate a population proportion pp to within EE at 100(1α)%100(1−α)\% confidence is

n=(zα2)2pˆ(1pˆ)E2n= \frac {(z_{α∕2})^2pˆ(1−pˆ) }{E^2}  (rounded  upup ).

There is a dilemma here: the formula for estimating how large a sample to take contains the number p^\hat{p} , which we know only after we have taken the sample. There are two ways out of this dilemma. Typically the researcher will have some idea as to the value of the population proportion p, hence of what the sample proportion p^\hat{p} is likely to be. For example, if last month 37% of all voters thought that state taxes are too high, then it is likely that the proportion with that opinion this month will not be dramatically different, and we would use the value 0.37 for p^\hat{p} in the formula.

The second approach to resolving the dilemma is simply to replace p^\hat{p} in the formula by 0.5. This is because if p^\hat{p} is large then (1p^)(1-\hat{p}) is small, and vice versa, which limits their product to a maximum value of 0.25, which occurs when p^=0.5\hat{p}=0.5 . This is called the most conservative estimate, since it gives the largest possible estimate of nn .

EXAMPLE 10. Find the necessary minimum sample size to construct a 98% confidence interval for pp with a margin of error E=0.05E = 0.05 ,

  1. assuming that no prior knowledge about pp is available; and

  2. assuming that prior studies suggest that pp is about 0.1.

[ Solution ]

Confidence level 98% means that α=10.98=0.02α=1−0.98=0.02 so α2=0.01.α∕2=0.01. From the last line of Figure 12.3 "Critical Values of " we obtain z0.01=2.326.z_{0.01}=2.326.

  1. Since there is no prior knowledge of p we make the most conservative estimate that p^=0.5.\hat p =0.5. Then n=(zα2)2pˆ(1pˆ)E2=(2.326)2(0.5)(10.5)/(0.05)2=541.0276n= \frac {(z_{α∕2})^2pˆ(1−pˆ) }{E^2} = (2.326)^2(0.5)(1−0.5)/(0.05)^2=541.0276 which we round up to 542.

  2. Since p0.1p ≈ 0.1 we estimate p^\hat p by 0.1, and obtain n=(zα2)2pˆ(1pˆ)E2=(2.326)2(0.1)(10.1)/(0.05)2=194.769936n= \frac {(z_{α∕2})^2pˆ(1−pˆ) }{E^2} = (2.326)^2(0.1)(1−0.1)/(0.05)^2=194.769936which we round up to 195.

nsample <- function(err, alp, ph) { 
             n <- qnorm(1-alp/2)^2 * ph * (1-ph) / err ^2
             cat("n >=", n, " ==> n >=", ceiling(n), "\n")}

# 1.
nsample(err=0.05, alp=0.02, ph=0.5)

# 2.
nsample(err=0.05, alp=0.02, ph=0.1)

EXAMPLE 11. A dermatologist wishes to estimate the proportion of young adults who apply sunscreen regularly before going out in the sun in the summer. Find the minimum sample size required to estimate the proportion to within three percentage points, at 90% confidence.

[ Solution ]

Confidence level 90% means that α=10.90=0.10α=1−0.90=0.10 so α2=0.05.α∕2=0.05. From the last line of Figure 12.3 "Critical Values of " we obtain z0.05=1.645.z_{0.05}=1.645.

Since there is no prior knowledge of pp we make the most conservative estimate that p^=0.5.\hat p=0.5. To estimate “to within three percentage points” means that E=0.03.E = 0.03. Then

n=(zα2)2pˆ(1pˆ)E2=(1.645)2(0.5)(10.5)/(0.03)2=751.6736111n= \frac {(z_{α∕2})^2pˆ(1−pˆ) }{E^2} = (1.645)^2(0.5)(1−0.5)/(0.03)^2=751.6736111 which we round up to 752.

nsample <- function(err, alp, ph) { 
             n <- qnorm(1-alp/2)^2 * ph * (1-ph) / err ^2
             cat("n >=", n, " ==> n >=", ceiling(n), "\n")}

nsample(err=0.03, alp=0.10, ph=0.5)

Last updated