9-5. Sample Size Considerations

As was pointed out at the beginning of Section 7.4 "Sample Size Considerations" in Chapter 7 "Estimation", sampling is typically done with definite objectives in mind. For example, a physician might wish to estimate the difference in the average amount of sleep gotten by patients suffering a certain condition with the average amount of sleep got by healthy adults, at 90% confidence and to within half an hour.

Since sampling costs time, effort, and money, it would be useful to be able to estimate the smallest size samples that are likely to meet these criteria.

1. Estimating μ1μ2μ_1−μ_2 with Independent Samples

Assuming that large samples will be required, the confidence interval formula for estimating the difference μ1μ2 μ_1−μ_2 between two population means using independent samples is

(x1ˉx2ˉ)±E,(\bar{x_1}− \bar{x_2})±E,

where E=zα2s12n1+s22n2E=z_{α∕2} \sqrt{ \frac{s_1^2} {n_1} + \frac{s_2^2} {n_2}}

To say that we wish to estimate the mean to within a certain number of units means that we want the margin of error EE to be no larger than that number. The number zα2z_{α∕2} is determined by the desired level of confidence.

The numbers s1s_1 and s2s_2 are estimates of the standard deviations σ1σ_1 and σ2σ_2 of the two populations. In analogy with what we did in Section 7.4 "Sample Size Considerations" in Chapter 7 "Estimation" we will assume that we either know or can reasonably approximate σ1σ_1 and σ2σ_2.

We cannot solve for both n1n_1 and n2n_2 , so we have to make an assumption about their relative sizes. We will specify that they be equal. With these assumptions we obtain the minimum sample sizes needed by solving the equation displayed just above for n1=n2n_1=n_2 .

Minimum Equal Sample Sizes for Estimating the Difference in the Means of Two Populations Using Independent Samples

The estimated minimum equal sample sizes n1=n2n_1=n_2 needed to estimate the difference (μ1μ2)(μ_1−μ_2) in two population means to within EE units at 100(1α)%100(1−α)\% confidence is

n1=n2=(zα2)2(σ12+σ22)E2n_1=n_2= \frac{(z_{α∕2})^2(σ^2_1+σ^2_2)}{E^2} (rounded up)

In all the examples and exercises the population standard deviations σ1σ1 and σ2σ2 will be given.

EXAMPLE 13. A law firm wishes to estimate the difference in the mean delivery time of documents sent between two of its offices by two different courier companies, to within half an hour and with 99.5% confidence. From their records it will randomly sample the same number n of documents as delivered by each courier company. Determine how large n must be if the estimated standard deviations of the delivery times are 0.75 hour for one company and 1.15 hours for the other.

[ Solution ]

Confidence level 99.5% means that α=(10.995)=0.005α=(1−0.995)=0.005 so α2=0.0025.α∕2=0.0025. From the last line of Figure 12.3 "Critical Values of " we obtain z0.0025=2.807.z_{0.0025}=2.807.

To say that the estimate is to be “to within half an hour” means that E=0.5E = 0.5 . Thus

n1=n2=(zα2)2(σ12+σ22)E2=(2.807)2(0.752+1.152)0.52=59.40953746n_1=n_2= \frac{(z_{α∕2})^2(σ^2_1+σ^2_2)}{E^2} =\frac {(2.807)^2(0.75^2+1.15^2)} {0.5^2}=59.40953746

which we round up to 60, since it is impossible to take a fractional observation. The law firm must sample 60 document deliveries by each company.

2. Estimating μ1μ2μ_1−μ_2 with Paired Samples

As we mentioned at the end of Section 9.3 "Comparison of Two Population Means: Paired Samples", if the sample is large (meaning that n30n ≥ 30 ) then in the formula for the confidence interval we may replace tα2t_{α∕2} by zα2z_{α∕2} , so that the confidence interval formula becomes dˉ±E\bar{d}±E for

E=zα2sdnE=z_{α∕2} \frac {s_d} {n}

The number sds_d is an estimate of the standard deviations σdσ_d of the population of differences. We must assume that we either know or can reasonably approximate σdσ_d. Thus, assuming that large samples will be required to meet the criteria given, we can solve the displayed equation for nn to obtain an estimate of the number of pairs needed in the sample.

Minimum Sample Size for Estimating the Difference in the Means of Two Populations Using Paired Difference Samples

The estimated minimum number of pairs nn needed to estimate the difference μd=μ1μ2μ_d=μ_1−μ_2 in two population means to within EE units at 100(1α)%100(1−α)\% confidence using paired difference samples is

n=(zα2)2σd2E2n=\frac{(z_{α∕2})^2 σ_d^2}{E^2} (rounded up)

In all the examples and exercises the population standard deviation of the differences σdσd will be given.

EXAMPLE 14. A automotive tire manufacturer wishes to compare the mean lifetime of two tread designs under actual driving conditions. They will mount one of each type of tire on n vehicles (both on the front or both on the back) and measure the difference in remaining tread after 20,000 miles of driving. If the standard deviation of the differences is assumed to be 0.025 inch, find the minimum samples size needed to estimate the difference in mean depth (at 20,000 miles use) to within 0.01 inch at 99.9% confidence.

[ Solution ]

Confidence level 99.9% means that α=(10.999)=0.001α=(1−0.999)=0.001 so α2=0.0005α∕2=0.0005 . From the last line of Figure 12.3 "Critical Values of " we obtain z0.0005=3.291z_{0.0005}=3.291 .

To say that the estimate is to be “to within 0.01 inch” means that E=0.01E = 0.01 . Thus

n=(zα2)2σd2E2=(3.291)2(0.025)2(0.01)2=67.69175625n=\frac{(z_{α∕2})^2 σ_d^2}{E^2} = \frac {(3.291)^2(0.025)^2}{(0.01)^2}=67.69175625

which we round up to 68. The manufacturer must test 68 pairs of tires.

3. Estimating p1p2p_1−p_2

The confidence interval formula for estimating the difference p1−p2p1−p2 between two population proportions is

(p1^p2^)±E(\hat{p_1}−\hat{p_2})±E ,

where E=zα2p1^(1p1^)n1+p2^(1p2^)n2E=z_{α∕2} \sqrt {\frac{\hat{p_1}(1−\hat{p_1})}{n_1}+ \frac{\hat{p_2}(1−\hat{p_2})}{n_2} }

To say that we wish to estimate the mean to within a certain number of units means that we want the margin of error EE to be no larger than that number. The number zα2z_{α∕2} is determined by the desired level of confidence.

We cannot solve for both n1n_1 and n2n_2 , so we have to make an assumption about their relative sizes. We will specify that they be equal. With these assumptions we obtain the minimum sample sizes needed by solving the displayed equation for n1=n2n_1=n_2 .

Minimum Equal Sample Sizes for Estimating the Difference in Two Population Proportions

The estimated minimum equal sample sizes n1=n2n1=n2 needed to estimate the difference (p1p2)(p_1−p_2) in two population proportions to within EE percentage points at 100(1α)%100(1−α)\% confidence is

n1=n2=(zα2)2{p1^(1p1^)+p2^(1p2^)}E2n_1=n_2= \frac{ (z_{α∕2})^2 \{\hat{p_1}(1−\hat{p_1})+\hat{p_2}(1−\hat{p_2})\}}{E^2} (rounded up)

Here we face the same dilemma that we encountered in the case of a single population proportion: the formula for estimating how large a sample to take contains the numbers p1^\hat{p_1} and p2^\hat{p_2} , which we know only after we have taken the sample. There are two ways out of this dilemma. Typically the researcher will have some idea as to the values of the population proportions p1p_1 and p2p_2 , hence of what the sample proportions p1^\hat{p_1} and p2^\hat{p_2} are likely to be. If so, those estimates can be used in the formula.

The second approach to resolving the dilemma is simply to replace each of p1^\hat{p_1} and p2^\hat{p_2} in the formula by 0.5. As in the one-population case, this is the most conservative estimate, since it gives the largest possible estimate of nn . If we have an estimate of only one of p1p_1 and p2p_2 we can use that estimate for it, and use the conservative estimate 0.5 for the other.

EXAMPLE 15. Find the minimum equal sample sizes necessary to construct a 98% confidence interval for the difference (p1p2)(p_1−p_2) with a margin of error E=0.05E = 0.05 ,

  1. assuming that no prior knowledge about p1p_1 or p2p_2 is available; and

  2. assuming that prior studies suggest that p10.2p_1≈0.2 and p20.3.p_2≈0.3.

[ Solution ]

  1. Since there is no prior knowledge of p1p_1 or p2p_2 we make the most conservative estimate that p1^=0.5\hat{p_1}=0.5 and p2^=0.5.\hat{p_2}=0.5. Then n1=n2=(zα2)2{p1^(1p1^)+p2^(1p2^)}E2=(2.326)2{(0.5)(0.5)+(0.5)(0.5)}0.052=1082.0552n_1=n_2= \frac{ (z_{α∕2})^2 \{\hat{p_1}(1−\hat{p_1})+\hat{p_2}(1−\hat{p_2})\}}{E^2} = \frac{(2.326)^2 \{(0.5)(0.5)+(0.5)(0.5)\}}{0.052}=1082.0552 which we round up to 1,083. We must take a sample of size 1,083 from each population.

  2. Since p10.2p_1≈0.2 we estimate p1^\hat{p_1} by 0.2, and since p20.3p_2≈0.3 we estimate p2^\hat{p_2} by 0.3. Thus we obtain n1=n2=(zα2)2{p1^(1p1^)+p2^(1p2^)}E2=(2.326)2{(0.2)(0.8)+(0.3)(0.7)}0.052=800.720848n_1=n_2= \frac{ (z_{α∕2})^2 \{\hat{p_1}(1−\hat{p_1})+\hat{p_2}(1−\hat{p_2})\}}{E^2} = \frac{(2.326)^2 \{(0.2)(0.8)+(0.3)(0.7)\} } {0.052}=800.720848 which we round up to 801. We must take a sample of size 801 from each population.

Last updated