6-3. The Sample Proportion
样本率/样本比例
Last updated
样本率/样本比例
Last updated
Often sampling is done in order to estimate the proportion of a population that has a specific characteristic, such as the proportion of all items coming off an assembly line that are defective or the proportion of all people entering a retail store who make a purchase before leaving.
The population proportion is denoted p and the sample proportion is denoted . Thus if in reality 43% of people entering a store make a purchase before leaving, p = 0.43; if in a sample of 200 people entering the store, 78 make a purchase, =78/200=0.39.
The sample proportion is a random variable: it varies from sample to sample in a way that cannot be predicted with certainty. Viewed as a random variable it will be written . It has a mean μ_\hat{P} and a standard deviation σ_\hat{P} . Here are formulas for their values.
Suppose random samples of size n are drawn from a population in which the proportion with a characteristic of interest is p. The mean μ_\hat{P} and standard deviation σ_\hat{P} of the sample proportion satisfy
μ_\hat{P} = p and σ_\hat{P} = \sqrt{ \frac{pq}{n} }
where .
The Central Limit Theorem has an analogue for the population proportion . To see how, imagine that every element of the population that has the characteristic of interest is labeled with a 1, and that every element that does not is labeled with a 0. This gives a numerical population consisting entirely of zeros and ones. Clearly the proportion of the population with the special characteristic is the proportion of the numerical population that are ones; in symbols,
But of course the sum of all the zeros and ones is simply the number of ones, so the mean μ of the numerical population is
Thus the population proportion p is the same as the mean μ of the corresponding population of zeros and ones. In the same way the sample proportion is the same as the sample mean . Thus the Central Limit Theorem applies to. However, the condition that the sample be large is a little more complicated than just being of size at least 30.
For large samples, the sample proportion is approximately normally distributed,
with mean μ_\hat{P}=p
and standard deviation σ_\hat{P}= \sqrt{ \frac{pq}{n}}
A sample is large if the interval [ p − 3 σ_\hat{P}, \space \space p + 3 σ_\hat{P}] lies wholly within the interval .
Figure "Distribution of Sample Proportions" shows that when p = 0.1 a sample of size 15 is too small but a sample of size 100 is acceptable.
Figure "Distribution of Sample Proportions for " shows that when p = 0.5 a sample of size 15 is acceptable.
EXAMPLE 7. Suppose that in a population of voters in a certain region 38% are in favor of particular bond issue. Nine hundred randomly selected voters are asked if they favor the bond issue.
Find the probability that the sample proportion computed from a sample of size 900 will be within 5 percentage points of the true population proportion.
[ Solution ]
EXAMPLE 8. An online retailer claims that 90% of all orders are shipped within 12 hours of being received. A consumer group placed 121 orders of different sizes and at different times of day; 102 orders were shipped within 12 hours.
Compute the sample proportion of items shipped within 12 hours.
Confirm that the sample is large enough to assume that the sample proportion is normally distributed. Use p = 0.90, corresponding to the assumption that the retailer’s claim is valid.
Assuming the retailer’s claim is true, find the probability that a sample of size 121 would produce a sample proportion so low as was observed in this sample.
Based on the answer to part (3), draw a conclusion about the retailer’s claim.
[ Solution ]
样本率/样本比例 sample proportion
In actual practice p is not known, hence neither is σ_\hat{P}. In that case in order to check that the sample is sufficiently large we substitute the known quantity for . This means checking that the interval
lies wholly within the interval . This is illustrated in the examples.
Verify that the sample proportion computed from samples of size 900 meets the condition that its sampling distribution be approximately normal.
μ_\hat{P} =p=0.38 and σ_\hat{P}=\sqrt{ \frac{pq } {n} } =\sqrt{ \frac{(0.38)(0.62)}{900}} = 0.016 Then 3σ_\hat{P} =3(0.01618)=0.04854≈0.05. so [p−3σ_\hat{P},p+3σ_\hat{P}]=[0.38−0.05,0.38+0.05]=[0.33,0.43] which lies wholly within the interval [0,1], so it is safe to assume that is approximately normally distributed.
=P(\frac{0.33−μ_\hat{P} }{ \sigma_\hat{P}}<Z<\frac{0.43−μ_\hat{P} }{ \sigma_\hat{P}})
[p−3σ_\hat{P},p+3_\hat{P}] =[ 0.90-0.08, 0.90 +0.08] = [0.82, 0.98]
=P(Z≤ \frac{0.84−μ_\hat{P}}{σ_\hat{P}})