4-6. Hypergeometric Distribution

1. Hypergeometric Distribution

Hypergeometric distribution, in statisticsarrow-up-right, distribution functionarrow-up-right in which selections are made from two groups without replacing members of the groups. The hypergeometric distribution differs from the binomial distributionarrow-up-right in the lack of replacements. Thus, it often is employed in random samplingarrow-up-right for statistical quality controlarrow-up-right. A simple everyday example would be the random selection of members for a team from a populationarrow-up-right of girls and boys.

In symbols, let the size of the population selected from be NN , with rr elements of the population belonging to one group (for convenience, called successes) and (Nโˆ’r)(N โˆ’ r ) belonging to the other group (called failures). Further, let the number of samples drawn from the population be nn , such that 0โ‰คnโ‰คN0 โ‰ค n โ‰ค N . Then the probability ( PP ) that the number ( XX ) of elements drawn from the successful group is equal to some number ( xx ) is given by

P(X=x)=(rx)(Nโˆ’rnโˆ’x)(Nn)P(X=x) = \frac { \binom{r}{x}\binom{N-r}{n-x} }{ \binom{N}{n}} :XโˆผHG(n,N,r): X \sim HG(n, N, r)

using the notation of binomial coefficientsarrow-up-right, or, using factorialarrow-up-right notation,

P(X=x)=n!r!(Nโˆ’n)!(Nโˆ’r)!N!x!(rโˆ’x)!(nโˆ’x)!(Nโˆ’rโˆ’n+x)!P(X=x) = \frac{n!r!(N-n)!(N-r)!}{N!x!(r-x)!(n-x)!(N-r-n+x)!} (hypergeometric factorial formula)

2. Mean and Variance

The meanarrow-up-right of the hypergeometric distribution is ฮผ=nโˆ—P(success)=nrN\mu = n * P(success) = n \frac{r}{N} ,

and the variancearrow-up-right (square of the standard deviationarrow-up-right) is ฯƒ=nr(Nโˆ’r)(Nโˆ’n)N2(Nโˆ’1)\sigma = \frac{nr(N โˆ’ r)(N โˆ’ n)}{N^2(N โˆ’ 1)} .

Var(X)=np(1โˆ’p)Nโˆ’nNโˆ’1Var(X) = np(1-p) \frac{N-n}{N-1} , if p=rnp = \frac{r}{n}

EXAMPLE 15. A batch of 100 piston rings is known to contain 10 defective rings. If two piston rings are drawn from the batch, write down the probabilities that:

  1. the first ring is defective;

  2. the second ring is defective given that the first one is defective.

[ Solution ]

  1. 10/ 100 = 1/10

  2. 9/99 = 1/11

EAXMPLE 16. A batch of 10 rocker cover gaskets contains 4 defective gaskets. If we draw samples of size 3 without replacement, from the batch of 10, find the probability that a sample contains 2 defective gaskets. And Find the expectation and variance of samples.

[ Solution ]

  • P(X=x)=(rx)(Nโˆ’rnโˆ’x)(Nn)P(X=x) = \frac { \binom{r}{x}\binom{N-r}{n-x} }{ \binom{N}{n}}, N=10,n=3,r=4ย andย x=2N =10, n =3, r=4 \space and \space x=2 => P(X=2)=4C2โˆ—6C110C3=6โˆ—6120=0.3P(X=2) = \frac {_4C_2 * _6C_1} {_{10}C_3} = \frac {6*6}{120} = 0.3

EXAMPLE 17. In the manufacture of car tyres, a particular production process is know to yield 10 tyres with defective walls in every batch of 100 tyres produced. From a production batch of 100 tyres, a sample of 4 is selected for testing to destruction. Find:

  1. the probability that the sample contains 1 defective tyre

  2. the expectation of the number of defectives in samples of size 4

  3. the variance of the number of defectives in samples of size 4.

[ Solution ]

  • P(X=x)=(rx)(Nโˆ’rnโˆ’x)(Nn)P(X=x) = \frac { \binom{r}{x}\binom{N-r}{n-x} }{ \binom{N}{n}}, N=100,n=4,r=10ย andย x=1N =100, n =4, r=10 \space and \space x=1

  1. P(X=1)=10C1โˆ—(100โˆ’10)C(4โˆ’1)100C4=10โˆ—1174803921225โ‰ˆ0.3P(X=1) = \frac { _{10}C_1 * _{(100-10)}C_{(4-1)} } {_{100}C_4} = \frac {10 * 117480 }{3921225} โ‰ˆ 0.3

  2. E(X)=np=4โˆ—0.1=0.4E(X) = np = 4 * 0.1 = 0.4

  3. V(X)=np(1โˆ’p)Nโˆ’MNโˆ’1=0.4โˆ—0.9โˆ—9099โ‰ˆ0.33V(X) = np(1-p)\frac {N-M}{N-1} = 0.4 * 0.9 * \frac {90}{99} โ‰ˆ0.33

3. Using R

์ดˆ๊ธฐํ•˜๋ถ„ํฌ์˜ ๋ฐ€๋„ ํ•จ์ˆ˜, ๋ˆ„์ ๋ถ„ํฌ ํ•จ์ˆ˜, ๋ถ„์œ„์ˆ˜ ํ•จ์ˆ˜, ๋‚œ์ˆ˜ ๋ฐœ์ƒ์„ ์œ„ํ•œ R ํ•จ์ˆ˜ ๋ฐ ๋ชจ์ˆ˜๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

๊ตฌ๋ถ„

์ดˆ๊ธฐํ•˜๋ถ„ํฌ R ํ•จ์ˆ˜ / ๋ชจ์ˆ˜

๋ฐ€๋„ ํ•จ์ˆ˜

d

dhyper(x, m, n, k)

๋ˆ„์ ๋ถ„ํฌ ํ•จ์ˆ˜

p

phyper(q, m, n, k, lower.tail = TRUE/FALSE)

๋ถ„์œ„์ˆ˜ ํ•จ์ˆ˜

q

qhyper(p, m, n, k, lower.tail = TRUE/FALSE)

๋‚œ์ˆ˜ ๋ฐœ์ƒ

r

rhyper(nn, m, n, k)

์ฐธ๊ณ  : ๋ชจ์ง‘๋‹จ์ด m๊ณผ n์˜ ๊ฐœ์ฒด๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋Š”๋ฐ k๊ฐœ์˜ ํ‘œ๋ณธ์„ ์ถ”์ถœ. lower.tail = TRUE ์ด๋ฉด ํ™•๋ฅ ๋ณ€์ˆ˜ x๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์™ผ์ชฝ ๊ผฌ๋ฆฌ๋ฅผ ์˜๋ฏธ

3-1. Random Number Generation & Plotting

  • Random Number Generation : rhyper(nn, m, n, k)

  • Plotting : dhyper(x, m, n, k)

m=5, n=20 ์ธ ์ดˆ๊ธฐํ•˜๋ถ„ํฌ์—์„œ ๋น„๋ณต์›์œผ๋กœ 4๊ฐœ๋ฅผ ์ถ”์ถœํ•˜๋Š” ๊ฒƒ์„ 1000๋ฒˆ ๋ชจ์˜์‹คํ—˜ํ•œ ํ›„์— ๋„์ˆ˜๋ถ„ํฌํ‘œ๋ฅผ ๊ตฌํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

3-2. Probability Computation

1) P(X=4)P(X = 4) ํ™•๋ฅ  ๊ณ„์‚ฐ : dhyper(x, m, n, k)

EXAMPLE 18. ์–ด๋–ค ๋ฐ”๋ฆฌ์Šคํƒ€๊ฐ€ ์•„๋ฉ”๋ฆฌ์นด๋…ธ ํ–ฅ ๋ƒ„์ƒˆ๋ฅผ ๋งก์•„๋ณด๊ธฐ๋งŒ ํ•˜๋ฉด "์ฝœ๋กฌ๋น„์•„ ์›๋‘"๋กœ ๋งŒ๋“  ๊ฒƒ์ธ์ง€ ์•„๋‹Œ์ง€๋ฅผ ๋งž์ถœ ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ฃผ์žฅํ•˜์˜€๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ๊ทธ ๋ฐ”๋ฆฌ์Šคํƒ€๋ฅผ ๋ฐ๋ ค๋‹ค๊ฐ€ ์‹คํ—˜์„ ํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค. "์ฝœ๋กฌ๋น„์•„ ์›๋‘"๋กœ ๋งŒ๋“  ์•„๋ฉ”๋ฆฌ์นด๋…ธ 5์ž” (m=5), ์ฝœ๋กฌ๋น„์•„ ์›๋‘ ๋ง๊ณ  ๋‹ค๋ฅธ ์ง€์—ญ ์›๋‘๋กœ ๋งŒ๋“  ์•„๋ฉ”๋ฆฌ์นด๋…ธ 20์ž” (n=20) ์„ ๋งŒ๋“ค์–ด ๋†“๊ณ  ๊ทธ ๋ฐ”๋ฆฌ์Šคํƒ€์—๊ฒŒ "์ฝœ๋กฌ๋น„์•„ ์›๋‘"๋กœ ๋งŒ๋“  ์•„๋ฉ”๋ฆฌ์นด๋…ธ 5์ž”(k)์„ ๊ณจ๋ผ๋‚ด ๋ณด๋ผ๊ณ  ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค. ์ด๋•Œ "์ฝœ๋กฌ๋น„์•„ ์›๋‘"๋กœ ๋งŒ๋“  ์•„๋ฉ”๋ฆฌ์นด๋…ธ๋ฅผ 4์ž”(x) ๊ณจ๋ผ๋‚ผ ํ™•๋ฅ ์€?

[ Solution ]

  • m : "์ฝœ๋กฌ๋น„์•„ ์›๋‘"๋กœ ๋งŒ๋“  ์•„๋ฉ”๋ฆฌ์นด๋…ธ 5์ž” (์›ํ•˜๋Š” ๊ฒฐ๊ณผ ๋Œ€์ƒ)

  • n : ๋‹ค๋ฅธ ์ง€์—ญ ์›๋‘๋กœ ๋งŒ๋“  ์•„๋ฉ”๋ฆฌ์นด๋…ธ 20์ž” (์›ํ•˜์ง€ ์•Š๋Š” ๊ฒฐ๊ณผ ๋Œ€์ƒ)

  • k : ๊ณจ๋ผ๋‚ด๋Š” ์ปคํ”ผ 5์ž” (์‹œํ–‰ํšŸ์ˆ˜)

  • x : ์›ํ•˜๋Š” ๊ฒฐ๊ณผ์˜ ํšŸ์ˆ˜ (4์ž”)

=> P(X=4)=dhyper(x=4,m=5,n=20,k=5)P(X=4) = dhyper(x=4, m=5, n=20, k=5)

EXAMPLE 19. TV๋ฅผ ์ƒ์‚ฐํ•˜๋Š” ์ œ์กฐํšŒ์‚ฌ์—์„œ ์ƒ์‚ฐํ•œ TV 100 ๋Œ€ ์ค‘์—์„œ ํ’ˆ์งˆ์ด ์–‘ํ˜ธํ•œ TV๊ฐ€ 95๋Œ€, ๋ถˆ๋Ÿ‰ํ’ˆ์ด 5๋Œ€๊ฐ€ ์žฌ๊ณ ์ฐฝ๊ณ ์— ๋“ค์–ด์žˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์ด ์žฌ๊ณ  ์ฐฝ๊ณ ์—์„œ TV 10๊ฐœ๋ฅผ ๋น„๋ณต์›์ถ”์ถœํ•œ๋‹ค๊ณ  ํ–ˆ์„ ๋•Œ ๋ถˆ๋Ÿ‰ํ’ˆ์ด 3๊ฐœ๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์„ ํ™•๋ฅ ์€?

[Solution]

  • m : ๋ถˆ๋Ÿ‰ํ’ˆ์˜ ๋Œ€์ˆ˜ 5๋Œ€

  • n : ์–‘ํ˜ธํ•œ TV ๋Œ€์ˆ˜ 95๋Œ€

  • k : 10๋Œ€ ๋น„๋ณต์›์ถ”์ถœ

  • x : ๋ถˆ๋Ÿ‰ํ’ˆ์ด 3๋Œ€

=> dhyper(3,m=5,n=95,k=10)dhyper(3, m=5, n=95, k=10)

2) P(X<=4)P(X<=4)

  • phyper(x, m, n, k, lower.tail=TRUE) : lower.tail=TRUE ์‚ฌ์šฉ

EXAMPLE 20. EXAMPLE 18.์—์„œ 4์ž” ์ดํ•˜์ผ ํ™•๋ฅ ์„ ๊ตฌํ•˜๋ผ.

[ Solution ]

=> phyper(4,m=5,n=20,k=5,lower.tail=TRUE)phyper(4, m=5, n=20, k=5, lower.tail=TRUE)

  • ๋˜๋Š” P(X<=4)=P(X=1)+P(X=2)+P(X=3)+P(X=4)P(X<=4) = P(X=1) + P(X=2) +P(X=3) + P(X=4)

3-3. ํŠน์ • ํ™•๋ฅ ์— ํ•ด๋‹นํ•˜๋Š” ๋ถ„์œ„์ˆ˜ ๊ตฌํ•˜๊ธฐ

qhyper(p, m, n, k, lower.tail = TRUE/FALSE)

EXAMPLE 21. EXAMPLE 18. ์—์„œ ํ™•๋ฅ ์ด 0.03576134๊ฐ€ ๋˜๋Š” ์‹œํ–‰ํšŸ์ˆ˜๋ฅผ ๊ตฌํ•˜๋ผ.

[ Solution ]

EXAMPLE 22. ๋ˆ„์ ํ™•๋ฅ ์ด 0.998099๊ฐ€ ๋˜๋Š” ์‹œํ–‰ํšŸ์ˆ˜๋ฅผ ๊ตฌํ•˜๋ผ.

[ Solution ]

EXAMPLE 23. ์ด 50๊ฐœ์˜ ๊ฐœ์ฒด๋กœ ๊ตฌ์„ฑ๋˜๋ฉฐ, ๊ฐ๊ฐ 10๊ฐœ, 20๊ฐœ, 40๊ฐœ์˜ ์„ฑ๊ณต๊ฐœ์ฒด๊ฐ€ ์ž‡๋Š” ์„ธ ์ข…๋ฅ˜์˜ ์œ ํ•œ๋ชจ์ง‘๋‹จ์—์„œ 10๊ฐœ์˜ ํ‘œ๋ณธ์„ ์ทจํ•˜์˜€์„ ๋•Œ, ์„ฑ๊ณต๊ฐœ์ˆ˜์˜ ํ™•๋ฅ ๋ถ„ํฌ๋ฅผ ๊ตฌํ•˜์—ฌ ๋น„๊ตํ•˜๋ผ.

[ Solution ]

EXAMPLE 24. ๋ถˆ๋Ÿ‰๋ฅ ์ด 5%์ด๊ณ  1,000๊ฐœ์˜ ์ œํ’ˆ์œผ๋กœ ๊ตฌ์„ฑ๋œ Lot์—์„œ 30๊ฐœ์˜ ํ‘œ๋ณธ์„ ์ถ”์ถœํ•˜์˜€์„ ๋•Œ ๋‚˜์˜ค๋Š” ๋ถˆ๋Ÿ‰ํ’ˆ์˜ ๊ฐฏ์ˆ˜๋ฅผ XX ๋ผ ํ•  ๋•Œ, ๋‹ค์Œ์„ ๊ตฌํ•˜์‹œ์˜ค.

  1. X์˜ ํ™•๋ฅ ๋ถ„ํฌํ•จ์ˆ˜

  2. E(X)E(X) ์™€ Var(X)Var(X)

  3. P(X=3)P(X=3)

  4. P(Xโ‰ค3)P(X\le 3)

[ Solution ]

4. Binomial Distribution and Hypergeometric Distribution

4.1 Hypergeometric Distribution from Binomial Distribution

์ดํ•ญ ๋ถ„ํฌ์˜ ์กฐ๊ฑด๋ถ€ ๋ถ„ํฌ๊ฐ€ ๋ฐ”๋กœ ์ดˆ๊ธฐํ•˜ ๋ถ„ํฌ๊ฐ€ ๋˜๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‘ ํ™•๋ฅ ๋ณ€์ˆ˜ X์™€ Y๊ฐ€ ์„œ๋กœ ๋…๋ฆฝ์ด๊ณ , ์ดํ•ญ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅผ ๋•Œ, ํ™•๋ฅ ๋ณ€์ˆ˜ X+Y์— ๋Œ€ํ•œ X์˜ ์กฐ๊ฑด๋ถ€ ๋ถ„ํฌ๋Š” '์ดˆ๊ธฐํ•˜ ๋ถ„ํฌ'๋ฅผ ๋”ฐ๋ฅธ๋‹ค.

4.2 Binomial Distribution from Hypergeometric Distribution

์ด์™€ ๋ฐ˜๋Œ€๋กœ ์ดˆ๊ธฐํ•˜ ๋ถ„ํฌ์— ๊ทนํ•œ (N โ†’ โˆž)์„ ์ทจํ•˜๋ฉด ์ดํ•ญ๋ถ„ํฌ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.

์ดˆ๊ธฐํ•˜ ๋ถ„ํฌ์˜ ์„ฑ๊ณตํ™•๋ฅ ์„ p=rNp= \frac{r}{N} ์ด๋ผ ํ•  ๋•Œ, ์ดˆ๊ธฐํ•˜ ๋ถ„ํฌ์˜ ํ™•๋ฅ ์งˆ๋Ÿ‰ํ•จ์ˆ˜

P(X=x)=(rx)(Nโˆ’rnโˆ’x)(Nn)=rCxร—(Nโˆ’r)C(nโˆ’x)NCn=nCxร—(Nโˆ’n)C(rโˆ’x)NCrP(X=x) = \frac { \binom{r}{x}\binom{N-r}{n-x} }{ \binom{N}{n}} = \frac { _{r}C_x \times _{(N-r)}C_{(n-x)} } {_{N}C_n} = \frac { _{n}C_x \times _{(N-n)}C_{(r-x)} } {_{N}C_r} =nCx(Nโˆ’n)!(rโˆ’x)!((Nโˆ’r)โˆ’n+x)!N!r!(Nโˆ’r)!=nCxr!(Nโˆ’r)!(Nโˆ’n)!N!(rโˆ’x)!((Nโˆ’r)โˆ’n+x)!= {_{n}C_x} \frac{ \frac{(N-n)!}{(r-x)!((N-r)-n+x)!}}{ \frac{N!}{r!(N-r)!} } = {_nC_x} \frac{r!(N-r)!(N-n)! }{N!(r-x)!((N-r)-n+x)! } =nCxr(rโˆ’1)โ‹ฏ(rโˆ’x+1)(Nโˆ’r)(Nโˆ’rโˆ’1)โ‹ฏ(Nโˆ’rโˆ’n+x+1)N(Nโˆ’1)(Nโˆ’2)โ‹ฏ(Nโˆ’n+1)= {_nC_x} \frac{r(r-1) \cdots (r-x+1)(N-r)(N-r-1)\cdots (N-r-n+x+1) }{N(N-1)(N-2)\cdots (N-n+1)}

๋ถ„๋ชจ, ๋ถ„์ž๋ฅผ NnN^n ์œผ๋กœ ๋‚˜๋ˆ„๋ฉด,

=nCxร—rN(rNโˆ’1N)โ‹ฏ(rNโˆ’xโˆ’1N)(1โˆ’rN)(1โˆ’rNโˆ’1N)โ‹ฏ(1โˆ’rNโˆ’nโˆ’xโˆ’1N)(1โˆ’1N)(1โˆ’2N)โ‹ฏ(1โˆ’nโˆ’1N)= {_nC_x} \times \frac{ \frac{r}{N}(\frac{r}{N}-\frac{1}{N}) \cdots (\frac{r}{N} - \frac{x-1}{N})(1- \frac{r}{N})(1-\frac{r}{N} -\frac{1}{N})\cdots (1- \frac{r}{N}-\frac{n-x-1}{N}) }{(1-\frac{1}{N})(1-\frac{2}{N})\cdots (1-\frac{n-1}{N})}

p=rNp= \frac{r}{N}์ด๊ณ , 1โˆ’p=q1-p = q ์ด๋ฏ€๋กœ,

=nCxร—p(pโˆ’1N)โ‹ฏ(pโˆ’xโˆ’1N)q(qโˆ’1N)โ‹ฏ(qโˆ’nโˆ’xโˆ’1N)(1โˆ’1N)(1โˆ’2N)โ‹ฏ(1โˆ’nโˆ’1N)= {_nC_x} \times \frac{ p(p-\frac{1}{N}) \cdots (p - \frac{x-1}{N})q(q -\frac{1}{N})\cdots (q-\frac{n-x-1}{N}) }{(1-\frac{1}{N})(1-\frac{2}{N})\cdots (1-\frac{n-1}{N})}

AsNโ†’โˆžN \rightarrow \infty , P(X=x)โ†’nCxpxqnโˆ’xโ‡’P(X=x) \rightarrow {_nC_x}p^xq^{n-x} \Rightarrow BinomialDistribution Binomial Distribution

์ด๋ ‡๊ฒŒ ์ดˆ๊ธฐํ•˜ ๋ถ„ํฌ์— ๊ทนํ•œ์„ ์ทจํ–ˆ์„ ๊ฒฝ์šฐ ์ดํ•ญ ๋ถ„ํฌ๊ฐ€ ๋จ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

4.3 Binomial Distribution and Hypergeometric Distribution

๋”ฐ๋ผ์„œ ์ดํ•ญ ๋ถ„ํฌ์™€ ์ดˆ๊ธฐํ•˜ ๋ถ„ํฌ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ด€๊ณ„๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค.

๊ทธ๋ฆฌ๊ณ  ํ•œ ๊ฐ€์ง€ ๋” ์•Œ์•„๋‘์…”์•ผ ํ•  ์‚ฌํ•ญ์€ ์ดํ•ญ ๋ถ„ํฌ๋Š” '๋ณต์› ์ถ”์ถœ'์„ ์ „์ œ๋กœ, ์ดˆ๊ธฐํ•˜ ๋ถ„ํฌ๋Š” '๋น„๋ณต์› ์ถ”์ถœ'์„ ์ „์ œ๋กœ ํ•œ๋‹ค๋Š” ๊ฒƒ ์ž…๋‹ˆ๋‹ค.

Last updated