4-9. Negative Hypergeometric Distribution (*)

1. Negative Hypergeometric Distribution

6개의 흰 공, 4개의 검은 공이 든 상자에서 공을 하나씩 꺼낼 때, 흰 공을 꺼내는 것을 '성공'이라고 한다면, 흰 공을 처음 꺼내기 전 검은 공을 꺼낸 횟수를 확률변수 XX라 할 때, 이 확률변수를 'Negative Hypergeometric random variable' 이라고 한다.

52장의 플레잉 카드를 고루 섞고, 한 장씩 나눠준다고 생각해보. 여기서 '에이스' 카드가 나오면 성공입이다. 만약 7번째에서 나눠준 카드가 '에이스'카드 였다면, 이 이전에 나눠줬던 6개의 카드들은 'Negative Hypergeometric Distribution'을 따르는 것이다.

One can view the Binomial distribution and Hypergeometric distribution as both considering a random variable that counts the number of successes in nn trials.

In the Binomial case, there are Bernoulli trials with constant probability pp of success from trial to trial, with independent trials.

In the Hypergeometric case, the sampling is without replacement so that probabilities change from selection to selection and trials are dependent. In the Bernoulli trials case, the Negative Binomial distribution is the distribution counting the number of trials required until a specified number (say kk ) of successes have been observed. In the sampling without replacement case, a similar situation is to consider the number of selections required until a kthk ^{th} success is obtained.

EXAMPLE 29. Suppose an urn contains 4 red and 10 blue balls and that balls are drawn one after another from this urn until a red ball is obtained. What is the probability that exactly six balls are drawn?

[ Solution ]

P(BBBBBR)=P(5 B Balls in 5 Draws)×P(6th Ball is R given 5 B Removed)P(BBBBBR) = P(5 \space B \space Balls \space in \space 5 \space Draws) \times P(6^{th} \space Ball \space is \space R \space given \space 5 \space B \space Removed) =(105)(40)(145)×4145=2522002×490.0559= \frac{ \binom{10}{5} \binom{4}{0} }{ \binom{14}{5} } \times \frac {4}{14-5} = \frac{252}{2002} \times \frac {4}{9} \doteq 0.0559 .

This example is like "waiting for the first success in sampling without replacement" with success being obtaining a red ball.

Recall that in sampling with replacement the distribution analogous to this was the Geometric distribution, a special case of the Negative Binomial distribution.

EXAMPLE 30. A statistics department has purchased 24 calculators of which 4 are defective. Calculators are selected one-after-another without replacement and tested. What is the probability that the second calculator found to be defective is the eighth calculator selected?

[ Solution ]

P(1 defective in 7 then 1 more defective)P(1 \space defective \space in \space 7 \space then \space 1 \space more \space defective) =P(6 Good and 1 Defective in 7 draws)×P(8th Ball is Defective given 17 left)= P(6 \space Good \space and \space 1 \space Defective \space in \space 7 \space draws) \times P(8^{th} \space Ball \space is \space Defective \space given \space 17 \space left) =(206)(41)(247)×317=155,040346,104×3170.079051= \frac {\binom {20 }{6 } \binom {4 }{1}} { \binom {24} {7}} \times \frac{3}{17} = \frac{155,040 }{346,104 } \times \frac{3}{17} \doteq 0.079051 .

In the above two examples, the final selection must be a success. The selections before this one simply involve a Hypergeometric situation involving one fewer selection than the total number and one fewer success than the number for which the procedure is waiting.

Let XX be a random variable counting the number of selections required until the kthk^{th} success is obtained when sampling without replacement from a set of NN objects of which MM have a certain attribute (i.e. success). Then XX is said to have a Negative Hypergeometric distribution with parameters N,MN, M and kk -- that is, YNHG(N,M,k)Y \sim NHG(N,M ,k ) -- and, for appropriate values XX, its probability function is

Negative Hypergeometric Distribution

P(X=k)=(Mk1)(NMxk)(Nx1)×Mk+1Nx+1P(X=k)= \frac{ \binom{M}{k-1} \binom{N−M}{x-k} }{\binom{N}{x-1}} \times \frac{M-k+1}{N-x+1}

where, the parameters N,MN,Mand kk are non-negative integers which satisfy the condition kMNk≤M≤N .

Comments:

  1. In the above expression, the quantity (Mk1)(NMxk)(Nx1) \frac{ \binom{M}{k-1} \binom{N−M}{x-k} }{\binom{N}{x-1}} is just the Hypergeometric probability P[X=k1] P[ X = k − 1] if XHG(N,M,y1)X \sim HG(N,M , y − 1) (i.e. exactly k1k - 1 successes in the first y1y - 1 draws), and the second is the probability of another success on the kthk^{ th} draw based on what remains of the set of objects.

  2. Again, it might be harder to try to remember the formula for the Negative Hypergeometric probability function than to simply solve the problem based on general knowledge of probability.

  3. The value space of this random variable is VX={k,k+1,k+2,...,N}V_X = \{k ,k + 1,k + 2,...,N \}.

  4. The mean of this distribution is μE(X)=k×N+1M+1\mu ≡ E(X) = k \times \frac {N+1}{M+1} and its variance is σ2Var(X)=(NM)(N+1)(M+1k)(M+1)2(M+2)\sigma ^2 ≡ Var(X) = \frac{(N-M)(N+1)(M+1-k)} {(M+1)^2(M+2)} .

EXAMPLE 31. A land developer has plans for having 86 acreages in its development south of the city. During the development of the acreages, water testing has suggested that 12 of the sites have water problems such that the wells on these sites do not have water that meets local drinking standards. If a potential purchaser decides to visit several of the acreage sites chosen at random from the 86, what is the probability that the third site that the purchaser visits that has such water problems is the eighth site visited? What is the expected number of sites that this purchaser would visit so as to have found three with such water problems?

[ Solution ]

Let XX be a random variable counting the number of sites visited up to and including the third one having these water problems. Then XNHG(86,12,3)X \sim NHG(86,12,3) . The probability of having to visit 8 sites is P(X=8)=(122)(745)(867)×10790.19787×0.126580.025046P(X=8) = \frac {\binom {12 }{2 } \binom {74 }{5}} { \binom {86} {7}} \times \frac{10}{79} \doteq 0.19787 \times 0.12658 \doteq 0.025046 . And μE(X)=k×N+1M+1=3×86+112+1=2611320.07692\mu ≡ E(X) = k \times \frac {N+1}{M+1} = 3 \times \frac {86+1}{12 +1} = \frac{261}{13} \doteq 20.07692 .

2. Binomial, Hypergeometric, Negative Binomial and Negative Hypergeometric Distribution

지금까지 알아 봤던 1. 이항분포, 2. 초기하분포, 3. 음이항분포, 4. Negative Hypergeometric Distribution 를 한 번 생각해 보자.

2.1 Binomial Distribution and Hypergeometric Distribution

  • 이항분포시행횟수(number of trials)가 정해져 있었다. 예를 들어, 시험문제 25문제를 찍어서 다 맞힐 확률처럼 '25'라는 시행횟수가 고정되어 있다.

  • 초기하분포도 마찬가지 이다. 1,000개의 제품 중 20개를 뽑는 것처럼 '20'이라는 시행횟수를 정해놓고 문제를 푼다.

  • 하지만, 이항분포와 초기하분포의 차이점은 '복원추출(with replacement)'이나 '비복원추출(without replacement)'이냐의 차이이다. 이항분포복원추출, 초기하분포비복원추출을 전제로 한다.

2.2 Negative Binomial Distribution and Negative Hypergeometric Distribution

  • 음이항분포는 시행횟수를 정해 놓은 것이 아니라 '성공'의 횟수(number of successes)를 정해 놓고 문제를 푼다. 즉 월드시리즈에서 4번 먼저 승리하는 팀이 우승하는 것이 예가 된다.

  • Negative Hypergeometic Distribution 도 성공횟수를 정해놓습니다. 첫 번째 성공이 나오기 전까지 몇 번의 실패를 거듭해야 하는지를 알아보는 것이다.

이 내용을 표로 정리하면 다음과 같다.

With Replacement

Without Replacement

Fixed Number of Trials

이항 분포

(Binomial Distribution)

초기하 분포

(Hypergeometric Distribution)

Fixed Number of Successes

음이항 분포

(Negative Binomial Distribution)

Negative Hypergeometric

Distribution

Last updated