1. The Probability Distribution of a Continuous Random Variable
Theprobability distribution of a continuous random variableXis an assignment of probabilities to intervals of decimal numbers using a functionf(x) , called a density function, in the following way: the probability thatXassumes a value in the interval [a,b][a,b] is equal to the area of the region that is bounded above by the graph of the equationy=f(x) , bounded below by thex-axis, and bounded on the left and right by the vertical lines throughaandb, as illustrated in Figure "Probability Given as Area of a Region under a Curve".
Every probability density function(p.d.f)f(x) must satisfy the following two conditions:
For all numbers x,f(x)≥0 , so that the graph of y=f(x) never drops below the x-axis.
The area of the region under the graph of y=f(x) and above the x-axis is 1.
For any continuous random variable X:
P(a≤X≤b)=P(a<X≤b)=P(a≤X<b)=P(a<X<b)
2. Uniform Distribution
균일분포(Uniform distribution)는 유한한 실수 구간 [a,b] 에서 동일한 확률로 관측되는 확률변수 X의 분포이다.
(p.d.f) f(x)=b−a1,a≤X≤b .
Expected Values and Variance of Uniform Distribution
E(X)=∫abxb−a1dx=2(b−a)b2−a2=2a+b
E(X2)=∫abx2b−a1dx=3(b−a)b3−a3=3a2+ab+b2
Var(X)=E(X2)−E(X)2=3a2+ab+b2−(2a+b)2=12(b−a)2
EXAMPLE 1. A random variable X has the uniform distribution on the interval [0,1] :
the density function is
f(x)=1 if x is between 0 and 1
and f(x)=0 for all other values of x,
as shown in Figure "Uniform Distribution on ".
Find P(X>0.75) , the probability that X assumes a value greater than 0.75.
Find P(X≤0.2) , the probability that X assumes a value less than or equal to 0.2.
Find P(0.4<X<0.7) , the probability that X assumes a value between 0.4 and 0.7.
[ Solution ]
P(X>0.75) is the area of the rectangle of height 1 and base length 1−0.75=0.25, hence is base×height=(0.25)⋅(1)=0.25 . See Figure 5.3 "Probabilities from the Uniform Distribution on "(a).
P(X≤0.2) is the area of the rectangle of height 1 and base length 0.2−0=0.2 , hence is base×height=(0.2)⋅(1)=0.2 . See Figure 5.3 "Probabilities from the Uniform Distribution on "(b).
P(0.4<X<0.7) is the area of the rectangle of height 1 and length 0.7−0.4=0.3 , hence is base×height=(0.3)⋅(1)=0.3 . See Figure 5.3 "Probabilities from the Uniform Distribution on "(c).
EXAMPLE 2. A man arrives at a bus stop at a random time (that is, with no regard for the scheduled service) to catch the next bus. Buses run every 30 minutes without fail, hence the next bus will come any time during the next 30 minutes with evenly distributed probability (a uniform distribution). Find the probability that a bus will come within the next 10 minutes.
[ Solution ]
The graph of the density function is a horizontal line above the interval from 0 to 30 and is the x-axis everywhere else. Since the total area under the curve must be 1, the height of the horizontal line is 1/30. See Figure 5.4 "Probability of Waiting At Most 10 Minutes for a Bus". The probability sought is P(0≤X≤10) .
By definition, this probability is the area of the rectangular region bounded above by the horizontal line f(x)=1∕30, bounded below by the x-axis, bounded on the left by the vertical line at 0 (the y-axis), and bounded on the right by the vertical line at 10. This is the shaded region in Figure 5.4 "Probability of Waiting At Most 10 Minutes for a Bus". Its area is the base of the rectangle times its height, 10⋅(1∕30)=1∕3 .
Thus P(0≤X≤10)=1∕3 .
library(Rstat)
min <- 0
max <- 30
# 0. Probability Distribution Function
fx <- function(x) dunif(x, min, max)
# E(X), Var(X) and Plot
t <- (max-min) * 0.2
cont.exp(fx, min-t, max+t, prt=TRUE, plot=TRUE)
# 1. P(X<10)
punif(10, min=min, max=max, lower.tail=TRUE)
library(ggplot2)
# uniform distribution plot (min=0, max=10)
# fun = dunif
ggplot(data.frame(x=c(-2,20)), aes(x=x)) +
stat_function(fun=dunif, args=list(min = 0, max = 10),
colour="black", size=1) +
ggtitle("Uniform Distribution of (min=1, max=10)")
3-2. Cumulative Uniform Distribution Plot
punif()
Uniform Distribution of 0≤X≤10
# Cumulative Uniform distribution plot) : fun = punif
ggplot(data.frame(x=c(-2,20)), aes(x=x)) +
stat_function(fun=punif, args=list(min = 0, max = 10),
colour="black", size=1) +
ggtitle("Cumulative Uniform Distribution of (min=0, max=10)")
3-3. Probability Calculation
Uniform Distribution of 0≤X≤10, X from 0 to 3"
min = 0, max = 10 => P(0<X<3)=?
# 확률 값 계산 : punif()
# punif(q, min, max, lower.tail = TRUE/FALSE)
punif(3, min=0, max=10, lower.tail=TRUE)
# Uniform Distribution of (min=1, max=10), x from 0 to 3"
ggplot(data.frame(x=c(-2,20)), aes(x=x)) +
stat_function(fun=dunif, args=list(min = 0, max = 10), colour="black", size=1) +
annotate("rect", xmin=0, xmax=3, ymin=0, ymax=0.1, alpha=0.2, fill="yellow") +
ggtitle("Uniform Distribution of (min=1, max=10), x from 0 to 3")
# Random Number Generation
ru_100 <- runif(n=100, min=0, max = 10) ; ru_100
# density plot of runif(n=100, min=0, max = 10) & adding line of 0.1 uniform probability
hist(ru_100, freq=FALSE, breaks=10, col="yellow", ylim=c(0, 0.15))
abline(h=0.1, lty=3, lwd=3, col="red")
The formula for f(x) contains two parameters μ and σ that can be assigned any specific numerical values, so long as σ is positive. We will not need to know the formula for f(x) , but for those who are interested it is
f(x)=2πσ1e−21(σx−μ)2
where π≈3.14159 and e≈2.71828 is the base of the natural logarithms.
The value of σ determines whether the bell curve is tall and thin or short and squat, subject always to the condition that the total area under the curve be equal to 1.
The probability distribution corresponding to the density function for the bell curve with parametersμandσis called thenormal distributionwith meanμand standard deviationσ .
A continuous random variable whose probabilities are described by the normal distribution with meanμand standard deviationσis called anormally distributed random variable, or a normal random variable for short, with meanμand standard deviationσ : X∼N(μ,σ) .
The density curve for the normal distribution is symmetric about the mean.
EXAMPLE 3. Heights of 25-year-old men in a certain region have mean 69.75 inches and standard deviation 2.59 inches. These heights are approximately normally distributed. Thus the height X of a randomly selected 25-year-old man is a normal random variable with mean μ = 69.75 and standard deviation σ = 2.59. Sketch a qualitatively accurate graph of the density function for X. Find the probability that a randomly selected 25-year-old man is more than 69.75 inches tall.
[ Solution ]
Since the total area under the curve is 1, by symmetry the area to the right of 69.75 is half the total, or 0.5. But this area is precisely the probability P(X>69.75), the probability that a randomly selected 25-year-old man is more than 69.75 inches tall.