5-1. Continuous Random Variables

连续型随机变量

1. The Probability Distribution of a Continuous Random Variable

The probability distribution of a continuous random variable X is an assignment of probabilities to intervals of decimal numbers using a function f(x)f(x) , called a density function, in the following way: the probability that X assumes a value in the interval [a,b][a,b] is equal to the area of the region that is bounded above by the graph of the equation y=f(x)y=f(x) , bounded below by the x-axis, and bounded on the left and right by the vertical lines through a and b, as illustrated in Figure "Probability Given as Area of a Region under a Curve".

Every probability density function(p.d.f) f(x)f(x) must satisfy the following two conditions:

  1. For all numbers x,x, f(x)0 f(x)≥0 , so that the graph of y=f(x)y=f(x) never drops below the x-axis.

  2. The area of the region under the graph of y=f(x)y=f(x) and above the x-axis is 1.

For any continuous random variable XX :

P(aXb)=P(a<Xb)=P(aX<b)=P(a<X<b)P(a≤X≤b)=P(a<X≤b)=P(a≤X<b)=P(a<X<b)

2. Uniform Distribution

균일분포(Uniform distribution)는 유한한 실수 구간 [a,b][a, b] 에서 동일한 확률로 관측되는 확률변수 XX의 분포이다.

(p.d.f) f(x)=1ba,f(x) = \frac{1}{b-a}, aXba \le X \le b .

Expected Values and Variance of Uniform Distribution

E(X)=abx1badx=b2a22(ba)=a+b2E(X) = \int_{a}^{b}x\frac{1}{b-a}dx = \frac{b^2-a^2}{2(b-a)}=\frac{a+b}{2}

E(X2)=abx21badx=b3a33(ba)=a2+ab+b23E(X^2) = \int_{a}^{b}x^2\frac{1}{b-a}dx = \frac{b^3-a^3}{3(b-a)}=\frac{a^2+ab+b^2}{3}

Var(X)=E(X2)E(X)2=a2+ab+b23(a+b2)2=(ba)212Var(X) = E(X^2) - E(X)^2 = \frac{a^2+ab+b^2}{3} - \left ( \frac{a+b}{2} \right )^2 = \frac{(b-a)^2}{12}

EXAMPLE 1. A random variable XX has the uniform distribution on the interval [0,1][0,1] : the density function is f(x)=1f(x)=1 if x is between 0 and 1 and f(x)=0f(x)=0 for all other values of xx , as shown in Figure "Uniform Distribution on ".

  1. Find P(X>0.75)P(X > 0.75) , the probability that XX assumes a value greater than 0.75.

  2. Find P(X0.2)P(X ≤ 0.2) , the probability that XX assumes a value less than or equal to 0.2.

  3. Find P(0.4<X<0.7)P(0.4 < X < 0.7) , the probability that XX assumes a value between 0.4 and 0.7.

[ Solution ]

  1. P(X>0.75)P(X > 0.75) is the area of the rectangle of height 1 and base length 10.75=0.251−0.75=0.25 , hence is base×height=(0.25)(1)=0.25base×height=(0.25)⋅(1)=0.25 . See Figure 5.3 "Probabilities from the Uniform Distribution on "(a).

  2. P(X0.2)P(X ≤ 0.2) is the area of the rectangle of height 1 and base length 0.20=0.20.2−0=0.2 , hence is base×height=(0.2)(1)=0.2base×height=(0.2)⋅(1)=0.2 . See Figure 5.3 "Probabilities from the Uniform Distribution on "(b).

  3. P(0.4<X<0.7)P(0.4 < X < 0.7) is the area of the rectangle of height 1 and length 0.70.4=0.30.7−0.4=0.3 , hence is base×height=(0.3)(1)=0.3base×height=(0.3)⋅(1)=0.3 . See Figure 5.3 "Probabilities from the Uniform Distribution on "(c).

library(Rstat)

# 0. Probability Distribution Function
fx <- function(x) dunif(x, 0, 1)

win.graph(7, 6); par(mfrow=c(1,1));

# E(X), Var(X) and Plot
cont.exp(fx, -0.2, 1.2, prt=TRUE, plot=TRUE)

# 1. P(X>0.75)
punif(0.75, min=0, max=1, lower.tail=FALSE)

# 2. P(X<0.2)
punif(0.2, min=0, max=1, lower.tail=TRUE)

# 3. P(0.4<X<0.7) = P(X<0.7) - P(X<0.4)
punif(0.7, min=0, max=1, lower.tail=TRUE) - punif(0.4, min=0, max=1, lower.tail=TRUE) 

EXAMPLE 2. A man arrives at a bus stop at a random time (that is, with no regard for the scheduled service) to catch the next bus. Buses run every 30 minutes without fail, hence the next bus will come any time during the next 30 minutes with evenly distributed probability (a uniform distribution). Find the probability that a bus will come within the next 10 minutes.

[ Solution ]

The graph of the density function is a horizontal line above the interval from 0 to 30 and is the x-axis everywhere else. Since the total area under the curve must be 1, the height of the horizontal line is 1/30. See Figure 5.4 "Probability of Waiting At Most 10 Minutes for a Bus". The probability sought is P(0X10)P(0≤X≤10) .

By definition, this probability is the area of the rectangular region bounded above by the horizontal line f(x)=130,f(x)=1∕30, bounded below by the x-axis, bounded on the left by the vertical line at 0 (the y-axis), and bounded on the right by the vertical line at 10. This is the shaded region in Figure 5.4 "Probability of Waiting At Most 10 Minutes for a Bus". Its area is the base of the rectangle times its height, 10(130)=1310⋅(1∕30)=1∕3 .

Thus P(0X10)=13P(0≤X≤10)=1∕3 .

library(Rstat)

min <- 0
max <- 30

# 0. Probability Distribution Function
fx <- function(x) dunif(x, min, max)

# E(X), Var(X) and Plot
t <- (max-min) * 0.2 
cont.exp(fx, min-t, max+t, prt=TRUE, plot=TRUE)

# 1. P(X<10)
punif(10, min=min, max=max, lower.tail=TRUE)

3. Uniform Distribution in R

Function

parameters

density function

d

dunif(x, min, max)

cumulative distribution function

p

punif(q, min, max, lower.tail =TRUE/FALSE

quantile function

q

qunif(p, min, max, lower.tail = TRUE/FALSE

random nunber generation

r

runif(n, min, max)

3-1. Uniform Distribution Plot

dunif()

  • Uniform Distribution of 0X100 \le X \le 10

library(ggplot2)

# uniform distribution plot (min=0, max=10)
# fun = dunif
ggplot(data.frame(x=c(-2,20)), aes(x=x)) +
stat_function(fun=dunif, args=list(min = 0, max = 10), 
        colour="black", size=1) +
ggtitle("Uniform Distribution of (min=1, max=10)")

3-2. Cumulative Uniform Distribution Plot

punif()

  • Uniform Distribution of 0X100 \le X \le 10

# Cumulative Uniform distribution plot) : fun = punif 

ggplot(data.frame(x=c(-2,20)), aes(x=x)) + 
stat_function(fun=punif, args=list(min = 0, max = 10), 
               colour="black", size=1) + 
ggtitle("Cumulative Uniform Distribution of (min=0, max=10)")

3-3. Probability Calculation

  • Uniform Distribution of 0X100 \le X \le 10, XX from 0 to 3"

min = 0, max = 10 => P(0<X<3)=?P(0 <X<3) = ?

# 확률 값 계산 : punif()
# punif(q, min, max, lower.tail = TRUE/FALSE)
punif(3, min=0, max=10, lower.tail=TRUE)

# Uniform Distribution of (min=1, max=10), x from 0 to 3"
ggplot(data.frame(x=c(-2,20)), aes(x=x)) +
stat_function(fun=dunif, args=list(min = 0, max = 10), colour="black", size=1) +
annotate("rect", xmin=0, xmax=3, ymin=0, ymax=0.1, alpha=0.2, fill="yellow") +
ggtitle("Uniform Distribution of (min=1, max=10), x from 0 to 3")

3-4. Quartiles

qunif(p, min, max, lower.tail=TRUE/FALSE)

  • Uniform Distribution of 0X100 \le X \le 10

qunif(0.3, min=0, max=10, lower.tail = TRUE)

3-5. Random Number Generation

runif(n=100, min=0, max = 10)

# Random Number Generation
ru_100 <- runif(n=100, min=0, max = 10) ; ru_100

# density plot of runif(n=100, min=0, max = 10) & adding line of 0.1 uniform probability
hist(ru_100, freq=FALSE, breaks=10, col="yellow", ylim=c(0, 0.15))
abline(h=0.1, lty=3, lwd=3, col="red")
 

  • Drawing PDF and CDF of Continuous Uniform Distribution

library(ggplot2)
library(dplyr)
options(scipen = 999, digits = 2) # sig digits

min <- 0
max <- 1
events <- seq(min, max, by=0.005)

density <- dunif(x = events, min=min, max=max)
prob <- punif(q = events, min=min, max=max, lower.tail = TRUE)
df <- data.frame(events, density, prob)
ggplot(df, aes(x = events, y = density)) + 
  geom_col(width=0.02) +
#  geom_text(
#    aes(label = round(density,2), y = density + 0.01),
#    position = position_dodge(0.9),
#    size = 3,
#    vjust = 0
#  ) +
  labs(title = "PMF and CDF of Uniform Distribution",
#      subtitle = "P(3).",
       x = "Events (x)",
       y = "Density") +
  geom_line(data = df, aes(x = events, y = prob), col="blue")
  • Using Rstat Package

library(Rstat)

min <- 0
max <- 1
events <- seq(min, max, by=0.005)

dcol <- c("red", "blue", "green2")

# Survival Function
win.graph(7, 5)
par(mfrow=c(1,2))

plot(events, punif(q = events, min=min, max=max, lower.tail = TRUE), 
     type="l", lwd=2, col=dcol[1], 
     main="CDF of Gamma Distribution",
     ylab="CDF", ylim=c(0,1))

grid(col=3)

plot(events, dunif(events, min=min, max=max), 
     type="l", lwd=2, col=dcol[2], 
     main="PDF of Gamma Distribution",
     ylab="CDF", ylim=c(0,1))
grid(col=3)

4. Normal Distributions

The formula for f(x)f(x) contains two parameters μμ and σσ that can be assigned any specific numerical values, so long as σσ is positive. We will not need to know the formula for f(x)f(x) , but for those who are interested it is

f(x)=12πσe12(xμσ)2f(x)= \frac {1}{\sqrt{ 2 \pi} \sigma } e^ {-\frac{1}{2} \left ( \frac{x-\mu}{\sigma } \right ) ^2 }

where π3.14159π≈3.14159 and e2.71828e ≈ 2.71828 is the base of the natural logarithms.

library(ggplot2)

# uniform distribution plot (min=0, max=10)
# fun = dnorm
mu <- c(-2, -1, 1)

ggplot(data.frame(x=c(-4,4)), aes(x=x)) +
stat_function(fun=dnorm, args=list(mean = mu[1], sd = 0.25), 
        colour="black", size=1) +
stat_function(fun=dnorm, args=list(mean = mu[2], sd = 0.25), 
        colour="blue", size=1) +
stat_function(fun=dnorm, args=list(mean = mu[3], sd = 0.25), 
        colour="red", size=1) +
ggtitle("Normal Distribution of (mu=c(-2, -1, 1), sigma=0.25)")

The value of σσ determines whether the bell curve is tall and thin or short and squat, subject always to the condition that the total area under the curve be equal to 1.

library(ggplot2)

# uniform distribution plot (min=0, max=10)
# fun = dnorm
mu <- 6
sd <- c(0.5, 1, 2)

ggplot(data.frame(x=c(-2,14)), aes(x=x)) +
stat_function(fun=dnorm, args=list(mean = mu, sd = sd[1]), 
        colour="black", size=1) +
stat_function(fun=dnorm, args=list(mean = mu, sd = sd[2]), 
        colour="blue", size=1) +
stat_function(fun=dnorm, args=list(mean = mu, sd = sd[3]), 
        colour="red", size=1) +
ggtitle("Normal Distribution of (mean=6, sigma=c(0.5, 1, 2))")

The probability distribution corresponding to the density function for the bell curve with parameters μμ and σσ is called the normal distribution with mean μμ and standard deviation σσ .

A continuous random variable whose probabilities are described by the normal distribution with mean μμ and standard deviation σσ is called a normally distributed random variable, or a normal random variable for short, with mean μμ and standard deviation σσ : XN(μ,σ)X \sim N(\mu , \sigma ) .

The density curve for the normal distribution is symmetric about the mean.

EXAMPLE 3. Heights of 25-year-old men in a certain region have mean 69.75 inches and standard deviation 2.59 inches. These heights are approximately normally distributed. Thus the height X of a randomly selected 25-year-old man is a normal random variable with mean μ = 69.75 and standard deviation σ = 2.59. Sketch a qualitatively accurate graph of the density function for X. Find the probability that a randomly selected 25-year-old man is more than 69.75 inches tall.

[ Solution ]

Since the total area under the curve is 1, by symmetry the area to the right of 69.75 is half the total, or 0.5. But this area is precisely the probability P(X>69.75)P(X > 69.75), the probability that a randomly selected 25-year-old man is more than 69.75 inches tall.

library(Rstat)

# 1. Compute P(X<=69.75
pnorm(69.75, mean=69.75, sd=2.59)

# 2. Plot
norm.trans(69.75, 2.59, a=0, b=69.75)

We will learn how to compute other probabilities in the next two sections.

4-1. Random Number Generation & Plotting using R

  • Random Number Generation : dnorm(x, mena = , sd = )

  • Plotting : plot(x, dnorm())

mean μμ = 69.75 , standard deviation σσ = 2.59.

# Normal distribution plot, X~N(69.76, 2.59)
mu <- 69.75
sigma <- 2.59
x <- seq((mu - 6 * sigma), (mu + 6 * sigma), length=200) # x-axis values
dnorm(x, mean=mu, sd=sigma)   # random number

main_title <- paste("Normal Distribution, X ~ N(", mu,",",sigma,")", sep="")
ylab_title <- paste("dnorm(x, mean = ", mu, ", sd = ", sigma,")", sep="")

plot(x, dnorm(x, mean=mu, sd=sigma), 
        type='l', 
        main=main_title,
        ylab=ylab_title)
abline(v=mu, col="yellow")
abline(h=0, col="gray")

概率密度函数 Probability Density Function 均匀分布 Uniform probability distribution 正态分布 Normal probability distribution

Last updated