2-3. Measures of Variability (Statistical Dispersion)
离散程度的统计量
다음의 두 데이터 세트가 있다. 각각의 데이터 세트를 점으로 표현한 것이 dot plot이다.
Data Set 1
40
38
42
40
39
39
43
40
39
40
Data Set 2
46
37
40
33
42
36
40
47
34
45
library(ggplot2)
x1 <- c(40, 38, 42, 40, 39, 39, 43, 40, 39, 40)
x2 <- c(46, 37, 40, 33, 42, 36, 40, 47, 34, 45)
x <- data.frame(x1, x2)
ggplot(x, aes(x = x1)) + geom_dotplot()
ggplot(x, aes(x = x2)) + geom_dotplot()
참고사이트 : https://ggplot2.tidyverse.org/reference/geom_dotplot.html
1. The Range
The range of a data set is the number R defined by the formula
where is the largest measurement in the data set and is the smallest.
EXAMPLE 23. 앞의 2개의 데이터 세트의 range를 구하라.
[Solution]
x1 <- c(40, 38, 42, 40, 39, 39, 43, 40, 39, 40)
x2 <- c(46, 37, 40, 33, 42, 36, 40, 47, 34, 45)
# range of x1
# 1)
range_x1 <- max(x1) - min(x1); range_x1
# 2)
range(x1) # R Function : range()
diff(range(x1))
# range of x2
# 1)
range_x2 <- max(x2) - min(x2); range_x2
# 2)
range(x2)
diff(range(x2))
Note : range( )
function of R returns the minimum value and the maxim value of the data set.
2. The Variance and The Standard Deviation
EXAMPLE 24. 위의 예에서 Data Set 2의 sample variance와 sample standard deviation을 구하라.
[Solution]
x <- c(46, 37, 40, 33, 42, 36, 40, 47, 34, 45)
# 1. Variance
n <- length(x); n
y <- (x - mean(x)); y
var_x <- sum(y^2)/(n-1); var_x
# 2. R Function for Variance : var()
var(x)
# 3. R Function for Standard Deviation : sd()
sd(x)
Note : In R, var()
returns the sample variance, i.e. the denominator used in var() function is (n-1)
.
The sample variance of a set of sample data is the number defined by the formula
which by algebra is equivalent to the formula
The sample standard deviation of a set of sample data is the square root of the sample variance, hence is the number given by the formulas
EXAMPLE 25. 무작위로 선발한 10명의 학생의 평균 평점은 다음과 같다. sample variance와 sample standard deviation을 구하라.
1.90 3.00 2.53 3.71 2.12 1.76 2.71 1.39 4.00 3.33
[풀이]
x <- c(1.90, 3.00, 2.53, 3.71, 2.12, 1.76, 2.71, 1.39, 4.00, 3.33)
var(x)
sd(x)
The population variance and population standard deviation of a set of population data are the numbers and defined by the formulas
and
[ Difference between Two Data Sets ]

See : Degree of Dispersion
3. Coefficient of Variation
In probability theory and statistics, the coefficient of variation (CV), also known as relative standard deviation (RSD), is a standardized measure of dispersion of a probability distribution or frequency distribution. It is often expressed as a percentage, and is defined as the ratio of the standard deviation to the mean (or its absolute value, ).
평균에 대한 상대적인 변동성의 크기를 설명할 때에 변동계수(Coefficient of Variation)을 사용한다.
평균에 대한 표준편차의 비율로 표현된다.
변동계수가 클수록, 즉 표준편차가 표본평균에 비해 클수록 자료의 퍼짐진 정도가 더 크다고 할 수 있다.
EXAMPLE 26. Example 25.의 CV를 구하라.
[ Solution ]
# install.packages("goeveg")
library(goeveg)
x <- c(1.90, 3.00, 2.53, 3.71, 2.12, 1.76, 2.71, 1.39, 4.00, 3.33)
# 1. Calculation of CV
cv_x <- sd(x) / mean(x); cv_x
# 2. R Function : cv() in 'goeveg' package
cv(x)
离散程度的统计量
1) 全距(range)
2) 标准差(standard deviation)
3) 方差(variance)
4) 变异系数(coefficient of variation)
Last updated
Was this helpful?