2-4. Exercises

1. Basic Exercises

Ex 1. 다음과 같은 데이터 세트가 있다.

69 93 70 53 92 75 85 70 68 76 88 70 77 82 85 82 80 100 96 85

a. 82번째 percentile을 구하라.

b. 68번째 percentile을 구하라.

[Solution]

x <- c(69, 93, 70, 53, 92, 75, 85, 70, 68, 76, 88, 70, 77, 82, 85, 82, 80, 100, 96, 85)

quantile(x, c(.68, .82))

> quantile(x, c(.68, .82))
  68%   82% 
85.00 90.32

Ex 2. 다음과 같은 데이터 세트가 있다.

8.5 9.6 6.5 8.0 8.2 8.5 8.2 7.7 7.0 8.8 7.6 2.9 7.0 8.5 1.5 9.2 4.9 8.7 9.3 6.9

a. 6.5 값의 percentile 순위를을 구하라.

b. 7.7 값의 percentile 순위를을 구하라.

[Solution]

x <- c(8.5, 9.6, 6.5, 8.0, 8.2, 8.5, 8.2, 7.7, 7.0, 8.8, 7.6, 2.9, 7.0, 
       8.5, 1.5, 9.2, 4.9, 8.7, 9.3, 6.9)
# a.
length(x[x <= 6.5]) / length(x) * 100

# b.
length(x[x <= 7.7]) / length(x) * 100

> # b.
> length(x[x <= 6.5]) / length(x) * 100
[1] 20
> length(x[x <= 7.7]) / length(x) * 100
[1] 45

Ex 3. 다음과 같은 stem and leaf diagram으로 표현된 데이터 세트가 있다.


  The decimal point is 1 digit(s) to the right of the |

   3 | 99
   4 | 25688
   5 | 02334467789
   6 | 012223445777788
   7 | 000112445666777889
   8 | 011223457889
   9 | 111123
  10 | 00

a. 75의 percentile 순위를 구하라.

b. 57의 percentile 순위를 구하라.

[ Solution ]

x <- c(100, 100, 91, 91, 91, 91, 92, 93,
        80, 81, 81, 82, 82, 83, 84, 85, 87, 88, 88, 89,
        70, 70, 70, 71, 71, 72, 74, 74, 75, 76, 76, 76, 77, 77, 77, 78, 78, 79,
        60, 61, 62, 62, 62, 63, 64, 64, 65, 67, 67, 67, 67, 68, 68,
        50, 52, 53, 53, 54, 54, 56, 57, 57, 58, 59,
        42, 45, 46, 48, 48, 39, 39)
stem(x)  

# a.
length(x[x <= 75]) / length(x) * 100      

# b.
length(x[x <= 57]) / length(x) * 100

> # a.
> length(x[x <= 75]) / length(x) * 100      
[1] 59.15493
> 
> # b.
> length(x[x <= 57]) / length(x) * 100
[1] 22.53521

Ex 4. 데이터 세트의 90번째 percentile은 90%와 같은가? Why or Why not?

Ex 5. 데이터 세트의 29번째 percentile은 5이다.

a. 관측치의 몇 퍼센트가 5보다 더 작은가?

b. 관측치의 몇 퍼센트가 5보다 더 큰가?

Ex 6. 데이터 세트의 54번째 percentile이 98.6이다.

a. 관측치의 몇 퍼센트가 98.6 보다 더 작은가?

b. 관측치의 몇 퍼센트가 98.6 보다 더 큰가?

Ex 7. 한 데이터 세트의 29번쨰 percentile이 5이고, 79번째 percentile이 10이다. 5와 10 사이에는 관측치의 몇 퍼센트가 포함되는가?

Ex 8. 한 데이터 세트의 40번쨰 percentile이 125이고, 82번째 percentile이 158이다. 125와 158 사이에는 관측치의 몇 퍼센트가 포함되는가?

Ex 9. Ex. 2)의 데이터 세트에 대하여 five-number summary와 IQR을 구하라. 그리고 box plot을 작성하라.

Ex 10. Ex. 3)의 의stem and leaf diagram으로 표현된 데이터 세트에 대하여 five-number summary와 IQR을 구하라. 그리고 box plot을 작성하라.

Ex 11. 다음의 data frequency table로 표현된 데이터 세트의 five-number summary와 IQR을 구하라. 그리고 box plot을 작성하라.

x   1  2  5  8  9
f   5  2  3  6  4

Ex 12. 다음의 data frequency table로 표현된 데이터 세트의 five-number summary와 IQR을 구하라. 그리고 box plot을 작성하라.

x -5 -3 -2 -1  0  1  3  4  5
f  2  1  3  2  4  1  1  2  1

Ex 13. 다음의 표본 데이터 세트에 있는 각 측정치의 z-score를 구하라.

-5 6 2 -1 0

Ex 14. 다음의 표본 데이터 세트에 있는 각 측정치의 z-score를 구하라.

1.6 5.2 2.8 3.7 4.0

Ex 15. 다음의 data frequency table을 갖는 표본이 평균 3, 표준편차 2.71을 갖는다. 표본에 있는 각 수치의 z-score를 구하라.

x   1  2  7
f   1  2  1

Ex 16. 다음의 data frequency table을 갖는 표본이 평균 3, 표준편차 2.71을 갖는다. 표본에 있는 각 수치의 z-score를 구하라.

x   -1  0  1  4
f    1  1  3  1

Ex 17. 다음과 같은 모집단이 있다.

0   0   2   2

a. 모평균을 구하라( $\mu$ ).

b. 모분산을 구하라( $\sigma^2$ ).

c. 모 표준편차를을 구하라( $\sigma$ ).

d. 모집단 데이터 세트의 각 수치에 대한 z-score를 구하라.

Ex 18. 다음과 같은 모집단이 있다.

0.5   2.1   4.4   1.0

a. 모평균을 구하라( $\mu$ ).

b. 모분산을 구하라( $\sigma^2$ ).

c. 모 표준편차를 구하라( $\sigma$ ).

d. 모집단 데이터 세트의 각 수치에 대한 z-score를 구하라.

Ex 19. 표본 평균이 10, 표준편차가 3인 x의 z-score 값이 2이다. x를 구하라.

Ex 20. 표본 평균이 10, 표준편차가 3인 x의 z-score 값이 -1이다. x를 구하라.

Ex 21. 모 평균이 2.3, 표준편차가 1.3인 x의 z-score 값이 2이다. x를 구하라.

Ex 22. 모 평균이 2.3, 표준편차가 1.3인 x의 z-score 값이 -1.2이다. x를 구하라.

2. Applications Exercises

Ex 24. The weekly sales for the last 20 weeks in a kitchen appliance store for an electric automatic rice cooker are

20  15  14  14  18  15  19  12  13   9
15  17  16  16  18  19  15  15  16  15

Find the percentile rank of 15.
If the sample accurately reflects the population, then what percentage of weeks would an inventory of 15 rice cookers be adequate?

Ex 25. The table shows the number of vehicles owned in a survey of 52 households.

x   0   1   2   3   4   5   6   7
f   2  12  15  11   6   3   1   2

Find the percentile rank of 2.
If the sample accurately reflects the population, then what percentage of households have at most two vehicles?

Ex 26. For two months Cordelia records her daily commute time to work each day to the nearest minute and obtains the following data:

x  26  27  28  29  30  31  32
f   3   4  16  12   6   2   1

Cordelia is supposed to be at work at 8:00 a.m. but refuses to leave her house before 7:30 a.m.

Find the percentile rank of 30, the time she has to get to work.
Assuming that the sample accurately reflects the population of all of Cordelia’s commute times, use your answer to part (a) to predict the proportion of the work days she is late for work.

Ex 27. The mean score on a standardized grammar exam is 49.6; the standard deviation is 1.35. Dromio is told that the z-score of his exam score is −1.19.

Is Dromio’s score above average or below average?
What was Dromio’s actual score on the exam?

Ex 28. A random sample of 49 invoices for repairs at an automotive body shop is taken. The data are arrayed in the stem and leaf diagram shown. (Stems are thousands of dollars, leaves are hundreds, so that for example the largest observation is 3,800.)

3 |  5 6 8
3 |  0 0 1 1 2 4
2 |  5 6 6 7 7 8 8 9 9
2 |  0 0 0 0 1 2 2 4
1 |  5 5 5 6 6 7 7  7 8 8 9
1 |  0 0 1 3 4 4 4
0 |  5 6 8 8
0 |  4

For these data, $Σx=101,100$ , $Σx^2=244,830,000$ .

Find the z-score of the repair that cost $1,100.
Find the z-score of the repairs that cost $2,700.

Ex 29. The stem and leaf diagram shows the time in seconds that callers to a telephone-order center were on hold before their call was taken.

0 |  0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4
0 |  5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 9 9
1 |  0 0 1 1 1 1 2 2 2 2 4 4
1 |  5 6 6 8 9
2 |  2 4
2 |  5
3 |  0

Find the quartiles.
Give the five-number summary of the data.
Find the range and the IQR.

3. Additional Exercises

Ex 29. stem and leaf diagram

a. quartiles

b. five-number summary

c. range and IQR

Ex 30. stem and leaf diagram

a. percentile rank of 800

b. percentile rank of 3,200

Ex 31. Frequency table -> five-number summary

Ex 32. Frequency table -> five-number summary

Ex 33. stem and leaf diagram

a. three quartiles

b. IQR

c. five-number summary

Ex 34. Determine whether the following statement is true. “In any data set, if an observation x1 is greater than another observation x2, then the z-score of x1 is greater than the z-score of x2.”

Ex 35. z-score : 비교 of the z-score of x1 and the z-score of x2.

Ex 36. z-score

Ex 37. z-score : 비교 of the z-score of x1 and the z-score of x2.

4. Large Data Set Exercises

Ex 38. Large Data Set 1 lists the SAT scores and GPAs of 1,000 students.

Compute the three quartiles and the interquartile range of the 1,000 SAT scores.
Compute the three quartiles and the interquartile range of the 1,000 GPAs.

Ex 39. Large Data Set 10 records the scores of 72 students on a statistics exam.

Compute the five-number summary of the data.
Describe in words the performance of the class on the exam in the light of the result in part (a).

Ex 40. Large Data Sets 3 and 3A list the heights of 174 customers entering a shoe store.

Compute the five-number summary of the heights, without regard to gender.
Compute the five-number summary of the heights of the men in the sample.
Compute the five-number summary of the heights of the women in the sample.

[ Solution ]

# 1.
library(readxl)
data <- read_excel("data3.xls")
str(data)

height <- data[[3]]
quantile(height)

# 2.
data <- read_excel("data3A.xls")
str(data)

data_M <- subset(data, M == 1) 
head(data_M)
height_M <- data_M[[3]] ; height_M
quantile(height_M)

# 3.
data_F <- subset(data, F == 1) 
head(data_F)
height_F <- data_F[[3]] ; height_F
quantile(height_F)

> # 1.
> str(data)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   174 obs. of  6 variables:
 $ ID        : num  1 2 3 4 5 6 7 8 9 10 ...
 $ Shoe Size : num  12.5 11 10.5 10 9 8 10.5 9 5 10 ...
 $ Height(cm): num  186 181 174 170 171 161 181 176 157 174 ...
 $ Gender    : chr  "M" "M" "F" "F" ...
 $ M         : num  1 1 0 0 0 0 1 1 0 0 ...
 $ F         : num  0 0 1 1 1 1 0 0 1 1 ...
> 
> height <- data[[3]]
> quantile(height)
    0%    25%    50%    75%   100% 
154.00 166.25 174.00 178.00 194.00

> # 2.
> data <- read_excel("data3A.xls")
> str(data)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   174 obs. of  6 variables:
 $ ID        : num  1 2 3 4 5 6 7 8 9 10 ...
 $ Shoe Size : num  12.5 11 10.5 10 9 8 10.5 9 5 10 ...
 $ Height(cm): num  186 181 174 170 171 161 181 176 157 174 ...
 $ Gender    : chr  "M" "M" "F" "F" ...
 $ M         : num  1 1 0 0 0 0 1 1 0 0 ...
 $ F         : num  0 0 1 1 1 1 0 0 1 1 ...
> 
> data_M <- subset(data, M == 1) 
> head(data_M)
# A tibble: 6 x 6
     ID `Shoe Size` `Height(cm)` Gender     M     F
  <dbl>       <dbl>        <dbl> <chr>  <dbl> <dbl>
1     1        12.5          186 M          1     0
2     2        11            181 M          1     0
3     7        10.5          181 M          1     0
4     8         9            176 M          1     0
5    13        14            191 M          1     0
6    14        10            174 M          1     0
> height_M <- data_M[[3]] ; height_M
 [1] 186 181 181 176 191 174 166 176 177 176 182 168 187 177 178 185 177 191 173 180 178 185 177 188 178 177 178
[28] 193 173 162 186 174 190 161 176 176 167 176 180 181 163 180 176 190 188 176 179 176 181 179 164 180 179 186
[55] 189 164 169 174 185 164 175 176 194 177 186 170 184 177 187 179 184 188 178 170 173 184 180 178 175 180 179
[82] 176 174 178 176 176 168 169 167 190 183 174 170 179 169 176
> quantile(height_M)
    0%    25%    50%    75%   100% 
161.00 174.00 177.50 182.25 194.00

> # 3.
> data_F <- subset(data, F == 1) 
> head(data_F)
# A tibble: 6 x 6
     ID `Shoe Size` `Height(cm)` Gender     M     F
  <dbl>       <dbl>        <dbl> <chr>  <dbl> <dbl>
1     3        10.5          174 F          0     1
2     4        10            170 F          0     1
3     5         9            171 F          0     1
4     6         8            161 F          0     1
5     9         5            157 F          0     1
6    10        10            174 F          0     1
> height_F <- data_F[[3]] ; height_F
 [1] 174 170 171 161 157 174 170 157 168 165 166 171 164 166 165 172 156 167 168 167 163 158 165 162 167 167 171
[28] 168 174 164 165 166 159 168 168 157 164 165 174 169 172 167 158 167 179 177 176 165 176 174 156 158 166 154
[55] 172 163 159 174 161 164 170 171 167 169 167 172 166 169 174 174 163 161 162 159 167 171 157 160
> quantile(height_F)
  0%  25%  50%  75% 100% 
 154  163  167  171  179

Ex 41. Large Data Sets 7, 7A, and 7B list the survival times in days of 140 laboratory mice with thymic leukemia from onset to death.

Compute the three quartiles and the interquartile range of the survival times for all mice, without regard to gender.
Compute the three quartiles and the interquartile range of the survival times for the 65 male mice (recorded as gender = "M" in Large Data Set 7).
Compute the three quartiles and the interquartile range of the survival times for the 75 female mice (recorded as gender = "F" in Large Data Set 7B).

Previous2-4. Relative Position of Data Next2-5. The Empirical Rule and Chebyshev’s Theorem

Last updated 5 years ago

Was this helpful?