Tutorial 3: Descriptive Statistics and Data Visualization.
About the Data
- Data on season, day of the week, temperature, humidity, wind speed, weather, number of users, etc.
Exercise 1: summary statistics.
Solve:
- 1.1 Average number of total users
[1] 189.4631
- 1.2 Average temperature (in F)
[1] 58.77751
- 1.3 median Humidity
[1] 63
- 1.4 Variance of the Number of Registered Users
[1] 22909.03
- 1.5 Standard Deviation of the Number of Casual Users
[1] 49.30503
Exercise 2: Vizualization the Data Distribution
Instructions: - to include title: main - to change the color of lines: col - to change the label of y and x axes: ylab and xlab - to change the limit of y and x axes: ylim and xlim
- 2.1 Density plot of Humidity
-
2.2 Density Plot of the Temperature including the mean and the median
-
2.3 Histogram of Wind Speed
-
2.4 Boxplot of Casual Users Including a line with the mean
Exercise 3: Normal Distribution
-
3.1 Generate a normal distribution with 1000 observations, mean = 20 and sd = 3 and then plot its density plot (Use the code set.seed(320) before)
-
3.2 Present the Histogram of this normal distribution
-
3.3 Present the Boxplot of this normal distribution
Excersise 4: Covariance and Correlation
- 4.1 Covariance between temperature and Total Number of Users
[1] 1220.347
- 4.2 Correlation between “Feels Like” Temperature and Total Number of Users
[1] 0.4009377
- 4.3 Correlation Matrix of bike data
Season Hour Holiday Day.of.the.Week
Season 1.000000000 0.004931139 0.055947939 -0.003163450
Hour 0.004931139 1.000000000 0.000479136 -0.003497739
Holiday 0.055947939 0.000479136 1.000000000 -0.102087791
Day.of.the.Week -0.003163450 -0.003497739 -0.102087791 1.000000000
Working.Day -0.036158734 0.002284998 -0.252471370 0.035955071
Weather.Type 0.040452288 -0.020202528 -0.017036113 0.003310740
Temperature.F -0.470806327 0.137625946 -0.027356343 -0.001805613
Temperature.Feels.F -0.469271254 0.133758276 -0.030974740 -0.008817003
Humidity 0.014750149 -0.276497828 -0.010588465 -0.037158268
Wind.Speed -0.038741686 0.137253208 0.003984692 0.011504125
Casual.Users -0.227260165 0.301201730 0.031563628 0.032721415
Registered.Users -0.099585576 0.374140710 -0.047345424 0.021577888
Total.Users -0.144872483 0.394071498 -0.030927303 0.026899860
Working.Day Weather.Type Temperature.F Temperature.Feels.F
Season -0.036158734 0.04045229 -0.470806327 -0.469271254
Hour 0.002284998 -0.02020253 0.137625946 0.133758276
Holiday -0.252471370 -0.01703611 -0.027356343 -0.030974740
Day.of.the.Week 0.035955071 0.00331074 -0.001805613 -0.008817003
Working.Day 1.000000000 0.04467222 0.055396228 0.054665178
Weather.Type 0.044672224 1.00000000 -0.102600649 -0.105570718
Temperature.F 0.055396228 -0.10260065 1.000000000 0.987677449
Temperature.Feels.F 0.054665178 -0.10557072 0.987677449 1.000000000
Humidity 0.015687512 0.41813033 -0.069889709 -0.051935510
Wind.Speed -0.011831470 0.02622604 -0.023115427 -0.062325722
Casual.Users -0.300942486 -0.15262788 0.459626269 0.454088895
Registered.Users 0.134325791 -0.12096552 0.335373166 0.332565807
Total.Users 0.030284368 -0.14242614 0.404785441 0.400937689
Humidity Wind.Speed Casual.Users Registered.Users
Season 0.01475015 -0.038741686 -0.22726017 -0.09958558
Hour -0.27649783 0.137253208 0.30120173 0.37414071
Holiday -0.01058846 0.003984692 0.03156363 -0.04734542
Day.of.the.Week -0.03715827 0.011504125 0.03272142 0.02157789
Working.Day 0.01568751 -0.011831470 -0.30094249 0.13432579
Weather.Type 0.41813033 0.026226043 -0.15262788 -0.12096552
Temperature.F -0.06988971 -0.023115427 0.45962627 0.33537317
Temperature.Feels.F -0.05193551 -0.062325722 0.45408890 0.33256581
Humidity 1.00000000 -0.290108894 -0.34702809 -0.27393312
Wind.Speed -0.29010889 1.000000000 0.09029235 0.08232535
Casual.Users -0.34702809 0.090292353 1.00000000 0.50661770
Registered.Users -0.27393312 0.082325350 0.50661770 1.00000000
Total.Users -0.32291074 0.093239057 0.69456408 0.97215073
Total.Users
Season -0.14487248
Hour 0.39407150
Holiday -0.03092730
Day.of.the.Week 0.02689986
Working.Day 0.03028437
Weather.Type -0.14242614
Temperature.F 0.40478544
Temperature.Feels.F 0.40093769
Humidity -0.32291074
Wind.Speed 0.09323906
Casual.Users 0.69456408
Registered.Users 0.97215073
Total.Users 1.00000000
Excersise 5: Visualizing Two Variables
-
5.1 Plot Temperature in the x-axis and Total Number of Users in y-axis
-
5.2 Plot “Feels Like” Temperature in the x-axis and Number of Casual Users in y-axis, Include a line that indicates the correlation between these two variables (i.e., use the correlation as the slope of this line)
Exercise 6: Applied Questions.
- 6.1 Get the correlation between Temperature and “Feels Like” Temperature and interpret its value
[1] 0.9876774
- 6.2 Density Plot of Total Users including the mean and the median. Does this variable have a skew? If so, is it left of right skewed? Is it possible to know this just by looking at the mean and median? How?
- 6.3 Generate a normal with 10 observations, mean = 0 and sd = 1 (use set.seed = 99). Does this distribution looks like a normal? What would you have to do to make it look more like a normal distribution?
- 6.4 Provide the boxplot of Registered Users, What can you infer from this boxplot?