# R for Statistics and Data Visualization

# 1. Lecture Time and Location

- Time: 08:00 - 11:00, Each Friday, Start from Sep 21.
- Location: 206, Teaching Building 2.

# 2. Prerequisites

Students of this course should be familiar with the basic concepts used in statistics, such as *Mean*, *Standard Deviation*, *Normal Distribution*, *t-statistic*, *ANOVA*, *F-ratio*, *p-value*, *Hypothesis Testing* etc. To acheive this, Students should have already finished some introductory courses in statistics, such as Statistics for the Behavioral Sciences, or other courses at the same level. Students can also learn these basic concepts by themselves.

# 3. Course Information

R is a language and environment for **statistical computing** and **graphics**. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories by John Chambers and colleagues. R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible. One of R’s strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed.

Linear models, their variants, and extensions are among the most useful and widely used statistical tools for social research. This course aims to provide an accessible, in-depth, modern treatment of *regression analysis*, *linear models*, *generalized linear models*, and closely related methods.

# 4. Syllabus and Lecture Notes

- Part 0: Introduction
- Part I: Data craft
- Part II: Linear models and least squares
- Linear least-squares regression

- Statistical Inference for Regression

- Dummy-variable regression

- Analysis of variance

- Part III: Linear-model diagnostics
- Unusual and influential data

- Non-Normality, Nonconstant Error Variance, and Nonlinearity

- Collinearity and Its Purported Remedies

- Part IV: Generalized linear models
- Logit and Probit Models for Categorical Response Variables

- Generalized Linear Models

- Part V. Extending Linear and Generalized Linear Models
- Time-Series Regression and Generalized Least Squares

- Nonlinear Regression

- Nonparametric Regression

- Bootstrapping Regression Models

- Part VI: Mixed-Effects Models
- Linear Mixed-Effects Models

- Generalized Linear and Nonlinear Mixed-Effects Models

# 5. References

- R manuals
- R language in General
- Becker, R. A., Chambers, J. M., & Wilks, A. R. (1988).
*The New S Language: A Programming Environment for Data Analysis and Statistics*. Pacific Grove , CA : Wadsworth. (The**Blue**book) - Chambers, J. M. (1998).
*Programming with data: A guide to the S language*. New York, NY: Springer. (The**Green**book) - Chambers, J. M. (2016).
*Extending R*. Boca Baton, UK: CRC Press. - Wickham, H. (2015).
*Advanced R*. Boca Baton, UK: CRC Press. Full text - Wickham, H. (In Progress).
*Advanced R*(2 ed.). Boca Baton, UK: CRC Press. Full text - Wickham, H. (2016).
*R Packages: Organize, Test, Document, and Share Your Code*. Sebastopol, CA: O’Reilly Media. - Wickham, H., & Grolemund, G. (2016).
*R for Data Science: Import, Tidy, Transform, Visualize, and Model Data*. Sebastopol, CA: O’Reilly Media. Full text

- Becker, R. A., Chambers, J. M., & Wilks, A. R. (1988).
- R graphics
- Murrell, P. (2011).
*R Graphics*(2 ed.). Boca Raton, FL: CRC Press. - Kassambara, A. (2015).
*Complete Guide to 3D Plots in R*: STHDA. - Sarkar, D. (2008).
*Lattice: Multivariate Data Visualization with R*. New York, NY: Springer. - Wickham, H. (2016).
*ggplot2: Elegant Graphics for Data Analysis*(2 ed.). New York, NY: Springer.

- Murrell, P. (2011).
- Statistical models
- Chambers, J. M., & Hastie, T. J. (Eds.). (1993).
*Statistical Models in S*. London, UK: Champman & Hall. (The**White**book) **Fox, J. (2016).***Applied Regression Analysis and Generalized Linear Models*(3 ed.). Thousand Oaks, CA: SAGE.- Hastie, T. J., & Tibshirani, R. J. (1990).
*Generalized additive models*. Landon, UK: Chapman and Hall. - Venables, W. N., & Ripley, B. D. (2002).
*Modern Applied Statistics with S*. Springer. - Wood, S. N. (2006).
*Generalized Additive Models: an introduction with R*: CRC Press. - Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting Linear Mixed-Effects Models Using lme4.
*Journal of Statistical Software, 67*(1). doi:10.18637/jss.v067.i01

- Chambers, J. M., & Hastie, T. J. (Eds.). (1993).
- Probability and statistics
- Fieller, N. (2016).
*Basics of matrix algebra for statistics with R*. Boca Raton, FL: CRC Press. - Poole, D. (2015).
*Linear Algebra: A Modern Introduction*(4 ed.). Stamford, CT: Gengage Learning. - Ugarte, M. D., Militino, A. F., & Arnholt, A. T. (2016).
*Probability and statistics with R*(2 ed.). Boca Raton, UK: CRC Press. - Stewart, J. (2016).
*Calculus: Early transcendentals*(8 ed.). Boston, MA: Cengage Learning.

- Fieller, N. (2016).
- Dynamic documenting
- Xie, Y. (2014).
*Dynamic Documents with R and knitr*(2 ed.). Boca Raton, FL: CRC Press. - Xie, Y., Allaire, J. J., & Grolemund, G. (2018).
*R Markdown: The Definitive Guide*. Boca Raton, FL: CRC Press. Full text

- Xie, Y. (2014).

# 6. Tools

# 7. Final Examination

To sucessfully complete the examination, your computer should have already installed the

*R*software and the*R studio*software. One extra package, i.e.,*car*should have also been installed before you attend this examination.The exanimation paper and the data set used in the final examination are avaliable from 2018-01-11, 0900 to 2018-01-15, 0900.

Download the two files from the following links:

*TBA*file and the data set file*TBA*.Rename the

`Rmd`

file into the following formatin Pinyin, such as`SurnameGivenname_Student number.Rmd`

.`ZhangSan_20170708.Rmd`

Open the renamed

file with`.Rmd`

.`RStudio`

Change the

region in the front matter of the`Name-Number`

file to your own name in Chinese characters and your student number, such as`Rmd`

.`张三 - 20170708`

Write your

*R*code in the region enclosed by*···{r} XXX ···*, i.e. the XXX area. Write your answers that don’t include R code out of the region enclosed by*···{r} XXX ···*. For more information concerning on the syntax of*Markdown*and*Rmarkdown*, please download the Markdown cheatsheet and the R markdown cheatsheet.Question

**one**is mandatory. The remaining 5 questions are optional, you can choose any and only**three**questions from questions 2- 6 to answer.After finishing all your answers, click the

button in the`knit to HTML`

drop-down menu of`knit`

.`RStudio`

Send both the filled

file and the knitted`.Rmd`

file to the following email address:`.html`

`zhanlikan@blcu.edu.cn`

before**2018-01-15, 0900**.