# R for Statistics and Data Visualization

# 1. Lecture Time and Location

- Time: 08:00 - 11:00, Each Friday, Start from Sep 21.
- Location: 206, Teaching Building 2.

# 2. Prerequisites

Students of this course should be familiar with the basic concepts used in statistics, such as *Mean*, *Standard Deviation*, *Normal Distribution*, *t-statistic*, *ANOVA*, *F-ratio*, *p-value*, *Hypothesis Testing* etc. To acheive this, Students should have already finished some introductory courses in statistics, such as Statistics for the Behavioral Sciences, or other courses at the same level. Students can also learn these basic concepts by themselves.

# 3. Course Information

R is a language and environment for **statistical computing** and **graphics**. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories by John Chambers and colleagues. R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible. One of R’s strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed.

Linear models, their variants, and extensions are among the most useful and widely used statistical tools for social research. This course aims to provide an accessible, in-depth, modern treatment of *regression analysis*, *linear models*, *generalized linear models*, and closely related methods.

# 4. Syllabus and Lecture Notes

- Part 0: Introduction
- Part I: Data craft
- Part II: Linear models and least squares
- Part III: Linear-model diagnostics
- Unusual and influential data

- Non-Normality, Nonconstant Error Variance, and Nonlinearity

- Collinearity and Its Purported Remedies

- Part IV: Generalized linear models
- Logit and Probit Models for Categorical Response Variables

- Generalized Linear Models

- Part V. Extending Linear and Generalized Linear Models
- Time-Series Regression and Generalized Least Squares

- Nonlinear Regression

- Nonparametric Regression

- Bootstrapping Regression Models

- Part VI: Mixed-Effects Models

# 5. References

- R manuals
- R language in General
- Becker, R. A., Chambers, J. M., & Wilks, A. R. (1988).
*The New S Language: A Programming Environment for Data Analysis and Statistics*. Pacific Grove , CA : Wadsworth. (The**Blue**book) - Chambers, J. M. (1998).
*Programming with data: A guide to the S language*. New York, NY: Springer. (The**Green**book) - Chambers, J. M. (2016).
*Extending R*. Boca Baton, UK: CRC Press. - Wickham, H. (2015).
*Advanced R*. Boca Baton, UK: CRC Press. Full text - Wickham, H. (In Progress).
*Advanced R*(2 ed.). Boca Baton, UK: CRC Press. Full text - Wickham, H. (2016).
*R Packages: Organize, Test, Document, and Share Your Code*. Sebastopol, CA: O’Reilly Media. - Wickham, H., & Grolemund, G. (2016).
*R for Data Science: Import, Tidy, Transform, Visualize, and Model Data*. Sebastopol, CA: O’Reilly Media. Full text

- Becker, R. A., Chambers, J. M., & Wilks, A. R. (1988).
- R graphics
- Murrell, P. (2011).
*R Graphics*(2 ed.). Boca Raton, FL: CRC Press. - Kassambara, A. (2015).
*Complete Guide to 3D Plots in R*: STHDA. - Sarkar, D. (2008).
*Lattice: Multivariate Data Visualization with R*. New York, NY: Springer. - Wickham, H. (2016).
*ggplot2: Elegant Graphics for Data Analysis*(2 ed.). New York, NY: Springer.

- Murrell, P. (2011).
- Statistical models
- Chambers, J. M., & Hastie, T. J. (Eds.). (1993).
*Statistical Models in S*. London, UK: Champman & Hall. (The**White**book) **Fox, J. (2016).***Applied Regression Analysis and Generalized Linear Models*(3 ed.). Thousand Oaks, CA: SAGE.- Hastie, T. J., & Tibshirani, R. J. (1990).
*Generalized additive models*. Landon, UK: Chapman and Hall. - Venables, W. N., & Ripley, B. D. (2002).
*Modern Applied Statistics with S*. Springer. - Wood, S. N. (2006).
*Generalized Additive Models: an introduction with R*: CRC Press. - Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting Linear Mixed-Effects Models Using lme4.
*Journal of Statistical Software, 67*(1). doi:10.18637/jss.v067.i01

- Chambers, J. M., & Hastie, T. J. (Eds.). (1993).
- Probability and statistics
- Fieller, N. (2016).
*Basics of matrix algebra for statistics with R*. Boca Raton, FL: CRC Press. - Poole, D. (2015).
*Linear Algebra: A Modern Introduction*(4 ed.). Stamford, CT: Gengage Learning. - Ugarte, M. D., Militino, A. F., & Arnholt, A. T. (2016).
*Probability and statistics with R*(2 ed.). Boca Raton, UK: CRC Press. - Stewart, J. (2016).
*Calculus: Early transcendentals*(8 ed.). Boston, MA: Cengage Learning.

- Fieller, N. (2016).
- Dynamic documenting
- Xie, Y. (2014).
*Dynamic Documents with R and knitr*(2 ed.). Boca Raton, FL: CRC Press. - Xie, Y., Allaire, J. J., & Grolemund, G. (2018).
*R Markdown: The Definitive Guide*. Boca Raton, FL: CRC Press. Full text

- Xie, Y. (2014).

# 6. Final Examination

To sucessfully complete the examination, your computer should have already installed the

*R*software and at least two extra R packages:*rmarkdown*and*car*.Download the file from the following link:

*Exam_Sample.rmd*. The link is avaliable from 2018-01-11, 0900 to 2018-01-15, 0900.Rename the

`Rmd`

file into the following formatin Pinyin, such as`SurnameGivenname_Student number.Rmd`

.`ZhangSan_20170708.Rmd`

Open the renamed

file with`.Rmd`

and change the`R`

region in the front matter of the`Name-Number`

file to your own name in Chinese characters and your student number, such as`Rmd`

.`张三 - 20170708`

```
---
title: "Exam Sample"
author: "张三-20181030" # <- Your name and student number
lastmod: "2018-12-14"
date: '2018-10-30'
output:
html_document
--- # <- You can write any text after this
```

- A Rmd document can include both normal texts and legial R code. The
*R*code should be enclosed by````{r} XXX ````

, i.e., the XXX region; and the normal texts that do not include the R code should be out of those regions. For example,

```
a. You can add any text here.
```{r}
# You can add comments here
str(Titanic) # <- This should be eligible R code.
```
b. You can also add any text here.
```

- After finishing all your answers, use R to render the Rmd file into a html file with the function:

```
install.packages("rmarkdown", dependencies = TRUE)
rmarkdown::render(
"directory/to/your/file/ZhangSan_20170708.rmd",
output_format = "html_document")
```

Send both the filled

file and the knitted`.Rmd`

file to the following email address:`.html`

`zhanlikan@blcu.edu.cn`

before**2018-01-15, 0900**.Question

**one**is mandatory. The remaining 5 questions are optional, you can choose any and only**three**questions from questions 2- 6 to answer.