12 Applications of R Markdown

Now that we’ve learned the basics of R and R Markdown, we’re going to tie some things together by doing a regression on the Gapminder data and displaying our results in a nicely formatted document.

Working in a new R Markdown document (File -> New File -> R Markdown Document), we can load the libraries that we are going to use, and load the data in a code chunk:

```{r load}
library(readr)
library(dplyr)
library(ggplot2)
gapminder 

For now, we are just going to use the data for Canada, so let’s insert a code chunk to filter our data to just Canada:

```{r filter-can}
canada 

We want to do a regression investigating the relationship between life expectancy and per-capita GDP in Canada.

We can start by inserting another code chunk to do a quick visualization of the data. We can insert the regression line on the points, but this doesn’t give us the regression coefficients, significance, etc. that we need to really evaluate the relationship.

```{r plot-1}
ggplot(canada, aes(x = gdpPercap, y = lifeExp)) + 
  geom_point() + 
  geom_smooth(method = "lm")
```

We use the lm() function to perform a General Linear Model (GLM), of which a regression is one form. It is specified by a new syntax, a “formula” syntax:

lm(response ~ predictor(s), data = data, ...)

So in our case, we can put in a new code chunk to fit the model:

```{r model}
canada_lm 

The summary() function gives us the high level information from our regression model: The coefficient estimates, the R-squared, and the p-value.

To this point we have been using ggplot2 for plotting, but there is also a built-in plot() function. If we use that function on a model, it gives us some basic diagnostic plots:

```{r diagnostics}
plot(canada_lm)
```

But these lm objects are ungainly - they are fine to look at to inspect the important parts of our model, but difficult to extract those pieces that we want to use later - such as regression coefficients and P values.

There is another package we can use, called broom, that is used to tidy model results into a format we are familiar with: a data.frame. broom works with many different types of models, today we are going to use it on our lm.

You will first need to install the package:

Then add a chunk that loads the package, and calls the most commonly-used functions in broom:

```{r broom}
library(broom)
model_summary 
  • glance gives us the model-level summary. The r-squared and adjusted r-squared, the F statistic, the P value, etc.
  • tidy gives us the information for each term in the regression - in our case just the intercept and the estimated effect of per-capita GDP on life expectancy.
  • augment gives us the estimated values of the response variable as predicted by our model.

Now that these are in a data frame, we can use them more easily. Let’s plot our regression line onto our plot:

```{r plot-lm}
ggplot() + 
  geom_point(data = canada, aes(x = gdpPercap, y = lifeExp)) + 
  geom_line(data = model_predict, aes(x = gdpPercap, y = .fitted))
```

Again, using the data frame containing our predicted values, we can calculate the 95% confidence limits, and plot them as well:

```{r plot-ci, echo=FALSE}
model_predict % 
  mutate(lower_ci = .fitted - 1.96 * .se.fit, 
         upper_ci = .fitted + 1.96 * .se.fit)

ggplot() + 
  geom_ribbon(data = model_predict, aes(x = gdpPercap, ymin = lower_ci, ymax = upper_ci), alpha = 0.2) + 
  geom_point(data = canada, aes(x = gdpPercap, y = lifeExp)) + 
  geom_line(data = model_predict, aes(x = gdpPercap, y = .fitted))
```

If we compare this plot to the auto-generated line we added at the beginning of the lesson (using geom_smooth()), we see they are the same.

We can also pull out some of those model terms and report on them “reproducibly”:

In our plot:

```{r model-stats, echo=FALSE}
r2  0.05")

ggplot() + 
  geom_ribbon(data = model_predict, aes(x = gdpPercap, ymin = lower_ci, ymax = upper_ci), alpha = 0.2) + 
  geom_point(data = canada, aes(x = gdpPercap, y = lifeExp)) + 
  geom_line(data = model_predict, aes(x = gdpPercap, y = .fitted)) + 
  geom_text(aes(x = 72, y = 30000), label = paste0("R^2 = ", r2, "; ", sig_0.05))
```

And in text:

```{r lifeExp-stats, echo=FALSE}
life_exp 

Then in our document we can type:

The predicted effect of per-capita GDP on life expectancy is `r life_exp_estimate` 
(SE: `r life_exp_se`), meaning that for every dollar increase in per-capita GDP
we live longer by approximately `r life_exp_esimate` years.

This helps reduce typos, copy-paste errors, and ensures that if our data change, so do our results.

Challenge 1

Create a copy of your report for Canada, and do it for the country of your choice.

Solution to Challenge 1

All you really need to do is change
"Canada"
in your
filter

statement to another country of your choosing.

Challenge 2

Remove or add chunks, and change the chunk options to only show the figures and results so the rendered report looks like a report you would submit.

Solution to Challenge 2

Possible chunk options to change:
results='hide'
echo=FALSE
message=FALSE
include=FALSE