Easily generate a ggplot2 graph using a class-appropriate geometry for the chosen primary (required) and secondary (optional) variable(s) from the same data frame source using other elucidate plot_* functions with a restricted set of customization options and some modified defaults. See "Arguments" section for details and this blog post for an introduction to ggplot2. To obtain plots of all variables/columns of a data frame, use plot_var_all or plot_var_pairs instead.

plot_var(
  data,
  var1 = NULL,
  var2 = NULL,
  group_var = NULL,
  var1_lab = ggplot2::waiver(),
  var2_lab = ggplot2::waiver(),
  title = ggplot2::waiver(),
  caption = ggplot2::waiver(),
  fill = "blue2",
  colour = "black",
  palette = c("plasma", "C", "magma", "A", "inferno", "B", "viridis", "D", "cividis",
    "E"),
  palette_direction = c("d2l", "l2d"),
  palette_begin = 0,
  palette_end = 0.8,
  alpha = 0.75,
  greyscale = FALSE,
  line_size = 1,
  theme = c("bw", "classic", "grey", "light", "dark", "minimal"),
  text_size = 14,
  font = c("sans", "serif", "mono"),
  legend_position = c("right", "left", "top", "bottom"),
  omit_legend = FALSE,
  dnorm = TRUE,
  violin = TRUE,
  var1_log10 = FALSE,
  var2_log10 = FALSE,
  point_size = 2,
  point_shape = c("circle", "square", "diamond", "triangle up", "triangle down"),
  regression_line = TRUE,
  regression_method = c("gam", "loess", "lm"),
  regression_se = TRUE,
  bar_position = c("dodge", "fill", "stack"),
  bar_width = 0.9,
  basic = FALSE,
  interactive = FALSE,
  verbose = FALSE
)

Arguments

data

Either a data frame containing variables to be plotted or a vector to be plotted (unlike most other plot_* functions, which always require a data frame). Note that if a vector is supplied here, arguments var1, var2, and/or group_var cannot also be used.

var1

The name of the primary variable you want plot(s) for (quoted or unquoted), e.g. var1 = "variable" or var1 = variable. If a data frame is supplied to the data argument, then var1 must also be specified. var1 will be assigned to the x-axis if var2 is not specified. If var2 is also specified, var1 will be assigned to the y-axis if it is a numeric, integer, or date variable or to the x-axis if it is a factor, character, or logical variable.

var2

The name of a secondary variable to plot against the primary variable (quoted or unquoted), e.g. var2 = "variable" or var2 = variable. var2 is usually assigned to the x-axis. However, if var1 is a categorical (factor, character, or logical) variable and var2 is a numeric, integer, or date variable, var2 will be assigned to the y-axis and var1 will be assigned to the x-axis. If var1 and var2 are both categorical variables, var1 will be assigned to the x-axis and var2 will be assigned to facet_var.

group_var

Use if you want to assign a grouping variable to fill (colour) and/or (outline) colour e.g. group_var = "grouping_variable" or group_var = grouping_variable. Whether the grouping variable is mapped to fill, colour, or both will depend upon which plot_* function is used (See "Value" section). For density plots, both fill and colour are used for consistency across the main density plots and added normal density curve lines (if dnorm = TRUE). For bar graphs and box-and-whisker plots, the variable will be assigned to fill. For scatter plots, the variable will be assigned to colour. See aes for details.

var1_lab

Accepts a character string to use to change the axis label for the variable assigned to var1.

var2_lab

Accepts a character string to use to change the axis label for the variable assigned to var2. Ignored if var1 and var2 are both categorical variables (since var2 will be used for faceting in such cases).

title

A character string to add as a title at the top of the graph.

caption

Add a figure caption to the bottom of the plot using a character string.

fill

Fill colour to use for density plots, bar graphs, and box plots. Ignored if a variable that has been assigned to group_var is mapped on to fill_var (see group_var argument information above). Default is "blue2". Use colour_options to see colour option examples.

colour

Outline colour to use for density plots, bar graphs, box plots, and scatter plots. Ignored if a variable that has been assigned to group_var is mapped on to colour_var (see group_var argument information above). Default is "black". Use colour_options to see colour option examples.

palette

If a variable is assigned to group_var, this determines which viridis colour palette to use. Options include "plasma" or "C" (default), "magma" or "A", "inferno" or "B", "viridis" or "D", and "cividis" or "E". See this link for examples.

palette_direction

Choose "d2l" for dark to light (default) or "l2d" for light to dark.

palette_begin

Value between 0 and 1 that determines where along the full range of the chosen colour palette's spectrum to begin sampling colours. See scale_fill_viridis_d for details.

palette_end

Value between 0 and 1 that determines where along the full range of the chosen colour palette's spectrum to end sampling colours. See scale_fill_viridis_d for details.

alpha

This adjusts the transparency/opacity of the main geometric objects in the generated plot, with acceptable values ranging from 0 = 100% transparent to 1 = 100% opaque.

greyscale

Set to TRUE if you want the plot converted to grey scale.

line_size

Controls the thickness of plotted lines.

theme

Adjusts the theme using 1 of 6 predefined "complete" theme templates provided by ggplot2. Currently supported options are: "classic", "bw" (the elucidate default), "grey" (the ggplot2 default), "light", "dark", & "minimal". See theme_bw for more information.

text_size

This controls the size of all plot text. Default = 14.

font

This controls the font of all plot text. Default = "sans" (Arial). Other options include "serif" (Times New Roman) and "mono" (Courier New).

legend_position

This allows you to modify the legend position if a variable is assigned to group_var. Options include "right" (the default), "left", "top", & "bottom".

omit_legend

Set to TRUE if you want to remove/omit the legend(s). Ignored if group_var is unspecified.

dnorm

When TRUE (default), this adds a dashed line representing a normal/Gaussian density curve to density plots, which are rendered for plots of single numeric variables. Disabled if var1 is a date vector, var1_log10 = TRUE, or basic = TRUE.

violin

When TRUE (default), this adds violin plot outlines to box plots, which are rendered in cases where a mixture of numeric and categorical variables are assigned to var1 and var2. Disabled if basic = TRUE.

var1_log10

If TRUE, applies a base-10 logarithmic transformation to a numeric variable that has been assigned to var1. Ignored if var1 is a categorical variable.

var2_log10

If TRUE, applies a base-10 logarithmic transformation to a numeric variable that has been assigned to var2. Ignored if var2 is a categorical variable.

point_size

Controls the size of points used in scatter plots, which are rendered in cases where var1 and var2 are both numeric, integer, or date variables.

point_shape

Point shape to use in scatter plots, which are rendered in cases where var1 and var2 are both numeric, integer, or date variables.

regression_line

If TRUE (the default), adds a regression line to scatter plots, which are rendered in cases where var1 and var2 are both numeric, integer, or date variables. Disabled if basic = TRUE.

regression_method

If regression_line = TRUE, this determines the type of regression line to use. Currently available options are "gam", "loess", and "lm". "gam" is the default, which fits a generalized additive model using a smoothing term for x. This method has a longer run time, but typically provides a better fit to the data than other options and uses an optimization algorithm to determine the optimal wiggliness of the line. If the relationship between y and x is linear, the output will be equivalent to fitting a linear model. "loess" may be preferable to "gam" for small sample sizes. See stat_smooth and gam for details.

regression_se

If TRUE (the default), adds a 95% confidence envelope for the regression line. Ignored if regression_line = FALSE.

bar_position

In bar plots, which are rendered for one or more categorical variables, this determines how bars are arranged relative to one another when a grouping variable is assigned to group_var. The default, "dodge", uses position_dodge to arrange bars side-by-side; "stack" places the bars on top of each other; "fill" also stacks bars but additionally converts y-axis from counts to proportions.

bar_width

In bar plots, which are rendered for one or more categorical variables, this adjusts the width of the bars (default = 0.9).

basic

This is a shortcut argument that allows you to simultaneously disable the dnorm, violin, and regression_line arguments to produce a basic version of a density, box, or scatter plot (depending on var1/var2 variable class(es)) without any of those additional layers. Dropping these extra layers may noticeably reduce rendering time and memory utilization, especially for larger sample sizes and/or when interactive = TRUE.

interactive

Determines whether a static ggplot object or an interactive html plotly object is returned. See ggplotly for details. Note that in cases where a box plot is generated (for a mix of numeric and categorical variables) and a variable is also assigned to group_var, activating interactive/plotly mode will cause a spurious warning message about 'layout' objects not having a 'boxmode' attribute to be printed to the console. This is a documented bug with plotly that can be safely ignored, although unfortunately the message cannot currently be suppressed.

verbose

If TRUE, this causes a message to be printed to the console informing you of the classes detected for variables assigned to any of var1, var2, and/or group_var as well as which arguments those variables are passed to in the underlying geom-specific elucidate plot_* function that is used to render the plot.

Value

A ggplot or plotly graph depending on whether static or interactive output was requested. The type of graph (i.e. ggplot2::geom* layers) that is rendered will depend upon the classes of the chosen variables, as follows:

  • One numeric (classes numeric/integer/date) variable will be graphed with plot_density.

  • One or two categorical (classes factor/character/logical) variable(s) will be graphed with plot_bar.

  • Two numeric variables will be graphed with plot_scatter.

  • A mixture of numeric and categorical variables will be graphed with plot_box.

References

Wickham, H. (2016). ggplot2: elegant graphics for data analysis. New York, N.Y.: Springer-Verlag.

Author

Craig P. Hutton, Craig.Hutton@gov.bc.ca

Examples


data(mtcars) #load the mtcars data

#convert variables "cyl" and "am" to a factors
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$am <- as.factor(mtcars$am)

# density plot of a single numeric variable on the x-axis
# with normal density curve added as dashed line
#
# normal density curves can be disabled via `dnorm` = FALSE or `basic` = TRUE

plot_var(data = mtcars, var1 = mpg)


# density plot with a primary numeric variable on the x-axis
# split by a categorical grouping variable assigned to the `fill_var`
# argument of plot_density() & normal density curves distabled

plot_var(mtcars, mpg, group_var = cyl, dnorm = FALSE)


plot_var(mtcars, cyl) #bar plot of a single categorical variable on x-axis


# bar plot with a primary categorical variable on the x-axis and a secondary
# categorical variable used for faceting.

plot_var(mtcars, var1 = cyl, var2 = am)


# box plot with added violin plots for a mix of numeric and categorical variables
# and verbose mode enabled to print variable assignment information to the console
#
# the violin plots can be disabled via `violin` = FALSE or `basic` = TRUE

plot_var(mtcars, mpg, cyl, verbose = TRUE)
#> `var1` class = "numeric" and `var2` class = "factor".
#> Building graph with `plot_box()`.
#> Assigning `var1` to `y` and `var2` to `x`.
#> Use `plot_box()` instead to access additional customization options.
#> See help("plot_box") for details.


# scatter plot with added generalized additive model regression line and
# 95% confidence envelope for two numeric variables
#
# N.B. the regression line and CI can be disabled via `regression_line` = FALSE or
# `basic`= TRUE

plot_var(mtcars, mpg, hp)