Easily generate ggplot2 graphs for all (or a named vector of) variables in a data frame using a class-appropriate geometry via other elucidate plot_* functions with a restricted set of customization options and some modified defaults. The var2 argument also allows you to plot all variables against a specific named secondary variable. The collection of generated graphs will be combined into a single lattice-style figure with either the patchwork package or trelliscopejs package. See "Arguments" section for details and this blog post for an introduction to ggplot2. To obtain a plot of a single variable or vector, use plot_var instead. To obtain pairwise plots of all bivariate combinations of variables, use plot_var_pairs instead.

plot_var_all(
  data,
  var2 = NULL,
  group_var = NULL,
  cols = NULL,
  var2_lab = ggplot2::waiver(),
  title = ggplot2::waiver(),
  caption = ggplot2::waiver(),
  fill = "blue2",
  colour = "black",
  palette = c("plasma", "C", "magma", "A", "inferno", "B", "viridis", "D", "cividis",
    "E"),
  palette_direction = c("d2l", "l2d"),
  palette_begin = 0,
  palette_end = 0.8,
  alpha = 0.75,
  greyscale = FALSE,
  line_size = 1,
  theme = c("bw", "classic", "grey", "light", "dark", "minimal"),
  text_size = 14,
  font = c("sans", "serif", "mono"),
  legend_position = c("right", "left", "top", "bottom"),
  omit_legend = FALSE,
  dnorm = TRUE,
  violin = TRUE,
  var1_log10 = FALSE,
  var2_log10 = FALSE,
  point_size = 2,
  point_shape = c("circle", "square", "diamond", "triangle up", "triangle down"),
  regression_line = TRUE,
  regression_method = c("gam", "loess", "lm"),
  regression_se = TRUE,
  bar_position = c("dodge", "fill", "stack"),
  bar_width = 0.9,
  basic = FALSE,
  interactive = FALSE,
  trelliscope = FALSE,
  nrow = NULL,
  ncol = NULL,
  guides = "collect"
)

Arguments

data

A data frame containing variables to be plotted.

var2

The (quoted or unquoted) name of a secondary variable to plot against all other variables in the input data (or a subset of them if the cols argument is used), where the latter set of "primary" variables will be automatically assigned to the var1 argument of plot_var. var2 is usually assigned to the x-axis. However, if the primary variable (i.e. var1) is a categorical (factor, character, or logical) variable and var2 is a numeric, integer, or date variable, var2 will be assigned to the y-axis and var1 will be assigned to the x-axis. If var1 and var2 are both categorical variables, var1 will be assigned to the x-axis and var2 will be assigned to facet_var.

group_var

Use if you want to assign a grouping variable to fill (colour) and/or (outline) colour e.g. group_var = "grouping_variable" or group_var = grouping_variable. Whether the grouping variable is mapped to fill, colour, or both will depend upon which plot_* function is used (See "Value" section). For density plots, both fill and colour are used for consistency across the main density plots and added normal density curve lines (if dnorm = TRUE). For bar graphs and box-and-whisker plots, the variable will be assigned to fill. For scatter plots, the variable will be assigned to colour. See aes for details.

cols

A character (or integer) vector of column names (or indices) which allows you to plot only a subset of the columns in the input data frame, where each of these primary variable columns will be automatically assigned to the var1 argument of plot_var. Note that a variable which has been assigned to var2 or group_var does not also need to be listed here.

var2_lab

Accepts a character string to use to change the axis label for the variable assigned to var2. Ignored if var2 and the primary variable are both categorical variables (since var2 will be used for faceting in such cases).

title

A character string to add as a title at the top of the combined multiple-panel patchwork graph or trelliscopejs display.

caption

Add a figure caption to the bottom of the plot using a character string.

fill

Fill colour to use for density plots, bar graphs, and box plots. Ignored if a variable that has been assigned to group_var is mapped on to fill_var (see group_var argument information above). Default is "blue2". Use colour_options to see colour option examples.

colour

Outline colour to use for density plots, bar graphs, box plots, and scatter plots. Ignored if a variable that has been assigned to group_var is mapped on to colour_var (see group_var argument information above). Default is "black". Use colour_options to see colour option examples.

palette

If a variable is assigned to group_var, this determines which viridis colour palette to use. Options include "plasma" or "C" (default), "magma" or "A", "inferno" or "B", "viridis" or "D", and "cividis" or "E". See this link for examples.

palette_direction

Choose "d2l" for dark to light (default) or "l2d" for light to dark.

palette_begin

Value between 0 and 1 that determines where along the full range of the chosen colour palette's spectrum to begin sampling colours. See scale_fill_viridis_d for details.

palette_end

Value between 0 and 1 that determines where along the full range of the chosen colour palette's spectrum to end sampling colours. See scale_fill_viridis_d for details.

alpha

This adjusts the transparency/opacity of the main geometric objects in the generated plot, with acceptable values ranging from 0 = 100% transparent to 1 = 100% opaque.

greyscale

Set to TRUE if you want the plot converted to grey scale.

line_size

Controls the thickness of plotted lines.

theme

Adjusts the theme using 1 of 6 predefined "complete" theme templates provided by ggplot2. Currently supported options are: "classic", "bw" (the elucidate default), "grey" (the ggplot2 default), "light", "dark", & "minimal". See theme_bw for more information.

text_size

This controls the size of all plot text. Default = 14.

font

This controls the font of all plot text. Default = "sans" (Arial). Other options include "serif" (Times New Roman) and "mono" (Courier New).

legend_position

This allows you to modify the legend position if a variable is assigned to group_var. Options include "right" (the default), "left", "top", & "bottom".

omit_legend

Set to TRUE if you want to remove/omit the legend(s). Ignored if group_var is unspecified.

dnorm

When TRUE (default), this adds a dashed line representing a normal/Gaussian density curve to density plots, which are rendered for plots of single numeric variables. Disabled if var1 is a date vector, var1_log10 = TRUE, or basic = TRUE.

violin

When TRUE (default), this adds violin plot outlines to box plots, which are rendered in cases where a mixture of numeric and categorical variables are assigned to var1 and var2. Disabled if basic = TRUE.

var1_log10

If TRUE, applies a base-10 logarithmic transformation to a numeric variable that has been assigned to var1. Ignored if var1 is a categorical variable.

var2_log10

If TRUE, applies a base-10 logarithmic transformation to a numeric variable that has been assigned to var2. Ignored if var2 is a categorical variable.

point_size

Controls the size of points used in scatter plots, which are rendered in cases where var1 and var2 are both numeric, integer, or date variables.

point_shape

Point shape to use in scatter plots, which are rendered in cases where var1 and var2 are both numeric, integer, or date variables.

regression_line

If TRUE (the default), adds a regression line to scatter plots, which are rendered in cases where var1 and var2 are both numeric, integer, or date variables. Disabled if basic = TRUE.

regression_method

If regression_line = TRUE, this determines the type of regression line to use. Currently available options are "gam", "loess", and "lm". "gam" is the default, which fits a generalized additive model using a smoothing term for x. This method has a longer run time, but typically provides a better fit to the data than other options and uses an optimization algorithm to determine the optimal wiggliness of the line. If the relationship between y and x is linear, the output will be equivalent to fitting a linear model. "loess" may be preferable to "gam" for small sample sizes. See stat_smooth and gam for details.

regression_se

If TRUE (the default), adds a 95% confidence envelope for the regression line. Ignored if regression_line = FALSE.

bar_position

In bar plots, which are rendered for one or more categorical variables, this determines how bars are arranged relative to one another when a grouping variable is assigned to group_var. The default, "dodge", uses position_dodge to arrange bars side-by-side; "stack" places the bars on top of each other; "fill" also stacks bars but additionally converts y-axis from counts to proportions.

bar_width

In bar plots, which are rendered for one or more categorical variables, this adjusts the width of the bars (default = 0.9).

basic

This is a shortcut argument that allows you to simultaneously disable the dnorm, violin, and regression_line arguments to produce a basic version of a density, box, or scatter plot (depending on var1/var2 variable class(es)) without any of those additional layers. Dropping these extra layers may noticeably reduce rendering time and memory utilization, especially for larger sample sizes and/or when interactive = TRUE.

interactive

Determines whether a static ggplot object or an interactive html plotly object is returned. Interactive/plotly mode for multiple plots should only be used in conjunction with trelliscope = TRUE. See ggplotly for details. Note that in cases where a box plot is generated (for a mix of numeric and categorical variables) and a variable is also assigned to group_var, activating interactive/plotly mode will cause a spurious warning message about 'layout' objects not having a 'boxmode' attribute to be printed to the console. This is a documented bug with plotly that can be safely ignored, although unfortunately the message cannot currently be suppressed.

trelliscope

If changed to TRUE, plots will be combined into an interactive trelliscope display rather than a static patchwork graph grid. See trelliscope for more information.

nrow

This controls the number of rows to use when arranging plots in the combined patchwork or trelliscopejs display.

ncol

This controls the number of columns to use when arranging plots in the combined patchwork or trelliscopejs display.

guides

Controls the pooling of group_var legends/guides across plot panels if a categorical variable has been assigned to group_var and trelliscope = FALSE. See wrap_plots for details.

Value

A static "patchwork" or dynamic "trelliscope" multi-panel graphical display of ggplot2 or plotly graphs depending upon the values of the trelliscope and interactive arguments. The type of graph (i.e. ggplot2::geom* layers) that is rendered in each panel will depend upon the classes of the chosen variables, as follows:

  • One numeric (classes numeric/integer/date) variable will be graphed with plot_density.

  • One or two categorical (classes factor/character/logical) variable(s) will be graphed with plot_bar.

  • Two numeric variables will be graphed with plot_scatter.

  • A mixture of numeric and categorical variables will be graphed with plot_box.

References

Wickham, H. (2016). ggplot2: elegant graphics for data analysis. New York, N.Y.: Springer-Verlag.

Author

Craig P. Hutton, Craig.Hutton@gov.bc.ca

Examples


data(mtcars) #load the mtcars data

#convert variables "cyl" to a factors
mtcars$cyl <- as.factor(mtcars$cyl)

#plot variables "hp", "wt", and "cyl" from the mtcars data frame
plot_var_all(mtcars, cols = c("hp", "wt", "cyl"))


#plot each of the same variables against column "mpg"
plot_var_all(mtcars, var2 = mpg, cols = c("hp", "wt", "cyl"))


#plot "hp" and "wt" against mpg, group by "cyl"
plot_var_all(mtcars, var2 = mpg, group_var = cyl, cols = c("hp", "wt"),
             basic = TRUE, #distable regression lines/CIs
             ncol = 1, nrow = 2) #change the layout