plot_*
function.R/plot.R
plot_var_pairs.Rd
Easily generate a matrix of ggplot2 graphs for all pairwise
combinations of all (or a subset of) variables in a data frame using a
class-appropriate geometry via other elucidate plot_*
functions with a
restricted set of customization options and some modified defaults. The
collection of generated graphs will be combined into a single lattice-style
matrix figure with either the patchwork package or trelliscopejs package.
See "Arguments" section for details and
this
blog post for an introduction to ggplot2. To obtain a plot of a single
variable or vector, use plot_var
instead. To obtain
univariate plots of all variables, or plots of all variables against a
specific secondary variable, use plot_var_all
instead.
plot_var_pairs(
data,
group_var = NULL,
cols = NULL,
title = ggplot2::waiver(),
caption = ggplot2::waiver(),
fill = "blue2",
colour = "black",
palette = c("plasma", "C", "magma", "A", "inferno", "B", "viridis", "D", "cividis",
"E"),
palette_direction = c("d2l", "l2d"),
palette_begin = 0,
palette_end = 0.8,
alpha = 0.75,
greyscale = FALSE,
line_size = 1,
theme = c("bw", "classic", "grey", "light", "dark", "minimal"),
text_size = 14,
font = c("sans", "serif", "mono"),
legend_position = c("right", "left", "top", "bottom"),
omit_legend = FALSE,
dnorm = TRUE,
violin = TRUE,
var1_log10 = FALSE,
var2_log10 = FALSE,
point_size = 2,
point_shape = c("circle", "square", "diamond", "triangle up", "triangle down"),
regression_line = TRUE,
regression_method = c("gam", "loess", "lm"),
regression_se = TRUE,
bar_position = c("dodge", "fill", "stack"),
bar_width = 0.9,
basic = FALSE,
interactive = FALSE,
trelliscope = FALSE,
nrow = NULL,
ncol = NULL,
guides = "collect"
)
A data frame containing variables to be plotted against each other in pairwise/bivariate combinations.
Use if you want to assign a grouping variable to fill
(colour) and/or (outline) colour e.g. group_var = "grouping_variable" or
group_var = grouping_variable. Whether the grouping variable is mapped to
fill, colour, or both will depend upon which plot_*
function is used (See
"Value" section). For density plots, both fill and colour are used for
consistency across the main density plots and added normal density curve
lines (if dnorm = TRUE). For bar graphs and box-and-whisker plots, the
variable will be assigned to fill. For scatter plots, the variable will be
assigned to colour. See aes
for details.
A character (or integer) vector of column names (or indices)
which allows you to generate pair plots only a subset of the columns in the
input data frame, where each variable combination will be assigned once
each of the var1
and var2
arguments of plot_var
. Note
that a variable which has been assigned to group_var
does not also need
to be listed here.
A character string to add as a title at the top of the combined multiple-panel patchwork graph or trelliscopejs display.
Add a figure caption to the bottom of the plot using a character string.
Fill colour to use for density plots, bar graphs, and box plots.
Ignored if a variable that has been assigned to group_var
is mapped on to
fill_var
(see group_var
argument information above). Default is
"blue2". Use colour_options
to see colour option examples.
Outline colour to use for density plots, bar graphs, box plots,
and scatter plots. Ignored if a variable that has been assigned to
group_var
is mapped on to colour_var
(see group_var
argument
information above). Default is "black". Use colour_options
to
see colour option examples.
If a variable is assigned to group_var, this determines which viridis colour palette to use. Options include "plasma" or "C" (default), "magma" or "A", "inferno" or "B", "viridis" or "D", and "cividis" or "E". See this link for examples.
Choose "d2l" for dark to light (default) or "l2d" for light to dark.
Value between 0 and 1 that determines where along the
full range of the chosen colour palette's spectrum to begin sampling
colours. See scale_fill_viridis_d
for details.
Value between 0 and 1 that determines where along the full
range of the chosen colour palette's spectrum to end sampling colours. See
scale_fill_viridis_d
for details.
This adjusts the transparency/opacity of the main geometric objects in the generated plot, with acceptable values ranging from 0 = 100% transparent to 1 = 100% opaque.
Set to TRUE if you want the plot converted to grey scale.
Controls the thickness of plotted lines.
Adjusts the theme using 1 of 6 predefined "complete" theme
templates provided by ggplot2. Currently supported options are: "classic",
"bw" (the elucidate default), "grey" (the ggplot2 default), "light",
"dark", & "minimal". See theme_bw
for more
information.
This controls the size of all plot text. Default = 14.
This controls the font of all plot text. Default = "sans" (Arial). Other options include "serif" (Times New Roman) and "mono" (Courier New).
This allows you to modify the legend position if a
variable is assigned to group_var
. Options include "right" (the default),
"left", "top", & "bottom".
Set to TRUE if you want to remove/omit the legend(s).
Ignored if group_var
is unspecified.
When TRUE (default), this adds a dashed line representing a
normal/Gaussian density curve to density plots, which are rendered for
plots of single numeric variables. Disabled if var1
is a date vector,
var1_log10
= TRUE, or basic
= TRUE.
When TRUE (default), this adds violin plot outlines to box
plots, which are rendered in cases where a mixture of numeric and
categorical variables are assigned to var1
and var2
. Disabled if
basic
= TRUE.
If TRUE, applies a base-10 logarithmic transformation to a
numeric variable that has been assigned to var1
. Ignored if var1
is a
categorical variable.
If TRUE, applies a base-10 logarithmic transformation to a
numeric variable that has been assigned to var2
. Ignored if var2
is a
categorical variable.
Controls the size of points used in scatter plots, which
are rendered in cases where var1
and var2
are both numeric, integer, or
date variables.
Point shape to use in scatter plots, which
are rendered in cases where var1
and var2
are both numeric, integer, or
date variables.
If TRUE (the default), adds a regression line to scatter
plots, which are rendered in cases where var1
and var2
are both
numeric, integer, or date variables. Disabled if basic
= TRUE.
If regression_line
= TRUE, this determines the
type of regression line to use. Currently available options are "gam",
"loess", and "lm". "gam" is the default, which fits a generalized additive
model using a smoothing term for x. This method has a longer run time, but
typically provides a better fit to the data than other options and uses an
optimization algorithm to determine the optimal wiggliness of the line. If
the relationship between y and x is linear, the output will be equivalent
to fitting a linear model. "loess" may be preferable to "gam" for small
sample sizes. See stat_smooth
and
gam
for details.
If TRUE (the default), adds a 95% confidence envelope for the
regression line. Ignored if regression_line
= FALSE.
In bar plots, which are rendered for one or more
categorical variables, this determines how bars are arranged relative to
one another when a grouping variable is assigned to group_var
. The
default, "dodge", uses position_dodge
to arrange
bars side-by-side; "stack" places the bars on top of each other; "fill"
also stacks bars but additionally converts y-axis from counts to
proportions.
In bar plots, which are rendered for one or more categorical variables, this adjusts the width of the bars (default = 0.9).
This is a shortcut argument that allows you to simultaneously
disable the dnorm
, violin
, and regression_line
arguments to produce a
basic version of a density, box, or scatter plot (depending on
var1
/var2
variable class(es)) without any of those additional layers.
Dropping these extra layers may noticeably reduce rendering time and memory
utilization, especially for larger sample sizes and/or when interactive
=
TRUE.
Determines whether a static ggplot object or an
interactive html plotly object is returned. Interactive/plotly mode for
multiple plots should only be used in conjunction with trelliscope
=
TRUE. See ggplotly
for details. Note that in cases
where a box plot is generated (for a mix of numeric and categorical
variables) and a variable is also assigned to group_var
, activating
interactive/plotly mode will cause a spurious warning message about
'layout' objects not having a 'boxmode' attribute to be printed to the
console. This is a
documented bug with
plotly that can be safely ignored, although unfortunately the message
cannot currently be suppressed.
If changed to TRUE, plots will be combined into an
interactive trelliscope display rather than a static patchwork graph grid.
See trelliscope
for more information.
This controls the number of rows to use when arranging plots in
the combined patchwork or trelliscopejs display. Modifying the arrangement
of the plot matrix this way is not recommended when trelliscope
= FALSE.
This controls the number of columns to use when arranging plots
in the combined patchwork or trelliscopejs display. Modifying the arrangement
of the plot matrix this way is not recommended when trelliscope
= FALSE.
Controls the pooling of group_var
legends/guides across plot
panels if a categorical variable has been assigned to group_var
and
trelliscope
= FALSE. See wrap_plots
for details.
A static "patchwork" or dynamic "trelliscope" multi-panel graphical
display matrix of ggplot2 or plotly graphs depending upon the values of the
trelliscope
and interactive
arguments. The type of graph (i.e.
ggplot2::geom*
layers) that is rendered in each panel will depend upon
the classes of the chosen variables, as follows:
One numeric (classes numeric/integer/date) variable (e.g. on the diagonal
of the plot matrix) will be graphed with plot_density
.
One (e.g. on the diagonal) or two (appearing off-diagonal) categorical
(classes factor/character/logical) variable(s) will be graphed with
plot_bar
.
Two numeric variables will be graphed with plot_scatter
.
A mixture of numeric and categorical variables will be graphed with
plot_box
, where the numeric variable will always be assigned
to the y-axis.
Wickham, H. (2016). ggplot2: elegant graphics for data analysis. New York, N.Y.: Springer-Verlag.
data(mtcars) #load the mtcars data
# convert variables "cyl" to a factors
mtcars$cyl <- as.factor(mtcars$cyl)
# plot all pairwise combinations of variables "mpg", "hp", and "cyl"
plot_var_pairs(mtcars, cols = c("mpg", "hp", "cyl"))
# render basic versions of the same plots without normal density curves,
# violin plots, or regression lines added.
plot_var_pairs(mtcars, cols = c("mpg", "hp", "cyl"), basic = TRUE)