Plot a sample mean or median +/- error bars.

Easily generate plots of a sample mean or median +/- error bars using ggplot2 with a simplified customization interface with static (ggplot) and interactive (plotly) output options. The static output is useful for producing static reports (e.g. for manuscripts) and is readily customized further using ggplot2 syntax. The interactive output is helpful for exploring the data and producing dynamic html reports. See this blog post for an introduction to ggplot2.

plot_stat_error(
  data,
  y,
  x = NULL,
  geom = c("point", "bar"),
  stat = c("mean", "median"),
  error = c("ci", "sd", "se", "var", "quartile"),
  ci_level = 0.95,
  ci_type = c("perc", "bca", "norm", "basic"),
  replicates = 2000,
  parallel = FALSE,
  cores = NULL,
  xlab = NULL,
  ylab = NULL,
  title = NULL,
  title_hjust = 0.5,
  caption = NULL,
  caption_hjust = 0,
  ...,
  ylim = c(NA, NA),
  ybreaks = ggplot2::waiver(),
  transform_y = FALSE,
  y_transformation = "log10",
  y_var_labs = ggplot2::waiver(),
  x_var_order = NULL,
  x_var_labs = NULL,
  fill_var = NULL,
  fill_var_order = NULL,
  fill_var_values = NULL,
  fill_var_labs = NULL,
  fill_var_title = NULL,
  colour_var = NULL,
  colour_var_order = NULL,
  colour_var_values = NULL,
  colour_var_labs = NULL,
  colour_var_title = NULL,
  palette = c("plasma", "C", "magma", "A", "inferno", "B", "viridis", "D", "cividis",
    "E"),
  palette_direction = c("d2l", "l2d"),
  palette_begin = 0,
  palette_end = 0.8,
  greyscale = FALSE,
  b_width = 0.75,
  p_size = 3,
  p_shape = 21,
  dodge_width = 0.9,
  eb_size = 0.3,
  eb_width = 0.2,
  eb_alpha = 1,
  eb_line_type = 1,
  eb_colour = NULL,
  add_lines = F,
  line_alpha = 0.75,
  line_group = NULL,
  line_colour = NULL,
  line_type = 1,
  line_size = 0.5,
  theme = c("bw", "classic", "grey", "light", "dark", "minimal"),
  text_size = 14,
  font = c("sans", "serif", "mono"),
  coord_flip = FALSE,
  omit_legend = FALSE,
  legend_position = c("right", "left", "bottom", "top"),
  facet_var = NULL,
  facet_var_order = NULL,
  facet_var_labs = NULL,
  facet_var_strip_position = c("top", "bottom"),
  facet_var_text_bold = TRUE,
  print_stats = F,
  aesthetic_options = FALSE,
  output = "p",
  interactive = FALSE,
  na.rm = TRUE
)

Arguments

data: A data frame or tibble containing the dependent measure "y", the independent measure "x" (optional), and any grouping variables or covariates.
y: A numeric variable you want to plot on the y-axis (quoted or unquoted), e.g. y = "variable" or y = variable.
x: A categorical variable you want to plot on the x-axis (quoted or unquoted), e.g. x = "variable" or x = variable.
geom: Determines whether the chosen summary statistic is displayed using geom_bar (geom = "bar"; the default) or geom_point (geom = "point")
stat: The summary statistic to use for plotting bars/points. Options are "mean" (the default) or "median".
error: The statistic to use for the error bars. When stat = "mean", available options include se (standard error), sd (standard deviation), var(variance), and ci (confidence interval; the default). When stat = "median", options include "quartile" (lower bound = 25th percentile & upper bound = 75th percentile), or "ci". See below for more details on confidence intervals. With respect to CIs, when stat = "mean", CIs are calculated directly from a normal distribution based on the standard error using qnorm. In contrast, bootstrapped CIs of the specified type are returned via median_ci when stat = "median".
ci_level: The confidence level to use for constructing confidence intervals. Default is set to ci_level = 0.95 for 95 percent CIs.
ci_type: The type of confidence intervals to calculate from the bootstrap samples when stat = "median" and error = "ci". Most of the options available in the underlying boot.ci function are implemented (except for studentized intervals): "norm" for an approximation based on the normal distribution, "perc" for percentile, "basic" for basic, and "bca" for bias-corrected and accelerated. Percentile intervals are the default since these are typically sufficient when working with large data sets (e.g. >= 100,000 rows of data) and are faster to calculate than BCa intervals. However, BCa intervals (the default for the more primitive median_ci function) tend to provide the most accurate/least-biased results (Efron, 1987), particularly for small-medium sized samples, at the obvious cost of requiring more time to calculate. See boot.ci for details.
replicates: The number of bootstrap replicates to use for calculating bootstrapped CIs when stat = "median" and error = "ci". Default is 2,000, as recommended by Efron & Tibshirani (1993). For publications, or if you need more precise estimates, more replications (e.g. >= 5,000) are recommended. N.B. more replications will of course take longer to run. If you get the error: "estimated adjustment 'a' is NA" then try again with more replications.
parallel: set to TRUE if you want to use multiple cores or FALSE if you don't (the default). Note that there is some processing overhead involved when operating in parallel so speed gains may not be very noticeable for smaller samples (and may even take longer than sequential processing). Due to the nature of the underlying parallelisation architecture, performance gains will likely be greater on non-Windows machines that can use the "multicore" implementation instead of "snow". For obvious reasons this option only works on machines with more than 1 logical processing core.
cores: If parallel is set to TRUE, this determines the number of cores to use. To see how many cores are available on your machine, use parallel::detectCores()
xlab: Specify/overwrite the x-axis label using a character string, e.g. "x-axis label". If no variable is assigned to x, then this label will be omitted.
ylab: Specify/overwrite the y-axis label using a character string, e.g. "y-axis label". If not specified, this label will reflect a combination of the chosen statistic, y variable, and error bar options.
title: Add a main title to the plot using a character string, e.g. "scatterplot of y as a function of x".
title_hjust: Left-to-right/horizontal justification (alignment) of the main plot title. Accepts values from 0 (far left) to 1 (far right). Default is 0.5 (centre).
caption: Add a figure caption to the bottom of the plot using a character string.
caption_hjust: Left-to-right/horizontal justification (alignment) of the caption. Accepts values from 0 (far left) to 1 (far right). Default is 0 (left).
...: graphical parameters (not associated with variables) to be passed to geom_point or geom_bar according to which base geometry is specified (via the geom argument), e.g. colour, fill, or transparency (e.g. alpha = 0.6) to be applied to all cases. To see some of the available options in a web browser, set the aesthetic_options argument to TRUE. Exceptions to this are the width of the bar graphs which is controlled using the b_width argument, and the shape & size of points for geom = "point", which are controlled using p_size & p_shape (see below).
ylim: specify the y-axis limits, e.g. ylim = c(lower_limit, upper_limit). Use NA for the existing minimum or maximum value of y, e.g. the default is ylim = c(NA, NA)
ybreaks: This allows you to change the break points to use for tick marks on the y-axis. seq is particularly useful here. See scale_y_continuous for details. If ybreaks is specified, then ylim should be also.
transform_y: Would you like to transform the y axis? (TRUE or FALSE)
y_transformation: If transform_y = TRUE, this determines the transformation to be applied. Common choices include "log10" (the default), "log2", "sqrt", or "exp". See scale_continuous for details.
y_var_labs: Allows you to modify the labels displayed with the y-axis tick marks. See scale_continuous for details.
x_var_order: If a variable has been assigned to x, this allows you to modify the order of the variable groups, e.g. x = grouping_variable, x_var_order = c("group_2", "group_1"). See fct_relevel for details.
x_var_labs: If a variable has been assigned to x, this allows you to modify the labels of the variable groups, e.g. x = grouping_variable, x_var_labs = c("group_1_new_label" = "group_1_old_label", "group_2_new_label" = "group_2_old_label"). See fct_recode for details.
fill_var: Use if you want to assign a variable to the point fill colour, e.g. fill_var = "grouping_variable" or fill_var = grouping_variable. Produces separate sets of points for each level of the fill variable. See aes for details. Note: for geom = "point", fill_var and fill only affect shapes 21-24 (21 is the default). To split the data by a variable based on colour, it is therefore easier to use colour_var for this particular plot geometry.
fill_var_order: If a variable has been assigned to fill using fill_var, this allows you to modify the order of the variable groups, e.g. fill_var = grouping_variable, fill_var_order = c("group_2", "group_1"). See fct_relevel for details.
fill_var_values: If a variable has been assigned to fill using fill_var, this allows you to modify the colours assigned to the fill of each of the variable groups, e.g. fill_var = grouping_variable, fill_var_values = c("blue", "red"). See scale_fill_manual for details. For the colour options available in base R, see colour_options.
fill_var_labs: If a variable has been assigned to fill using fill_var, this allows you to modify the labels of the variable groups, e.g. fill_var = grouping_variable, fill_var_labs = c("group_1_new_label" = "group_1_old_label", "group_2_new_label" = "group_2_old_label"). See fct_recode for details.
fill_var_title: If a variable has been assigned to fill using fill_var, this allows you to modify the variable label in the plot legend.
colour_var: Use if you want to assign a variable to the point outline colour, e.g. colour_var = "grouping_variable" or colour_var = grouping_variable. Produces separate sets of points for each level of the colour variable. See aes for details.
colour_var_order: If a variable has been assigned to colour using colour_var, this allows you to modify the order of the variable groups, e.g. colour_var = grouping_variable, fill_var_order = c("group_2", "group_1"). See fct_relevel for details.
colour_var_values: If a variable has been assigned to colour using colour_var, this allows you to modify the colours assigned to the outline of each of the variable groups, e.g. colour_var = grouping_variable, colour_var_values = c("blue", "red"). See scale_fill_manual for details. For the colour options available in base R, see colour_options.
colour_var_labs: If a variable has been assigned to colour using colour_var, this allows you to modify the labels of the variable groups, e.g. colour_var = grouping_variable, colour_var_labs = c("group_1_new_label" = "group_1_old_label", "group_2_new_label" = "group_2_old_label"). See fct_recode for details.
colour_var_title: If a variable has been assigned to colour using colour_var, this allows you to modify the variable label in the plot legend.
palette: If a variable is assigned to fill_var or colour_var, this determines which viridis colour palette to use. Options include "plasma" or "C" (default), "magma" or "A", "inferno" or "B", "viridis" or "D", and "cividis" or "E". See this link for examples. You can override these colour palettes with fill_var_values or colour_var_values.
palette_direction: Choose "d2l" for dark to light (default) or "l2d" for light to dark.
palette_begin: Value between 0 and 1 that determines where along the full range of the chosen colour palette's spectrum to begin sampling colours. See scale_fill_viridis_d for details.
palette_end: Value between 0 and 1 that determines where along the full range of the chosen colour palette's spectrum to end sampling colours. See scale_fill_viridis_d for details.
greyscale: Set to TRUE if you want the plot converted to greyscale.
b_width: The width of the bars when geom = "bar". Affects the spacing between adjacent bars, where a value of 1 specifies that there should be no space between adjacent bars. Default is 0.75.
p_size: The size of the points when geom = "point", default = 2.
p_shape: The shape of the points when geom = "point". The default is 21, which is a circle like shape 1 (the ggplot2 default), but unlike the latter also has a fill parameter.
dodge_width: Determines the amount by which to jitter points and error bars when variables are mapped/assigned to x, fill, or colour. To disable jittering set this to 0.
eb_size: Controls the thickness of error bar lines. Default = 0.3.
eb_width: Controls the width of error bar endpoint lines. Default = 0.2.
eb_alpha: This adjusts the transparency/opacity of the graphical components of the error bars, ranging from 0 = 100 percent transparent to 1 = 100% percent opaque.
eb_line_type: Controls the error bar line type. Default = 1 or "solid".
eb_colour: Controls the colour of error bar lines. If unspecified, will either be "black" for all lines, or differ by groups if colour_var is specified.
add_lines: Would you like to connect the estimated statistics (the points or ends of the bars) with lines (TRUE/FALSE)? Default = FALSE. Requires that a variable is assigned to x. Particularly useful for showing changes over time between groups (assigned to either fill_var or colour_var). If only one of fill_var or colour_var are specified then lines will be split according to the same variable mapping. If neither or both of these parameters are specified and you want the lines split by a grouping variable, assign the chosen variable to line_group.
line_alpha: This adjusts the transparency/opacity of the lines connecting each (groupwise) estimate, ranging from 0 = 100 percent transparent to 1 = 100 percent opaque.
line_group: Determines which variable to split the connecting lines on. Only required if (2 different) variables are already assigned to both fill_var and colour_var.
line_colour: Controls the colour of connection lines. If unspecified, will either be "black" for all lines, or differ by groups if colour_var is specified.
line_type: Controls the connection line type. Default = 1 or "solid".
line_size: Controls the thickness of the connection lines. Default = 0.5.
theme: Adjusts the theme using 1 of 6 predefined "complete" theme templates provided by ggplot2. Currently supported options are: "classic", "bw" (the elucidate default), "grey" (the ggplot2 default), "light", "dark", & "minimal". See theme_bw for more information.
text_size: This controls the size of all plot text. Default = 14.
font: This controls the font of all plot text. Default = "sans" (Arial). Other options include "serif" (Times New Roman) and "mono" (Courier New).
coord_flip: Set to TRUE (default = FALSE) if you want to swap the x and y axes. See coord_flip for more information.
omit_legend: Set to TRUE if you want to remove/omit the legends.
legend_position: This allows you to modify the legend position. Options include "right" (the default), "left", "top", & "bottom".
facet_var: Use if you want separate plots for each level of a grouping variable (i.e. a faceted plot), e.g. facet_var = "grouping_variable" or facet_var = grouping_variable. See facet_wrap for details.
facet_var_order: If a variable has been assigned for faceting using facet_var, this allows you to modify the order of the variable groups, e.g. facet_var = grouping_variable, facet_var_order = c("group_2", "group_1"). See fct_relevel for details.
facet_var_labs: If a variable has been assigned for faceting using facet_var, this allows you to modify the labels of the variable groups which will appear in the facet strips, e.g. facet_var = grouping_variable, facet_var_labs = c("group_1_new_label" = "group_1_old_label", "group_2_new_label" = "group_2_old_label"). See fct_recode for details.
facet_var_strip_position: If a variable has been assigned for faceting using facet_var, this allows you to modify the position of the facet strip labels. Sensible options include "top" (the default) or "bottom".
facet_var_text_bold: If a variable has been assigned for faceting using facet_var, this allows you to use boldface (TRUE/default or FALSE) for the facet strip label text.
print_stats: Set to TRUE (the default is FALSE) if you would like the values used for plotting and info on the y variable sample size(s) & missing values to be printed as the function is executed. To save/extract these values, use output = "ps" instead.
aesthetic_options: If set to TRUE, opens a web browser to the tidyverse online aesthetic options vignette.
output: Set to "p" if only want the ggplot or plotly object (depending on whether interactive = T or F) to be returned. Set to "ps" if you would instead like a list to be returned containing both the "plot" as a ggplot2/plotly object and "stats" used to produce it as a tibble. This allows you to extract/save the values for subsequent reporting or utilization.
interactive: Determines whether a static ggplot object or an interactive html plotly object is returned. See ggplotly for details.
na.rm: This determines whether missing values (NAs) should be removed before attempting to calculate the summary statistics used for plotting.

Value

A ggplot object or plotly object depending on whether static or interactive output was requested. This is returned as a standalone object if output = "p", or as the "plot" component of a list also containing the underlying "stats" as a 2nd component if output = "ps.

References

Wickham, H. (2016). ggplot2: elegant graphics for data analysis. New York, N.Y.: Springer-Verlag.

Altman, D. G., & Bland, J. M. (2005). Standard deviations and standard errors. Bmj, 331(7521), 903.

Efron, B. (1987). Better bootstrap confidence intervals. Journal of the American statistical Association, 82(397), 171-185.

Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York: Chapman & Hall.

Author

Craig P. Hutton, craig.hutton@gov.bc.ca

Examples

data(mtcars) #load the mtcars data

plot_stat_error(mtcars, y = mpg, x = cyl, colour = "blue")


plot_stat_error(mtcars, y = mpg, x = cyl, colour = "blue", geom = "point")


# \donttest{

plot_stat_error(pdata, y = y1, x = d, colour_var = g, print_stats = TRUE,
                geom = "point", p_size = 3,
                add_lines = TRUE,
                dodge_width = 0,
                alpha = 0.6)
#> # A tibble: 60 × 9
#>    d          g     cases     n    na  p_na lower  mean upper
#>    <chr>      <chr> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
#>  1 2008-01-01 e       196   196     0     0  98.8 100.   101.
#>  2 2008-01-01 c       185   185     0     0  99.2 101.   102.
#>  3 2008-01-01 d       198   198     0     0  99.2 101.   102.
#>  4 2008-01-01 a       216   216     0     0  98.5  99.8  101.
#>  5 2008-01-01 b       205   205     0     0  99.8 101.   103.
#>  6 2009-01-01 e       196   196     0     0 118.  120.   121.
#>  7 2009-01-01 c       185   185     0     0 114.  116.   117.
#>  8 2009-01-01 d       198   198     0     0 118.  120.   121.
#>  9 2009-01-01 a       216   216     0     0 108.  110.   111.
#> 10 2009-01-01 b       205   205     0     0 114.  115.   117.
#> 11 2010-01-01 e       196   196     0     0 148.  149.   150.
#> 12 2010-01-01 c       185   185     0     0 118.  119.   121.
#> 13 2010-01-01 d       198   198     0     0 129.  130.   131.
#> 14 2010-01-01 a       216   216     0     0 111.  112.   113.
#> 15 2010-01-01 b       205   205     0     0 113.  114.   115.
#> 16 2011-01-01 e       196   196     0     0 130.  131.   133.
#> 17 2011-01-01 c       185   185     0     0 131.  132.   133.
#> 18 2011-01-01 d       198   198     0     0 149.  151.   152.
#> 19 2011-01-01 a       216   216     0     0 119.  120.   121.
#> 20 2011-01-01 b       205   205     0     0 120.  121.   122.
#> 21 2012-01-01 e       196   196     0     0 135.  136.   137.
#> 22 2012-01-01 c       185   185     0     0 138.  139.   141.
#> 23 2012-01-01 d       198   198     0     0 149.  151.   152.
#> 24 2012-01-01 a       216   216     0     0 109.  110.   111.
#> 25 2012-01-01 b       205   205     0     0 120.  121.   122.
#> 26 2013-01-01 e       196   196     0     0 144.  146.   147.
#> 27 2013-01-01 c       185   185     0     0 149.  150.   152.
#> 28 2013-01-01 d       198   198     0     0 159.  160.   162.
#> 29 2013-01-01 a       216   216     0     0 119.  120.   122.
#> 30 2013-01-01 b       205   205     0     0 129.  131.   132.
#> 31 2014-01-01 e       196   196     0     0 154.  156.   157.
#> 32 2014-01-01 c       185   185     0     0 179.  181.   183.
#> 33 2014-01-01 d       198   198     0     0 189.  190.   192.
#> 34 2014-01-01 a       216   216     0     0 144.  146.   147.
#> 35 2014-01-01 b       205   205     0     0 158.  160.   161.
#> 36 2015-01-01 e       196   196     0     0 117.  118.   119.
#> 37 2015-01-01 c       185   185     0     0 208.  210.   212.
#> 38 2015-01-01 d       198   198     0     0 198.  199.   201.
#> 39 2015-01-01 a       216   216     0     0 160.  161.   162.
#> 40 2015-01-01 b       205   205     0     0 178.  179.   181.
#> 41 2016-01-01 e       196   196     0     0 135.  137.   138.
#> 42 2016-01-01 c       185   185     0     0 218.  220.   221.
#> 43 2016-01-01 d       198   198     0     0 210.  211.   213.
#> 44 2016-01-01 a       216   216     0     0 148.  149.   151.
#> 45 2016-01-01 b       205   205     0     0 184.  185.   187.
#> 46 2017-01-01 e       196   196     0     0 145.  146.   147.
#> 47 2017-01-01 c       185   185     0     0 249.  251.   252.
#> 48 2017-01-01 d       198   198     0     0 224.  225.   227.
#> 49 2017-01-01 a       216   216     0     0 169.  170.   171.
#> 50 2017-01-01 b       205   205     0     0 194.  195.   197.
#> 51 2018-01-01 e       196   196     0     0 129.  131.   132.
#> 52 2018-01-01 c       185   185     0     0 245.  246.   248.
#> 53 2018-01-01 d       198   198     0     0 218.  220.   222.
#> 54 2018-01-01 a       216   216     0     0 147.  149.   151.
#> 55 2018-01-01 b       205   205     0     0 199.  201.   202.
#> 56 2019-01-01 e       196   196     0     0 148.  149.   151.
#> 57 2019-01-01 c       185   185     0     0 259.  261.   263.
#> 58 2019-01-01 d       198   198     0     0 228.  230.   232.
#> 59 2019-01-01 a       216   216     0     0 159.  160.   162.
#> 60 2019-01-01 b       205   205     0     0 194.  196.   197.


plot_stat_error(pdata, y = y1, x = g, coord_flip = TRUE,
                fill_var = g, geom = "point", eb_size = 0.6,
                alpha = 0.6)


plot_stat_error(pdata, y = y1, x = g, fill = "blue", alpha = 0.6,
                stat = "median", error = "quartile")


plot_stat_error(pdata, y = y1, x = g, fill = "blue", alpha = 0.6,
                  stat = "median", error = "ci")


plot_stat_error(pdata, y = y1, x = g, fill = "blue", alpha = 0.6,
                stat = "mean", error = "ci", ci_level = 0.8,
                interactive = TRUE)


#when output = "ps" the plot is stored as the 1st element of a
#list

out <- plot_stat_error(pdata, y = y1, x = g, fill = "blue", alpha = 0.6,
                       stat = "mean", error = "ci", output = "ps")

out$plot #print the plot to the appropriate active graphics device


out$stats #print the descriptive summary table with the values used for plotting to the console
#> # A tibble: 5 × 8
#>   g     cases     n    na  p_na lower  mean upper
#>   <chr> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 e      2352  2352     0     0  134.  135.  136.
#> 2 c      2220  2220     0     0  175.  177.  180.
#> 3 d      2376  2376     0     0  172.  174.  176.
#> 4 a      2592  2592     0     0  133.  134.  135.
#> 5 b      2460  2460     0     0  150.  152.  153.
# }

Arguments

Value

References

See also

Author

Examples