R/boot_ci.R
mean_ci.Rd
mean_ci returns confidence intervals for the mean of a numeric
vector. One might want to use bootstrapping to obtain robust estimates for
a confidence interval of the mean if the sample size is small (e.g. n = 10)
or calculate them from a theoretical normal distribution otherwise. Note
that the usual calculation based on quantiles of the theoretical
distribution can be obtained with this function using the default ci_type =
"norm". This function provides a simplified user interface to the
boot
and boot.ci
functions
similarly to the one.boot
function but retains
more of the boot package's functionality, most notably including options
for parallelization. For convenience, when operating in parallel the user's
operating system is automatically detected so that the appropriate
parallelization engine is used (e.g. snow for Windows, multicore otherwise)
by the parallel package. Since the mean and median are common descriptive
statistics for which confidence intervals are estimated, these have their
own dedicated functions. To obtain bootstrapped confidence intervals for
other summary statistics use codestat_ci instead.
mean_ci(
y,
replicates = 2000,
ci_level = 0.95,
ci_type = c("norm", "perc", "bca", "basic"),
parallel = FALSE,
cores = NULL,
na.rm = TRUE
)
A vector/variable (required).
The number of bootstrap replicates to use. Default is 2,000, as recommended by Efron & Tibshirani (1993). For publications, or if you need more precise estimates, more replications (e.g. >= 5,000) are recommended. N.B. more replications will of course take longer to run. If you get the error: "estimated adjustment 'a' is NA" when ci_type is set to "bca" then try again with more replications.
The confidence level to use for constructing confidence intervals.
Default is set to ci_level = 0.95
for 95 percent CIs.
The type of confidence intervals to calculate from the
bootstrap samples. Most of the options available in the underlying boot.ci
function are implemented (except for studentized intervals): "norm" for
calculation based on a theoretical normal distribution, "perc" for
percentile, "basic" for basic, and "bca" for bias-corrected and
accelerated. See boot.ci
for details regarding options
other than "norm". Since the normal confidence intervals for the mean can
be directly calculation based quantiles from the theoretical gaussian
distribution this method is used for this unique case (CIs for the mean)
instead of bootstrapping when the ci_type is set to "norm" (the default),
since it is MUCH faster.
set to TRUE if you want to use multiple cores or FALSE if you don't (the default). Note that there is some processing overhead involved when operating in parallel so speed gains may not be very noticeable for smaller samples (and may even take longer than sequential processing). Due to the nature of the underlying parallelization architecture, performance gains will likely be greater on non-Windows machines that can use the "multicore" implementation instead of "snow". For obvious reasons this option only works on machines with more than 1 logical processing core.
If parallel is set to TRUE, this determines the number of cores to use. To see how many cores are available on your machine, use parallel::detectCores()
should missing values be removed before attempting to calculate the mean and confidence intervals? Default is TRUE.
Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York: Chapman & Hall.
y1 <- rnorm(1:10000, 100, 10)
#using a single core (sequential processing)
mean_ci(y1)
#> mean lower upper
#> 100.02908 99.83368 100.22448
mean_ci(y1, ci_type = "perc")
#> mean lower upper
#> 100.02908 99.83775 100.22495
#using multiple cores (parallel processing)
mean_ci(y1, parallel = TRUE, cores = 2, ci_type = "perc")
#> mean lower upper
#> 100.02908 99.83728 100.22711