R/boot_ci.R
stat_ci.Rd
stat_ci returns bootstrapped confidence intervals for a specific
summary statistic for numeric vectors. This function provides a simplified
user interface to the boot
and
boot.ci
functions similarly to the
one.boot
function but retains more of the boot
package's functionality, most notably including options for
parallelization. For convenience, when operating in parallel the user's
operating system is automatically detected so that the appropriate
parallelization engine is used (e.g. snow for Windows, multicore otherwise)
by the parallel package. Confidence intervals for the mean or median can be
obtained more easily using the convenience functions mean_ci
& median_ci
.
stat_ci(
y,
stat,
...,
replicates = 2000,
ci_level = 0.95,
ci_type = c("bca", "perc", "basic", "norm"),
parallel = FALSE,
cores = NULL,
na.rm = TRUE
)
A vector/variable (required).
the unquoted name (e.g. mean, not "mean") of a summary statistic function to calculate confidence intervals for. Only functions which return a single value and operate on numeric variables are currently supported.
any number of additional named arguments passed to stat function for further customization.
The number of bootstrap replicates to use. Default is 2,000, as recommended by Efron & Tibshirani (1993). For publications, or if you need more precise estimates, more replications (e.g. >= 5,000) are recommended. N.B. more replications will of course take longer to run. If you get the error: "estimated adjustment 'a' is NA" when ci_type is set to "bca" then try again with more replications.
The confidence level to use for constructing confidence intervals.
Default is set to ci_level = 0.95
for 95 percent CIs.
The type of confidence intervals to calculate from the
bootstrap samples. Most of the options available in the underlying boot.ci
function are implemented (except for studentized intervals): "norm" for an
approximation based on the normal distribution, "perc" for percentile,
"basic" for basic, and "bca" for bias-corrected and accelerated. BCa
intervals are the default since these tend to provide the most
accurate/least-biased results (Efron, 1987), however they require more time
to calculate and may not be much better than the other methods for large
sample sizes (e.g. >= 100,000 rows of data). See
boot.ci
for details.
set to TRUE if you want to use multiple cores or FALSE if you don't (the default). Note that there is some processing overhead involved when operating in parallel so speed gains may not be very noticeable for smaller samples (and may even take longer than sequential processing). Due to the nature of the underlying parallelization architecture, performance gains will likely be greater on non-Windows machines that can use the "multicore" implementation instead of "snow". For obvious reasons this option only works on machines with more than 1 logical processing core.
If parallel is set to TRUE, this determines the number of cores to use. To see how many cores are available on your machine, use parallel::detectCores()
should missing values be removed before attempting to calculate the chosen statistic and confidence intervals? Default is TRUE.
Efron, B. (1987). Better bootstrap confidence intervals. Journal of the American statistical Association, 82(397), 171-185.
Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York: Chapman & Hall.
y1 <- rnorm(1:10000, 100, 10)
#using a single core (sequential processing)
stat_ci(y1, stat = sd, ci_type = "perc")
#> sd lower upper
#> 9.968904 9.826940 10.111037
if (FALSE) {
#using multiple cores (parallel processing)
stat_ci(y1, stat = sd, parallel = TRUE, cores = 2, ci_type = "perc")
}