Obtain a summary of missing values for a vector/variable. This
function is a more efficient alternative to describe
when
assessment of missing values is the focus of describing a variable/vector.
Uses a combination of tidyverse packages and data.table to provide a
user-friendly interface that is pipe-friendly while leveraging the
excellent performance of data.table. The use of the ... argument also makes
it incredibly easy to obtain summaries split by grouping variables. To
obtain summaries of missing values for all variables in a data frame use
describe_na_all
instead.
describe_na(data, y = NULL, ..., digits = 4, output = c("dt", "tibble"))
Either a vector or a data frame or tibble containing the vector ("y") to be summarized and any grouping variables.
If the data object is a data.frame, this is the variable for which you wish to obtain a summary of missing values. You can use either the quoted or unquoted name of the variable, e.g. "y_var" or y_var.
If the data object is a data.frame, this special argument accepts
any number of unquoted grouping variable names (also present in the data
source) to use for subsetting, separated by commas, e.g. group_var1, group_var2
. Also accepts a character vector of column names or index
numbers, e.g. c("group_var1", "group_var2") or c(1, 2), but not a mixture
of formats in the same call. If no column names are specified, all columns
will be used.
This determines the number of digits used for rounding of the "p_na" column in the output.
Output type for each class of variables. dt" for data.table or "tibble" for tibble.
A tibble or data.table with the following columns in addition to any specified grouping variables:
the total number of cases
number of complete cases
the number of missing values
the proportion of total cases with missing values
describe_na(data = mtcars, y = mpg) #data frame column input method
#> cases n na p_na
#> 1: 32 32 0 0
describe_na(mtcars$mpg) #vector input method
#> cases n na p_na
#> 1: 32 32 0 0