Obtain a summary of missing values for a vector/variable. This function is a more efficient alternative to describe when assessment of missing values is the focus of describing a variable/vector. Uses a combination of tidyverse packages and data.table to provide a user-friendly interface that is pipe-friendly while leveraging the excellent performance of data.table. The use of the ... argument also makes it incredibly easy to obtain summaries split by grouping variables. To obtain summaries of missing values for all variables in a data frame use describe_na_all instead.

describe_na(data, y = NULL, ..., digits = 4, output = c("dt", "tibble"))

Arguments

data

Either a vector or a data frame or tibble containing the vector ("y") to be summarized and any grouping variables.

y

If the data object is a data.frame, this is the variable for which you wish to obtain a summary of missing values. You can use either the quoted or unquoted name of the variable, e.g. "y_var" or y_var.

...

If the data object is a data.frame, this special argument accepts any number of unquoted grouping variable names (also present in the data source) to use for subsetting, separated by commas, e.g. group_var1, group_var2. Also accepts a character vector of column names or index numbers, e.g. c("group_var1", "group_var2") or c(1, 2), but not a mixture of formats in the same call. If no column names are specified, all columns will be used.

digits

This determines the number of digits used for rounding of the "p_na" column in the output.

output

Output type for each class of variables. dt" for data.table or "tibble" for tibble.

Value

A tibble or data.table with the following columns in addition to any specified grouping variables:

cases

the total number of cases

n

number of complete cases

na

the number of missing values

p_na

the proportion of total cases with missing values

Author

Craig P. Hutton, craig.hutton@gov.bc.ca

Examples


describe_na(data = mtcars, y = mpg) #data frame column input method
#>    cases  n na p_na
#> 1:    32 32  0    0

describe_na(mtcars$mpg) #vector input method
#>    cases  n na p_na
#> 1:    32 32  0    0