This function extends {describe_na} by applying to it all columns in a data frame using functional programming tools from the purrr package (e.g. map). To obtain a summary of missing values for a single variable in a data frame use describe_na instead. This function is a more efficient way of checking for missing values than using describe_all, which calculates additional summary statistics.

describe_na_all(data, ..., digits = 4, output = c("dt", "tibble"))

Arguments

data

A data frame or tibble.

...

This special argument accepts any number of unquoted grouping variable names (also present in the data source) to use for subsetting, separated by commas, e.g. group_var1, group_var2. Also accepts a character vector of column names or index numbers, e.g. c("group_var1", "group_var2") or c(1, 2), but not a mixture of formats in the same call. If no column names are specified, all columns will be used.

digits

This determines the number of digits used for rounding of the "p_na" column in the output.

output

Output type for each class of variables. dt" for data.table or "tibble" for tibble.

Value

A tibble or data.table with the following columns in addition to any specified grouping variables:

cases

the total number of cases

n

number of complete cases

na

the number of missing values

p_na

the proportion of total cases with missing values

Author

Craig P. Hutton, craig.hutton@gov.bc.ca

Examples


describe_na_all(mtcars)
#>     variable cases  n na p_na
#>  1:      mpg    32 32  0    0
#>  2:      cyl    32 32  0    0
#>  3:     disp    32 32  0    0
#>  4:       hp    32 32  0    0
#>  5:     drat    32 32  0    0
#>  6:       wt    32 32  0    0
#>  7:     qsec    32 32  0    0
#>  8:       vs    32 32  0    0
#>  9:       am    32 32  0    0
#> 10:     gear    32 32  0    0
#> 11:     carb    32 32  0    0