Clean up a data frame by parsing/updating column classes, converting column names to a common case for easier use, and remove empty rows & columns. A convenience wrapper for some helpful routine cleaning functions from the janitor, readr, & tibble packages.

wash_df(
  data,
  clean_names = TRUE,
  case = "snake",
  remove_empty = TRUE,
  remove_which = c("rows", "cols"),
  parse = TRUE,
  guess_integer = FALSE,
  na = c("", "NA"),
  rownames_to_column = FALSE,
  col_name = "rowname",
  column_to_rownames = FALSE,
  names_col = "rowname"
)

Arguments

data

A messy data frame that contains inappropriate column classifications, inconsistently structured column names, empty rows/columns

clean_names

If TRUE (default), applies clean_names to reformat column names according to the specified case.

case

The case/format you want column names to be converted to if clean_names = TRUE. Default is snake_case.

remove_empty

If TRUE (the default), applies remove_empty to remove empty rows &/or columns as per remove_which

remove_which

Either "rows" to remove empty rows, "cols" to remove empty columns, or c("rows", "cols") to remove both (the default).

parse

If TRUE (the default), applies parse_guess to each column in data to guess the appropriate column classes and update them accordingly.

guess_integer

If TRUE, will classify variables containing whole numbers as integer, otherwise they are classified as the more general double/numeric class.

na

A character vector of values that should be read as missing/NA when parse = TRUE. Default is c("", "NA").

rownames_to_column

If TRUE, applies rownames_to_column to add the row names of data as a column. This is often helpful when cleaning up a data frame or tibble that used to be a matrix with row names.

col_name

If rownames_to_column = TRUE, this specifies the name of the new column to store the row names in.

column_to_rownames

If TRUE, applies column_to_rownames to use the values of a column as the row names for the data object.

names_col

If column_to_rownames = TRUE, this specifies the column containing the names you want to assign to the rows of the data object.

Value

An updated version of the input data, modified according to the chosen options.

See also

Author

Craig P. Hutton, Craig.Hutton@gov.bc.ca

Examples

data(mtcars)

mtcars$`Extra Column` <- rep(NA, length.out = nrow(mtcars)) #add an empty column

mtcars[33:50,] <- NA #add some missing rows

mtcars #now mtcars is messy & more like a real raw data set
#>                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#> Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#> Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
#> Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
#> Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
#> Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
#> Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
#> Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
#> Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
#> Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
#> Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
#> Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
#> Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
#> Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
#> Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
#> Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
#> Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
#> Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
#> Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
#> Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
#> AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
#> Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
#> Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
#> Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
#> Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
#> Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
#> Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
#> Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
#> Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
#> Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
#> 33                    NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
#> 34                    NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
#> 35                    NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
#> 36                    NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
#> 37                    NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
#> 38                    NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
#> 39                    NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
#> 40                    NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
#> 41                    NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
#> 42                    NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
#> 43                    NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
#> 44                    NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
#> 45                    NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
#> 46                    NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
#> 47                    NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
#> 48                    NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
#> 49                    NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
#> 50                    NA  NA    NA  NA   NA    NA    NA NA NA   NA   NA
#>                     Extra Column
#> Mazda RX4                     NA
#> Mazda RX4 Wag                 NA
#> Datsun 710                    NA
#> Hornet 4 Drive                NA
#> Hornet Sportabout             NA
#> Valiant                       NA
#> Duster 360                    NA
#> Merc 240D                     NA
#> Merc 230                      NA
#> Merc 280                      NA
#> Merc 280C                     NA
#> Merc 450SE                    NA
#> Merc 450SL                    NA
#> Merc 450SLC                   NA
#> Cadillac Fleetwood            NA
#> Lincoln Continental           NA
#> Chrysler Imperial             NA
#> Fiat 128                      NA
#> Honda Civic                   NA
#> Toyota Corolla                NA
#> Toyota Corona                 NA
#> Dodge Challenger              NA
#> AMC Javelin                   NA
#> Camaro Z28                    NA
#> Pontiac Firebird              NA
#> Fiat X1-9                     NA
#> Porsche 914-2                 NA
#> Lotus Europa                  NA
#> Ford Pantera L                NA
#> Ferrari Dino                  NA
#> Maserati Bora                 NA
#> Volvo 142E                    NA
#> 33                            NA
#> 34                            NA
#> 35                            NA
#> 36                            NA
#> 37                            NA
#> 38                            NA
#> 39                            NA
#> 40                            NA
#> 41                            NA
#> 42                            NA
#> 43                            NA
#> 44                            NA
#> 45                            NA
#> 46                            NA
#> 47                            NA
#> 48                            NA
#> 49                            NA
#> 50                            NA

#clean it up and convert the row names to a column
mtcars <- wash_df(mtcars, rownames_to_column = TRUE, col_name = "car")

mtcars #the empty rows and column are gone, huzzah! So is that awkard column name!
#> # A tibble: 32 × 12
#>    car           mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <chr>       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 Mazda RX4    21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2 Mazda RX4 …  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3 Datsun 710   22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4 Hornet 4 D…  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5 Hornet Spo…  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6 Valiant      18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7 Duster 360   14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8 Merc 240D    24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9 Merc 230     22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10 Merc 280     19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # … with 22 more rows

#or turn a column with rownames into row names
mtcars <- wash_df(mtcars, column_to_rownames = TRUE, names_col = "car")
mtcars
#>                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#> Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#> Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
#> Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
#> Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
#> Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
#> Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
#> Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
#> Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
#> Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
#> Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
#> Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
#> Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
#> Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
#> Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
#> Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
#> Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
#> Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
#> Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
#> Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
#> AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
#> Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
#> Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
#> Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
#> Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
#> Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
#> Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
#> Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
#> Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
#> Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2