Clean up a data frame by parsing/updating column classes, converting column names to a common case for easier use, and remove empty rows & columns. A convenience wrapper for some helpful routine cleaning functions from the janitor, readr, & tibble packages.
A messy data frame that contains inappropriate column classifications, inconsistently structured column names, empty rows/columns
If TRUE (default), applies
clean_names
to reformat column names according to
the specified case.
The case/format you want column names to be converted to if clean_names = TRUE. Default is snake_case.
If TRUE (the default), applies
remove_empty
to remove empty rows &/or columns as
per remove_which
Either "rows" to remove empty rows, "cols" to remove empty columns, or c("rows", "cols") to remove both (the default).
If TRUE (the default), applies parse_guess
to each column in data to guess the appropriate column classes and update
them accordingly.
If TRUE, will classify variables containing whole numbers as integer, otherwise they are classified as the more general double/numeric class.
A character vector of values that should be read as missing/NA when parse = TRUE. Default is c("", "NA").
If TRUE, applies
rownames_to_column
to add the row names of data as a
column. This is often helpful when cleaning up a data frame or tibble that
used to be a matrix with row names.
If rownames_to_column = TRUE, this specifies the name of the new column to store the row names in.
If TRUE, applies
column_to_rownames
to use the values of a column as
the row names for the data object.
If column_to_rownames = TRUE, this specifies the column containing the names you want to assign to the rows of the data object.
An updated version of the input data, modified according to the chosen options.
data(mtcars)
mtcars$`Extra Column` <- rep(NA, length.out = nrow(mtcars)) #add an empty column
mtcars[33:50,] <- NA #add some missing rows
mtcars #now mtcars is messy & more like a real raw data set
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#> Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#> Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#> Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#> Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#> Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#> Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#> Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#> Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
#> Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
#> Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#> Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#> Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#> Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#> AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#> Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#> Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#> Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#> Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
#> Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#> Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#> Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#> Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
#> Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
#> 33 NA NA NA NA NA NA NA NA NA NA NA
#> 34 NA NA NA NA NA NA NA NA NA NA NA
#> 35 NA NA NA NA NA NA NA NA NA NA NA
#> 36 NA NA NA NA NA NA NA NA NA NA NA
#> 37 NA NA NA NA NA NA NA NA NA NA NA
#> 38 NA NA NA NA NA NA NA NA NA NA NA
#> 39 NA NA NA NA NA NA NA NA NA NA NA
#> 40 NA NA NA NA NA NA NA NA NA NA NA
#> 41 NA NA NA NA NA NA NA NA NA NA NA
#> 42 NA NA NA NA NA NA NA NA NA NA NA
#> 43 NA NA NA NA NA NA NA NA NA NA NA
#> 44 NA NA NA NA NA NA NA NA NA NA NA
#> 45 NA NA NA NA NA NA NA NA NA NA NA
#> 46 NA NA NA NA NA NA NA NA NA NA NA
#> 47 NA NA NA NA NA NA NA NA NA NA NA
#> 48 NA NA NA NA NA NA NA NA NA NA NA
#> 49 NA NA NA NA NA NA NA NA NA NA NA
#> 50 NA NA NA NA NA NA NA NA NA NA NA
#> Extra Column
#> Mazda RX4 NA
#> Mazda RX4 Wag NA
#> Datsun 710 NA
#> Hornet 4 Drive NA
#> Hornet Sportabout NA
#> Valiant NA
#> Duster 360 NA
#> Merc 240D NA
#> Merc 230 NA
#> Merc 280 NA
#> Merc 280C NA
#> Merc 450SE NA
#> Merc 450SL NA
#> Merc 450SLC NA
#> Cadillac Fleetwood NA
#> Lincoln Continental NA
#> Chrysler Imperial NA
#> Fiat 128 NA
#> Honda Civic NA
#> Toyota Corolla NA
#> Toyota Corona NA
#> Dodge Challenger NA
#> AMC Javelin NA
#> Camaro Z28 NA
#> Pontiac Firebird NA
#> Fiat X1-9 NA
#> Porsche 914-2 NA
#> Lotus Europa NA
#> Ford Pantera L NA
#> Ferrari Dino NA
#> Maserati Bora NA
#> Volvo 142E NA
#> 33 NA
#> 34 NA
#> 35 NA
#> 36 NA
#> 37 NA
#> 38 NA
#> 39 NA
#> 40 NA
#> 41 NA
#> 42 NA
#> 43 NA
#> 44 NA
#> 45 NA
#> 46 NA
#> 47 NA
#> 48 NA
#> 49 NA
#> 50 NA
#clean it up and convert the row names to a column
mtcars <- wash_df(mtcars, rownames_to_column = TRUE, col_name = "car")
mtcars #the empty rows and column are gone, huzzah! So is that awkard column name!
#> # A tibble: 32 × 12
#> car mpg cyl disp hp drat wt qsec vs am gear carb
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Mazda RX4 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 Mazda RX4 … 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 Datsun 710 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 Hornet 4 D… 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 Hornet Spo… 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 Valiant 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 Duster 360 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 Merc 240D 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 Merc 230 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 Merc 280 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> # … with 22 more rows
#or turn a column with rownames into row names
mtcars <- wash_df(mtcars, column_to_rownames = TRUE, names_col = "car")
mtcars
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#> Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#> Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#> Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#> Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
#> Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
#> Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
#> Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#> Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
#> Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
#> Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#> Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#> Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
#> Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
#> AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
#> Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
#> Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
#> Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#> Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
#> Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
#> Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
#> Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
#> Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
#> Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2