Cleans water quality data. After standardization using standardize_wqdata replicates (two or more readings for the same variable on the same date) are averaged using the mean function. Readings for the same variable on the same date but at different levels of the columns specified in by are not considered replicates. The clean_wqdata function is automatically called by calc_limits prior to calculating limits.

clean_wqdata(
  x,
  by = NULL,
  max_cv = Inf,
  sds = 10,
  ignore_undetected = TRUE,
  large_only = TRUE,
  delete_outliers = FALSE,
  remove_blanks = FALSE,
  messages = getOption("wqbc.messages", default = TRUE),
  FUN = mean
)

Arguments

x

The data.frame to clean.

by

A character vector of the columns in x to perform the cleaning by. If you have multiple stations specify the column name that contains the station IDs.

max_cv

A number indicating the maximum permitted coefficient of variation for replicates.

sds

The number of standard deviations above which a value is considered an outlier.

ignore_undetected

A flag indicating whether to ignore undetected values when calculating the average deviation and identifying outliers.

large_only

A flag indicating whether only large values which exceed the sds should be identified as outliers.

delete_outliers

A flag indicating whether to delete outliers or merely flag them.

remove_blanks

Should blanks be removed? Blanks are assumed to be denoted by a value of "Blank..." in the SAMPLE_CLASS column. Default FALSE

messages

A flag indicating whether to print messages.

FUN

The function to use for summaries, e.g. median, mean, or max. Default mean

Details

If there are three or more replicates with a coefficient of variation (CV) in exceedance of max_cv then the replicates with the highest absolute deviation is dropped until the CV is less than or equal to max_cv or only two values remain. By default all values are averaged.

A max_cv value of 1.29 is exceeded by two zero and one positive value (CV = 1.73) or by two identical positive values and a third value an order or magnitude greater (CV = 1.30). It is not exceed by one zero and two identical positive values (CV = 0.87).

Examples

clean_wqdata(wqbc::dummy, messages = TRUE)
#> Cleaning water quality data...
#> Identified 0 outliers in water quality data.
#> Cleansed water quality data.
#> # A tibble: 9 × 6
#>   Date       Variable              Value Units    Outlier DetectionLimit
#>   <date>     <chr>                 <dbl> <chr>    <lgl>            <dbl>
#> 1 2000-01-01 Aluminium Dissolved    7.67 mg/L     FALSE               NA
#> 2 2000-01-04 Aluminium Dissolved 1000.   ug/L     FALSE               NA
#> 3 2000-01-05 Aluminium Dissolved   20.5  mg/L     FALSE               NA
#> 4 2000-01-06 Aluminium Dissolved   15    mg/L     FALSE               NA
#> 5 2000-01-02 DISSOLVED ALUMINUM  1000.   MG/L     FALSE               NA
#> 6 1978-12-01 Kryptonite             1    ug/L     FALSE               NA
#> 7 1977-05-25 Zinc Total           500.   ug/L     FALSE               NA
#> 8 1978-12-01 pH                     7    PH UNITS FALSE               NA
#> 9 2000-01-01 pH                     8.75 PH UNITS FALSE               NA