Cleans water quality data. After standardization using standardize_wqdata
replicates (two or more readings for the same variable on the same date) are averaged
using the mean
function.
Readings for the same variable on the same date but at different levels of the
columns specified in by are not considered replicates. The clean_wqdata
function is automatically called by calc_limits
prior
to calculating limits.
clean_wqdata(
x,
by = NULL,
max_cv = Inf,
sds = 10,
ignore_undetected = TRUE,
large_only = TRUE,
delete_outliers = FALSE,
remove_blanks = FALSE,
messages = getOption("wqbc.messages", default = TRUE),
FUN = mean
)
The data.frame to clean.
A character vector of the columns in x to perform the cleaning by. If you have multiple stations specify the column name that contains the station IDs.
A number indicating the maximum permitted coefficient of variation for replicates.
The number of standard deviations above which a value is considered an outlier.
A flag indicating whether to ignore undetected values when calculating the average deviation and identifying outliers.
A flag indicating whether only large values which exceed the sds should be identified as outliers.
A flag indicating whether to delete outliers or merely flag them.
Should blanks be removed? Blanks are assumed to be denoted by
a value of "Blank..."
in the SAMPLE_CLASS
column. Default FALSE
A flag indicating whether to print messages.
The function to use for summaries, e.g. median
, mean
, or max
. Default mean
If there are three or more replicates with a coefficient of variation (CV) in
exceedance of max_cv
then the replicates with the highest absolute deviation
is dropped until the CV is less than or equal to max_cv
or only two values remain. By default all values are averaged.
A max_cv value of 1.29 is exceeded by two zero and one positive value (CV = 1.73) or by two identical positive values and a third value an order or magnitude greater (CV = 1.30). It is not exceed by one zero and two identical positive values (CV = 0.87).
clean_wqdata(wqbc::dummy, messages = TRUE)
#> Cleaning water quality data...
#> Identified 0 outliers in water quality data.
#> Cleansed water quality data.
#> # A tibble: 9 × 6
#> Date Variable Value Units Outlier DetectionLimit
#> <date> <chr> <dbl> <chr> <lgl> <dbl>
#> 1 2000-01-01 Aluminium Dissolved 7.67 mg/L FALSE NA
#> 2 2000-01-04 Aluminium Dissolved 1000. ug/L FALSE NA
#> 3 2000-01-05 Aluminium Dissolved 20.5 mg/L FALSE NA
#> 4 2000-01-06 Aluminium Dissolved 15 mg/L FALSE NA
#> 5 2000-01-02 DISSOLVED ALUMINUM 1000. MG/L FALSE NA
#> 6 1978-12-01 Kryptonite 1 ug/L FALSE NA
#> 7 1977-05-25 Zinc Total 500. ug/L FALSE NA
#> 8 1978-12-01 pH 7 PH UNITS FALSE NA
#> 9 2000-01-01 pH 8.75 PH UNITS FALSE NA