`vignettes/articles/fasstr_dataRetrieval.Rmd`

`fasstr_dataRetrieval.Rmd`

`fasstr`

, the Flow Analysis Summary Statistics Tool for R,
is a set of R functions to
tidy, summarize, analyze, trend, and visualize streamflow data. This
package summarizes continuous daily mean streamflow data into various
daily, monthly, annual, and long-term statistics, completes trending and
frequency analyses.

This vignette explores how to obtain and integrate U.S. Geological
Survey (USGS) streamflow data, available through the National Water Information System
(NWIS) using the `dataRetrieval`

R package, with the various `fasstr`

functions.

This vignette will use the following packages:

```
library(fasstr)
library(dataRetrieval) # for getting USGS NWIS data
library(tidyhydat) # for getting ECCC HYDAT data
library(dplyr) # for data wrangling and pipelines
library(ggplot2) # for modifying fasstr plots
```

`dataRetrieval`

is an R package with a collection of
functions to retrieve hydrologic and water quality data hosted on USGS
and U.S. Environmental Protection Agency (EPA) web services. USGS
hydrologic data are pulled from https://waterservices.usgs.gov/ and https://waterdata.usgs.gov/nwis. Water quality data are
obtained from the Water Quality Portal https://www.waterqualitydata.us/. More information on
using `dataRetrieval`

can be found in the vignette.

This package is another useful tool for hydrologists, especially for
those looking to use USGS streamflow hydrometric station data with
`fasstr`

or other streamflow packages and analyses. You can
search for stations on the NWIS website https://waterdata.usgs.gov/nwis/rt. Unlike the
`tidyhydat`

package to obtain Environment and Climate Change
Canada’s (ECCC) HYDAT historical discharge data,
`dataRetrieval`

pulls all data from the NWIS web services and
does not require downloading and storing a database.

As USGS stations collect more than just streamflow, a parameter code
is required for many of the functions to choose the appropriate data
type. The parameter code for daily mean discharge data is “00060”. The
units of this discharge parameter are in US customary cubic feet per
second (cfs). While using US customary units is not an issue for many
`fasstr`

functions, as units do not matter for many analyses
or plots titles and column headers can simply be renamed, there are some
functions that use internal metric unit conversions so some adaptations
will be required (see sections below).

Here is a quick workflow demo showing how to get hydrometric station
information, and daily and annual peak data using
`dataRetrieval`

, adapted from the vignette:

```
# Hoko River Near Sekiu, WA (12043300)
site_number <- "12043300"
# Get site information (i.e. name, coordinates, drainage basin area, etc.)
site_info <- readNWISsite(site_number)
# Get daily discharge data
# Requires parameter code for discharge (00060), in cubic feet per second (cfs)
usgs_data <- readNWISdv(siteNumbers = site_number,
parameterCd = "00060",
startDate = "1980-01-01",
endDate = "2019-12-31")
# Get annual instantaneous peak discharge data (based on water years (Oct-Sep))
peak_data <- readNWISpeak(siteNumbers = site_number)
```

For metric discharge analyses using NWIS data from the USGS,
discharge values can easily be converted to cms from cfs (0.028317 cms
per cfs) for use in `fasstr`

.

The default `dataRetrieval`

NWIS data column formats are
as follows:

```
usgs_data <- readNWISdv(siteNumbers = "12043300", parameterCd = "00060")
head(usgs_data)
```

```
## agency_cd site_no Date X_00060_00003 X_00060_00003_cd
## 1 USGS 12043300 1962-08-01 25 A
## 2 USGS 12043300 1962-08-02 27 A
## 3 USGS 12043300 1962-08-03 40 A
## 4 USGS 12043300 1962-08-04 40 A
## 5 USGS 12043300 1962-08-05 60 A
## 6 USGS 12043300 1962-08-06 88 A
```

To keep the NWIS data column names for `fasstr`

functions,
just remember to convert the discharge units and then send to
`fasstr`

functions using the ‘values’ argument. Date column
names are the same in both NWIS and HYDAT data. Here is an example of
this situation using base R:

```
# Conversion factor from cfs to cms
cfs_to_cms <- 0.028317
# Get cfs data
usgs_data <- readNWISdv(siteNumbers = "12043300",
parameterCd = "00060")
# Convert cfs to cms
usgs_data$X_00060_00003 <- usgs_data$X_00060_00003 * cfs_to_cms
# Set `values` and 'groups' (if multiple stations) to the appropriate column names
plot_flow_data(data = usgs_data,
values = X_00060_00003,
groups = site_no)
```

and using `dplyr`

and pipelines:

```
# Conversion factor from cfs to cms
cfs_to_cms <- 0.028317
# One station
usgs_data <- readNWISdv(siteNumbers = "12043300",
parameterCd = "00060") %>%
mutate(X_00060_00003 = X_00060_00003 * cfs_to_cms)
plot_flow_data(data = usgs_data,
values = X_00060_00003)
# Multiple stations
usgs_data <- readNWISdv(siteNumbers = c("12043300", "12048000"),
parameterCd = "00060") %>%
mutate(X_00060_00003 = X_00060_00003 * cfs_to_cms)
plot_flow_data(data = usgs_data,
values = X_00060_00003,
groups = site_no)
```

NWIS column names can also be changed to match those with
`tidyhydat`

and `fasstr`

‘s defaults data column
names (i.e. ’STATION_NUMBER’, ‘Date’, and ‘Value’) to work easily with
`fasstr`

. Renaming ‘site_no’ to ‘STATION_NUMBER’ is not
necessary if one station is used, but is useful if data contains
multiple stations. Here is an example of this situation using base
R:

```
# Get the data
usgs_data <- readNWISdv(siteNumbers = "12043300", parameterCd = "00060")
# Conversion factor from cfs to cms
cfs_to_cms <- 0.028317
# Rename columns
names(usgs_data)[names(usgs_data) == 'site_no'] <- 'STATION_NUMBER'
names(usgs_data)[names(usgs_data) == 'X_00060_00003'] <- 'Value'
# Convert cfs to cms
usgs_data$Value <- usgs_data$Value * cfs_to_cms
# Use a fasstr function
calc_annual_extremes(usgs_data)
```

and using `dplyr`

and pipelines:

```
# Conversion factor from cfs to cms
cfs_to_cms <- 0.028317
# One station
usgs_data <- readNWISdv(siteNumbers = "12043300",
parameterCd = "00060") %>%
rename(Value = X_00060_00003) %>%
mutate(Value = Value * cfs_to_cms)
# Multiple stations
usgs_data <- readNWISdv(siteNumbers = c("12043300", "12048000"),
parameterCd = "00060") %>%
rename(STATION_NUMBER = site_no,
Value = X_00060_00003) %>%
mutate(Value = Value * cfs_to_cms)
# Use a fasstr function
calc_annual_extremes(usgs_data)
```

Some `fasstr`

functions require an upstream drainage basin
area to calculate discharge as area-based runoff yield statistics (in
depth units, commonly millimetres in Canada). NWIS upstream drainage
basin areas (in square miles) can be found in the site information using
the `dataRetrieval::readNWISsite()`

function and then can be
converted to square kilometres (2.58999 sqkm per sqmile):

```
# Conversion factor from cfs to cms
cfs_to_cms <- 0.028317
# Conversion factor from sq. miles to sq. km
sqmile_to_sqkm <- 2.58999
# Get station information
site_info <- readNWISsite("12043300")
# Get the drainage area and convert to sq. km
basin_area_NWIS <- site_info$drain_area_va * sqmile_to_sqkm
# Get data and apply to fasstr function
readNWISdv(siteNumbers = "12043300",
parameterCd = "00060") %>%
rename(Value = X_00060_00003) %>%
mutate(Value = Value * cfs_to_cms) %>%
plot_daily_cumulative_stats(basin_area = basin_area_NWIS,
use_yield = TRUE,
add_year = 2000)
```

Here is one way of providing multiple basin areas for multiple stations:

```
# Conversion factor from cfs to cms
cfs_to_cms <- 0.028317
# Conversion factor from sq. miles to sq. km
sqmile_to_sqkm <- 2.58999
# Get station information
site_info <- readNWISsite(c("12043300", "12048000")) %>%
select(site_no, drain_area_va) %>%
mutate(Basin_Area_sqkm = drain_area_va * sqmile_to_sqkm)
# Create named vector of areas to send to basin area
areas <- setNames(site_info$Basin_Area_sqkm, site_info$site_no)
# Get data and apply to fasstr function
readNWISdv(siteNumbers = c("12043300", "12048000"),
parameterCd = "00060") %>%
rename(Value = X_00060_00003,
STATION_NUMBER = site_no) %>%
mutate(Value = Value * cfs_to_cms) %>%
plot_daily_cumulative_stats(use_yield = TRUE,
basin_area = areas,
add_year = 2000)
```

Functions can also be created if conversions will frequently occur
within workflows. The first example here converts and formats data
already downloaded using `dataRetrieval`

’s
`readNWISdv()`

function for `fasstr`

:

```
# Create function to convert to metric and HYDAT structure
usgs_to_hydat <- function(data){
names(data)[names(data) == 'site_no'] <- 'STATION_NUMBER'
names(data)[names(data) == 'X_00060_00003'] <- 'Value'
data$Value <- data$Value * 0.028317
return(data)
}
# Get data, convert, and send to fasstr function:
# in base R
usgs_data <- readNWISdv("12043300", parameterCd = "00060")
usgs_data <- usgs_to_hydat(usgs_data)
calc_annual_stats(usgs_data)
# and in a pipeline
readNWISdv("12043300", parameterCd = "00060") %>%
usgs_to_hydat() %>%
calc_annual_stats()
```

This second function integrates `readNWISdv()`

with the
first function that will both download and convert the data for
`fasstr`

functions by supplying NWIS site numbers:

```
# Create function to download NWIS data and convert to metric and HYDAT structure
readNWISdv_hydat <- function(siteNumbers){
data <- readNWISdv(siteNumbers = siteNumbers, parameterCd = "00060")
names(data)[names(data) == 'site_no'] <- 'STATION_NUMBER'
names(data)[names(data) == 'X_00060_00003'] <- 'Value'
data$Value <- data$Value * 0.028317
return(data)
}
# Get data, convert, and send to fasstr function:
# in base R
usgs_data <- readNWISdv_hydat("12043300")
calc_annual_stats(usgs_data)
# and in a pipeline
readNWISdv_hydat("12043300") %>%
calc_annual_stats()
```

This third function is the same as the second, but adds a column of drainage basin areas, in square kilometres, extracted and converted from NWIS:

```
# Create function to download NWIS data and convert to metric and HYDAT structure
readNWISdv_hydat <- function(siteNumbers){
data <- readNWISdv(siteNumbers = siteNumbers, parameterCd = "00060")
names(data)[names(data) == 'site_no'] <- 'STATION_NUMBER'
names(data)[names(data) == 'X_00060_00003'] <- 'Value'
data$Value <- data$Value * 0.028317
basin_area <- readNWISsite(siteNumbers = siteNumbers)
basin_area$Basin_Area_sqkm <- basin_area$drain_area_va * 2.58999
basin_area <- data.frame(STATION_NUMBER = basin_area$site_no,
Basin_Area_sqkm = basin_area$Basin_Area_sqkm)
data <- merge(data, basin_area, by = "STATION_NUMBER")
return(data)
}
# Get data and convert data:
usgs_data <- readNWISdv_hydat("12043300")
```

For analyses using US customary units from USGS NWIS data from
`dataRetrieval`

, most `fasstr`

functions will work
with little to no issues. Most functions take the provided data
(i.e. flows in cfs or acre-feet) and provide means, medians,
percentiles, etc. For data frame outputs, it can be assumed that if
there are no units in column names that units are the same as the data
provided. Axes labeled ‘Discharge (cms)’ on plots can simply be renamed
to match the units by extracting the plot object from its listed object
(using double square brackets `[[1]]`

, or
`magrittr::extract2(1)`

in a pipeline) and using
`ggplot2`

functions (eg.
`+ ylab('Discharge, cubic feet per second')`

or
`+ ylab('Discharge, acre-feet')`

). If units are expressed in
Volume_m3 it can be interpreted as cubic feet volume and renamed as
such. However, if column names or plots titles are labeled as Yield_mm,
there will there be additional conversion steps (see section below)
required to get the appropriate inches depth. If data is in cfs but some
volumetric results are desired to be in acre-feet, then some
modifications will also be required.

NWIS data column names from `dataRetrieval`

can easily be
used with `fasstr`

functions, with setting
`values = X_00060_00003`

or
`values = 'X_00060_00003'`

to identify the flow values column
in `fasstr`

functions. If there are multiple stations in a
data frame, the stations can be grouped separately by identifying them
using `groups = site_no`

or `groups = 'site_no'`

.
Date column names are the same in both NWIS and HYDAT data.

```
# One station
usgs_data <- readNWISdv(siteNumbers = "12043300",
parameterCd = "00060")
plot_flow_data(data = usgs_data,
values = X_00060_00003)
# Multiple stations
usgs_data_multiple <- readNWISdv(siteNumbers = c("12043300", "12048000"),
parameterCd = "00060")
plot_flow_data(data = usgs_data_multiple,
values = X_00060_00003,
groups = site_no)
# Calculate stats using water years
calc_longterm_mean(data = usgs_data,
values = X_00060_00003,
water_year_start = 10,
complete_years = TRUE)
```

NWIS column names can also be changed to match those with
`tidyhydat`

and `fasstr`

‘s defaults data column
names (i.e. ’STATION_NUMBER’, ‘Date’, and ‘Value’) to work easily with
`fasstr`

. Using original names is good for quick function
use, but renaming to the HYDAT convention makes using multiple
`fasstr`

functions easier. Renaming ‘site_no’ to
‘STATION_NUMBER’ is not necessary if one station is used, but is useful
if data contains multiple stations. Here is an example of this situation
using base R:

```
# Get the data
usgs_data <- readNWISdv(siteNumbers = "12043300", parameterCd = "00060")
# Rename columns
names(usgs_data)[names(usgs_data) == 'site_no'] <- 'STATION_NUMBER'
names(usgs_data)[names(usgs_data) == 'X_00060_00003'] <- 'Value'
```

and using `dplyr`

and pipelines:

```
# One station
usgs_data <- readNWISdv(siteNumbers = "12043300",
parameterCd = "00060") %>%
rename(Value = X_00060_00003)
# Multiple stations
usgs_data <- readNWISdv(siteNumbers = c("12043300", "12048000"),
parameterCd = "00060") %>%
rename(STATION_NUMBER = site_no,
Value = X_00060_00003)
```

Some `fasstr`

functions use require drainage basin areas
to calculate discharge as area-based runoff yield statistics (in depth
units). Since `fasstr`

internally converts discharge using
metric conversions, this makes conversions of US customary cubic feet
per second and square miles to inches depth not as simple. Luckily, if
your discharge units are in cubic feet per second and basin ares in
square miles, a simple workaround can be applied to get inches depth:
multiply the basin area by 2323.2. This number is derived from three
US/metric unit conversions that magically provide the proper
conversion:

```
# sqkm/sqmiles / (inches/mm * cms/cfs)
2.58999 / (0.0393701 * 0.028316846592)
```

`## [1] 2323.2`

So if using any functions that result in yield values (typically
‘yield’ or ‘cumulative’ functions that with columns or axes with yield
names when using `use_yield = TRUE`

), then the multiplier
needs to be applied to square mile basin areas. NWIS upstream drainage
basin areas can be found in the site information using the
`dataRetrieval`

`readNWISsite()`

function. The
following is an example of how to use the workaround, with appropriate
renaming of columns or plot axes:

```
# Get a NWIS basin area
area_sqmile <- dataRetrieval::readNWISsite(siteNumbers = "12043300") %>%
pull(drain_area_va)
# Drainage basin area conversion for fasstr
area_adjust <- 2323.2
# Get NWIS data
usgs_data <- readNWISdv(siteNumbers = "12043300",
parameterCd = "00060") %>%
rename(Value = X_00060_00003)
# Add daily yield in inches, with 2323.2 basin area multiplier
add_daily_yield(data = usgs_data,
basin_area = area_sqmile * area_adjust) %>%
rename(Yield_inch = Yield_mm)
# Calculate total annual inches, with multiplier
calc_annual_cumulative_stats(data = usgs_data,
use_yield = TRUE,
basin_area = area_sqmile * area_adjust,
water_year_start = 10) %>%
rename(Total_Yield_inch = Total_Yield_mm)
# Plot daily cumulative discharge, with multiplier
plot_daily_cumulative_stats(data = usgs_data,
use_yield = TRUE,
basin_area = area_sqmile * area_adjust,
water_year_start = 10) %>%
magrittr::extract2(1) + # extracts plot from list
ggplot2::ylab("Runoff, inch")
```

Like data in cfs units, if data from `dataRetrieval`

is
converted to acre-feet per day units, most functions should work
appropriately (if no units in column headers or plot axis titles in cms
(and simply be renamed)).

```
# cfs to acre-feet per day
cfs_to_acft <- 1.983459
# Get NWIS data, rename to Value, and convert cfs to acre-feet
usgs_data <- readNWISdv(siteNumbers = "12043300",
parameterCd = "00060") %>%
rename(Value = X_00060_00003) %>%
mutate(Value = Value * cfs_to_acft)
# Plot the acre-feet units, extract the plot (using [[1]]),
# and rename the y-axis to appropriate units
plot_daily_stats(usgs_data, complete_years = TRUE)[[1]]+
ylab("Discharge, acre-feet per day")
```

As some `fasstr`

functions report numbers in volumetric
discharge and derive values from ‘per seconds’ units (i.e. cms or cfs)
the original values are multiplied by 86400 seconds to obtain the daily
volumes. Since acre-feet are typically in acre-feet per day, and not
seconds, an easy conversion from acre-feet per day to acre-feet per
second (divide by `86400`

) will allow the user to get
cumulative acre-feet.

```
# cfs to acre-feet per second
cfs_to_acftsec <- 1.983459 / 86400
# Get NWIS data, rename to Value, and convert cfs to
# acre-feet per second
usgs_data <- readNWISdv(siteNumbers = "12043300",
parameterCd = "00060") %>%
rename(Value = X_00060_00003) %>%
mutate(Value = Value * cfs_to_acftsec)
# Plot the daily cumulative stats for all years with 2000 data,
# extract the plot and rename the y-axis
plot_daily_cumulative_stats(usgs_data,
add_year = 2000)[[1]]+
ylab("Cumulative Discharge, acre-feet")
# Calculate annual and seasonal cumulative flows and
# rename the column headers
stats <- calc_annual_cumulative_stats(usgs_data,
include_seasons = TRUE)
names(stats) <- gsub('_m3', '_acft', names(stats))
```

Using acre-feet per second data will not work for some functions that
ideally work best with data in cfs or cms units and include volumetric
totals like `calc_all_annual_stats()`

,
`compute_annual_trends()`

, or
`compute_full_analysis()`

. However, results in total
volumetric cubic-feet can simply be converted to acre-feet per day by
multiplying the cubic-feet per day results by
`0.00002295689`

.

For the rest of the vignette examples, we will set up data frames of
daily stream discharge from `dataRetrieval`

called
`usgs_data`

and `usgs_data_multiple`

, and rename
the columns to match the `fasstr`

defaults
(i.e. STATION_NUMBER, Date, and Value). We will also extract a basin
area for the `usgs_data`

data set.

```
# Get daily data for examples
usgs_data <- readNWISdv(siteNumbers = "12043300",
parameterCd = "00060") %>%
rename(Value = X_00060_00003,
STATION_NUMBER = site_no)
# Get drainage basin area for examples
usgs_basin_area <- readNWISsite(siteNumbers = "12043300") %>%
pull(drain_area_va)
# Get daily data for examples with multiple stations
usgs_data_multiple <- readNWISdv(siteNumbers = c("12043300", "12048000"),
parameterCd = "00060") %>%
rename(Value = X_00060_00003,
STATION_NUMBER = site_no)
```

To be consistent with many USGS streamflow analyses, these examples
will use Water Years (from Oct-Sep) instead of Calendar Years by setting
`water_year_start = 10`

in the necessary functions. With
`fasstr`

a water year is identified by the calendar year in
which it ends. For example, a water year from October 2019 to September
2020 will be identified as 2020.

There are several functions that are used to prepare your flow data
set for your own analyses by adding columns of various date variables,
adding rolling means, and adding cumulative discharges, amongst others.
These functions begin with `add_`

or `fill_`

and
add columns or rows, respectively, to your data frame of daily flow
values. See the documentation for more info on the functions. The
following example shows how to apply the tidying functions in a
continuous pipeline and using water years. Note how the basin area
correction is applied in the two `_cumulative_`

functions and
columns are renamed to match the appropriate units where necessary.

```
usgs_data %>%
fill_missing_dates(water_year_start = 10) %>%
add_date_variables(water_year_start = 10) %>%
add_seasons(water_year_start = 10,
seasons_length = 3) %>%
add_rolling_means() %>%
add_daily_volume() %>%
rename(Volume_cuft = Volume_m3) %>%
add_cumulative_volume(water_year_start = 10) %>%
rename(Cumul_Volume_cuft = Cumul_Volume_m3) %>%
add_daily_yield(basin_area = usgs_basin_area * 2323.2) %>%
rename(Yield_inch = Yield_mm) %>%
add_cumulative_yield(water_year_start = 10,
basin_area = usgs_basin_area * 2323.2) %>%
rename(Cumul_Yield_inch = Cumul_Yield_mm)
```

It may be useful to explore data availability and quality before using the data for analysis. Some of the screening functions can help you screen for outliers and missing data. Note the renaming of axis labels in the plot examples to the necessary units with one or multiple station plots.

```
# Get the number of missing dates and some basic summary statistics (calculated regardless
# of missing dates)
screen_flow_data(usgs_data,
water_year_start = 10)
# Plot the missing dates
plot_missing_dates(usgs_data,
water_year_start = 10)
# Plot the data, extract the plot, and rename the axis
plot_flow_data(usgs_data)[[1]]+
ylab("Discharge, cfs")
# Alternatively, plot the data, and rename the axis label by modifying the ggplot code
usgs_daily_plot <- plot_flow_data(usgs_data)
usgs_daily_plot$Daily_Flows$labels$y <- "Discharge, cfs"
usgs_daily_plot
# Plotting daily flows of multiple stations on one plot, extracting and renaming axis
plot_flow_data(usgs_data_multiple,
one_plot = TRUE)[[1]]+
ylab("Discharge, cfs")
# Plotting daily flows of multiple stations on different plots, and
# renaming axes of existing ggplot2 objects
usgs_daily_multiple <- plot_flow_data(usgs_data_multiple)
usgs_daily_multiple$`12043300_Daily_Flows`$labels$y <- "Discharge, cfs"
usgs_daily_multiple$`12048000_Daily_Flows`$labels$y <- "Discharge, cfs"
usgs_daily_multiple
```

The majority of the `fasstr`

functions produce statistics
over a certain time period, either long-term, annually, monthly, or
daily. These statistics are produced using the `calc_`

functions and many can be visualized using corresponding
`plot_`

functions. Most of the `calc_`

functions
do not require any changes to column names, unless they require as
drainage basin area, where the multiplier fix will be applied. Most of
the plots will require renaming axes as noted.

```
# Calculate long-term and long-term monthly summary statistics based on daily data
# using only years with complete data
calc_longterm_daily_stats(usgs_data,
water_year_start = 10,
complete_years = TRUE)
# Calculate the long-term mean annual discharge (MAD) using only years with complete
# data and calculate 5%, 10%, and 20% MADs, for multiple stations
calc_longterm_mean(usgs_data_multiple,
water_year_start = 10,
complete_years = TRUE,
percent_MAD = c(5,10,20))
# Plot flow duration curves for each month and combining summer months, using only
# complete years, and renaming the axis title
plot_flow_duration(usgs_data,
water_year_start = 10,
complete_years = TRUE,
custom_months = 7:9,
custom_months_label = "Summer")[[1]]+
ylab("Discharge, cubic feet per second")
# Plot annual summary statistics, including annual 10th and 90th percentiles, starting
# in 1996 and making the axis logarithmic, and renaming the axis title
plot_annual_stats(usgs_data,
water_year_start = 10,
percentiles = c(10,90),
start_year = 1996,
log_discharge = TRUE)[[1]]+
ylab("Discharge, cubic feet per second")
# Calculate annual 7- and 30-day low flows
calc_annual_lowflows(usgs_data,
roll_days = c(7,30),
water_year_start = 10)
# Plot the winter-spring (Jan-July) center of volume dates for
# multiple stations
plot_annual_flow_timing(usgs_data_multiple,
percent_total = 50,
water_year_start = 10,
months = 1:7)
# Plot annual means, starting in 1996
plot_annual_means(usgs_data,
water_year_start = 10,
start_year = 1996)
# Calculate monthly summary statistics, including 25 and 27 percentiles
calc_monthly_stats(usgs_data,
percentiles = c(25,75),
water_year_start = 10)
# Calculate daily summary statistics, including 25th and 75th percentiles
calc_daily_stats(usgs_data,
percentiles = c(25,75),
water_year_start = 10,
complete_years = TRUE)
# Plot an annual hydrograph (daily summary statistics) using only data
# from complete years and add 2020 daily data for comparison, for multiple stations
plot_daily_stats(usgs_data_multiple,
complete_years = TRUE,
add_year = 2020,
water_year_start = 10)
# Plot an annual cumulative runoff hydrograph (daily summary statistics) and adding
# 2020 daily data, using the basin area and correction factor, and renaming the axis
plot_daily_cumulative_stats(usgs_data,
add_year = 2020,
use_yield = TRUE,
basin_area = usgs_basin_area * 2323.2,
water_year_start = 10)[[1]]+
ylab("Cumulative Runoff, inches")
```

The `compute_annual_trends()`

function trends annual
streamflow metrics using prewhitened non-parametric Mann-Kendall tests
using the `zyp`

package. The function calculates various annual metrics using the
`calc_all_annual_stats()`

`fasstr`

function and
then calculates and plots the trending data. The `zhang`

prewhitening method is recommended for hydrologic applications over
`yuepilon`

. See the `zyp`

package and the trending
vignette for more information.

For the trending function, many of the metrics and results are simply
statistics on the raw data and so will not require any conversions.
However, like other `fasstr`

functions that calculate
volumetric and runoff totals (using a basin area), some fixes will be
required to use a basin area (see above) and to rename data, columns,
and plot axes. Data and axes with names ending with units of volume or
yield will require renaming. The following code is examples of how to
run the trending function and modify it and the outputs for US
units.

```
# Calculate annual metrics and trend the data using the Mann-Kendall methods,
# using 'zhang' prewhitening methods, plotting a trend line on plots if α < 0.05,
# and applying the basin area correction to get inches depth for yield units.
trends <- compute_annual_trends(usgs_data,
zyp_method = "zhang",
zyp_alpha = 0.05,
basin_area = usgs_basin_area * 2323.2,
water_year_start = 10)
```

While exported values will be accurate, some data, columns, and axes
will needed to be renamed from metric to US units. If want to use
results from above, then here are some ways to convert names in the
output from `compute_annual_trends`

:

```
# Rename plot names
names(trends) <- gsub('_m3', '_cuft', names(trends))
names(trends) <- gsub('_mm', '_inch', names(trends))
# Rename the Statistics in Data and Results
trends$Annual_Trends_Data$Statistic <- gsub('_m3', '_cuft', trends$Annual_Trends_Data$Statistic)
trends$Annual_Trends_Data$Statistic <- gsub('_mm', '_inch', trends$Annual_Trends_Data$Statistic)
trends$Annual_Trends_Results$Statistic <- gsub('_m3', '_cuft', trends$Annual_Trends_Results$Statistic)
trends$Annual_Trends_Results$Statistic <- gsub('_mm', '_inch', trends$Annual_Trends_Results$Statistic)
# Rename y-axes and titles in plots
for (i in names(trends[3:length(trends)])){
if (all(trends[[i]]$labels$y == "Discharge (cms)")){
trends[[i]]$labels$y <- gsub('cms', 'cfs', trends[[i]]$labels$y)
}
if (all(trends[[i]]$labels$y == "Yield (mm)")){
trends[[i]]$labels$y <- gsub('mm', 'inch', trends[[i]]$labels$y)
trends[[i]]$labels$title <- gsub('_mm', '_inch', trends[[i]]$labels$title)
}
if (all(trends[[i]]$labels$y == "Volume (cubic metres)")){
trends[[i]]$labels$y <- gsub('metres', 'feet', trends[[i]]$labels$y)
trends[[i]]$labels$title <- gsub('_m3', '_cuft', trends[[i]]$labels$title)
}
}
```

A function can also be written if relabeling will frequently occur
within workflows. With this example, you apply the output of
`compute_annual_trends()`

directly to this wrapper function
and it will rename all appropriate labels and titles within the
results.

```
# Create wrapper function
fasstr_trends_to_US_units <- function(trends){
# Rename plot names
names(trends) <- gsub('_m3', '_cuft', names(trends))
names(trends) <- gsub('_mm', '_inch', names(trends))
# Rename the Statistics in Data and Results
trends$Annual_Trends_Data$Statistic <- gsub('_m3', '_cuft', trends$Annual_Trends_Data$Statistic)
trends$Annual_Trends_Data$Statistic <- gsub('_mm', '_inch', trends$Annual_Trends_Data$Statistic)
trends$Annual_Trends_Results$Statistic <- gsub('_m3', '_cuft', trends$Annual_Trends_Results$Statistic)
trends$Annual_Trends_Results$Statistic <- gsub('_mm', '_inch', trends$Annual_Trends_Results$Statistic)
# Rename y-axes and titles in plots
for (i in names(trends[3:length(trends)])){
if (all(trends[[i]]$labels$y == "Discharge (cms)")){
trends[[i]]$labels$y <- gsub('cms', 'cfs', trends[[i]]$labels$y)
}
if (all(trends[[i]]$labels$y == "Yield (mm)")){
trends[[i]]$labels$y <- gsub('mm', 'inch', trends[[i]]$labels$y)
trends[[i]]$labels$title <- gsub('_mm', '_inch', trends[[i]]$labels$title)
}
if (all(trends[[i]]$labels$y == "Volume (cubic metres)")){
trends[[i]]$labels$y <- gsub('metres', 'feet', trends[[i]]$labels$y)
trends[[i]]$labels$title <- gsub('_m3', '_cuft', trends[[i]]$labels$title)
}
}
return(trends)
}
# Run the wrapper function
trends <- fasstr_trends_to_US_units(trends)
```

There are several functions that perform various volume frequency
analyses. Frequency analyses are used to determine probabilities of
events of certain sizes (typically high or low flows). The analyses
produce plots of event series and computed quantiles fitted from either
Log-Pearson Type III or Weibull probability distributions. See the frequency
analysis vignette for more information. With the two options for
probability distributions (logPIII and weibull), the user should have
understanding or investigate the appropriateness of fit for the data
provided with the probability distributions. The
`fitdistrplus`

R package fitting parameters can be found the
in the `fasstr`

‘Freq_Fitting’ output and can be explored
further (see `fitdistrplus`

documentation for more
information).

These functions do analyses on the data provided without requiring
metric conversions. As such, the only fix required to format results in
US units is changing the y-axis on the plot object from “Discharge
(cms)” to desired units. Here are some examples that could be applied
using USGS data. See documentation for default arguments
(i.e. `?compute_annual_frequencies`

).

Compute annual low flows (1,3,7,and 30-day) frequencies with default arguments:

```
# Compute annual low flows frequencies with default arguments
lowflow_frequency <- compute_annual_frequencies(usgs_data,
water_year_start = 10,
prob_plot_position = "weibull", # plot positions using (i)/(n+1)
fit_distr = "PIII", # log-Pearson Type III
fit_distr_method = "MOM") # method of moments
# Change the y-axis of the plot to US units (changing the object itself)
lowflow_frequency$Freq_Plot$labels$y <- "Discharge, cubic feet per second"
# View the fitting parameters (shape, scale, location, loglikehood, AIC and BIC etc.)
# of the 7-day low flows
summary(lowflow_frequency$Freq_Fitting$`7-Day`)
# Plot the fitting distributions of the 7-day low flows
plot(lowflow_frequency$Freq_Fitting$`7-Day`)
```

Compute annual 1-day maximum frequencies with default arguments:

```
# Compute annual daily peak flows frequencies with default arguments
highflow_frequency <- compute_annual_frequencies(usgs_data,
water_year_start = 10,
use_max = TRUE,
roll_days = 1)
# Change the y-axis of the plot to US units (changing the object itself)
highflow_frequency$Freq_Plot$labels$y <- "Discharge, cubic feet per second"
highflow_frequency$Freq_Plot
```

Compute summer 7-day low flow frequencies with default arguments:

```
# Compute summer low flow frequencies with default arguments
lowflow_7day <- compute_annual_frequencies(usgs_data,
water_year_start = 10,
months = 7:9,
roll_days = 7)
# Extract the plot and change the y-axis title to US units (doesn't change the original)
lowflow_plot <- lowflow_7day$Freq_Plot +
ylab("Discharge, cubic feet per second")
```

Compute a frequency analysis on annual instantaneous peak data from NWIS using the custom frequency analysis function:

```
# dataRetrieval function to get annual instantaneous peak data, grouped by water years
usgs_peaks <- readNWISpeak("12043300") %>%
mutate(Measure = "Inst. Peaks")
# Use the default column names for events and values for the custom analysis
# and default arguments (log-PIII etc.)
flood_frequency <- compute_frequency_analysis(data = usgs_peaks,
events = peak_dt,
values = peak_va,
measures = Measure,
use_max = TRUE)
# Can also do the analysis with renaming parameters to match fasstr function, within a pipeline
flood_frequency <- readNWISpeak("12043300") %>%
select(Year = peak_dt, Value = peak_va) %>%
mutate(Measure = "Inst. Peaks") %>%
compute_frequency_analysis(use_max = TRUE)
flood_frequency$Freq_Plot$labels$y <- "Discharge, cubic feet per second"
```

Compute a single frequency quantile with default arguments. With this function only the desired quantile is returned. Use this function if you know the fitting distributions are suitable for the data set.

```
# Compute the summer 7Q10 with default arguments
compute_frequency_quantile(usgs_data,
roll_days = 7,
return_period = 10,
months = 7:9)
```