read_health_dict reads into R pre-formatted popdata data dictionaries, while health_dict_to_spec converts a health dict to a readr spec.

read_health_dict(path, sheet, ...)

health_dict_to_spec(health_dict, special = NULL)

Arguments

path

Path to the xls/xlsx file.

sheet

Sheet to read. Either a string (the name of a sheet), or an integer (the position of the sheet). Ignored if the sheet is specified via range. If neither argument specifies the sheet, defaults to the first sheet.

...

arguments passed to readxl::read_excel

health_dict

a data.frame, output of read_health_dict()

special

a named list of readr column specifications for columns where you want to override the format in the dictionary file

Value

read_health_dict: A clean data.frame of health data dictionary health_dict_to_spec: a named list of readr column specifications that can be passed on to the col_types argument of any of the readr functions, or dat_to_parquet() and friends.

Details

read_health_dict: Files are in .xlsx format and therefore require both a path and sheet argument. The rest of the function is a thin wrapper around reaxl::read_excel with some formatting taking place.

health_dict_to_spec converts a health dict created by read_health_dict to a readr spec. This will use the dictionary to create specifications even for date and datetime columns, and allows overriding the default column specs by using the special parameter.

Examples

dict <- read_health_dict(dipr_example("sample_hlth_dict.csv"))
#> Rows: 12 Columns: 7
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (4): Name Abbrev, Name, Data Type, Data Format
#> dbl (3): Start, Stop, Length
#> 
#>  Use `spec()` to retrieve the full column specification for this data.
#>  Specify the column types or set `show_col_types = FALSE` to quiet this message.
dict
#> # A tibble: 12 × 7
#>    start  stop length name     data_type data_format         col_type
#>    <dbl> <dbl>  <dbl> <chr>    <chr>     <chr>               <chr>   
#>  1     1     1      1 code     char      NA                  c       
#>  2     2     9      8 date     date      ccyymmdd            c       
#>  3    10    14      5 anum     number    NA                  d       
#>  4    15    16      2 spec     number    NA                  d       
#>  5    17    22      6 expl_cd  char      Three 2-digit codes c       
#>  6    31    43     13 amt1     NA        NA                  c       
#>  7    44    56     13 code1    num       NA                  d       
#>  8    57    61      5 code2    char      NA                  c       
#>  9    62    63      2 type     varchar   NA                  c       
#> 10    64    73     10 date2    datetime  YYYY-MM-DD HH:MM:SS c       
#> 11    74    83     10 studyid  NA        NA                  c       
#> 12    84    84      1 linefeed NA        NA                  c       
health_dict_to_spec(dict, special = list(code1 = readr::col_integer()))
#> $code
#> <collector_character>
#> 
#> $date
#> <collector_date>
#> 
#> $anum
#> <collector_double>
#> 
#> $spec
#> <collector_double>
#> 
#> $expl_cd
#> <collector_character>
#> 
#> $amt1
#> <collector_character>
#> 
#> $code1
#> <collector_integer>
#> 
#> $code2
#> <collector_character>
#> 
#> $type
#> <collector_character>
#> 
#> $date2
#> <collector_datetime>
#> 
#> $studyid
#> <collector_character>
#> 
#> $linefeed
#> <collector_skip>
#>