EpiNOAA¶

The EpiNOAA R package is designed as a programmatic interface to the nClimGrid daily data created by NOAA and made publicly available through the NOAA Big Data Program on public cloud providers. The goal of this project is to allow data subsetting and facilitate access to an analysis ready dataset using the tidyverse framework for tabular data.

If you would like to know more about nClimGrid data and the background behind this dataset, please visit: AWS Open Data Registry

Installation¶

# Install from CRAN (coming soon...)
install.packages("epinoaa")

# Or the development version from GitHub
# install.packages("devtools")
devtools::install_github("https://github.com/NOAA-Big-Data-Program/EpiNOAA-R")

Configuration¶

This relies on a number of underlying packages and data to facilitate access to nClimGrid data.

Arrow Arrow is used to facilitate fast reading of data files in parquet format. Installing Apache Arrow for R can sometimes be problematic if the underlying C++ libraries are not installed/configured appropriately. If you have difficulty, please consult the install docs here: Arrow Package Install.

Futures This package relies on the Futureverse as a parallel backend to support performant ingest and filtering of the nClimGrid data. Please be sure that you are aware of the computational resources needed before scaling up your workers.

AWS.S3 This package relies on the AWS.S3 library under the hood to pull data from the NOAA S3 bucket. This bucket is publicly accessible and does not require any credentials. If you see odd access denied or object/resource do not exist errors, it could be that your default AWS configuration is interfering with the requests. See The Cloudyr Project for more information.

Example¶

This is a basic example which pulls in county data from 2021:

library(EpiNOAA)

# Make sure both that the number of workers is scaled to your computer
# and that you have enough memory to support the calls.  The data pulled below are ~100mb
data_2021 <- read_nclimgrid_epinoaa(beginning_date = '2021-01-01', end_date = '2021-12-31', workers = 10)

Data Considerations¶

The data pulled using this package are produced by NOAA and made available through NOAA's Big Data Program. This specific dataset is available through the AWS Open Data Registry. All of the raw data can be accessed here. Data can also be pulled through an interactive web explorer here. If you are interested in working with the data out-of-memory, raw monthly and decadal data files can be accessed here.

Scaled and Preliminary Data As these data are made available through NOAA, they are first released as preliminary datasets, then adjusted through a quality control process. After that additional processing and scaling, they are released as scaled datasets. This package does not access any preliminary data to obviate the ramifications of inadvertently combining scaled and preliminary data in an analysis. Preliminary data are available through the Open Data Registry here.