read in neon tabular data

neon_read(
  table = NA,
  product = NA,
  site = NA,
  start_date = NA,
  end_date = NA,
  ext = NA,
  timestamp = NA,
  release = NA,
  dir = neon_dir(),
  files = NULL,
  sensor_metadata = TRUE,
  keep_filename = FALSE,
  altrep = FALSE,
  ...
)

Arguments

table: the name of a downloaded NEON table in the store, see neon_index
product: A NEON productCode or list of product codes, see examples.
site: 4-letter site code(s) to filter on. Leave as NA to search all.
start_date: Download only files as recent as (YYYY-MM-DD). Leave as NA to download up to the most recent available data.
end_date: Download only files up to end_date (YYYY-MM-DD). Leave as NA to download all prior data.
ext: only match files with this file extension(s)
timestamp: only match timestamps prior this. See details in neon_index(). Should be a datetime POSIXct object (or coerce-able string)
release: Select only data files associated with a particular release tag, see https://www.neonscience.org/data-samples/data-management/data-revisions-releases, e.g. "RELEASE-2021". Releases are associated with a specific DOI and the promise that files associated with a particular release will not change.
dir: Location where files should be downloaded. By default will use the appropriate applications directory for your system (see tools::R_user_dir()). This default also be configured by setting the environmental variable NEONSTORE_HOME, see Sys.setenv or Renviron.
files: optionally, specify a vector of file paths directly (e.g. as provided from neon_index) and specify table argument as NULL.
sensor_metadata: logical, default TRUE. Should we add metadata fields from file names of sensor data into the table? Adds DomainID, SiteID, horizontalPosition, verticalPosition, and publicationDate. Results in slower parsing.
keep_filename: Should we include a column indicating the original file name for each row? Can be a useful source of additional metadata that NEON may omit from the raw files (i.e. siteID), but will also result in slower parsing. Default FALSE.
altrep: enable or disable altrep. Logical, default FALSE. Setting to TRUE can speed up reading, but may cause vroom::vroom to throw mapping error: Too many open files.
...: additional arguments to vroom::vroom, can usually be omitted.

Details

NEON's tabular data files are separated out into separate .csv files for each site for each month of sampling. In principle, each file has identical columns. vroom::vroom can read in a data table that has been sharded into many files like this much much faster than other parsers can read in each table iteratively, (and thus can greatly out-perform the 'stacking" methods in neonUtilities).

When reading in very large numbers of files, it may be helpful to set altrep = FALSE to opt out of vroom's fast altrep mechanism, which can cause neon_read() to fail when stacking thousands of files.

Unfortunately, not all datasets are entirely consistent in their use of columns. neon_read works around this by parsing such tables in groups of matching schema, which is still reasonably fast.

NEON sensor data products currently do not include important metadata columns containing DomainID, SiteID, horizontalPosition, verticalPosition, and publicationDate in the data files themselves, but only encode this in the in the raw file names. All though these values are shared across a raw data file, this information is lost when stacking the tables unless explicit columns are added to the data. This requires us to parse the files one-by-one, which is much slower. By default this information is added to the table, altering the stacked table schema from that of the raw table. Disable this behavior by setting sensor_metadata = FALSE. Future NEON sensor data products may start including this information in the raw data files, as is already the case for observational data.

Examples

if (FALSE) { # interactive()

neon_read("brd_countdata-expanded")

## Sensor inputs will add metadata columns by default
neon_read("waq_instantaneous", site = c("CRAM","SUGG"))

}