neon_read.Rd
read in neon tabular data
neon_read(
table = NA,
product = NA,
site = NA,
start_date = NA,
end_date = NA,
ext = NA,
timestamp = NA,
release = NA,
dir = neon_dir(),
files = NULL,
sensor_metadata = TRUE,
keep_filename = FALSE,
altrep = FALSE,
...
)
the name of a downloaded NEON table in the store, see neon_index
A NEON productCode
or list of product codes, see examples.
4-letter site code(s) to filter on. Leave as NA
to search all.
Download only files as recent as (YYYY-MM-DD
). Leave
as NA
to download up to the most recent available data.
Download only files up to end_date (YYYY-MM-DD
). Leave as
NA
to download all prior data.
only match files with this file extension(s)
only match timestamps prior this. See details in neon_index()
.
Should be a datetime POSIXct object (or coerce-able string)
Select only data files associated with a particular release tag, see https://www.neonscience.org/data-samples/data-management/data-revisions-releases, e.g. "RELEASE-2021". Releases are associated with a specific DOI and the promise that files associated with a particular release will not change.
Location where files should be downloaded. By default will
use the appropriate applications directory for your system
(see tools::R_user_dir()
). This default also be configured by
setting the environmental variable NEONSTORE_HOME
, see Sys.setenv or
Renviron.
optionally, specify a vector of file paths directly (e.g. as
provided from neon_index) and specify table
argument as NULL.
logical, default TRUE. Should we add metadata fields from file names of sensor data into the table? Adds DomainID, SiteID, horizontalPosition, verticalPosition, and publicationDate. Results in slower parsing.
Should we include a column indicating the original
file name for each row? Can be a useful source of additional metadata that
NEON may omit from the raw files (i.e. siteID
), but will also result in
slower parsing. Default FALSE
.
enable or disable altrep. Logical, default FALSE
. Setting to
TRUE
can speed up reading, but may cause vroom::vroom to throw
mapping error: Too many open files
.
additional arguments to vroom::vroom, can usually be omitted.
NEON's tabular data files are separated out into separate .csv
files for each site for each month of sampling. In principle,
each file has identical columns. vroom::vroom can read in a
data table that has been sharded into many files like this much
much faster than other parsers can read in each table iteratively,
(and thus can greatly out-perform the 'stacking" methods in neonUtilities
).
When reading in very large numbers of files, it may be helpful to set
altrep = FALSE
to opt out of vroom
's fast altrep mechanism, which
can cause neon_read()
to fail when stacking thousands of files.
Unfortunately, not all datasets are entirely consistent in their use
of columns. neon_read
works around this by parsing such tables in
groups of matching schema, which is still reasonably fast.
NEON sensor data products currently do not include important metadata columns
containing DomainID, SiteID, horizontalPosition, verticalPosition, and
publicationDate in the data files themselves, but only encode this in the
in the raw file names. All though these values are shared across a raw
data file, this information is lost when stacking the tables unless explicit
columns are added to the data. This requires us to parse the files
one-by-one, which is much slower. By default this information is added to
the table, altering the stacked table schema from that of the raw table.
Disable this behavior by setting sensor_metadata = FALSE
. Future
NEON sensor data products may start including this information in
the raw data files, as is already the case for observational data.
if (FALSE) { # interactive()
neon_read("brd_countdata-expanded")
## Sensor inputs will add metadata columns by default
neon_read("waq_instantaneous", site = c("CRAM","SUGG"))
}