vignettes/technotes/forecast-example.Rmd
forecast-example.Rmd
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.5 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.4 ✓ stringr 1.4.0
## ✓ readr 2.0.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## Loading required package: fabletools
##
## Attaching package: 'tsibble'
## The following objects are masked from 'package:base':
##
## intersect, setdiff, union
Download and read in the current target file for the Aquatics theme. For convenience, we read this in as a timeseries object, noting that the time is in the ‘time’ column, and timeseries are replicated over sites.
Create a 35 day forecast for each variable, oxygen, and temperature. For illustrative purposes, we’ll use the fable package because it is concise and well documented. We make separate forecasts for each of the two variables before reformatting them and combining them. Note the use of efi_format helper function from the neon4cast package, which merely replaces the special <S3:distribution>
column used by fable with something we can write to text: either columns with a mean/sd (for normal distributions) or otherwise random draws from the distributions.
So that we can score our forecast right away instead of waiting for next month’s data, we will filter out the most recent data available first.
rw_forecast <- function(input_file, forecast_file){
## read data, format as time-series for each siteID
## drop last 35 days & use explicit NAs
ts <- read_csv(input_file) %>%
as_tsibble(index=time, key=siteID) %>%
filter(time < max(time) - 35) %>%
fill_gaps()
## compute model, generate forecast with fable, write to csv
ts %>%
fabletools::model(null = fable::RW(oxygen)) %>%
fabletools::forecast(h = "35 days") %>%
efi_format() %>%
readr::write_csv(forecast_file)
forecast_file
}
Run forecast with prov
based tracing. Add conditional evaluations based on prov conditions.
## We'll use a local tsv registry only
Sys.setenv(CONTENTID_REGISTRIES=paste(contentid:::default_tsv(), contentid::content_dir(), sep=", "))
## Register the URL and download by ID. We have to download to hash content.
## Memoised forecast function will only re-run on unique input ids.
target_id <- store("https://data.ecoforecast.org/targets/aquatics/aquatics-targets.csv.gz")
local <- retrieve(target_id)
if(is.null(local)){ # No local copy of this ID, so recompute
target_file <- resolve(target_id)
rw_forecast(target_file, "rw_forecast.csv")
# add output file to local store
forecast_id <- store("rw_forecast.csv")
} else {
forecast_id <- content_id(local)
}
forecast <- resolve(forecast_id) %>% read_csv()
## Rows: 2960 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): siteID, neon_product_ids
## dbl (6): oxygen, temperature, oxygen_sd, temperature_sd, depth_oxygen, dept...
## date (1): time
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
forecast %>% score()