< h1 class = "title toc-ignore" > Caching openSenseMap Data for Reproducibility< / h1 >
< h4 class = "author" > < em > Norwin Roosen< / em > < / h4 >
< h4 class = "date" > < em > 2018-06-07< / em > < / h4 >
< p > It may be useful to download data from openSenseMap only once. For reproducible results, the data should be saved to disk, and reloaded at a later point.< / p >
< p > This avoids..< / p >
< ul >
< li > changed results for queries without date parameters,< / li >
< li > unnecessary wait times,< / li >
< li > risk of API changes / API unavailability,< / li >
< li > stress on the openSenseMap-server.< / li >
< / ul >
< p > This vignette shows how to use this built in < code > opensensmapr< / code > feature, and how to do it yourself in case you want to save to other data formats.< / p >
< div class = "sourceCode" > < pre class = "sourceCode r" > < code class = "sourceCode r" > < span class = "co" > # this vignette requires:< / span >
< span class = "kw" > library< / span > (opensensmapr)
< span class = "kw" > library< / span > (jsonlite)
< span class = "kw" > library< / span > (readr)< / code > < / pre > < / div >
< div id = "using-the-opensensmapr-caching-feature" class = "section level2" >
< h2 > Using the opensensmapr Caching Feature< / h2 >
< p > All data retrieval functions of < code > opensensmapr< / code > have a built in caching feature, which serializes an API response to disk. Subsequent identical requests will then return the serialized data instead of making another request.< / p >
< p > To use this feature, just add a path to a directory to the < code > cache< / code > parameter:< / p >
< div class = "sourceCode" > < pre class = "sourceCode r" > < code class = "sourceCode r" > b =< span class = "st" > < / span > < span class = "kw" > osem_boxes< / span > (< span class = "dt" > grouptag =< / span > < span class = "st" > 'ifgi'< / span > , < span class = "dt" > cache =< / span > < span class = "kw" > tempdir< / span > ())
< span class = "co" > # the next identical request will hit the cache only!< / span >
b =< span class = "st" > < / span > < span class = "kw" > osem_boxes< / span > (< span class = "dt" > grouptag =< / span > < span class = "st" > 'ifgi'< / span > , < span class = "dt" > cache =< / span > < span class = "kw" > tempdir< / span > ())
< span class = "co" > # requests without the cache parameter will still be performed normally< / span >
b =< span class = "st" > < / span > < span class = "kw" > osem_boxes< / span > (< span class = "dt" > grouptag =< / span > < span class = "st" > 'ifgi'< / span > )< / code > < / pre > < / div >
< p > Looking at the cache directory we can see one file for each request, which is identified through a hash of the request URL:< / p >
< div class = "sourceCode" > < pre class = "sourceCode r" > < code class = "sourceCode r" > < span class = "kw" > list.files< / span > (< span class = "kw" > tempdir< / span > (), < span class = "dt" > pattern =< / span > < span class = "st" > 'osemcache< / span > < span class = "ch" > \\< / span > < span class = "st" > ..*< / span > < span class = "ch" > \\< / span > < span class = "st" > .rds'< / span > )< / code > < / pre > < / div >
< pre > < code > ## [1] " osemcache.17db5c57fc6fca4d836fa2cf30345ce8767cd61a.rds" < / code > < / pre >
< p > You can maintain multiple caches simultaneously which allows to only store data related to a script in the same directory:< / p >
2018-05-26 12:52:02 +02:00
< div class = "sourceCode" > < pre class = "sourceCode r" > < code class = "sourceCode r" > cacheDir =< span class = "st" > < / span > < span class = "kw" > getwd< / span > () < span class = "co" > # current working directory< / span >
2018-06-07 00:13:15 +02:00
b =< span class = "st" > < / span > < span class = "kw" > osem_boxes< / span > (< span class = "dt" > grouptag =< / span > < span class = "st" > 'ifgi'< / span > , < span class = "dt" > cache =< / span > cacheDir)
2018-05-26 12:52:02 +02:00
< span class = "co" > # the next identical request will hit the cache only!< / span >
2018-06-07 00:13:15 +02:00
b =< span class = "st" > < / span > < span class = "kw" > osem_boxes< / span > (< span class = "dt" > grouptag =< / span > < span class = "st" > 'ifgi'< / span > , < span class = "dt" > cache =< / span > cacheDir)< / code > < / pre > < / div >
< p > To get fresh results again, just call < code > osem_clear_cache()< / code > for the respective cache:< / p >
< div class = "sourceCode" > < pre class = "sourceCode r" > < code class = "sourceCode r" > < span class = "kw" > osem_clear_cache< / span > () < span class = "co" > # clears default cache< / span >
< span class = "kw" > osem_clear_cache< / span > (< span class = "kw" > getwd< / span > ()) < span class = "co" > # clears a custom cache< / span > < / code > < / pre > < / div >
< / div >
< div id = "custom-de--serialization" class = "section level2" >
< h2 > Custom (De-) Serialization< / h2 >
< p > If you want to roll your own serialization method to support custom data formats, here’ s how:< / p >
< div class = "sourceCode" > < pre class = "sourceCode r" > < code class = "sourceCode r" > < span class = "co" > # first get our example data:< / span >
measurements =< span class = "st" > < / span > < span class = "kw" > osem_measurements< / span > (< span class = "st" > 'Windrichtung'< / span > )< / code > < / pre > < / div >
< p > If you are paranoid and worry about < code > .rds< / code > files not being decodable anymore in the (distant) future, you could serialize to a plain text format such as JSON. This of course comes at the cost of storage space and performance.< / p >
< div class = "sourceCode" > < pre class = "sourceCode r" > < code class = "sourceCode r" > < span class = "co" > # serializing senseBoxes to JSON, and loading from file again:< / span >
< span class = "kw" > write< / span > (jsonlite::< span class = "kw" > serializeJSON< / span > (measurements), < span class = "st" > 'measurements.json'< / span > )
measurements_from_file =< span class = "st" > < / span > jsonlite::< span class = "kw" > unserializeJSON< / span > (readr::< span class = "kw" > read_file< / span > (< span class = "st" > 'measurements.json'< / span > ))
< span class = "kw" > class< / span > (measurements_from_file)< / code > < / pre > < / div >
< pre > < code > ## [1] " osem_measurements" " tbl_df" " tbl"
## [4] " data.frame" < / code > < / pre >
< p > This method also persists the R object metadata (classes, attributes). If you were to use a serialization method that can’ t persist object metadata, you could re-apply it with the following functions:< / p >
< div class = "sourceCode" > < pre class = "sourceCode r" > < code class = "sourceCode r" > < span class = "co" > # note the toJSON call instead of serializeJSON< / span >
< span class = "kw" > write< / span > (jsonlite::< span class = "kw" > toJSON< / span > (measurements), < span class = "st" > 'measurements_bad.json'< / span > )
measurements_without_attrs =< span class = "st" > < / span > jsonlite::< span class = "kw" > fromJSON< / span > (< span class = "st" > 'measurements_bad.json'< / span > )
< span class = "kw" > class< / span > (measurements_without_attrs)< / code > < / pre > < / div >
< pre > < code > ## [1] " data.frame" < / code > < / pre >
< div class = "sourceCode" > < pre class = "sourceCode r" > < code class = "sourceCode r" > measurements_with_attrs =< span class = "st" > < / span > < span class = "kw" > osem_as_measurements< / span > (measurements_without_attrs)
< span class = "kw" > class< / span > (measurements_with_attrs)< / code > < / pre > < / div >
< pre > < code > ## [1] " osem_measurements" " tbl_df" " tbl"
## [4] " data.frame" < / code > < / pre >
< p > The same goes for boxes via < code > osem_as_sensebox()< / code > .< / p >
< / div >
