clean up osem-serialization, #22

2025-07-14 21:00:23 +02:00 · 2018-06-05 20:22:22 +02:00 · 2018-06-05 20:22:22 +02:00 · 97768e7cdb
commit 97768e7cdb
parent 54b0994671
1 changed files with 22 additions and 89 deletions
--- a/vignettes/osem-serialization.Rmd
+++ b/vignettes/osem-serialization.Rmd
@ -21,19 +21,24 @@ This avoids..
 - stress on the openSenseMap-server.

 This vignette shows how to use this built in `opensensmapr` feature, and
-how to do it yourself, if you want to store to other data formats.
+how to do it yourself, if you want to save to other data formats.

-## Using openSensMapr Caching Feature
+```{r setup, results='hide'}
+# this vignette requires:
+library(opensensmapr)
+library(jsonlite)
+library(readr)
+```
+
+## Using the opensensmapr Caching Feature
 All data retrieval functions of `opensensmapr` have a built in caching feature,
 which serializes an API response to disk.
 Subsequent identical requests will then return the serialized data instead of making
 another request.
-To do so, each request is given a unique ID based on its parameters.

 To use this feature, just add a path to a directory to the `cache` parameter:
 ```{r cache}
 b = osem_boxes(cache = tempdir())
-list.files(tempdir(), pattern = 'osemcache\\..*\\.rds')

 # the next identical request will hit the cache only!
 b = osem_boxes(cache = tempdir())
@ -42,8 +47,12 @@ b = osem_boxes(cache = tempdir())
 b = osem_boxes()
 ```

-You can maintain multiple caches simultaneously which allows to store only
-serialized data related to a script in its directory:
+Looking at the cache directory we can see one file for each request, which is identified through a hash of the request URL:
+```{r cachelisting}
+list.files(tempdir(), pattern = 'osemcache\\..*\\.rds')
+```
+
+You can maintain multiple caches simultaneously which allows to only store data related to a script in the same directory:
 ```{r cache_custom}
 cacheDir = getwd() # current working directory
 b = osem_boxes(cache = cacheDir)
@ -62,15 +71,9 @@ osem_clear_cache(getwd()) # clears a custom cache
 If you want to roll your own serialization method to support custom data formats,
 here's how:

-```{r setup, results='hide'}
-# this section requires:
-library(opensensmapr)
-library(jsonlite)
-library(readr)
-
+```{r data, results='hide'}
 # first get our example data:
 boxes = osem_boxes(grouptag = 'ifgi')
-measurements = osem_measurements(boxes, phenomenon = 'PM10')
 ```

 If you are paranoid and worry about `.rds` files not being decodable anymore
@ -78,92 +81,22 @@ in the (distant) future, you could serialize to a plain text format such as JSON
 This of course comes at the cost of storage space and performance.
 ```{r serialize_json}
 # serializing senseBoxes to JSON, and loading from file again:
-write(jsonlite::serializeJSON(measurements), 'boxes.json')
+write(jsonlite::serializeJSON(boxes), 'boxes.json')
 boxes_from_file = jsonlite::unserializeJSON(readr::read_file('boxes.json'))
+class(boxes_from_file)
 ```

-Both methods also persist the R object metadata (classes, attributes).
+This method also persists the R object metadata (classes, attributes).
 If you were to use a serialization method that can't persist object metadata, you
 could re-apply it with the following functions:

 ```{r serialize_attrs}
-# note the toJSON call
-write(jsonlite::toJSON(measurements), 'boxes_bad.json')
+# note the toJSON call instead of serializeJSON
+write(jsonlite::toJSON(boxes), 'boxes_bad.json')
 boxes_without_attrs = jsonlite::fromJSON('boxes_bad.json')
+class(boxes_without_attrs)

 boxes_with_attrs = osem_as_sensebox(boxes_without_attrs)
 class(boxes_with_attrs)
 ```
 The same goes for measurements via `osem_as_measurements()`.
-
-## Workflow for reproducible code
-For truly reproducible code you want it to work and return the same results --
-no matter if you run it the first time or a consecutive time, and without making
-changes to it.
-
-Therefore we need a wrapper around the save-to-file & load-from-file logic.
-The following examples show a way to do just that, and where inspired by
-[this reproducible analysis by Daniel Nuest](https://github.com/nuest/sensebox-binder).
-
-```{r osem_offline}
-# offline logic
-osem_offline = function (func, file, format='rds', ...) {
-  # deserialize if file exists, otherwise download and serialize
-  if (file.exists(file)) {
-    if (format == 'json')
-      jsonlite::unserializeJSON(readr::read_file(file))
-    else
-      readRDS(file)
-  } else {
-    data = func(...)
-    if (format == 'json')
-      write(jsonlite::serializeJSON(data), file = file)
-    else
-      saveRDS(data, file)
-    data
-  }
-}
-
-# wrappers for each download function
-osem_measurements_offline = function (file, ...) {
-  osem_offline(opensensmapr::osem_measurements, file, ...)
-}
-osem_boxes_offline = function (file, ...) {
-  osem_offline(opensensmapr::osem_boxes, file, ...)
-}
-osem_box_offline = function (file, ...) {
-  osem_offline(opensensmapr::osem_box, file, ...)
-}
-osem_counts_offline = function (file, ...) {
-  osem_offline(opensensmapr::osem_counts, file, ...)
-}
-```
-
-Thats it! Now let's try it out:
-
-```{r test}
-# first run; will download and save to disk
-b1 = osem_boxes_offline('mobileboxes.rds', exposure='mobile')
-
-# consecutive runs; will read from disk
-b2 = osem_boxes_offline('mobileboxes.rds', exposure='mobile')
-class(b1) == class(b2)
-
-# we can even omit the arguments now (though thats not really the point here)
-b3 = osem_boxes_offline('mobileboxes.rds')
-nrow(b1) == nrow(b3)
-
-# verify that the custom sensebox methods are still working
-summary(b2)
-plot(b3)
-```
-
-To re-download the data, just clear the files that were created in the process:
-```{r cleanup, results='hide'}
-file.remove('mobileboxes.rds', 'boxes_bad.json', 'boxes.json', 'measurements.rds')
-```
-
-A possible extension to this scheme comes to mind: Omit the specification of a
-filename, and assign a unique ID to the request instead.
-For example, one could calculate the SHA-1 hash of the parameters, and use it
-as filename.