You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
opensensmapR/inst/doc/osem-serialization.html

231 lines
41 KiB
HTML

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="pandoc" />
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="author" content="Norwin Roosen" />
<meta name="date" content="2018-05-26" />
<title>Caching openSenseMap Data for Reproducibility</title>
<style type="text/css">code{white-space: pre;}</style>
<style type="text/css">
div.sourceCode { overflow-x: auto; }
table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
margin: 0; padding: 0; vertical-align: baseline; border: none; }
table.sourceCode { width: 100%; line-height: 100%; }
td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
td.sourceCode { padding-left: 5px; }
code > span.kw { color: #007020; font-weight: bold; } /* Keyword */
code > span.dt { color: #902000; } /* DataType */
code > span.dv { color: #40a070; } /* DecVal */
code > span.bn { color: #40a070; } /* BaseN */
code > span.fl { color: #40a070; } /* Float */
code > span.ch { color: #4070a0; } /* Char */
code > span.st { color: #4070a0; } /* String */
code > span.co { color: #60a0b0; font-style: italic; } /* Comment */
code > span.ot { color: #007020; } /* Other */
code > span.al { color: #ff0000; font-weight: bold; } /* Alert */
code > span.fu { color: #06287e; } /* Function */
code > span.er { color: #ff0000; font-weight: bold; } /* Error */
code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
code > span.cn { color: #880000; } /* Constant */
code > span.sc { color: #4070a0; } /* SpecialChar */
code > span.vs { color: #4070a0; } /* VerbatimString */
code > span.ss { color: #bb6688; } /* SpecialString */
code > span.im { } /* Import */
code > span.va { color: #19177c; } /* Variable */
code > span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code > span.op { color: #666666; } /* Operator */
code > span.bu { } /* BuiltIn */
code > span.ex { } /* Extension */
code > span.pp { color: #bc7a00; } /* Preprocessor */
code > span.at { color: #7d9029; } /* Attribute */
code > span.do { color: #ba2121; font-style: italic; } /* Documentation */
code > span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code > span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
</style>
<link href="data:text/css;charset=utf-8,body%20%7B%0Abackground%2Dcolor%3A%20%23fff%3B%0Amargin%3A%201em%20auto%3B%0Amax%2Dwidth%3A%20700px%3B%0Aoverflow%3A%20visible%3B%0Apadding%2Dleft%3A%202em%3B%0Apadding%2Dright%3A%202em%3B%0Afont%2Dfamily%3A%20%22Open%20Sans%22%2C%20%22Helvetica%20Neue%22%2C%20Helvetica%2C%20Arial%2C%20sans%2Dserif%3B%0Afont%2Dsize%3A%2014px%3B%0Aline%2Dheight%3A%201%2E35%3B%0A%7D%0A%23header%20%7B%0Atext%2Dalign%3A%20center%3B%0A%7D%0A%23TOC%20%7B%0Aclear%3A%20both%3B%0Amargin%3A%200%200%2010px%2010px%3B%0Apadding%3A%204px%3B%0Awidth%3A%20400px%3B%0Aborder%3A%201px%20solid%20%23CCCCCC%3B%0Aborder%2Dradius%3A%205px%3B%0Abackground%2Dcolor%3A%20%23f6f6f6%3B%0Afont%2Dsize%3A%2013px%3B%0Aline%2Dheight%3A%201%2E3%3B%0A%7D%0A%23TOC%20%2Etoctitle%20%7B%0Afont%2Dweight%3A%20bold%3B%0Afont%2Dsize%3A%2015px%3B%0Amargin%2Dleft%3A%205px%3B%0A%7D%0A%23TOC%20ul%20%7B%0Apadding%2Dleft%3A%2040px%3B%0Amargin%2Dleft%3A%20%2D1%2E5em%3B%0Amargin%2Dtop%3A%205px%3B%0Amargin%2Dbottom%3A%205px%3B%0A%7D%0A%23TOC%20ul%20ul%20%7B%0Amargin%2Dleft%3A%20%2D2em%3B%0A%7D%0A%23TOC%20li%20%7B%0Aline%2Dheight%3A%2016px%3B%0A%7D%0Atable%20%7B%0Amargin%3A%201em%20auto%3B%0Aborder%2Dwidth%3A%201px%3B%0Aborder%2Dcolor%3A%20%23DDDDDD%3B%0Aborder%2Dstyle%3A%20outset%3B%0Aborder%2Dcollapse%3A%20collapse%3B%0A%7D%0Atable%20th%20%7B%0Aborder%2Dwidth%3A%202px%3B%0Apadding%3A%205px%3B%0Aborder%2Dstyle%3A%20inset%3B%0A%7D%0Atable%20td%20%7B%0Aborder%2Dwidth%3A%201px%3B%0Aborder%2Dstyle%3A%20inset%3B%0Aline%2Dheight%3A%2018px%3B%0Apadding%3A%205px%205px%3B%0A%7D%0Atable%2C%20table%20th%2C%20table%20td%20%7B%0Aborder%2Dleft%2Dstyle%3A%20none%3B%0Aborder%2Dright%2Dstyle%3A%20none%3B%0A%7D%0Atable%20thead%2C%20table%20tr%2Eeven%20%7B%0Abackground%2Dcolor%3A%20%23f7f7f7%3B%0A%7D%0Ap%20%7B%0Amargin%3A%200%2E5em%200%3B%0A%7D%0Ablockquote%20%7B%0Abackground%2Dcolor%3A%20%23f6f6f6%3B%0Apadding%3A%200%2E25em%200%2E75em%3B%0A%7D%0Ahr%20%7B%0Aborder%2Dstyle%3A%20solid%3B%0Aborder%3A%20none%3B%0Aborder%2Dtop%3A%201px%20solid%20%23777%3B%0Amargin%3A%2028px%200%3B%0A%7D%0Adl%20%7B%0Amargin%2Dleft%3A%200%3B%0A%7D%0Adl%20dd%20%7B%0Amargin%2Dbottom%3A%2013px%3B%0Amargin%2Dleft%3A%2013px%3B%0A%7D%0Adl%20dt%20%7B%0Afont%2Dweight%3A%20bold%3B%0A%7D%0Aul%20%7B%0Amargin%2Dtop%3A%200%3B%0A%7D%0Aul%20li%20%7B%0Alist%2Dstyle%3A%20circle%20outside%3B%0A%7D%0Aul%20ul%20%7B%0Amargin%2Dbottom%3A%200%3B%0A%7D%0Apre%2C%20code%20%7B%0Abackground%2Dcolor%3A%20%23f7f7f7%3B%0Aborder%2Dradius%3A%203px%3B%0Acolor%3A%20%23333%3B%0Awhite%2Dspace%3A%20pre%2Dwrap%3B%20%0A%7D%0Apre%20%7B%0Aborder%2Dradius%3A%203px%3B%0Amargin%3A%205px%200px%2010px%200px%3B%0Apadding%3A%2010px%3B%0A%7D%0Apre%3Anot%28%5Bclass%5D%29%20%7B%0Abackground%2Dcolor%3A%20%23f7f7f7%3B%0A%7D%0Acode%20%7B%0Afont%2Dfamily%3A%20Consolas%2C%20Monaco%2C%20%27Courier%20New%27%2C%20monospace%3B%0Afont%2Dsize%3A%2085%25%3B%0A%7D%0Ap%20%3E%20code%2C%20li%20%3E%20code%20%7B%0Apadding%3A%202px%200px%3B%0A%7D%0Adiv%2Efigure%20%7B%0Atext%2Dalign%3A%20center%3B%0A%7D%0Aimg%20%7B%0Abackground%2Dcolor%3A%20%23FFFFFF%3B%0Apadding%3A%202px%3B%0Aborder%3A%201px%20solid%20%23DDDDDD%3B%0Aborder%2Dradius%3A%203px%3B%0Aborder%3A%201px%20solid%20%23CCCCCC%3B%0Amargin%3A%200%205px%3B%0A%7D%0Ah1%20%7B%0Amargin%2Dtop%3A%200%3B%0Afont%2Dsize%3A%2035px%3B%0Aline%2Dheight%3A%2040px%3B%0A%7D%0Ah2%20%7B%0Aborder%2Dbottom%3A%204px%20solid%20%23f7f7f7%3B%0Apadding%2Dtop%3A%2010px%3B%0Apadding%2Dbottom%3A%202px%3B%0Afont%2Dsize%3A%20145%25%3B%0A%7D%0Ah3%20%7B%0Aborder%2Dbottom%3A%202px%20solid%20%23f7f7f7%3B%0Apadding%2Dtop%3A%2010px%3B%0Afont%2Dsize%3A%20120%25%3B%0A%7D%0Ah4%20%7B%0Aborder%2Dbottom%3A%201px%20solid%20%23f7f7f7%3B%0Amargin%2Dleft%3A%208px%3B%0Afont%2Dsize%3A%20105%25%3B%0A%7D%0Ah5%2C%20h6%20%7B%0Aborder%2Dbottom%3A%201px%20solid%20%23ccc%3B%0Afont%2Dsize%3A%20105%25%3B%0A%7D%0Aa%20%7B%0Acolor%3A%20%230033dd%3B%0Atext%2Ddecoration%3A%20none%3B%0A%7D%0Aa%3Ahover%20%7B%0Acolor%3A%20%236666ff%3B%20%7D%0Aa%3Avisited%20%7B%0Acolor%3A%20%23800080%3B%20%7D%0Aa%3Avisited%3Ahover%20%7B%0Acolor%3A%20%23BB00BB%3B%20%7D%0Aa%5Bhref%5E%3D%22http
</head>
<body>
<h1 class="title toc-ignore">Caching openSenseMap Data for Reproducibility</h1>
<h4 class="author"><em>Norwin Roosen</em></h4>
<h4 class="date"><em>2018-05-26</em></h4>
<p>It may be useful to download data from openSenseMap only once. For reproducible results, the data could be saved to disk, and reloaded at a later point.</p>
<p>This avoids..</p>
<ul>
<li>changed results for queries without date parameters,</li>
<li>unnecessary wait times,</li>
<li>risk of API changes / API unavailability,</li>
<li>stress on the openSenseMap-server.</li>
</ul>
<p>This vignette shows how to use this built in <code>opensensmapr</code> feature, and how to do it yourself, if you want to store to other data formats.</p>
<div id="using-opensensmapr-caching-feature" class="section level2">
<h2>Using openSensMapr Caching Feature</h2>
<p>All data retrieval functions of <code>opensensmapr</code> have a built in caching feature, which serializes an API response to disk. Subsequent identical requests will then return the serialized data instead of making another request. To do so, each request is given a unique ID based on its parameters.</p>
<p>To use this feature, just add a path to a directory to the <code>cache</code> parameter:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">b =<span class="st"> </span><span class="kw">osem_boxes</span>(<span class="dt">cache =</span> <span class="kw">tempdir</span>())
<span class="kw">list.files</span>(<span class="kw">tempdir</span>(), <span class="dt">pattern =</span> <span class="st">'osemcache</span><span class="ch">\\</span><span class="st">..*</span><span class="ch">\\</span><span class="st">.rds'</span>)</code></pre></div>
<pre><code>## [1] &quot;osemcache.c54710f66b662e29dd86b089962b0f598e47eddb.rds&quot;</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># the next identical request will hit the cache only!</span>
b =<span class="st"> </span><span class="kw">osem_boxes</span>(<span class="dt">cache =</span> <span class="kw">tempdir</span>())
<span class="co"># requests without the cache parameter will still be performed normally</span>
b =<span class="st"> </span><span class="kw">osem_boxes</span>()</code></pre></div>
<p>You can maintain multiple caches simultaneously which allows to store only serialized data related to a script in its directory:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">cacheDir =<span class="st"> </span><span class="kw">getwd</span>() <span class="co"># current working directory</span>
b =<span class="st"> </span><span class="kw">osem_boxes</span>(<span class="dt">cache =</span> cacheDir)
<span class="co"># the next identical request will hit the cache only!</span>
b =<span class="st"> </span><span class="kw">osem_boxes</span>(<span class="dt">cache =</span> cacheDir)</code></pre></div>
<p>To get fresh results again, just call <code>osem_clear_cache()</code> for the respective cache:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">osem_clear_cache</span>() <span class="co"># clears default cache</span></code></pre></div>
<pre><code>## [1] TRUE</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">osem_clear_cache</span>(<span class="kw">getwd</span>()) <span class="co"># clears a custom cache</span></code></pre></div>
<pre><code>## [1] TRUE</code></pre>
</div>
<div id="custom-de--serialization" class="section level2">
<h2>Custom (De-) Serialization</h2>
<p>If you want to roll your own serialization method to support custom data formats, heres how:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># this section requires:</span>
<span class="kw">library</span>(opensensmapr)
<span class="kw">library</span>(jsonlite)
<span class="kw">library</span>(readr)
<span class="co"># first get our example data:</span>
boxes =<span class="st"> </span><span class="kw">osem_boxes</span>(<span class="dt">grouptag =</span> <span class="st">'ifgi'</span>)
measurements =<span class="st"> </span><span class="kw">osem_measurements</span>(boxes, <span class="dt">phenomenon =</span> <span class="st">'PM10'</span>)</code></pre></div>
<p>If you are paranoid and worry about <code>.rds</code> files not being decodable anymore in the (distant) future, you could serialize to a plain text format such as JSON. This of course comes at the cost of storage space and performance.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># serializing senseBoxes to JSON, and loading from file again:</span>
<span class="kw">write</span>(jsonlite::<span class="kw">serializeJSON</span>(measurements), <span class="st">'boxes.json'</span>)
boxes_from_file =<span class="st"> </span>jsonlite::<span class="kw">unserializeJSON</span>(readr::<span class="kw">read_file</span>(<span class="st">'boxes.json'</span>))</code></pre></div>
<p>Both methods also persist the R object metadata (classes, attributes). If you were to use a serialization method that cant persist object metadata, you could re-apply it with the following functions:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># note the toJSON call</span>
<span class="kw">write</span>(jsonlite::<span class="kw">toJSON</span>(measurements), <span class="st">'boxes_bad.json'</span>)
boxes_without_attrs =<span class="st"> </span>jsonlite::<span class="kw">fromJSON</span>(<span class="st">'boxes_bad.json'</span>)
boxes_with_attrs =<span class="st"> </span><span class="kw">osem_as_sensebox</span>(boxes_without_attrs)
<span class="kw">class</span>(boxes_with_attrs)</code></pre></div>
<pre><code>## [1] &quot;sensebox&quot; &quot;data.frame&quot;</code></pre>
<p>The same goes for measurements via <code>osem_as_measurements()</code>.</p>
</div>
<div id="workflow-for-reproducible-code" class="section level2">
<h2>Workflow for reproducible code</h2>
<p>For truly reproducible code you want it to work and return the same results no matter if you run it the first time or a consecutive time, and without making changes to it.</p>
<p>Therefore we need a wrapper around the save-to-file &amp; load-from-file logic. The following examples show a way to do just that, and where inspired by <a href="https://github.com/nuest/sensebox-binder">this reproducible analysis by Daniel Nuest</a>.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># offline logic</span>
osem_offline =<span class="st"> </span>function (func, file, <span class="dt">format=</span><span class="st">'rds'</span>, ...) {
<span class="co"># deserialize if file exists, otherwise download and serialize</span>
if (<span class="kw">file.exists</span>(file)) {
if (format ==<span class="st"> 'json'</span>)
jsonlite::<span class="kw">unserializeJSON</span>(readr::<span class="kw">read_file</span>(file))
else
<span class="kw">readRDS</span>(file)
} else {
data =<span class="st"> </span><span class="kw">func</span>(...)
if (format ==<span class="st"> 'json'</span>)
<span class="kw">write</span>(jsonlite::<span class="kw">serializeJSON</span>(data), <span class="dt">file =</span> file)
else
<span class="kw">saveRDS</span>(data, file)
data
}
}
<span class="co"># wrappers for each download function</span>
osem_measurements_offline =<span class="st"> </span>function (file, ...) {
<span class="kw">osem_offline</span>(opensensmapr::osem_measurements, file, ...)
}
osem_boxes_offline =<span class="st"> </span>function (file, ...) {
<span class="kw">osem_offline</span>(opensensmapr::osem_boxes, file, ...)
}
osem_box_offline =<span class="st"> </span>function (file, ...) {
<span class="kw">osem_offline</span>(opensensmapr::osem_box, file, ...)
}
osem_counts_offline =<span class="st"> </span>function (file, ...) {
<span class="kw">osem_offline</span>(opensensmapr::osem_counts, file, ...)
}</code></pre></div>
<p>Thats it! Now lets try it out:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># first run; will download and save to disk</span>
b1 =<span class="st"> </span><span class="kw">osem_boxes_offline</span>(<span class="st">'mobileboxes.rds'</span>, <span class="dt">exposure=</span><span class="st">'mobile'</span>)
<span class="co"># consecutive runs; will read from disk</span>
b2 =<span class="st"> </span><span class="kw">osem_boxes_offline</span>(<span class="st">'mobileboxes.rds'</span>, <span class="dt">exposure=</span><span class="st">'mobile'</span>)
<span class="kw">class</span>(b1) ==<span class="st"> </span><span class="kw">class</span>(b2)</code></pre></div>
<pre><code>## [1] TRUE TRUE</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># we can even omit the arguments now (though thats not really the point here)</span>
b3 =<span class="st"> </span><span class="kw">osem_boxes_offline</span>(<span class="st">'mobileboxes.rds'</span>)
<span class="kw">nrow</span>(b1) ==<span class="st"> </span><span class="kw">nrow</span>(b3)</code></pre></div>
<pre><code>## [1] TRUE</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># verify that the custom sensebox methods are still working</span>
<span class="kw">summary</span>(b2)</code></pre></div>
<pre><code>## boxes total: 55
##
## boxes by exposure:
## mobile
## 55
##
## boxes by model:
## custom homeEthernet homeWifi
## 7 2 8
## homeWifiFeinstaub luftdaten_pms5003_bme280 luftdaten_sds011_bme280
## 6 2 9
## luftdaten_sds011_dht11 luftdaten_sds011_dht22
## 1 20
##
## $last_measurement_within
## 1h 1d 30d 365d never
## 16 16 24 43 12
##
## oldest box: 2017-05-24 08:16:36 (Feinstaub Hauptstrasse Steampunk-Design)
## newest box: 2018-05-24 07:08:32 (Josi Test)
##
## sensors per box:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 4.000 4.000 4.618 5.000 22.000</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">plot</span>(b3)</code></pre></div>
<p><img src="
<p>To re-download the data, just clear the files that were created in the process:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">file.remove</span>(<span class="st">'mobileboxes.rds'</span>, <span class="st">'boxes_bad.json'</span>, <span class="st">'boxes.json'</span>, <span class="st">'measurements.rds'</span>)</code></pre></div>
<pre><code>## Warning in file.remove(&quot;mobileboxes.rds&quot;, &quot;boxes_bad.json&quot;, &quot;boxes.json&quot;, :
## cannot remove file 'measurements.rds', reason 'No such file or directory'</code></pre>
<p>A possible extension to this scheme comes to mind: Omit the specification of a filename, and assign a unique ID to the request instead. For example, one could calculate the SHA-1 hash of the parameters, and use it as filename.</p>
</div>
<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
(function () {
var script = document.createElement("script");
script.type = "text/javascript";
script.src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
document.getElementsByTagName("head")[0].appendChild(script);
})();
</script>
</body>
</html>