update osem-history, add vignette deps to DESCRIPTION

measurements_archive
noerw 7 years ago
parent 1966c305bc
commit e49ae4bb50

@ -19,6 +19,9 @@ Suggests:
rmarkdown,
lubridate,
units,
jsonlite,
ggplot2,
zoo,
lintr,
testthat,
covr

@ -18,6 +18,9 @@ vignette: |
%\VignetteIndexEntry{Visualising the History of openSenseMap.org} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8}
---
> This vignette serves as an example on data wrangling & visualization with
`opensensmapr`, `dplyr` and `ggplot2`.
```{r setup, results='hide', message=FALSE, warning=FALSE}
# required packages:
library(opensensmapr) # data download
@ -28,13 +31,8 @@ library(zoo) # rollmean()
```
openSenseMap.org has grown quite a bit in the last years; it would be interesting
to see how we got to the current amount of sensor stations, especially split up
by various attributes of the boxes.
```{r counts}
# current number of sensor stations registered on the platform
osem_counts()$boxes
```
to see how we got to the current `r osem_counts()$boxes` sensor stations,
split up by various attributes of the boxes.
While `opensensmapr` provides extensive methods of filtering boxes by attributes
on the server, we do the filtering within R to save time and gain flexibility.
@ -46,8 +44,6 @@ So the first step is to retrieve *all the boxes*:
boxes = osem_boxes()
```
# Plot count of boxes by time {.tabset}
By looking at the `createdAt` attribute of each box we know the exact time a box
was registered.
@ -126,7 +122,6 @@ First we group the boxes by `createdAt` into bins of one week:
bins = 'week'
mvavg_bins = 6
# get number of sensebox registrations by date
growth = boxes %>%
mutate(week = cut(as.Date(createdAt), breaks = bins)) %>%
group_by(week) %>%
@ -141,10 +136,8 @@ considered an approximation, because we have no information about intermediate
inactive phases.
Also deleted boxes would probably have a big impact here.
```{r growthrate_inactive, warning=FALSE, message=FALSE, results='hide'}
# get number of boxes boxes becoming inactive by date
inactive = boxes %>%
# updatedAt gets updated with each measurement, so we can use it as indicator for activity
# remove boxes that were not updated in the last two days,
# remove boxes that were updated in the last two days,
# b/c any box becomes inactive at some point by definition of updatedAt
filter(updatedAt < now() - days(2)) %>%
mutate(week = cut(as.Date(updatedAt), breaks = bins)) %>%
@ -168,13 +161,13 @@ ggplot(boxes_by_date, aes(x = as.Date(week), colour = event)) +
We see a sudden rise in early 2017, which lines up with the fast growing grouptag `Luftdaten`.
This was enabled by an integration of openSenseMap.org into the firmware of the
air quality monitoring project <https://luftdaten.info>.
The dips in mid 2017 and early 2018 could possibly be explained by production/delivery issues,
but I have no data on the exact time frames to verify.
air quality monitoring project [luftdaten.info](https://luftdaten.info).
The dips in mid 2017 and early 2018 could possibly be explained by production/delivery issues
of the senseBox hardware, but I have no data on the exact time frames to verify.
# Plot duration of boxes being active {.tabset}
While we are looking at `createdAt` and `updatedAt`, we can also extract the duration of activity
of each box, and look at metrics by exposure and grouptag again:
of each box, and look at metrics by exposure and grouptag once more:
## ...by exposure
```{r exposure_duration, message=FALSE}
@ -188,6 +181,10 @@ ggplot(duration, aes(x = exposure, y = duration)) +
coord_flip() + ylab('Duration active in Days')
```
The time of activity averages at only `r round(mean(duration$duration))` days,
though there are boxes with `r round(max(duration$duration))` days of activity,
spanning a large chunk of openSenseMap's existence.
## ...by grouptag
```{r grouptag_duration, message=FALSE}
duration = boxes %>%
@ -210,6 +207,10 @@ duration %>%
arrange(desc(duration_avg))
```
The time of activity averages at only `r round(mean(duration$duration))` days,
though there are boxes with `r round(max(duration$duration))` days of activity,
spanning a large chunk of openSenseMap's existence.
## ...by year of registration
This is less useful, as older boxes are active for a longer time by definition.
If you have an idea how to compensate for that, please send a [Pull Request][PR]!
@ -232,7 +233,7 @@ Other visualisations come to mind, and are left as an exercise to the reader.
If you implemented some, feel free to add them to this vignette via a [Pull Request][PR].
* growth by phenomenon
* growth by location -> (interactive) map?
* growth by location -> (interactive) map
* set inactive rate in relation to total box count
* filter timespans with big dips in growth rate, and extrapolate the amount of
senseBoxes that could be on the platform today, assuming there were no production issues ;)

Loading…
Cancel
Save