mirror of
https://github.com/sensebox/opensensmapr
synced 2025-02-22 06:23:57 +01:00
update osem-history, add vignette deps to DESCRIPTION
This commit is contained in:
parent
1966c305bc
commit
e49ae4bb50
2 changed files with 22 additions and 18 deletions
|
@ -19,6 +19,9 @@ Suggests:
|
|||
rmarkdown,
|
||||
lubridate,
|
||||
units,
|
||||
jsonlite,
|
||||
ggplot2,
|
||||
zoo,
|
||||
lintr,
|
||||
testthat,
|
||||
covr
|
||||
|
|
|
@ -18,6 +18,9 @@ vignette: |
|
|||
%\VignetteIndexEntry{Visualising the History of openSenseMap.org} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8}
|
||||
---
|
||||
|
||||
> This vignette serves as an example on data wrangling & visualization with
|
||||
`opensensmapr`, `dplyr` and `ggplot2`.
|
||||
|
||||
```{r setup, results='hide', message=FALSE, warning=FALSE}
|
||||
# required packages:
|
||||
library(opensensmapr) # data download
|
||||
|
@ -28,13 +31,8 @@ library(zoo) # rollmean()
|
|||
```
|
||||
|
||||
openSenseMap.org has grown quite a bit in the last years; it would be interesting
|
||||
to see how we got to the current amount of sensor stations, especially split up
|
||||
by various attributes of the boxes.
|
||||
|
||||
```{r counts}
|
||||
# current number of sensor stations registered on the platform
|
||||
osem_counts()$boxes
|
||||
```
|
||||
to see how we got to the current `r osem_counts()$boxes` sensor stations,
|
||||
split up by various attributes of the boxes.
|
||||
|
||||
While `opensensmapr` provides extensive methods of filtering boxes by attributes
|
||||
on the server, we do the filtering within R to save time and gain flexibility.
|
||||
|
@ -46,8 +44,6 @@ So the first step is to retrieve *all the boxes*:
|
|||
boxes = osem_boxes()
|
||||
```
|
||||
|
||||
|
||||
|
||||
# Plot count of boxes by time {.tabset}
|
||||
By looking at the `createdAt` attribute of each box we know the exact time a box
|
||||
was registered.
|
||||
|
@ -126,7 +122,6 @@ First we group the boxes by `createdAt` into bins of one week:
|
|||
bins = 'week'
|
||||
mvavg_bins = 6
|
||||
|
||||
# get number of sensebox registrations by date
|
||||
growth = boxes %>%
|
||||
mutate(week = cut(as.Date(createdAt), breaks = bins)) %>%
|
||||
group_by(week) %>%
|
||||
|
@ -141,10 +136,8 @@ considered an approximation, because we have no information about intermediate
|
|||
inactive phases.
|
||||
Also deleted boxes would probably have a big impact here.
|
||||
```{r growthrate_inactive, warning=FALSE, message=FALSE, results='hide'}
|
||||
# get number of boxes boxes becoming inactive by date
|
||||
inactive = boxes %>%
|
||||
# updatedAt gets updated with each measurement, so we can use it as indicator for activity
|
||||
# remove boxes that were not updated in the last two days,
|
||||
# remove boxes that were updated in the last two days,
|
||||
# b/c any box becomes inactive at some point by definition of updatedAt
|
||||
filter(updatedAt < now() - days(2)) %>%
|
||||
mutate(week = cut(as.Date(updatedAt), breaks = bins)) %>%
|
||||
|
@ -168,13 +161,13 @@ ggplot(boxes_by_date, aes(x = as.Date(week), colour = event)) +
|
|||
|
||||
We see a sudden rise in early 2017, which lines up with the fast growing grouptag `Luftdaten`.
|
||||
This was enabled by an integration of openSenseMap.org into the firmware of the
|
||||
air quality monitoring project <https://luftdaten.info>.
|
||||
The dips in mid 2017 and early 2018 could possibly be explained by production/delivery issues,
|
||||
but I have no data on the exact time frames to verify.
|
||||
air quality monitoring project [luftdaten.info](https://luftdaten.info).
|
||||
The dips in mid 2017 and early 2018 could possibly be explained by production/delivery issues
|
||||
of the senseBox hardware, but I have no data on the exact time frames to verify.
|
||||
|
||||
# Plot duration of boxes being active {.tabset}
|
||||
While we are looking at `createdAt` and `updatedAt`, we can also extract the duration of activity
|
||||
of each box, and look at metrics by exposure and grouptag again:
|
||||
of each box, and look at metrics by exposure and grouptag once more:
|
||||
|
||||
## ...by exposure
|
||||
```{r exposure_duration, message=FALSE}
|
||||
|
@ -188,6 +181,10 @@ ggplot(duration, aes(x = exposure, y = duration)) +
|
|||
coord_flip() + ylab('Duration active in Days')
|
||||
```
|
||||
|
||||
The time of activity averages at only `r round(mean(duration$duration))` days,
|
||||
though there are boxes with `r round(max(duration$duration))` days of activity,
|
||||
spanning a large chunk of openSenseMap's existence.
|
||||
|
||||
## ...by grouptag
|
||||
```{r grouptag_duration, message=FALSE}
|
||||
duration = boxes %>%
|
||||
|
@ -210,6 +207,10 @@ duration %>%
|
|||
arrange(desc(duration_avg))
|
||||
```
|
||||
|
||||
The time of activity averages at only `r round(mean(duration$duration))` days,
|
||||
though there are boxes with `r round(max(duration$duration))` days of activity,
|
||||
spanning a large chunk of openSenseMap's existence.
|
||||
|
||||
## ...by year of registration
|
||||
This is less useful, as older boxes are active for a longer time by definition.
|
||||
If you have an idea how to compensate for that, please send a [Pull Request][PR]!
|
||||
|
@ -232,7 +233,7 @@ Other visualisations come to mind, and are left as an exercise to the reader.
|
|||
If you implemented some, feel free to add them to this vignette via a [Pull Request][PR].
|
||||
|
||||
* growth by phenomenon
|
||||
* growth by location -> (interactive) map?
|
||||
* growth by location -> (interactive) map
|
||||
* set inactive rate in relation to total box count
|
||||
* filter timespans with big dips in growth rate, and extrapolate the amount of
|
||||
senseBoxes that could be on the platform today, assuming there were no production issues ;)
|
||||
|
|
Loading…
Add table
Reference in a new issue