This vignette serves as an example on data wrangling & visualization with
opensensmapr
,dplyr
andggplot2
.
# required packages:
library(opensensmapr) # data download
library(dplyr) # data wrangling
library(ggplot2) # plotting
library(lubridate) # date arithmetic
library(zoo) # rollmean()
openSenseMap.org has grown quite a bit in the last years; it would be interesting to see how we got to the current 11367 sensor stations, split up by various attributes of the boxes.
While opensensmapr
provides extensive methods of
filtering boxes by attributes on the server, we do the filtering within
R to save time and gain flexibility. So the first step is to retrieve
all the boxes:
# if you want to see results for a specific subset of boxes,
# just specify a filter such as grouptag='ifgi' here
= osem_boxes() boxes
By looking at the createdAt
attribute of each box we
know the exact time a box was registered. With this approach we have no
information about boxes that were deleted in the meantime, but that’s
okay for now.
= boxes %>%
exposure_counts group_by(exposure) %>%
mutate(count = row_number(createdAt))
= c(indoor = 'red', outdoor = 'lightgreen', mobile = 'blue', unknown = 'darkgrey')
exposure_colors ggplot(exposure_counts, aes(x = createdAt, y = count, colour = exposure)) +
geom_line() +
scale_colour_manual(values = exposure_colors) +
xlab('Registration Date') + ylab('senseBox count')
Outdoor boxes are growing fast! We can also see the
introduction of mobile
sensor “stations” in 2017. While
mobile boxes are still few, we can expect a quick rise in 2018 once the
new senseBox MCU with GPS support is released.
Let’s have a quick summary:
%>%
exposure_counts summarise(
oldest = min(createdAt),
newest = max(createdAt),
count = max(count)
%>%
) arrange(desc(count))
exposure | oldest | newest | count |
---|---|---|---|
outdoor | 2016-08-09 19:34:42 | 2023-02-23 07:56:59 | 8413 |
indoor | 2018-05-10 20:14:44 | 2023-02-21 14:23:52 | 2344 |
mobile | 2020-10-24 14:39:30 | 2023-02-20 16:32:48 | 591 |
unknown | 2022-03-01 07:04:31 | 2022-03-30 11:25:43 | 19 |
We can try to find out where the increases in growth came from, by analysing the box count by grouptag.
Caveats: Only a small subset of boxes has a grouptag, and we should
assume that these groups are actually bigger. Also, we can see that
grouptag naming is inconsistent (Luftdaten
,
luftdaten.info
, …)
= boxes %>%
grouptag_counts group_by(grouptag) %>%
# only include grouptags with 8 or more members
filter(length(grouptag) >= 8 & !is.na(grouptag)) %>%
mutate(count = row_number(createdAt))
# helper for sorting the grouptags by boxcount
= function(oldFactor, ascending = TRUE) {
sortLvls = table(oldFactor) %>% sort(., decreasing = !ascending) %>% names()
lvls factor(oldFactor, levels = lvls)
}$grouptag = sortLvls(grouptag_counts$grouptag, ascending = FALSE)
grouptag_counts
ggplot(grouptag_counts, aes(x = createdAt, y = count, colour = grouptag)) +
geom_line(aes(group = grouptag)) +
xlab('Registration Date') + ylab('senseBox count')
%>%
grouptag_counts summarise(
oldest = min(createdAt),
newest = max(createdAt),
count = max(count)
%>%
) arrange(desc(count))
grouptag | oldest | newest | count |
---|---|---|---|
edu | 2022-03-30 11:25:43 | 2023-02-20 11:06:45 | 430 |
Save Dnipro | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 354 |
Luftdaten | 2022-03-30 11:25:43 | 2023-01-27 15:22:54 | 244 |
HU Explorers | 2022-03-30 11:25:43 | 2022-12-14 10:11:34 | 124 |
CS:iDrop | 2023-01-10 10:22:33 | 2023-01-31 15:13:46 | 120 |
#stropdeaer | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 101 |
321heiss | 2022-06-27 14:12:25 | 2022-08-08 10:22:21 | 91 |
GIZ Clean Air Day Project | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 76 |
Captographies | 2021-05-21 15:24:45 | 2023-01-31 12:11:49 | 62 |
Futurium | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 40 |
Bad_Hersfeld | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 37 |
TKS Bonn | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 36 |
Mikroprojekt Mitmachklima | 2022-03-30 11:25:43 | 2022-08-23 13:14:11 | 34 |
kerekdomb_ | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 34 |
Bottrop-Feinstaub | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 33 |
Luchtwachters Delft | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 33 |
Futurium 2021 | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 32 |
Feinstaub | 2022-03-30 11:25:43 | 2022-08-01 16:27:10 | 29 |
luftdaten.info | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 28 |
ifgi | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 26 |
SUGUCS | 2022-11-30 15:25:32 | 2023-01-23 13:17:54 | 25 |
cleanairfrome | 2022-03-30 11:25:43 | 2022-05-15 21:13:30 | 24 |
WAUW!denberg | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 23 |
freshairbromley | 2022-03-30 11:25:43 | 2023-01-31 10:18:57 | 23 |
Riga | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 22 |
KJR-M | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 21 |
Mikroklima | 2022-03-30 11:25:43 | 2022-09-05 08:38:57 | 21 |
bad_hersfeld | 2022-03-30 11:25:43 | 2022-06-14 09:34:02 | 21 |
Smart City MS | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 20 |
SekSeeland | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 19 |
Luftdaten.info | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 18 |
1 | 2022-03-30 11:25:43 | 2022-04-25 15:07:39 | 17 |
Apeldoorn | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 17 |
luftdaten | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 17 |
BurgerMeetnet | 2022-03-30 11:25:43 | 2022-05-10 21:22:35 | 16 |
Haus C | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 16 |
AGIN | 2022-11-28 17:33:12 | 2022-11-28 17:42:18 | 15 |
APPI | 2023-01-26 13:38:22 | 2023-01-26 13:40:59 | 15 |
BRGL | 2022-11-06 19:23:43 | 2022-11-06 22:08:36 | 15 |
BRGW | 2022-11-02 10:28:52 | 2022-11-02 13:32:12 | 15 |
Burgermeetnet | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 15 |
HTLJ | 2022-11-21 22:04:17 | 2022-11-21 22:05:47 | 15 |
MSGB | 2022-11-14 09:08:57 | 2022-11-14 10:19:24 | 15 |
MSHO | 2022-12-20 09:28:40 | 2022-12-20 10:01:38 | 15 |
MSIN | 2022-11-21 17:02:39 | 2022-11-21 23:06:22 | 15 |
MSKE | 2023-01-05 15:40:58 | 2023-01-05 15:52:02 | 15 |
MakeLight | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 15 |
PMSI | 2023-01-20 14:22:03 | 2023-01-20 14:31:52 | 15 |
Haus B | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 14 |
UrbanGarden | 2023-02-02 19:27:40 | 2023-02-18 14:50:19 | 14 |
Соседи по воздуху | 2022-03-30 11:25:43 | 2023-01-27 09:50:43 | 14 |
PIE | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 13 |
RB-DSJ | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 13 |
Sofia | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 12 |
co2mofetten | 2022-03-30 11:25:43 | 2023-01-17 07:38:21 | 12 |
Haus D | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 11 |
Netlight | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 11 |
home | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 11 |
#STROPDEAER | 2022-03-30 11:25:43 | 2023-02-16 15:12:50 | 10 |
AirAberdeen | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 10 |
Balthasar-Neumann-Schule 1 | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 10 |
Bestäuberprojekt | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 10 |
Che Aria Tira? | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 10 |
HBG Bonn | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 10 |
IntegrA | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 10 |
Mikroklima H | 2022-05-07 17:29:00 | 2022-05-07 17:47:42 | 10 |
dwih-sp | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 10 |
esri-de | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 10 |
makerspace-partheland | 2022-03-30 11:25:43 | 2023-02-20 18:34:50 | 10 |
montorioveronese.it | 2022-03-30 11:25:43 | 2022-12-29 07:45:57 | 10 |
ATSO | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 9 |
Fläming | 2022-08-15 19:16:48 | 2022-12-13 06:29:22 | 9 |
Mikroklima C-R | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 9 |
Ostroda | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 9 |
RSS | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 9 |
clevermint | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 9 |
test | 2022-03-30 11:25:43 | 2022-12-18 22:20:34 | 9 |
2 | 2022-03-30 11:25:43 | 2023-01-07 15:44:29 | 8 |
DBDS | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 8 |
Data4City | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 8 |
IKG | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 8 |
IVKOWeek | 2022-03-30 11:25:43 | 2022-07-05 09:42:31 | 8 |
Koerber-Stiftung | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 8 |
M7 | 2022-03-30 11:25:43 | 2022-11-28 13:00:44 | 8 |
Natlab Ökologie | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 8 |
PGKN | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 8 |
Raumanmeri | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 8 |
stw | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 8 |
First we group the boxes by createdAt
into bins of one
week:
= 'week'
bins = 6
mvavg_bins
= boxes %>%
growth mutate(week = cut(as.Date(createdAt), breaks = bins)) %>%
group_by(week) %>%
summarize(count = length(week)) %>%
mutate(event = 'registered')
We can do the same for updatedAt
, which informs us about
the last change to a box, including uploaded measurements. This method
of determining inactive boxes is fairly inaccurate and should be
considered an approximation, because we have no information about
intermediate inactive phases. Also deleted boxes would probably have a
big impact here.
= boxes %>%
inactive # remove boxes that were updated in the last two days,
# b/c any box becomes inactive at some point by definition of updatedAt
filter(updatedAt < now() - days(2)) %>%
mutate(week = cut(as.Date(updatedAt), breaks = bins)) %>%
group_by(week) %>%
summarize(count = length(week)) %>%
mutate(event = 'inactive')
Now we can combine both datasets for plotting:
= bind_rows(growth, inactive) %>% group_by(event)
boxes_by_date
ggplot(boxes_by_date, aes(x = as.Date(week), colour = event)) +
xlab('Time') + ylab(paste('rate per ', bins)) +
scale_x_date(date_breaks="years", date_labels="%Y") +
scale_colour_manual(values = c(registered = 'lightgreen', inactive = 'grey')) +
geom_point(aes(y = count), size = 0.5) +
# moving average, make first and last value NA (to ensure identical length of vectors)
geom_line(aes(y = rollmean(count, mvavg_bins, fill = list(NA, NULL, NA))))
We see a sudden rise in early 2017, which lines up with the fast
growing grouptag Luftdaten
. This was enabled by an
integration of openSenseMap.org into the firmware of the air quality
monitoring project luftdaten.info. The dips in mid
2017 and early 2018 could possibly be explained by production/delivery
issues of the senseBox hardware, but I have no data on the exact time
frames to verify.
While we are looking at createdAt
and
updatedAt
, we can also extract the duration of activity of
each box, and look at metrics by exposure and grouptag once more:
= boxes %>%
duration group_by(exposure) %>%
filter(!is.na(updatedAt)) %>%
mutate(duration = difftime(updatedAt, createdAt, units='days'))
ggplot(duration, aes(x = exposure, y = duration)) +
geom_boxplot() +
coord_flip() + ylab('Duration active in Days')
The time of activity averages at only 157 days, though there are boxes with 2389 days of activity, spanning a large chunk of openSenseMap’s existence.
= boxes %>%
duration group_by(grouptag) %>%
# only include grouptags with 8 or more members
filter(length(grouptag) >= 8 & !is.na(grouptag) & !is.na(updatedAt)) %>%
mutate(duration = difftime(updatedAt, createdAt, units='days'))
ggplot(duration, aes(x = grouptag, y = duration)) +
geom_boxplot() +
coord_flip() + ylab('Duration active in Days')
%>%
duration summarize(
duration_avg = round(mean(duration)),
duration_min = round(min(duration)),
duration_max = round(max(duration)),
oldest_box = round(max(difftime(now(), createdAt, units='days')))
%>%
) arrange(desc(duration_avg))
grouptag | duration_avg | duration_min | duration_max | oldest_box |
---|---|---|---|---|
Ostroda | 330 days | 330 days | 330 days | 330 days |
Mikroklima C-R | 328 days | 321 days | 330 days | 330 days |
Apeldoorn | 326 days | 263 days | 330 days | 330 days |
freshairbromley | 298 days | 23 days | 330 days | 330 days |
Mikroklima | 280 days | 42 days | 330 days | 330 days |
Mikroklima H | 279 days | 229 days | 292 days | 292 days |
Smart City MS | 272 days | 0 days | 330 days | 330 days |
Feinstaub | 220 days | 0 days | 330 days | 330 days |
co2mofetten | 212 days | 0 days | 330 days | 330 days |
makerspace-partheland | 210 days | 0 days | 330 days | 330 days |
Luftdaten | 208 days | 0 days | 330 days | 330 days |
luftdaten.info | 197 days | 0 days | 330 days | 330 days |
Burgermeetnet | 188 days | 0 days | 330 days | 330 days |
esri-de | 188 days | 0 days | 330 days | 330 days |
#stropdeaer | 185 days | 0 days | 330 days | 330 days |
Sofia | 170 days | 0 days | 330 days | 330 days |
WAUW!denberg | 166 days | 0 days | 330 days | 330 days |
KJR-M | 165 days | 0 days | 330 days | 330 days |
IKG | 162 days | 0 days | 330 days | 330 days |
AirAberdeen | 153 days | 0 days | 330 days | 330 days |
M7 | 152 days | 87 days | 243 days | 330 days |
1 | 145 days | 0 days | 330 days | 330 days |
BurgerMeetnet | 139 days | 0 days | 330 days | 330 days |
Luftdaten.info | 138 days | 0 days | 330 days | 330 days |
Bottrop-Feinstaub | 132 days | 0 days | 330 days | 330 days |
stw | 129 days | 0 days | 330 days | 330 days |
cleanairfrome | 128 days | 0 days | 330 days | 330 days |
montorioveronese.it | 128 days | 0 days | 330 days | 330 days |
RB-DSJ | 122 days | 0 days | 330 days | 330 days |
Mikroprojekt Mitmachklima | 117 days | 0 days | 330 days | 330 days |
Luchtwachters Delft | 109 days | 0 days | 330 days | 330 days |
BRGL | 107 days | 85 days | 109 days | 109 days |
Fläming | 107 days | 23 days | 175 days | 192 days |
BRGW | 106 days | 98 days | 113 days | 113 days |
PIE | 101 days | 0 days | 330 days | 330 days |
Riga | 101 days | 0 days | 330 days | 330 days |
kerekdomb_ | 100 days | 0 days | 330 days | 330 days |
luftdaten | 99 days | 0 days | 330 days | 330 days |
home | 94 days | 0 days | 330 days | 330 days |
Bad_Hersfeld | 93 days | 0 days | 330 days | 330 days |
dwih-sp | 91 days | 0 days | 330 days | 330 days |
MSGB | 89 days | 50 days | 101 days | 101 days |
AGIN | 86 days | 86 days | 86 days | 87 days |
HTLJ | 84 days | 58 days | 94 days | 94 days |
bad_hersfeld | 84 days | 0 days | 330 days | 330 days |
Соседи по воздуху | 84 days | 0 days | 330 days | 330 days |
Captographies | 78 days | 0 days | 643 days | 643 days |
Save Dnipro | 74 days | 0 days | 330 days | 330 days |
PGKN | 67 days | 0 days | 330 days | 330 days |
Netlight | 60 days | 0 days | 330 days | 330 days |
MSHO | 57 days | 36 days | 65 days | 65 days |
Futurium | 52 days | 0 days | 330 days | 330 days |
MSIN | 52 days | 0 days | 79 days | 94 days |
test | 52 days | 0 days | 329 days | 330 days |
ifgi | 51 days | 0 days | 330 days | 330 days |
#STROPDEAER | 50 days | 0 days | 330 days | 330 days |
ATSO | 48 days | 0 days | 279 days | 330 days |
2 | 46 days | 0 days | 310 days | 330 days |
MakeLight | 46 days | 0 days | 330 days | 330 days |
Haus B | 44 days | 0 days | 239 days | 330 days |
Futurium 2021 | 43 days | 0 days | 329 days | 330 days |
IVKOWeek | 42 days | 0 days | 330 days | 330 days |
DBDS | 41 days | 0 days | 330 days | 330 days |
GIZ Clean Air Day Project | 36 days | 0 days | 330 days | 330 days |
edu | 36 days | 0 days | 330 days | 330 days |
TKS Bonn | 32 days | 0 days | 330 days | 330 days |
HU Explorers | 28 days | 0 days | 319 days | 330 days |
321heiss | 24 days | 0 days | 43 days | 241 days |
SUGUCS | 9 days | 0 days | 53 days | 85 days |
APPI | 3 days | 0 days | 7 days | 28 days |
MSKE | 3 days | 0 days | 7 days | 49 days |
PMSI | 3 days | 0 days | 4 days | 34 days |
RSS | 3 days | 0 days | 28 days | 330 days |
CS:iDrop | 2 days | 0 days | 36 days | 44 days |
UrbanGarden | 2 days | 0 days | 12 days | 21 days |
Balthasar-Neumann-Schule 1 | 0 days | 0 days | 0 days | 330 days |
Bestäuberprojekt | 0 days | 0 days | 0 days | 330 days |
Che Aria Tira? | 0 days | 0 days | 0 days | 330 days |
Data4City | 0 days | 0 days | 0 days | 330 days |
HBG Bonn | 0 days | 0 days | 0 days | 330 days |
Haus C | 0 days | 0 days | 0 days | 330 days |
Haus D | 0 days | 0 days | 0 days | 330 days |
IntegrA | 0 days | 0 days | 0 days | 330 days |
Koerber-Stiftung | 0 days | 0 days | 0 days | 330 days |
Natlab Ökologie | 0 days | 0 days | 0 days | 330 days |
Raumanmeri | 0 days | 0 days | 0 days | 330 days |
SekSeeland | 0 days | 0 days | 0 days | 330 days |
clevermint | 0 days | 0 days | 0 days | 330 days |
The time of activity averages at only 89 days, though there are boxes with 643 days of activity, spanning a large chunk of openSenseMap’s existence.
This is less useful, as older boxes are active for a longer time by definition. If you have an idea how to compensate for that, please send a Pull Request!
# NOTE: boxes older than 2016 missing due to missing updatedAt in database
= boxes %>%
duration mutate(year = cut(as.Date(createdAt), breaks = 'year')) %>%
group_by(year) %>%
filter(!is.na(updatedAt)) %>%
mutate(duration = difftime(updatedAt, createdAt, units='days'))
ggplot(duration, aes(x = substr(as.character(year), 0, 4), y = duration)) +
geom_boxplot() +
coord_flip() + ylab('Duration active in Days') + xlab('Year of Registration')
Other visualisations come to mind, and are left as an exercise to the reader. If you implemented some, feel free to add them to this vignette via a Pull Request.