--- title: "Analyzing environmental sensor data from openSenseMap.org in R" author: "Norwin Roosen" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: fig_margin: 0 fig_width: 6 fig_height: 4 vignette: > %\VignetteIndexEntry{Analyzing environmental sensor data from openSenseMap.org in R} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` ## Analyzing environmental sensor data from openSenseMap.org in R This package provides data ingestion functions for almost any data stored on the open data platform for environemental sensordata . Its main goals are to provide means for: - big data analysis of the measurements stored on the platform - sensor metadata analysis (sensor counts, spatial distribution, temporal trends) > *Please note:* The openSenseMap API is sometimes a bit unstable when streaming long responses, which results in `curl` complaining about `Unexpected EOF`. This bug is being worked on upstream. Meanwhile you have to retry the request when this occurs. ### Exploring the dataset Before we look at actual observations, lets get a grasp of the openSenseMap datasets' structure. ```{r results = F} library(magrittr) library(opensensmapr) all_sensors = osem_boxes() ``` ```{r} summary(all_sensors) ``` This gives a good overview already: As of writing this, there are more than 600 sensor stations, of which ~50% are currently running. Most of them are placed outdoors and have around 5 sensors each. The oldest station is from May 2014, while the latest station was registered a couple of minutes ago. Another feature of interest is the spatial distribution of the boxes. `plot()` can help us out here. This function requires a bunch of optional dependcies though. ```{r message=F, warning=F} if (!require('maps')) install.packages('maps') if (!require('maptools')) install.packages('maptools') if (!require('rgeos')) install.packages('rgeos') plot(all_sensors) ``` It seems we have to reduce our area of interest to Germany. But what do these sensor stations actually measure? Lets find out. `osem_phenomena()` gives us a named list of of the counts of each observed phenomenon for the given set of sensor stations: ```{r} phenoms = osem_phenomena(all_sensors) str(phenoms) ``` Thats quite some noise there, with many phenomena being measured by a single sensor only, or many duplicated phenomena due to slightly different spellings. We should clean that up, but for now let's just filter out the noise and find those phenomena with high sensor numbers: ```{r} phenoms[phenoms > 20] ``` Alright, temperature it is! Fine particulate matter (PM2.5) seems to be more interesting to analyze though. We should check how many sensor stations provide useful data: We want only those boxes with a PM2.5 sensor, that are placed outdoors and are currently submitting measurements: ```{r results = F} pm25_sensors = osem_boxes( exposure = 'outdoor', date = Sys.time(), # ±4 hours phenomenon = 'PM2.5' ) ``` ```{r} summary(pm25_sensors) plot(pm25_sensors) ``` Thats still more than 200 measuring stations, we can work with that. ### Analyzing sensor data Having analyzed the available data sources, let's finally get some measurements. We could call `osem_measurements(pm25_sensors)` now, however we are focussing on a restricted area of interest, the city of Berlin. Luckily we can get the measurements filtered by a bounding box: ```{r} library(sf) library(units) library(lubridate) # construct a bounding box: 12 kilometers around Berlin berlin = st_point(c(13.4034, 52.5120)) %>% st_sfc(crs = 4326) %>% st_transform(3857) %>% # allow setting a buffer in meters st_buffer(units::set_units(12, km)) %>% st_transform(4326) %>% # the opensensemap expects WGS 84 st_bbox() ``` ```{r results = F} pm25 = osem_measurements( berlin, phenomenon = 'PM2.5', from = now() - days(7), # defaults to 2 days to = now() ) plot(pm25) ``` Now we can get started with actual spatiotemporal data analysis. First plot the measuring locations: ```{r} pm25_sf = osem_as_sf(pm25) plot(st_geometry(pm25_sf), axes = T) ``` further analysis: `TODO`