Take Home Exercise 05

Visualising and Analysing Geographic and Movement Data

Rakendu Ramesh https://www.linkedin.com/in/rakendu-ramesh/ (Singapore Management University)https://www.linkedin.com/showcase/smumitb/
2022-05-30

Objective

Dataset

To begin with, let us understand the distribution of Residential and Commercial Buildings in Engagement. We will also include schools in the visualisation to see if they reveal any patterns.

For this visualization, we will use the Buildings data set.

Getting Started

We will first load the required packages using the below code chunk

packages = c('sf','tmap','tidyverse','lubridate','clock','sftime','rmarkdown')
for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

Importing data

buildings <- read_sf("data/Buildings.csv", options = "GEOM_POSSIBLE_NAMES=location")

pubs <- read_sf("data/Pubs.csv", options = "GEOM_POSSIBLE_NAMES=location")

restaurants <- read_sf("data/Restaurants.csv", options = "GEOM_POSSIBLE_NAMES=location")

employers <- read_sf("data/Employers.csv", options = "GEOM_POSSIBLE_NAMES=location")

apartments <- read_sf("data/Apartments.csv", options = "GEOM_POSSIBLE_NAMES=location")

Getting the participant data to understand the traffic bottlenecks

Visualizing the location of Residential and Commercial buildings

To view the regions where there are more residential vs commercial buildings, we will use thematic maps to visualize the buildings, colored by buildingType.

tmap_mode("plot")
tm_shape(buildings)+
tm_polygons(col = "buildingType",
           size = 2,
           border.col = "grey40",
           border.lwd = 1,
           palette = "Set2") +
  tm_layout(main.title = "Distribution of Residential and Commercial Buildings",
            main.title.size = 2)

From the above visualization, we can see that the residential areas are in the outskirts where as Commercial areas are centrally located.

##Location of Pubs and Restaurants

Let us now see where are the pubs and restaurants located in Engagement.

For this, we will plot the pubs and restaurants as dots on the above thematic map so that we can see where they are located with respect to residential and commerical buildings.

tmap_mode("plot")
tm_shape(buildings)+
tm_polygons(col = "buildingType",
           size = 2,
           border.col = "grey40",
           border.lwd = 1,
           palette = "Set2")+
    tm_shape(pubs)+
    tm_dots(col = "green",
            size = 1)+
    tm_shape(restaurants)+
    tm_dots(col = "blue",
            size = 1)+
  tm_add_legend(type = "fill",
                labels = c("Restaurants","Pubs"),
                col = c("blue","green"))+
  tm_layout(main.title = "Location of Pubs and Restaurants in Engagement, Ohio",
            main.title.size = 2)

From the above visualization, it can be seen that many restaurants are located on the edge between the residential and commercial areas. There is a concentration of 3 pubs and a restaurant in the central area between residential and commercial buildings.

Identifying traffic bottle necks during peak hours

Let us consider the morning peak hours on a tuesday

we will consider the time period between 7 and 10 am as the morning peak hours.

We will take only those records for which there are atleast 2 or more log records during this period so that we can visualise their travel path. For this we take the count of records for each participant and then filter by the number of records greater than 1.

logs_path <- logs_selected %>%
  filter(day == 1) %>%
  filter(hour >= 7) %>%
  filter(hour <= 10) %>%
  group_by(participantId,hour) %>%
  summarize(m = mean(Timestamp), 
          do_union=FALSE) %>%
  ungroup() %>%
  group_by(participantId) %>%
  mutate(count = n()) %>%
  filter(count > 1 )%>%
    group_by(participantId) %>%
  summarize(do_union=FALSE) %>%
  st_cast("LINESTRING")
tmap_mode("plot")
tm_shape(buildings)+
tm_polygons(col = "buildingType",
           size = 2,
           border.col = "grey40",
           border.lwd = 1,
           palette = "Set2",
           alpha = 0.3) +
      tm_shape(pubs)+
    tm_dots(col = "green",
            size = 0.5)+
    tm_shape(restaurants)+
    tm_dots(col = "blue",
            size = 0.5)+
  tm_shape(logs_path)+
    tm_lines(col = "red",
             alpha = 0.3) +
  tm_add_legend(type = "fill",
                labels = c("Restaurants","Pubs"),
                col = c("blue","green"))+
  tm_layout(main.title = "Morning Traffic bottlenecks in Engagement, Ohio",
            main.title.size = 2)
tmap_mode("plot")

From the visualization above, we can see the traffic bottle necks during morning peak hours on Tuesdays. These are the areas with dark red color.It can be seen that the bottlenecks are where multiple paths converge between the residential and commercial areas.

Let us consider the evening peak hours on tuesdays

We will consider the time period between 5pm to 7 pm as the evening peak hours

logs__eve_path <- logs_selected %>%
  filter(day == 1) %>%
  filter(hour >= 17) %>%
  filter(hour <= 19) %>%
  group_by(participantId,hour) %>%
  summarize(m = mean(Timestamp), 
          do_union=FALSE) %>%
  ungroup() %>%
  group_by(participantId) %>%
  mutate(count = n()) %>%
  filter(count > 1 )%>%
    group_by(participantId) %>%
  summarize(do_union=FALSE) %>%
  st_cast("LINESTRING")
tmap_mode("plot")
tm_shape(buildings)+
tm_polygons(col = "buildingType",
           size = 2,
           border.col = "grey40",
           border.lwd = 1,
           palette = "Set2",
           alpha = 0.3) +
      tm_shape(pubs)+
    tm_dots(col = "green",
            size = 0.5)+
    tm_shape(restaurants)+
    tm_dots(col = "blue",
            size = 0.5)+
  tm_shape(logs__eve_path)+
    tm_lines(col = "cyan",
             alpha = 0.3) +
  tm_add_legend(type = "fill",
                labels = c("Restaurants","Pubs"),
                col = c("blue","green"))+
  tm_layout(main.title = "Evening Traffic bottlenecks in Engagement, Ohio",
            main.title.size = 2)
tmap_mode("plot")

From the above visualization of evening traffic, we can see that the traffic bottlenecks are in th same areas as the mornings. It can be seen there is high traffic near the region with the 3 pubs and a restaurant.

Let us try to add a layer to show the starting point of journeys of the participants inorder to understand the Origin of the journeys.

For this, we will find the first row of transport for each participant by selecting rows where the currentMode is Transport and the previous row is not Transport.

We will take the time period between 5 am and 10 am to see all the participants morning travel origin.

starting_point <- start_select %>%
  filter(hour >= 5) %>%
  filter(hour <= 10) %>%
  group_by(participantId) %>%
  summarize(start = min(Timestamp), minCurrentLocation = currentLocation[which(Timestamp == min(Timestamp))])

We will now plot the origin points on the thematic map showing residential and commercial areas.

tmap_mode("plot")
tm_shape(buildings)+
tm_polygons(col = "buildingType",
           size = 2,
           border.col = "grey40",
           border.lwd = 1,
           palette = "Set2",
           alpha = 0.3) +
      tm_shape(pubs)+
    tm_dots(col = "green",
            size = 0.5)+
    tm_shape(restaurants)+
    tm_dots(col = "blue",
            size = 0.5)+
      tm_shape(starting_point)+
    tm_bubbles(col = "red",
            size = 0.5,
            shape = 1)+
  tm_add_legend(type = "fill",
                labels = c("Restaurants","Pubs"),
                col = c("blue","green"))+
  tm_layout(main.title = "Morning Traffic Origins in Engagement, Ohio",
            main.title.size = 2)
tmap_mode("plot")

From the above visualization, it can be seen that the origin points are concentrated in the residential areas as expected. Also, notice that there are lesser origin points in the southern regions.

This could be because we have lesser participants from this region.

Let us now observe the origin points of the evening traffic.

starting_point <- start_select %>%
  filter(hour >= 17) %>%
  filter(hour <= 19) %>%
  group_by(participantId) %>%
  summarize(start = min(Timestamp), minCurrentLocation = currentLocation[which(Timestamp == min(Timestamp))])
tmap_mode("plot")
tm_shape(buildings)+
tm_polygons(col = "buildingType",
           size = 2,
           border.col = "grey40",
           border.lwd = 1,
           palette = "Set2",
           alpha = 0.3) +
      tm_shape(pubs)+
    tm_dots(col = "green",
            size = 0.5)+
    tm_shape(restaurants)+
    tm_dots(col = "blue",
            size = 0.5)+
      tm_shape(starting_point)+
    tm_bubbles(col = "red",
            size = 0.5,
            shape = 1)+
  tm_add_legend(type = "fill",
                labels = c("Restaurants","Pubs"),
                col = c("blue","green"))+
  tm_layout(main.title = "Evening Traffic Origins in Engagement, Ohio",
            main.title.size = 2)
tmap_mode("plot")

Evening traffic origin is not as well defined as the morning traffic origin, we can see that many origin points are in the residential areas where as one would expect the evening traffic to origin mainly in the commercial areasa.

On further exploring the dataset, it is seen that the participants reach back home and then again leave home for recreational activities resulting in origin points in the residential areas.

To overcome this, we will change the time period to 2pm to 5 pm. Now it can be observed that most of the origin points are in the commercial areas.

starting_point <- start_select %>%
  filter(hour >= 14) %>%
  filter(hour <= 17) %>%
  group_by(participantId) %>%
  summarize(start = min(Timestamp), minCurrentLocation = currentLocation[which(Timestamp == min(Timestamp))])
tmap_mode("plot")
tm_shape(buildings)+
tm_polygons(col = "buildingType",
           size = 2,
           border.col = "grey40",
           border.lwd = 1,
           palette = "Set2",
           alpha = 0.3) +
      tm_shape(pubs)+
    tm_dots(col = "green",
            size = 0.5)+
    tm_shape(restaurants)+
    tm_dots(col = "blue",
            size = 0.5)+
      tm_shape(starting_point)+
    tm_bubbles(col = "red",
            size = 0.5,
            shape = 1)+
  tm_add_legend(type = "fill",
                labels = c("Restaurants","Pubs"),
                col = c("blue","green"))+
  tm_layout(main.title = "Early Evening Traffic origins in Engagement, Ohio",
            main.title.size = 2)
tmap_mode("plot")

Conclusion