Visualising and Analysing Geographic and Movement Data
To identify the distinct areas of the city like residential, corporate and commercial areas.
To identify the busiest areas of the city ? Are there traffic bottlenecks that should be addressed?
To begin with, let us understand the distribution of Residential and Commercial Buildings in Engagement. We will also include schools in the visualisation to see if they reveal any patterns.
For this visualization, we will use the Buildings data set.
We will first load the required packages using the below code chunk
packages = c('sf','tmap','tidyverse','lubridate','clock','sftime','rmarkdown')
for(p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
buildings <- read_sf("data/Buildings.csv", options = "GEOM_POSSIBLE_NAMES=location")
pubs <- read_sf("data/Pubs.csv", options = "GEOM_POSSIBLE_NAMES=location")
restaurants <- read_sf("data/Restaurants.csv", options = "GEOM_POSSIBLE_NAMES=location")
employers <- read_sf("data/Employers.csv", options = "GEOM_POSSIBLE_NAMES=location")
apartments <- read_sf("data/Apartments.csv", options = "GEOM_POSSIBLE_NAMES=location")
Getting the participant data to understand the traffic bottlenecks
To view the regions where there are more residential vs commercial buildings, we will use thematic maps to visualize the buildings, colored by buildingType.
tmap_mode("plot")
tm_shape(buildings)+
tm_polygons(col = "buildingType",
size = 2,
border.col = "grey40",
border.lwd = 1,
palette = "Set2") +
tm_layout(main.title = "Distribution of Residential and Commercial Buildings",
main.title.size = 2)
From the above visualization, we can see that the residential areas are in the outskirts where as Commercial areas are centrally located.
##Location of Pubs and Restaurants
Let us now see where are the pubs and restaurants located in Engagement.
For this, we will plot the pubs and restaurants as dots on the above thematic map so that we can see where they are located with respect to residential and commerical buildings.
tmap_mode("plot")
tm_shape(buildings)+
tm_polygons(col = "buildingType",
size = 2,
border.col = "grey40",
border.lwd = 1,
palette = "Set2")+
tm_shape(pubs)+
tm_dots(col = "green",
size = 1)+
tm_shape(restaurants)+
tm_dots(col = "blue",
size = 1)+
tm_add_legend(type = "fill",
labels = c("Restaurants","Pubs"),
col = c("blue","green"))+
tm_layout(main.title = "Location of Pubs and Restaurants in Engagement, Ohio",
main.title.size = 2)
From the above visualization, it can be seen that many restaurants are located on the edge between the residential and commercial areas. There is a concentration of 3 pubs and a restaurant in the central area between residential and commercial buildings.
we will consider the time period between 7 and 10 am as the morning peak hours.
We will take only those records for which there are atleast 2 or more log records during this period so that we can visualise their travel path. For this we take the count of records for each participant and then filter by the number of records greater than 1.
logs_path <- logs_selected %>%
filter(day == 1) %>%
filter(hour >= 7) %>%
filter(hour <= 10) %>%
group_by(participantId,hour) %>%
summarize(m = mean(Timestamp),
do_union=FALSE) %>%
ungroup() %>%
group_by(participantId) %>%
mutate(count = n()) %>%
filter(count > 1 )%>%
group_by(participantId) %>%
summarize(do_union=FALSE) %>%
st_cast("LINESTRING")
tmap_mode("plot")
tm_shape(buildings)+
tm_polygons(col = "buildingType",
size = 2,
border.col = "grey40",
border.lwd = 1,
palette = "Set2",
alpha = 0.3) +
tm_shape(pubs)+
tm_dots(col = "green",
size = 0.5)+
tm_shape(restaurants)+
tm_dots(col = "blue",
size = 0.5)+
tm_shape(logs_path)+
tm_lines(col = "red",
alpha = 0.3) +
tm_add_legend(type = "fill",
labels = c("Restaurants","Pubs"),
col = c("blue","green"))+
tm_layout(main.title = "Morning Traffic bottlenecks in Engagement, Ohio",
main.title.size = 2)
tmap_mode("plot")
From the visualization above, we can see the traffic bottle necks during morning peak hours on Tuesdays. These are the areas with dark red color.It can be seen that the bottlenecks are where multiple paths converge between the residential and commercial areas.
We will consider the time period between 5pm to 7 pm as the evening peak hours
logs__eve_path <- logs_selected %>%
filter(day == 1) %>%
filter(hour >= 17) %>%
filter(hour <= 19) %>%
group_by(participantId,hour) %>%
summarize(m = mean(Timestamp),
do_union=FALSE) %>%
ungroup() %>%
group_by(participantId) %>%
mutate(count = n()) %>%
filter(count > 1 )%>%
group_by(participantId) %>%
summarize(do_union=FALSE) %>%
st_cast("LINESTRING")
tmap_mode("plot")
tm_shape(buildings)+
tm_polygons(col = "buildingType",
size = 2,
border.col = "grey40",
border.lwd = 1,
palette = "Set2",
alpha = 0.3) +
tm_shape(pubs)+
tm_dots(col = "green",
size = 0.5)+
tm_shape(restaurants)+
tm_dots(col = "blue",
size = 0.5)+
tm_shape(logs__eve_path)+
tm_lines(col = "cyan",
alpha = 0.3) +
tm_add_legend(type = "fill",
labels = c("Restaurants","Pubs"),
col = c("blue","green"))+
tm_layout(main.title = "Evening Traffic bottlenecks in Engagement, Ohio",
main.title.size = 2)
tmap_mode("plot")
From the above visualization of evening traffic, we can see that the traffic bottlenecks are in th same areas as the mornings. It can be seen there is high traffic near the region with the 3 pubs and a restaurant.
For this, we will find the first row of transport for each participant by selecting rows where the currentMode is Transport and the previous row is not Transport.
We will take the time period between 5 am and 10 am to see all the participants morning travel origin.
We will now plot the origin points on the thematic map showing residential and commercial areas.
tmap_mode("plot")
tm_shape(buildings)+
tm_polygons(col = "buildingType",
size = 2,
border.col = "grey40",
border.lwd = 1,
palette = "Set2",
alpha = 0.3) +
tm_shape(pubs)+
tm_dots(col = "green",
size = 0.5)+
tm_shape(restaurants)+
tm_dots(col = "blue",
size = 0.5)+
tm_shape(starting_point)+
tm_bubbles(col = "red",
size = 0.5,
shape = 1)+
tm_add_legend(type = "fill",
labels = c("Restaurants","Pubs"),
col = c("blue","green"))+
tm_layout(main.title = "Morning Traffic Origins in Engagement, Ohio",
main.title.size = 2)
tmap_mode("plot")
From the above visualization, it can be seen that the origin points are concentrated in the residential areas as expected. Also, notice that there are lesser origin points in the southern regions.
This could be because we have lesser participants from this region.
Let us now observe the origin points of the evening traffic.
tmap_mode("plot")
tm_shape(buildings)+
tm_polygons(col = "buildingType",
size = 2,
border.col = "grey40",
border.lwd = 1,
palette = "Set2",
alpha = 0.3) +
tm_shape(pubs)+
tm_dots(col = "green",
size = 0.5)+
tm_shape(restaurants)+
tm_dots(col = "blue",
size = 0.5)+
tm_shape(starting_point)+
tm_bubbles(col = "red",
size = 0.5,
shape = 1)+
tm_add_legend(type = "fill",
labels = c("Restaurants","Pubs"),
col = c("blue","green"))+
tm_layout(main.title = "Evening Traffic Origins in Engagement, Ohio",
main.title.size = 2)
tmap_mode("plot")
Evening traffic origin is not as well defined as the morning traffic origin, we can see that many origin points are in the residential areas where as one would expect the evening traffic to origin mainly in the commercial areasa.
On further exploring the dataset, it is seen that the participants reach back home and then again leave home for recreational activities resulting in origin points in the residential areas.
To overcome this, we will change the time period to 2pm to 5 pm. Now it can be observed that most of the origin points are in the commercial areas.
tmap_mode("plot")
tm_shape(buildings)+
tm_polygons(col = "buildingType",
size = 2,
border.col = "grey40",
border.lwd = 1,
palette = "Set2",
alpha = 0.3) +
tm_shape(pubs)+
tm_dots(col = "green",
size = 0.5)+
tm_shape(restaurants)+
tm_dots(col = "blue",
size = 0.5)+
tm_shape(starting_point)+
tm_bubbles(col = "red",
size = 0.5,
shape = 1)+
tm_add_legend(type = "fill",
labels = c("Restaurants","Pubs"),
col = c("blue","green"))+
tm_layout(main.title = "Early Evening Traffic origins in Engagement, Ohio",
main.title.size = 2)
tmap_mode("plot")
Residential areas are mainly concentrated in the outskirts and commercial areas are towards the central region of Engagement.
Pubs and restaurants are distributed more or less uniformly with some concentration seen in the central region.
Traffic bottlenecks are in the regions where multiple roads converge between residential and commercial areas.
Traffic bottlenecks are in the same regions during morning and evening peak hours during a weekday.
Origin of traffic in the morning peak hours are from residential areas where as evening peak traffic has origin in both residential as well as commercial areas owing to people heading to reacreational activities in the evening.