Visualising and Analysing Community Network
To understand the social networks in the City of Engagement, Ohio.
We will first load the required packages using the below code chunk
packages = c('tidyverse', 'tidygraph', 'ggraph','visNetwork','lubridate','clock')
for(p in packages){
if(!require(p,character.only = TRUE)){
install.packages(p)
}
library(p, character.only = TRUE)
}
Let us import the social network data of the participants and save it to an rds file. We will also import the participants to understand patterns.
We will group the data by the participantId pairs and add count column to understand how strong the relationship between 2 participants is, considering the relationship to be strong with higher number of times the participants interact with each other.
countEdgeData <- edgeData %>%
group_by(participantIdFrom,participantIdTo) %>%
summarise(count = n()) %>%
ungroup()
Since the Social data is huge size. Let us save it as an rds file. We will use this rds file going forward.
countEdgeData <- read_rds("data/rds/countEdgeData.rds")
Let us see the distribution of counts of interaction between participants. It is seen that the vast majority of participant pairs have counts less than 100.
hist(countEdgeData$count)
NodeData <- countEdgeData %>%
select(participantId = participantIdFrom) %>%
distinct(participantId) %>%
mutate(participantId = as.character(participantId)) %>%
merge(participantData)
countEdgeData <- countEdgeData %>%
filter(count > 250) %>%
mutate(participantIdFrom = as.character(participantIdFrom)) %>%
mutate(participantIdTo = as.character(participantIdTo))
Let us now construct the network graph data.frame of Tidygraph. We will then review the number of nodes and edges in the tbl_graph object. We can also see that the Node data is active.
social_graph <- tbl_graph(nodes = NodeData,
edges = countEdgeData,
directed = TRUE)
social_graph
# A tbl_graph: 963 nodes and 2896 edges
#
# A directed simple graph with 322 components
#
# Node Data: 963 x 8 (active)
participantId householdSize haveKids age educationLevel
<chr> <dbl> <lgl> <dbl> <chr>
1 0 3 TRUE 36 HighSchoolOrC~
2 1 3 TRUE 25 HighSchoolOrC~
3 10 3 TRUE 48 HighSchoolOrC~
4 100 2 FALSE 29 Low
5 1000 1 FALSE 56 Graduate
6 1001 1 FALSE 49 Graduate
# ... with 957 more rows, and 3 more variables: interestGroup <chr>,
# joviality <dbl>, agegroup <fct>
#
# Edge Data: 2,896 x 3
from to count
<int> <int> <int>
1 124 98 279
2 341 878 355
3 341 29 315
# ... with 2,893 more rows
We will use width of the edges to represent the weight of the edge. we will also increase transparency of the edges for better clarity. We will also color the nodes based on the education level of the participants. We will also adjust the size of the nodes based on the age of the participants.
set.seed(1234)
ggraph(social_graph,
layout = "stress")+
geom_edge_link(aes(width=count),
alpha =0.2)+
scale_edge_width(range = c(0.1,5))+
geom_node_point(aes(colour = educationLevel, size = agegroup))+
theme_graph()
From the network graph above, we can see that the participants with different Education Levels are more or less equally distributed among the network.
Similarly, different age groups can be seen present throughout the network.
Probably the reason we do not see a definite pattern from the network graph maybe because we have filtered out the edges with weight less than 250. This has also resulted in the network getting broken resulting in a large number of isolated networks.