Take Home Exercise 02

A peer review for mutual learning and growth.

Rakendu Ramesh https://www.linkedin.com/in/rakendu-ramesh/ (Singapore Management University)https://www.linkedin.com/showcase/smumitb/
2022-05-01

Overview

In this Take Home exercise, I would like to review my esteemed peer Jiarui Cui’s Take Home Ex 01. With this review, I hope to learn from my peer and hope to shed some light on the highlights of her first assignment.

Getting Started

We will first load the required packages using the below code chunk

packages = c('tidyverse','ggdist','gghalves','ggthemes','hrbrthemes','ggridges','patchwork')
for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

Importing Data

The code chunk below imports Participants.csv from the data folder by using read_csv() function of readr package into R and save it as a tibble dataframe called participants_data.

participants_data <- read_csv("data/Participants.csv")
jobs <- read_csv('data/Jobs.csv')

Boxplot to understand the joviality by education level

Interestingly, a boxplot is used to understand the joviality of different groups based on education level. The mean point is also added which makes it easier to compare the mean joviality of different education level groups, It can be seen that the bachelors have a higher joviality. Let us try to represent this using a density plot.

I believe the overlapping plots makes comparison easier. We can see that the joviality is generally low among all education levels. Also it can be seen that more people with Low Education Level have low joviality where as the graduate group has a more equitable density at low and high joviality values.

P1 <- ggplot(participants_data,
       aes(x=joviality, colour = educationLevel))+
  geom_density(size =1)+
  labs(colour = "Education Level")+
  ylab("Density")+
  xlab("Joviality")+
  ggtitle("Distribution of Joviality by Education Levels")+
  theme(axis.title.y = element_text(angle=0, margin = margin(r=10)),
        axis.title = element_text(size=16),
        panel.grid.minor.y = element_blank(),
        panel.grid.minor.x = element_blank(),
        panel.grid.major.x = element_line(colour = "grey90"),
        panel.grid.major.y = element_line(colour = "grey90"),
        panel.background = element_rect(fill = "white"),
        axis.text = element_text(size =16),
        legend.title = element_text(size =16),
        legend.text = element_text(size = 16),
        plot.title = element_text(size =20,hjust = 0.5)
        )+
  scale_y_continuous(expand = c(0,0),limits = c(0,1.5))

P1

Disturbing trend of fewer people with higher education level having kids.

The stacked bar chart very clearly shows that there are fewer people with kids who are graduates and bachelors educated. Let us take a look at the same data compared to the total number of people at different education levels to see if we can draw any more inferences.

From the plot below, we can see that the proportion of participants with kids is relatively low for all education levels. Therefore, we do not have enough evidence to state that more participants with higher education do not have kids.

level_order <- c('Low','HighSchoolOrCollege','Bachelors','Graduate')
d <- participants_data
d_bg <- d[,-3]
P2 <- ggplot(d, aes(x = factor(educationLevel,levels = level_order))) + 
  geom_bar(data=d_bg, fill="grey", alpha=.5) +
  geom_bar(colour="black", fill = "light blue") + 
  coord_flip()+
  guides(fill = FALSE) + 
   theme_ipsum(axis_title_size = 16, grid ="X")+
  theme(axis.title.y=element_text(angle= 0, margin = margin(r=-70)),
        axis.title = element_text(size=16),
        axis.text = element_text(size =16))+
  xlab("Education\nLevel")+
  ylab("No. of participants")+
    facet_wrap(~ haveKids,
             labeller = labeller(haveKids = c("TRUE" = "Have Kids",
                                              "FALSE" = "Do not have kids")))+
  ggtitle("Study of relationship between education levels and kids")

P2

Joviality index by age group

The density plot for joviality index for different age groups clearly show that more people in the age group 18-30 have higher joviality index where as more people in th eage group 40 -60 have lower joviality. Having kids seem to have made a difference in the joviality of age group 18-40. The plot is clear. I would only recommend minor aesthetic changes for easy readability, like making the y axis title horizontal, removing the minor grid lines, increasing thickness of the density lines and adding yrs to the facet wrap labels. I would also add a title to all the visualizations.

participants_data$agegroup <- cut(participants_data$age, breaks = c(17,30,40,50,60), 
                             labels = c("18 - 30 yrs","30 - 40 yrs","40 - 50 yrs","50 - 60 yrs"))


P3 <- ggplot(data=participants_data, 
       aes(x =joviality,colour = haveKids)) +
  geom_density(size = 1)+
  ylab("Density")+
  xlab("Joviality")+
  labs(colour = "Have Kids")+
  theme(axis.title.y=element_text(angle =0, margin = margin(t=-50,r=10)),
        axis.title.x=element_text(margin = margin(t=10)),
        panel.grid.minor.y = element_blank(),
        panel.grid.minor.x = element_blank(),
        panel.grid.major.x = element_line(colour = "grey90"),
        panel.grid.major.y = element_line(colour = "grey90"),
        panel.background = element_rect(fill = "white"),
        axis.title = element_text(size=16),
        axis.text = element_text(size =16),
        plot.title = element_text(size =20,hjust = 0.5),
        legend.title = element_text(size =16),
        legend.text = element_text(size = 16))+
  facet_wrap(~ agegroup,nrow = 2)+
  annotate("segment", x=-Inf, xend=Inf, y=-Inf, yend=-Inf, colour = "grey50")+
  annotate("segment", x=-Inf, xend=-Inf, y=-Inf, yend=Inf, colour = "grey50")+
  ggtitle("Joviality of Age groups with and without kids")

P3

Joviality Index by Education Level

The same aesthetic changes can be applied to Jovilaity index of participants having kids and not by their education level.

level_order <- c('Low','HighSchoolOrCollege','Bachelors','Graduate')

participants_data$educationLevel <- factor(participants_data$educationLevel, levels = level_order)

ggplot(data=participants_data, 
       aes(x =joviality,colour = haveKids)) +
  geom_density(size = 1)+
  facet_wrap(~ educationLevel,nrow = 2,labeller = labeller(educationLevel = c("Low" = "Low",
                                                                              "HighSchoolOrCollege" = "High School or College",
                                                                              "Bachelors" = "Bachelors",
                                                                              "Graduate" = "Graduate")))+
  ylab("Density")+
  xlab("Joviality")+
  labs(colour='Have Kids')+
  theme(axis.title.y=element_text(angle =0, margin = margin(t=-50,r=10)),
        axis.title.x=element_text(margin = margin(t=10)),
        panel.grid.minor.y = element_blank(),
        panel.grid.minor.x = element_blank(),
        panel.grid.major.x = element_line(colour = "grey90"),
        panel.grid.major.y = element_line(colour = "grey90"),
        panel.background = element_rect(fill = "white"),
        axis.title = element_text(size=16),
        axis.text = element_text(size =16),
        plot.title = element_text(size =20,hjust = 0.5),
        legend.title = element_text(size =16),
        legend.text = element_text(size = 16)
        )+
  annotate("segment", x=-Inf, xend=Inf, y=-Inf, yend=-Inf, colour = "grey50")+
  annotate("segment", x=-Inf, xend=-Inf, y=-Inf, yend=Inf, colour = "grey50")+
  ggtitle("Joviality by Education Level with and without kids")

Hourly Rate by Education Requirement

This graph is a good addition to the demographic study of the city. Though it is not specific about the people, it does say about their living standards. We know from the above graphs that the majority of the city are High School or college educated. From the box plot below, we can see that their salary rates is the second lowest. I would only make aesthetic changes to this boxplot for better clarity and readability. I will also remove the Legend since it is the same as the axis labels,

jobs$educationRequirement <- factor(jobs$educationRequirement,
                                      levels = c("Low", "HighSchoolOrCollege",
                                                 "Bachelors", "Graduate"))
cbp1 <- c("#E69F00", "#56B4E9","#F0E442", "#CC79A7")

P4 <- ggplot(data=jobs, aes(y = hourlyRate,
                      x = educationRequirement,
                      fill = educationRequirement)) + 
  labs(fill = "Education Requirement") +
  geom_boxplot() +
  scale_fill_manual(values = cbp1) +
  ylim(0, 50) +
  stat_summary(geom = "point",fun="mean") +
  ylab("Hourly Rate ($)")+
  xlab("Education\nRequirement")+
  coord_flip()+
  theme(axis.title.y=element_text(angle =0, margin = margin(t=-50,r=-50)),
        axis.title.x=element_text(margin = margin(t=10)),
        panel.grid.minor.y = element_blank(),
        panel.grid.minor.x = element_blank(),
        panel.grid.major.x = element_line(colour = "grey90"),
        panel.grid.major.y = element_line(colour = "grey90"),
        panel.background = element_rect(fill = "white"),
        axis.text = element_text(size =16),
        axis.line = element_line(color="grey25", size = 0.02),
        axis.title = element_text(size=16),
        legend.position = "none",
        plot.title = element_text(size =20,hjust = 0.5))+
  ggtitle("Salary rate by Education Requirement")

P4

Conclusion

Finally, a conclusion will be nice, pointing out the highlights which help us to understand the demographic of the city and also combine the corresponding visualizations using patchwork.

patchwork <- (P1 + P4)/ P2 /P3

patchwork + plot_annotation(
  title = "Demographics of Engagement, Ohio, USA",
  subtitle = "The below plots reveal the Demographics of the city",
  theme = theme(plot.title = element_text(size = 22, face="bold"),
                plot.background = element_rect(fill = NA,
                                               colour = 'black',
                                               size = 1),
                plot.margin = margin(1,1,0.5,0.5,"cm"))
)

From the analysis and visualization of the Participants data, we can draw the following inferences about the demographic of Engagement, Ohio, USA.

—-***—-