Analyzing World Happiness: How happy is your country?
Ingrid Zoll, Christy Lei & Nat Rubin
This year, the International Day of Happiness was celebrated on March 20 with the theme “Happiness For All, Together” while (ironically) the world is coping with a stressful pandemic. But don’t let the bad news get on your nerves too much — we are about to analyze some happiness-related data to see where your country ranks in the world and get some clues as to why some countries are happier than others.
Specifically, we want to analyze and visualize the relationships between happiness score, suicide rates, and mental health access for all the countries in the recent years, namely from 2015 to 2019.
Data Wrangling
library(knitr)
library(tidyverse)
library(ggplot2)
library(viridis)
library(ggthemes)
library(plotly)
library(countrycode)
library(gridExtra)
library(colorspace)
From Kaggle, we obtained our happiness datasets across 5 years (2015-2019) including key variables such as happiness score, GDP, life expectacny, and generosity. Our suicide_death_rates
dataset was obtained from Our World in Data that contains suicide death rates per 100,000 people (sdr) with the corresponding countries and years. Our mental_health_facilities
dataset was obtained from the WHO that contains variables related to mental health access (e.g. number of mental hospitals per 100000 population).
happiness_2015 <- read.csv("~/Desktop/Cumulus/Courses/Spring 2020/math241s20/Projects/MiniProject2/math241S20PostGrp4/2015.csv")
happiness_2016 <- read.csv("~/Desktop/Cumulus/Courses/Spring 2020/math241s20/Projects/MiniProject2/math241S20PostGrp4/2016.csv")
happiness_2017 <- read.csv("~/Desktop/Cumulus/Courses/Spring 2020/math241s20/Projects/MiniProject2/math241S20PostGrp4/2017.csv")
happiness_2018 <- read.csv("~/Desktop/Cumulus/Courses/Spring 2020/math241s20/Projects/MiniProject2/math241S20PostGrp4/2018.csv")
happiness_2019 <- read.csv("~/Desktop/Cumulus/Courses/Spring 2020/math241s20/Projects/MiniProject2/math241S20PostGrp4/2019.csv")
mental_health_facilities <- read.csv("~/Desktop/Cumulus/Courses/Spring 2020/math241s20/Projects/MiniProject2/math241S20PostGrp4/mental_health_facilities.csv")
suicide_death_rates <- read.csv("~/Desktop/Cumulus/Courses/Spring 2020/math241s20/Projects/MiniProject2/math241S20PostGrp4/suicide-death-rates.csv")
First, we needed to standardize the happiness data across each year. We removed some columns such as region
and dystopia
that were unhelpful for our analysis. We inserted NAs in some years for variables that were not included every year. Then we made sure the variables had standarized names before combining each year into one happiness
dataset.
lower_bound <- rep(NA, 158)
upper_bound <- rep(NA, 158)
happiness_2015 <- happiness_2015 %>%
mutate(year = 2015) %>%
select(1, 3, 4, 6:11, 13) %>%
mutate(lower_bound = lower_bound, upper_bound = upper_bound) %>%
rename(country = 1,
happiness_rank = 2,
happiness_score = 3,
gdp = 4,
family = 5,
life_expectancy = 6,
freedom = 7,
trust_corruption = 8,
generosity = 9,
year = 10,
lower_whisker = 11,
upper_whisker = 12)
happiness_2016 <- happiness_2016 %>%
mutate(year = 2016) %>%
select(1, 3, 4:12, 14) %>%
rename(country = 1,
happiness_rank = 2,
happiness_score = 3,
gdp = 6,
family = 7,
life_expectancy = 8,
freedom = 9,
trust_corruption = 10,
generosity = 11,
year = 12,
lower_whisker = 4,
upper_whisker = 5)
happiness_2017 <- happiness_2017 %>%
mutate(year = 2017) %>%
select(-12) %>%
rename(country = 1,
happiness_rank = 2,
happiness_score = 3,
gdp = 6,
family = 7,
life_expectancy = 8,
freedom = 9,
trust_corruption = 10,
generosity = 11,
year = 12,
lower_whisker = 4,
upper_whisker = 5)
lower_bound <- rep(NA, 156)
upper_bound <- rep(NA, 156)
happiness_2018 <- happiness_2018 %>%
mutate(year = 2018) %>%
mutate(lower_bound = lower_bound, upper_bound = upper_bound) %>%
rename(country = 2,
happiness_rank = 1,
happiness_score = 3,
gdp = 4,
family = 5,
life_expectancy = 6,
freedom = 7,
trust_corruption = 9,
generosity = 8,
year = 10,
lower_whisker = 11,
upper_whisker = 12)
happiness_2019 <- happiness_2019 %>%
mutate(year = 2019) %>%
mutate(lower_bound = lower_bound, upper_bound = upper_bound) %>%
rename(country = 2,
happiness_rank = 1,
happiness_score = 3,
gdp = 4,
family = 5,
life_expectancy = 6,
freedom = 7,
trust_corruption = 9,
generosity = 8,
year = 10,
lower_whisker = 11,
upper_whisker = 12)
happiness <- do.call("rbind", list(happiness_2015, happiness_2016, happiness_2017, happiness_2018, happiness_2019))
We then added suicide_death_rates
and mental_health_facilities
to the happiness
dataset to create a new dataset countries
. Unfortunately, our mental_health_facitilies
dataset was only collected in one year, so that value remains constant across the years in each country.
suicide_death_rates <- suicide_death_rates %>%
select(-2) %>%
rename(sdr = 3)
countries <- happiness %>%
left_join(mental_health_facilities, by = c("country" = "Country")) %>%
left_join(suicide_death_rates, by = c(c("country" = "Entity"),
c("year" = "Year"))) %>%
select(-13)
Finally, using the countrycode
library, we created a new variable continent
with the name of the continent to which each country belongs.
countries$continent <- countrycode(sourcevar = countries[, "country"],
origin = "country.name",
destination = "continent")
So, how happy are countries overall?
The World Happiness Report calculates each country’s happiness score by using survey data, in which citizens of each country are asked where their own life ranks on a scale of 1 to 10, with 10 being the best life possible and 0 being the worst life possible. Their responses are weighted to generate a national score (See Table 1). Overall, people view their lives as being of average to slightly-above average quality. The wide range between the maximum and minimum shows that people in the happiest countries are much happier than those in the least happy countries.
happiness_summary <- countries %>%
summarize(mean(happiness_score), median(happiness_score),
sd(happiness_score), min(happiness_score), max(happiness_score))
options(knitr.table.format = "html")
kable(happiness_summary, digits = 2,
col.names = c("Mean", "Median",
"Standard Dev.", "Minimum", "Maximum"),
caption = "Table 1. Summary Statistics of National Happiness Scores")
Mean | Median | Standard Dev. | Minimum | Maximum |
---|---|---|---|---|
5.38 | 5.32 | 1.13 | 2.69 | 7.77 |
A First Look at the World Happiness Ranking
Then, we wanted to directly visualize the world happiness ranking using a diverging bar chart, which is great for comparing variations above and below an average value or illustrating the spread of negative and positive values in a dataset.
First, we created a new data frame avg_happiness
that contains the average happiness score for every country from 2015 to 2019.
avg_happiness <- countries %>%
group_by(country) %>%
summarize(avg_happiness = mean(happiness_score)) %>%
as.data.frame()
Next, we created a new column happiness_z
that contains the normalized happiness score (\(z = (x-\mu)/ \sigma\)) for each country. To clarify, the z-score would allow us to show how far the happiness score for one country varies from the mean in terms of the standard deviation. We also added another column happiness_type
to indicate whether the country has the normalized happiness score above or below 0.
avg_happiness$happiness_z <- round((avg_happiness$avg_happiness-
mean(avg_happiness$avg_happiness))/sd(avg_happiness$avg_happiness), 2)
avg_happiness$happiness_type <- ifelse(avg_happiness$happiness_z < 0, "below", "above")
Finally, we sorted the columns by the happiness score and converted the column country
to factor to keep sorted order in plot.
avg_happiness <- avg_happiness[order(avg_happiness$happiness_z), ]
avg_happiness$country <- factor(avg_happiness$country, levels = unique(avg_happiness$country))
Now, let’s visualize the ranking using the diverging bars!
happy_plot <- ggplot(avg_happiness, aes(x = country, y = happiness_z, label = happiness_z)) +
geom_bar(stat = "identity", aes(fill = happiness_type)) +
scale_fill_manual(name="Happiness Score",
labels = c("Above Average", "Below Average"),
values = c("above"="paleVioletRed", "below"="thistle")) +
coord_flip() +
labs(title = "Average of World Happiness 2015-2019", y = "Normalized happiness score") +
theme_minimal() +
theme(legend.position = "bottom")
happy_plot
Change in happiness and suicide rate from 2015 to 2017
Are countries getting happier over the years? To answer this, we wanted to create a scatterplot to visualize (1) the change in world happiness and suicide death rate from 2015 to 2017, by continent and (2) the relationship between happiness score and suicide death rate. Since we don’t have access to information about suicide rate after 2017, we only included data from 2015 to 2017.
First, we selected data from 2015, 2016, and 2017 to calculate the mean of happiness score and suicide death rates for every continent. Then, we plotted the scores of individual countries (small circle) with continent means (large circle) to visualize the relationship between happiness and suicide. We also made use of the function ggplotly()
from the interative graphing library plotly
to add a slider that illustrates the change over the years.
happy_15_to_17 <- countries %>%
filter(is.na(sdr) == FALSE) %>%
filter(year %in% c(2015, 2016, 2017)) %>%
group_by(continent, year) %>%
summarise(happiness_score = mean(happiness_score),
sdr = mean(sdr))
suicide_happiness_plot <- countries %>%
filter(is.na(sdr) == FALSE) %>%
filter(year %in% c(2015, 2016, 2017)) %>%
ggplot(aes(x = happiness_score, y = sdr,
color = continent, frame = year)) +
geom_point(aes(label = country, alpha = 0.9, size = 2)) +
geom_point(data = happy_15_to_17, aes(size = 2.5, label = continent)) +
geom_smooth(inherit.aes = FALSE, aes(x = happiness_score, y = sdr),
method = "lm", se = FALSE, color = "black", size = 0.5) +
labs(x = "Happiness Score", y = "Suicide death rate per 100,000",
size = NULL, alpha = NULL, color = "Continent") +
theme_minimal() +
scale_color_manual(values = c("maroon", "salmon", "gold", "cornflowerblue", "darkSlateBlue"))
ggplotly(suicide_happiness_plot, tooltip = c("label", "x", "y"))
The plot showed that there isn’t a huge change in mean happiness score and suicide rate for all continents, but within each continent, there seems to be a high variability and change among different countries every year.
Africa consistently has the lowest happiness score with a high mean of suicide death rate throughout the years. However, Europe, with a relatively high average happiness score, has the highest suicide death rate compared to other continents. Similarly, Oceania has the highest happiness score with high suicide rate. Americas have a very high average happiness score and a relatively lower suicide death rate, and Asia has a medium high average happiness score with the lowest suicide death rate.
As for the general relationship between suicide and happiness, the flat black line indicates that there seems to be little to no correlation.
Mental health access, happiness, and suicide
How does mental health access factor into happiness and suicide? We used the indiciator of mental health access with the fewest NAs, which is the number of mental health units in general hospitals per 100,000
. Because most countries have few units, while a few have many, we log it to better visualize the relationship.
The two plots reveal that mental health access appears to have a stronger relationship with happiness than with the suicide death rate. The error bar on the trend line for the suicide rate shows that the two variables could easily be unrelated. The stronger relationship between mental health access and happiness is unsurprising. Greater access to mental healthcare may increase happiness, or another factor, such as a country’s wealth, may connect the two.
happiness_vs_health <- countries %>%
filter(year == 2016) %>%
ggplot(aes(x = Mental.health.units.in.general.hospitals..per.100.000.population.,
y = happiness_score), na.rm = TRUE) +
geom_point(color = "darkorchid2") +
geom_smooth(method=lm, color = "darkorchid4") +
scale_x_log10() +
theme_minimal() +
labs(x = "(Log of) Mental Health Units in General Hospitals per 100,000",
y = "Happiness Score (2016)",
title = "Log of Mental Health access versus Happiness Score")
suicide_vs_health <- countries %>%
filter(year == 2016) %>%
ggplot(aes(x = Mental.health.units.in.general.hospitals..per.100.000.population.,
y = sdr), na.rm = TRUE) +
geom_point(color = "royalblue2") +
geom_smooth(method=lm, color = "royalblue4") +
scale_x_log10() +
theme_minimal() +
labs(x = "(Log of) Mental Health Units in General Hospitals per 100,000",
y = "Suicide Death Rate (2016)",
title = "Log of Mental Health access versus Suicide Death Rate")
happiness_vs_health
suicide_vs_health
What Contributes to Happiness?
What factors contribute to happiness? We wanted to explore the different contributors to the happiness score and how they vary between the five most happy countries and the five least happy countries. For this question, we’re only looking at the data from 2019.
First, we used filter()
to select the five happiest and five least happy countries that we wanted to look at. Then we pivoted the 2019 dataset so the individual measurements (e.g. family, freedom, economy) that contribute to the overall happiness measurement would be observations and not variables. Finally, we made country name a factor so when we graph this data so that the countries are still ordered by happiness rank.
top_bottom_2019_happiness <- happiness_2019 %>%
filter(happiness_rank <= 5 | happiness_rank >= 152) %>%
pivot_longer(cols=c(gdp, family, life_expectancy, freedom, generosity, trust_corruption),
names_to = "category", values_to = "values") %>%
mutate(country = factor(country, levels=c("Finland", "Denmark", "Norway", "Iceland", "Netherlands",
"Rwanda", "Tanzania", "Afghanistan", "Central African Republic", "South Sudan")))
ggplot(top_bottom_2019_happiness, mapping=aes(x=category, y=values, fill=country))+
geom_col(position="dodge", color="black", size=0.25) +
theme_fivethirtyeight()+
scale_x_discrete(labels = c("Social\n support", "Freedom to\n make life\n choices",
"GDP\n per capita" , "Generosity", "Healthy life\n expectancy\n at birth",
"Perception\n of corruption")) +
scale_fill_discrete_sequential(name = "Country\n (with rank)", palette = "ag_Sunset",
labels = c("Finland (1)", "Denmark (2)", "Norway (3)", "Iceland (4)",
"Netherlands (5)", "Rwanda (152)", "Tanzania (153)",
"Afghanistan (154)", "Central African\n Republic (155)",
"South Sudan (156)")) +
theme(legend.text = element_text(size = 8.5),
legend.direction = "vertical",
legend.position = "right") +
labs(title = "Contribution of 6 variables to happiness") +
xlab("Measurement Category") +ylab("Importance score") +
theme_minimal()
This graph, which is faceted by measurement type and color coded by country, examines the extent to which the following factors contribute to the overall happiness score for a country: social support, freedom to make life choices, GDP per capita, generosity, healthy life expectancy at birth, and perception of corruption.
This chart shows that social support, GDP, and healthy life expentancy seem to be the largest contributors to how happy a country is. Those three categories show the greatest difference between the happiest and least happy countries, whereas there is much less of a difference in freedom to make life choices, generosity, and perception of corruption. These data indicates that happy countries are more socially supportive, have a higher GDP, and have healthier populations than less happy countries.
If you’re interested to explore more about happiness-related data, more information on each of the factors that contribute to the happiness measurement can be found on the World Happiness Report (https://worldhappiness.report/ed/2019/changing-world-happiness/#technical-box-detailed-information-about-each-of-the-predictors-in-table).