With the high rate of suicide and depression all over the world, more individuals and organisations are chanelling more efforts into mental health issues, in a bid to understand the reason for suicide and depression and reduce it.
The aim of this project is to analyse the trend in suicide rates globally and see what countries, age groups and gender have highest suicide rates. This will help WHO and other health agencies in tackling the painful issue of suicide
import math
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
% matplotlib inline
import plotly
import numpy as np
import plotly.graph_objs as go
import chart_studio.plotly as py
py.sign_in('ruthakor', '1HgVOfW0VLt2QnLKh0Al')
print(plotly.__version__)
# import the data
totalsuicide = pd.read_excel("suicide_total_deaths.xlsx")
totalsuicide.columns
totalsuicide.columns = ['country', 'yr1990', 'yr1991', 'yr1992', 'yr1993', 'yr1994', 'yr1995', 'yr1996', 'yr1997', 'yr1998',
'yr1999', 'yr2000','yr2001', 'yr2002', 'yr2003', 'yr2004', 'yr2005', 'yr2006', 'yr2007', 'yr2008',
'yr2009', 'yr2010', 'yr2011','yr2012', 'yr2013', 'yr2014', 'yr2015', 'yr2016']
totalsuicide.head()
# Mapping deaths from suicide globally in 2016
#set scale and color ranges
scale = [[0.0, 'rgb(88, 158, 92)'], [0.2, 'rgb(223,221,228)'],
[0.4, 'rgb(169,170,201)'], [0.6, 'rgb(139,135,181)'],
[0.8, 'rgb(158, 88, 142)'], [1.0, 'rgb(255,0,0)']]
#dataset to be graphed
data = [dict(type='choropleth',
colorscale=scale,
locations=totalsuicide['country'],
z=totalsuicide['yr2016'].astype(float),
locationmode='country names',
text=totalsuicide['country'],
hoverinfo='location+z',
marker=dict(line=dict(color='rgb(255,255,255)', width=2)),
colorbar=dict(title='Global suicide deaths in 2016'))]
#layout
layout = dict(title='Global suicide deaths in 2016 <br />(Hover for each country)',
geo=dict(scope='world',
projection=dict(type='equirectangular'),
showlakes=True,
lakecolor='rgb(95,145,237)'))
fig = dict(data=data, layout=layout)
py.iplot(fig, filename='DATS 6103 - Individual Project 3 - Ruth Akor')
From the map above, india records the highest death from suicide in 2016
In this section, I would love to see if suicide has been increasing or decreasing
totalsuicide.set_index("country",drop=True,inplace=True)
totalsuicide
ax = totalsuicide.sum().plot(legend=True, figsize=(16,10), title='Trend in Global suicide deaths from 1990 to 2016')
ax.set_xlabel("Year")
ax.set_ylabel('suicide deaths')
plt.show()
Deaths from suicide have been declining since 2005 but started increasing again at 2013 but is not as high as it was in the 90's'
#trying to convert from object to float
totalsuicide["yr2016"] = pd.to_numeric(totalsuicide.yr2016, errors='coerce')
def PiePlot(Year):
df = totalsuicide[Year]
result = df.sort_values(ascending=False)
result = result.reset_index()
result.index = result.index + 1
others = result[10:].sum()[1]
top = result[:10]
top.loc[11] = ['All Other Countries', others]
countryPlot = top[Year].plot.pie(subplots=True,
autopct='%0.1f',
fontsize=10,
figsize=(10,10),
legend=False,
labels=top['country'],
shadow=False,
explode=(0.15,0.12,0,0,0,0,0,0,0,0,0),
startangle=90)
countryPlot[0].set_ylabel('')
PiePlot('yr2016')
plt.show()
From the map and pieplot, it appears India has the highest number of suicide deaths, however India and China have very high populations so it may be helpful to consider the suicide rates relative to total population
suiciderate = pd.read_excel("suicidemortalityrate.xlsx")
suiciderate
suiciderate.columns = ['country', 'yr2000','yr2001', 'yr2002', 'yr2003', 'yr2004', 'yr2005', 'yr2006', 'yr2007', 'yr2008',
'yr2009', 'yr2010', 'yr2011','yr2012', 'yr2013', 'yr2014', 'yr2015', 'yr2016']
suiciderate.head()
#trying to convert from object to float
suiciderate["yr2016"] = pd.to_numeric(suiciderate.yr2016, errors='coerce')
# Mapping deaths from suicide rates globally in 2016
#set scale and color ranges
scale = [[0.0, 'rgb(29, 180, 240)'], [0.2, 'rgb(223,221,228)'],
[0.4, 'rgb(169,170,201)'], [0.6, 'rgb(139,135,181)'],
[0.8, 'rgb(236, 240, 29)'], [1.0, 'rgb(255,0,0)']]
#dataset to be graphed
data = [dict(type='choropleth',
colorscale=scale,
locations=suiciderate['country'],
z=suiciderate['yr2016'].astype(float),
locationmode='country names',
text=suiciderate['country'],
hoverinfo='location+z',
marker=dict(line=dict(color='rgb(255,255,255)', width=2)),
colorbar=dict(title='Global suicide rate in 2016'))]
#layout
layout = dict(title='Global suicide rate in 2016 <br />(Hover for each country)',
geo=dict(scope='world',
projection=dict(type='equirectangular'),
showlakes=True,
lakecolor='rgb(95,145,237)'))
fig = dict(data=data, layout=layout)
py.iplot(fig, filename='DATS 6103 - Individual Project 3 - Ruth Akor')
By considering suicide rates relative to each country's population, Lithuania, Russia and Guyana become the countries with the highest suicide rates in 2016
#trying to convert from object to float
suiciderate["yr2016"] = pd.to_numeric(suiciderate.yr2016, errors='coerce')
#Plotting the suicide rate for countries with top 10 highest suicide rates in 2016
top10suicide = suiciderate.nlargest(10, ['yr2016'])
top10suicide.set_index("country",drop=True,inplace=True)
top10suicide
top10suicide = top10suicide.loc[:,['yr2016']]
top10suicide
plot_top10suicide = top10suicide.plot(kind='bar',figsize=(20,10), color='purple', title='Countries with highest suicide rates in 2016')
plt.legend(loc='best', fontsize=10)
plt.show()
Using a CSV file of WHO statistics, I would love to analyse suicide rate by the following age groups:
who_stat = pd.read_csv("who_suicide_statistics.csv")
who_stat.head(10)
#renaming the age groups to remove 'years'
who_stat.loc[:, 'age'] = who_stat['age'].str.replace(' years','')
who_stat.loc[who_stat['age'] == '5-14', 'age'] = '05-14'
who_stat
#drop population column
who_stat = who_stat.drop(['population'], axis=1)
who_stat.head()
#subset all suicide deaths for age group 5 to 14
age5to14 = who_stat.loc[who_stat['age']== '05-14']
newage5to14 = age5to14.groupby(['year']).sum()
newage5to14.columns = ['age05-14']
newage5to14.head()
#susbet all suicide death for age group 15 to 24
age15to24 = who_stat.loc[who_stat['age']== '15-24']
newage15to24 = age15to24.groupby(['year']).sum()
newage15to24.columns = ['age15-24']
newage15to24.head()
#subset all suicide death for age group 25 to 34
age25to34 = who_stat.loc[who_stat['age']== '25-34']
newage25to34 = age25to34.groupby(['year']).sum()
newage25to34.columns = ['age25-34']
newage25to34.head()
#susbet all suicide death for age group 35 to 54
age35to54 = who_stat.loc[who_stat['age']== '35-54']
newage35to54 = age35to54.groupby(['year']).sum()
newage35to54.columns = ['age35-54']
newage35to54.head()
#susbet all suicide death for age group 55 to 74
age55to74 = who_stat.loc[who_stat['age']== '55-74']
newage55to74 = age55to74.groupby(['year']).sum()
newage55to74.columns = ['age55-74']
newage55to74.head()
#susbet all suicide death for age group 75+
age75plus = who_stat.loc[who_stat['age']== '75+']
newage75plus = age75plus.groupby(['year']).sum()
newage75plus.columns = ['age75plus']
newage75plus.head()
#join all age groups into one dataframe
data = newage5to14.join(newage15to24).join(newage25to34).join(newage35to54).join(newage55to74).join(newage75plus)
data.head()
agegroup_plot = data.plot.line(legend=True, figsize=(16,10), title='Suicide deaths by age group')
agegroup_plot.set_ylabel('suicide death')
plt.show()
Age group 35 to 54 has had the highest number of suicide deaths since 1979. This can be explained by the fact that many people in this age bracket are at the peak of their careers with families and so many responsibilities trying to figure out life.
In this section, I consider suicide rate by gender to see whether men or women commit more suicide
#subset all suicide deaths by gender
malesuicide = who_stat.loc[who_stat['sex']== 'male']
newmalesuicide = malesuicide.groupby(['year']).sum()
newmalesuicide.columns = ['male']
newmalesuicide.head()
femalesuicide = who_stat.loc[who_stat['sex']== 'female']
newfemalesuicide = femalesuicide.groupby(['year']).sum()
newfemalesuicide.columns = ['female']
newfemalesuicide.head()
#join all gender into one dataframe
genderdata = newmalesuicide.join(newfemalesuicide)
genderdata.head()
gender_plot = genderdata.plot(kind='bar', legend=True, figsize=(16,10), title='Suicide deaths by Sex')
gender_plot.set_ylabel('suicide death')
plt.show()
We have more males committing suicide all over the world than females
#import the data
suicidewomen = pd.read_excel("suicide_women_per_100000_people.xlsx")
suicidewomen
There is a lot of missing data in this excel file on suicide rate for women per 100,000 people so I will take the average of suicide rates for each country from 1950 to 2016 to see which countries have highest suicide rates on average over those years
#create a column for the mean suicide rate for each country
suicidewomen['mean'] = suicidewomen.mean(axis=1)
suicidewomen.head()
#set index to country and sort by mean value
suicidewomen.set_index("country",drop=True,inplace=True)
sort_suicidewomen = suicidewomen.sort_values('mean', ascending=False)
sort_suicidewomen.head(10)
mean_suicide_female = sort_suicidewomen['mean']
topmeansuiciderate = mean_suicide_female.head(10)
topmeansuiciderate
topmeansuiciderate[0:20].plot( # This is Pandas-style plotting
x='country',
y='mean suicide rate',
kind='bar',
color='red',
legend=False,
width=0.8
)
# Matplotlib styling of the output:
plt.ylabel("mean suicide rate")
plt.xlabel("country")
plt.title("Countries with the highest female suicide rates on average")
plt.gca().yaxis.grid(linestyle=':')
On average, Cuba has the highest female suicide rates since 1950, followed by Hungary, Japan, Denmanrk, Sri Lanka
#import the data
suicidemen = pd.read_excel("suicide_men_per_100000_people.xlsx")
suicidemen.head()
#create a column for the mean suicide rate for males for each country
suicidemen['mean'] = suicidemen.mean(axis=1)
suicidemen.head()
#set index to country and sort by mean value
suicidemen.set_index("country",drop=True,inplace=True)
sort_suicidemen = suicidemen.sort_values('mean', ascending=False)
sort_suicidemen.head(10)
mean_suicide_male = sort_suicidemen['mean']
topmeansuicide_men = mean_suicide_male.head(10)
topmeansuicide_men
plot_topmeansuicide = topmeansuicide_men.plot(kind='bar',figsize=(20,10), color='skyblue', title='Countries with highest male suicide rates on average')
plt.legend(loc='best', fontsize=10)
plt.show()
On average, Lithuania has had the highest male suicide rate from 1950 to 2016 followed by Russia, Belarus, Latvia, Hungary
From the analysis of the data, there are observed patterns in the data. Using the latest available data (2016), India has the highest number of deaths from suicide, followed by China and Russia. However, when we consider suicide rate which is suicide occurence as a percentage of the total population, Lithuania becomes the country with the highest suicide rate followed by Russia and Guyana.
By analysing the trend in suicide deaths globally, I found that suicide rate rose drastically between 1990 and 1995 after which it declined a little but has since been increasing since 2013. A major limitation is that the latest available data is 2016. With the availability of 2017, 2018 and 2019 suicide statistics, I expect the suicide deaths to be higher owing to the recent prevalence of people committing suicide.
Suicide rate is highest among people within the age bracket of 35 and 54 years followed by those within 55 and 74 years. This is probably because people between 35 and 54 years are in the middle of their life with so many responsibilities, trying to figure out life. This can be a tough period for those that feel they cannot make meaning of their life or life cannot get better.
Through the lens of gender, suicide rate has consistently been higher for the male gender throughout all the years compared to the female gender. Countries with highest male suicide rate on average for all the years are Lithuania, Russia and Belarus respectively. Countries with highest female suicide rate on average are Cuba, Hungary, Japan, Denmark respectively.
Another limitation in this study is the presence of so many missing values in the data, this was why average rates were used for the section on gender.
For future research, it will be helpful to know the drivers of suicide such as alcohol consumption, drug abuse, unemployment, high debt, poverty, etc. Also, it will be interesting to see the relationship between mental health issues and suicide.
Suicide in the 21st Century - https://www.kaggle.com/szamil/suicide-in-the-twenty-first-century/
GapMinder - https://www.gapminder.org/data/