With persistent media coverage of police violence, white supremacist radicalism, and continued global conflict, the following project attempts to analyze global terrorism data this century. Specficially, this endeavor attempts to visualize findings, uncover trends, and employ inferential statistics to quite possibly predict future terroristic behavior that may be of value to law enforcement and intelligence communities.
Two of the authors of this project, Glen Joy and Arushi Tayal, are Criminology & Criminal Justice majors at the University of Maryland who have studied criminal activity and terrorism in their coursework. We hope that combining data science techniques and criminological research can help provide valuable insight into the data and findings explored. This notebook file is intended to walk readers through the data science pipeline and the steps we followed in exploring our data. The notebook is divided as follows:
The dataset we will use for this project is the Global Terrorism Database from the National Consortium for the Study of Terrorism and Responses to Terrorism (START) at the University of Maryland. It is worth noting one of the authors of this project, Arushi, was an intern for this organization. Each observation of the dataset represents a separate terrorism incident that occurred. Columns in the dataset are descriptors of that incident which include location, attack type, fatalities, etc.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import statsmodels.api as sm
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from yellowbrick.regressor import ResidualsPlot
import folium
import json
import requests
import math
import random
import warnings
from matplotlib import cycler
import seaborn as sns
plt.style.use('ggplot')
warnings.filterwarnings("ignore")
The dataset is in the form of a CSV and will be read in as customrily through Pandas.
df = pd.read_csv('globalterrorismdb_0221dist.csv')
df.head()
| eventid | iyear | imonth | iday | approxdate | extended | resolution | country | country_txt | region | ... | addnotes | scite1 | scite2 | scite3 | dbsource | INT_LOG | INT_IDEO | INT_MISC | INT_ANY | related | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 197000000001 | 1970 | 7 | 2 | NaN | 0 | NaN | 58 | Dominican Republic | 2 | ... | NaN | NaN | NaN | NaN | PGIS | 0 | 0 | 0 | 0 | NaN |
| 1 | 197000000002 | 1970 | 0 | 0 | NaN | 0 | NaN | 130 | Mexico | 1 | ... | NaN | NaN | NaN | NaN | PGIS | 0 | 1 | 1 | 1 | NaN |
| 2 | 197001000001 | 1970 | 1 | 0 | NaN | 0 | NaN | 160 | Philippines | 5 | ... | NaN | NaN | NaN | NaN | PGIS | -9 | -9 | 1 | 1 | NaN |
| 3 | 197001000002 | 1970 | 1 | 0 | NaN | 0 | NaN | 78 | Greece | 8 | ... | NaN | NaN | NaN | NaN | PGIS | -9 | -9 | 1 | 1 | NaN |
| 4 | 197001000003 | 1970 | 1 | 0 | NaN | 0 | NaN | 101 | Japan | 4 | ... | NaN | NaN | NaN | NaN | PGIS | -9 | -9 | 1 | 1 | NaN |
5 rows × 135 columns
Since this project is only interested in terrorism data since 2000, we will only take a look at observations from the year 2000 and onwards, thus dropping any prior observations.
# Only interested in terrorist attacks this century so since 2000
df = df[~(df['iyear'] < 2000)]
df.head()
| eventid | iyear | imonth | iday | approxdate | extended | resolution | country | country_txt | region | ... | addnotes | scite1 | scite2 | scite3 | dbsource | INT_LOG | INT_IDEO | INT_MISC | INT_ANY | related | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 69832 | 200001010001 | 2000 | 1 | 1 | NaN | 0 | NaN | 139 | Namibia | 11 | ... | NaN | “Namibia: UNITA Rebels Reportedly Abduct 20 Vi... | “Namibia: 4 Injured in Shootout; UNITA 'Bandit... | “Abducted Namibians Reportedly Still Held by U... | CETIS | 1 | 1 | 0 | 1 | 200001010001, 200001010002 |
| 69833 | 200001010002 | 2000 | 1 | 1 | NaN | 1 | NaN | 139 | Namibia | 11 | ... | NaN | “Namibia: UNITA Rebels Reportedly Abduct 20 Vi... | “Namibia: 4 Injured in Shootout; UNITA 'Bandit... | “Abducted Namibians Reportedly Still Held by U... | CETIS | 1 | 1 | 0 | 1 | 200001010001, 200001010002 |
| 69834 | 200001010003 | 2000 | 1 | 1 | NaN | 0 | NaN | 92 | India | 6 | ... | NaN | “Lashkar 'Suicide' Squad Attacks Army Camp in ... | NaN | NaN | CETIS | 1 | 1 | 0 | 1 | NaN |
| 69835 | 200001010004 | 2000 | 1 | 1 | NaN | 0 | NaN | 1003 | Kosovo | 9 | ... | NaN | “Kosovo: Romany Home Attacked, 1 Person Injure... | NaN | NaN | CETIS | -9 | -9 | 1 | 1 | NaN |
| 69836 | 200001010005 | 2000 | 1 | 1 | NaN | 0 | NaN | 182 | Somalia | 11 | ... | NaN | “Somalia: 'Over 6' Killed in Mogadishu Attack,... | NaN | NaN | CETIS | -9 | -9 | 0 | -9 | NaN |
5 rows × 135 columns
Now that we have loaded our dataset and removed observations that we are not interested in, we will begin some preliminary exploratory analysis. Let us first look at what types of terrorist attacks we are dealing with. The dataset describes the following types: 'Facility/Infrastructure Attack', 'Hijacking', 'Hostage Taking (Barricade Incident)', 'Armed Assault', 'Hostage Taking (Kidnapping)', 'Assassination', 'Unarmed Assault', 'Bombing/Explosion', 'Unknown'. Each incident in the dataset may have secondary and tertiary attack type descriptions associated with it. For example, an incident may primarily be a hijacking but also involve an armed assault. Let us first begin by looking at the primary descriptions of each incident.
attack_types = ['Facility/Infrastructure Attack', 'Hijacking', 'Hostage Taking (Barricade Incident)', 'Armed Assault', 'Hostage Taking (Kidnapping)', 'Assassination', 'Unarmed Assault', 'Bombing/Explosion', 'Unknown']
We will first retrieve the number of incidents of each type of terrorism and examine its distribution.
counts = []
for i in range(0, len(attack_types)):
count = len(df[(df['attacktype1_txt'] == attack_types[i])])
counts.append(count)
We will use these counts of each terrorism type to plot a pie chart which shows the share of each type for all incidents since 2000.
explode=(0,0,0,0,0,0,0,0.1,0)
plt.figure(figsize=(20,10))
plt.title("Share of Terrorism Attacks since 2000")
plt.pie(counts, labels=attack_types, autopct='%1.1f%%', pctdistance=1.1, labeldistance=1.3)
plt.show()
From the pie chart above of different attack types, we see that Bombings/Explosions appear to be the most frequently used attack type, making up 50.3% of incidents since 2000. Armed assault follows second making up 23.7% of all incidents since 2000. The least frequently used attack type was hijacking at 0.3%.
The dataset also provides information on whether each incident was "successful" or not. Success is obviously a gross description of a terrorism incident. START defines 'success' not by whether or not the perpetrator's motives were fulfilled, but whether an incident carried through or not - essentially the difference between an attempt (failure) or whether the incident occurred (success). For example, if a bomb was intercepted before detonation or a kidnapper is unable to kidnap a desired individual, it is considered a failure. Lets see the distribution of succeeded vs. failed attacks since 2000.
success = len(df[(df['success'] == 1)])
failed = len(df[(df['success'] == 0)])
both = []
both.append(success)
both.append(failed)
labels = ["Success", "Failed"]
plt.figure(figsize=(20,10))
plt.pie(both, labels=labels, autopct='%1.1f%%')
plt.show()
Very disheartening information. Lets combine this data with our previous figure of the distribution of attack types and see the share of successes versus failures across all types. We can do this by creating a stacked bar chart for each type of attack.
successes = []
failures = []
for i in range(0, len(attack_types)):
succ = len(df[(df['attacktype1_txt'] == attack_types[i]) & (df['success'] == 1) ])
successes.append(succ)
fail = len(df[(df['attacktype1_txt'] == attack_types[i]) & (df['success'] == 0) ])
failures.append(fail)
plt.figure(figsize=(20,10))
plt.title("Distribution of Successful Attacks across Attack Types since 2000")
plt.ylabel("Number of Attacks")
plt.xlabel("Attack Type")
plt.bar(attack_types, successes, label="Success")
plt.bar(attack_types, failures, label="Failed")
plt.legend()
plt.show()
From the stacked bar chart we see that for each attack type there is overwhelmingly more successes than failures across all types. This is consistent with what we expected from the pie chart above. The only exception to this is assasinations where there is actually more failures than successes - very interesting.
Lets examine what groups of terrorists are killing the most people.
fatality_by_group = df[df.gname != "Unknown"]
fatality_by_group = fatality_by_group.pivot(index='eventid', columns='gname', values='nkill')
group_fatality_counts = fatality_by_group.sum(axis = 0, skipna = True)
group_fatality_counts = group_fatality_counts.sort_values(ascending=False)
group_fatality_counts = group_fatality_counts[0:15]
plt.figure(figsize=(20,10))
plt.title("Fatalities Across Terrorist Groups since 2000")
plt.xlabel("Terrorist Group")
plt.ylabel("Fatalities")
ax = group_fatality_counts.plot.bar()
plt.xticks(rotation=90)
plt.show()
From the bar graph, we see that the Taliban leads the world in total fatalities since 2000 but is followed closely behind by ISIS.
Now that we have a general idea of the types of attacks and their successes, let's get an idea of exactly where such attacks are occurring. Lets begin by simply plotting a sample of 500 incidents on a world map to see their spread. We can color each marker based on the type that it is.
terrorism_map = folium.Map(location = [50.193980, -6.905573], zoom_start = 2)
markers = random.sample(range(0, 131349), 500)
for idx, row in df.iterrows():
if idx in markers and row["latitude"] == row["latitude"] and row["longitude"] == row["longitude"]:
if row["attacktype1"] == 1:
color = "red"
elif row["attacktype1"] == 4:
color = "lightblue"
elif row["attacktype1"] == 6:
color = "white"
elif row["attacktype1"] == 5:
color = "orange"
elif row["attacktype1"] == 3:
color = "black"
elif row["attacktype1"] == 2 or row["attacktype1"] == 8:
color = "pink"
elif row["attacktype1"] == 7:
color = "purple"
else:
color = "gray"
folium.Marker([row["latitude"], row["longitude"]], popup = "<i>" + str(row["iyear"]) + "\n" + str(row["attacktype1_txt"]) + "\nin " + str(row["country_txt"]) + "</i>", icon = folium.Icon(color = color, icon = "info-sign"), tooltip = "View Details").add_to(terrorism_map)
terrorism_map
Simply from the plotted sample, we can hypothesize that most terrorism incidents occur in North Africa and South Asia/the Middle East. We can dig deeper and see the amount of terrorism incidents across each region. The dataset defines the following regions:
regions = ['Australasia & Oceania',
'Central America & Caribbean',
'Central Asia',
'East Asia',
'Eastern Europe',
'Middle East & North Africa',
'North America',
'South America',
'South Asia',
'Southeast Asia',
'Sub-Saharan Africa',
'Western Europe']
We can create a bar graph to visualize the extent of terrorism across regions.
fatality_by_region = df.pivot(index='eventid', columns='region_txt', values='nkill')
region_fatality_counts = fatality_by_region.sum(axis = 0, skipna = True)
region_fatality_counts = region_fatality_counts.sort_values(ascending=False)
region_fatality_counts = region_fatality_counts[0:15]
plt.figure(figsize=(20,10))
plt.title("Number of People Killed in Terrorist Attacks by Region")
plt.xlabel("Regions")
plt.ylabel("Fatalities")
region_fatality_counts.plot.bar()
plt.xticks(rotation=90)
plt.show()
As expected, the Middle East and North Africa have the highest share of terrorism incidents with South Asia second. From this bar graph, we see that there are four core regions which compose the greatest share of terrorism incidents: Middle East & North Africa, South Asia, Sub-Saharan Africa, and Southeast Asia. We will concentrate on these regions specifically in further analysis.
From the four core regions that we identitied in our exploration above, lets see how terrorism has changed since 2000 in those regions. We will graph the distriubtion of terrorism incidents in those regions over time through an overlayed histogram.
middle_east = df[df['region'] == 10]
middle_east_freq = middle_east.iyear
south_asia = df[df['region'] == 6]
south_asia_freq = south_asia.iyear
africa = df[df['region'] == 11]
africa_freq = africa.iyear
southeast_asia = df[df['region'] == 5]
southeast_asia_freq = southeast_asia.iyear
middle_east = middle_east[middle_east.nkill >= 0]
south_asia = south_asia[south_asia.nkill >= 0]
africa = africa[africa.nkill >= 0]
southeast_asia = southeast_asia[southeast_asia.nkill >= 0]
middle_east_kills = [0.0] * 20
south_asia_kills = [0.0] * 20
africa_kills = [0.0] * 20
southeast_asia_kills = [0.0] * 20
years = [0] * 20
for index in range (0, 20):
years[index] = index + 2000
for index, row in middle_east.iterrows():
middle_east_kills[row['iyear'] - 2000] = middle_east_kills[row['iyear'] - 2000] + row['nkill']
for index, row in south_asia.iterrows():
south_asia_kills[row['iyear'] - 2000] = south_asia_kills[row['iyear'] - 2000] + row['nkill']
for index, row in africa.iterrows():
africa_kills[row['iyear'] - 2000] = africa_kills[row['iyear'] - 2000] + row['nkill']
for index, row in southeast_asia.iterrows():
southeast_asia_kills[row['iyear'] - 2000] = southeast_asia_kills[row['iyear'] - 2000] + row['nkill']
plt.figure(figsize=(20,10))
plt.plot(years, middle_east_kills, color='gray')
plt.plot(years, south_asia_kills, color='orange')
plt.plot(years, africa_kills, color='green')
plt.plot(years, southeast_asia_kills, color='pink')
plt.legend(handles=[mpatches.Patch(color='gray', label='Middle East & North Africa'),
mpatches.Patch(color='orange', label='South Asia'),
mpatches.Patch(color='green', label='Sub-Saharan Africa'),
mpatches.Patch(color='pink', label='Southeast Asia')])
plt.title('Trends of Fatalities from Terrorist Attacks by Region Over Time')
plt.xlabel('Year')
plt.ylabel('Number of Terrorist Attacks')
plt.xticks([2000, 2005, 2010, 2015, 2020])
plt.show()
It appears that in South Asia specifically, in contrast to other regions, terrorist attacks seem to be increasing. In other regions, however, terrorist attacks rose and peaked right before 2015 then have sharply declined.
From the graph, we see that terrorism incidents peaked in all 4 core regions right before 2015 and have steadily declined since then. Since this decline, terrorism incidents in the Middle East and North Africa dipped below incidents in South Asia. How does this relate to global data?
global_data = df[df.nkill >= 0]
global_kills = [0.0] * 20
for index, row in global_data.iterrows():
global_kills[row['iyear'] - 2000] = global_kills[row['iyear'] - 2000] + row['nkill']
plt.figure(figsize=(20,10))
plt.plot(years, global_kills, color='black')
plt.legend(handles=[mpatches.Patch(color='black', label='Global Terrorist Attacks')])
plt.title('Trends of Fatalities from Terrorist Attacks Globally from 2000-2019')
plt.xlabel('Year')
plt.ylabel('Number of Fatalities')
plt.xticks([2000, 2005, 2010, 2015, 2020])
plt.show()
Global terrorism data appears to follow the same pattern we saw in the Middle East and Sub-Saharan Africa. Let's see how the different attack types have changed over time.
plt.figure(figsize=(20,10))
plt.title('Trends of Terrorist Attacks by Type Over Time')
plt.xlabel('Year')
plt.ylabel('Trends in Attack Types')
plt.xticks([2000, 2005, 2010, 2015, 2020])
colors = ["blue", "orange", "green", "red", "purple", "brown", "pink", "gray", "olive"]
for i in range(0, len(attack_types)):
type_data = df[df['attacktype1_txt'] == attack_types[i]]
freq = type_data.iyear
plt.hist(freq, bins=20, alpha=0.5, label=attack_types[i], color=colors[i])
plt.legend()
plt.show()
As expected, we see that the various types of attacks mirror the trends in overall attacks by region - there is a rise in all attack types to about just before 2015 upon which there is a sharper decline. It appears that, however, there is a small increase in attacks classified as "Unknown". We will drop all other attack types to confirm.
plt.figure(figsize=(20,10))
plt.title('Trends of Terrorist Attacks by Type Over Time')
plt.xlabel('Year')
plt.ylabel('Trends in Attack Types')
plt.xticks([2000, 2005, 2010, 2015, 2020])
type_data = df[df['attacktype1_txt'] == "Unknown"]
freq = type_data.iyear
plt.hist(freq, bins=20, alpha=0.5, label="Unknown", color="olive")
plt.legend()
plt.show()
Our observation appears to be true - attacks classified as Unknown have been rising. START defines attacks classified as "Unknown" to be attacks where the type cannot be determined from the available information.
We can construct a correlation matrix between certain predicted columns of interest to see if any of the columns may have a significant correlation with another. We can then use columns with possible correlations to conduct hypothesis testing and train a machine learning model.
corr_df = df[['country', 'region', 'attacktype1', 'weaptype1', 'targsubtype1', 'nkill', 'nkillter', 'propextent', 'nhostkid', 'nhostkidus', 'nhours', 'ransomamt', 'ransompaid', 'nreleased', 'nwound', 'propvalue', 'nperps']]
plt.figure(figsize=(15,15))
sns.heatmap(corr_df.corr(), annot=True)
plt.show()
From the correlation matrix, there are a few observations to note. The columns with the highest correlation are the weapon type of the incident and the attack type (0.74). This appears to make sense since many of the weapon types are things along the lines of "Grenade", "Sticky Bomb", "Pipe Bomb", etc. Thus, it makes sense that those weapon types would be highly correlated to an attack type such as "Bombing/Explosion". The next columns with the highest correlation are number of victims killed and number wounded (0.69). Like with the previous two columns, this correlation makes sense as if there are a high number of victims killed there will likely be a high number wounded. The third highest correlation is between number of hostages/kidnapped and number released (0.49) - this is yet again another correlation that makes logical sense (the more hostages you have the more you have to possibly release).
While there may have not been any interesting columns which are highly correlated from the matrix, we can still observe some interesting relationships between columns which we thought would be correlated but are not. For example, ransom paid and number of hostages released has a correlation value of only 0.047 - somewhat frightening.
We observed in our exploration that while terrorist attacks in most regions are decreasing since peaking around 2013, it appeared that terrorist attacks in South Asia were actually steadily increasing. From our line chart of regions, we hypothesized that this steady increase must be linear. To determine whether it or not, we will attempt to fit a linear regression model on the points for South Asia and then analyze the regression statistics to accept or refute our claim. We will use a significance level of 0.05 as our threshold.
x = np.reshape(years,(-1,1))
y = south_asia_kills
regr = LinearRegression()
regr.fit(x, y)
plt.figure(figsize=(15,10))
plt.title("South Asian Terrorism Fatalities since 2000")
plt.ylabel("Fatalities")
plt.xlabel("Year")
plt.scatter(x, y,color='g')
plt.plot(x, regr.predict(x),color='k')
plt.xticks([2000, 2005, 2010, 2015, 2020])
plt.show()
At first glance, our regression model seems to be very good! Lets observe some of its statistics and see if this linear regression we assumed is representative of the trend.
X2 = sm.add_constant(x)
est = sm.OLS(y, X2)
est2 = est.fit()
print(est2.summary())
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.905
Model: OLS Adj. R-squared: 0.900
Method: Least Squares F-statistic: 172.1
Date: Mon, 17 May 2021 Prob (F-statistic): 1.19e-10
Time: 14:55:05 Log-Likelihood: -165.60
No. Observations: 20 AIC: 335.2
Df Residuals: 18 BIC: 337.2
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const -1.024e+06 7.84e+04 -13.059 0.000 -1.19e+06 -8.59e+05
x1 511.7195 39.004 13.120 0.000 429.774 593.665
==============================================================================
Omnibus: 0.488 Durbin-Watson: 1.355
Prob(Omnibus): 0.784 Jarque-Bera (JB): 0.577
Skew: 0.139 Prob(JB): 0.749
Kurtosis: 2.215 Cond. No. 7.00e+05
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 7e+05. This might indicate that there are
strong multicollinearity or other numerical problems.
From the regression analysis above, it apppears that our regression model fits the data very well. Our coefficient of determination (R^2) is a healthy 0.905. Furthermore, the p-value outputted by our regression is 0.000 which is less than our selected alpha-level. Therefore, we have sufficient evidence to reject the null hypothesis that terrorist attacks in South Asia are not growing linearly.
From the look of it, it appears that our model fits the data well. This, however, is on all of the available data. Lets see how well our model performs when we train on 80% of the data and test on 20%. We will plot the residuals to evaluate its performance.
visualizer = ResidualsPlot(regr, size=(1000, 600))
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)
visualizer.fit(X_train, y_train)
visualizer.score(X_test, y_test)
visualizer.show()
<matplotlib.axes._subplots.AxesSubplot at 0x7f80932dd160>
The residuals appear scattered on the residual plot which indicates a good fit for a linear regression and our model once again has healthy R^2 values for both training and testing. We have concluded our hypothesis testing and machine learning through our construction of an accurate model which can predict South Asian terrorist activity.
We chose each graph based on the data's relevance in understanding global trends of terrorism and factors we predicted would be important with our group's background in Criminology & Criminal Justice. We chose mostly bar graphs and pie charts to represent our data to easily compare factors such as attack types and perpetrator characteristics.
Our methodology in this project was mainly playing with the data, analyzing trends, and publishing the graphs we thought conveyed the most pertinent information. This approach has the benefit of potentially uncovering relationships in the data that previous researchers had not considered. However, this approach is much more time consuming because it relies on creating more graphs than you will end up using and spending time thinking of original relationships to capture. For example, we have never seen a correlation matrix between variables in our previous research and were interested to see if any variables in a terrorist attack were correlated. We ended up being surprised by relationships that didn't emerge, such as a lack of correlation between ransom amount and ransom paid.
In addition to ours, another approach we would encourage future researchers to pursue is by focusing primarily on a body of research by renowned criminologists such as Gary LaFree, Founding Director of the National Consortium for the Study of Terrorism and Responses to Terrorism (START) and Martha Crenshaw, professor of Political Science at Stanford University. Once you have identified an article or study that identifies a possible cause or relationship seen in terrorist attacks, try to recreate that finding in the data and attempt to tweak the hypothesis by going one step further in the analysis or introducing a new variable into the mix. This saves time in formulating a new relationship hypothesis, but may not reveal any new information and may be restricting as a researcher.
Our first graph confirms the fact that the vast majority of terrorist attacks are either bombings are armed assaults and often have high success rates. Since a GTD analysis was done in 2014 by LaFree, Dugan, and Miller in "Putting Terrorism in Context", bombings and armed assaults have both increased their percentage of terrorist attacks by approximately 1-3%.
These attacks are also highly successful because they are generally carried out by "lone wolf" actors, which have become increasingly more common in the United States. As we can see more clearly from the bar graph breaking down success rate by attack type, not only are bombings frequent, but they are also largely successful. Individuals are increasingly able to create homemade explosives as well as acquire arms with little to no push back. One large contributor to failed terrorist attacks falls under assassinations, which is not surprising considering the amount of planning, skill, and luck that is required to break through the layers of security.
Our next set of graphs explores terrorist groups and regions, with radical Islamist terrorist groups taking the top spots for inflicting the most fatalities. Consequently, the most fatalities due to terrorism are seen primarily in the Middle East and Sub-Saharan Africa. However, our analysis revealed that South Asia also contributes to a large share of fatalities caused by acts of terrorism even though it not nearly as discussed as Middle Eastern terrorism. As LaFree et al. discuss in their analysis of the GTD, the peak in terrorism we see in the mid-2000s is largely due to the rise of al Qa'ida and their affiliates. While global terrorism rates rise overall and also follow the spikes as a result of terrorism in the Middle East and Sub-Saharan Africa, South Asian and Southeast Asian terrorism actually appear to have progressed quite linearly upwards. We further analyze this linear relationship seen in South Asia later.
The next two graphs did not reveal as much information backed up by current terrorism research other than the fact that they follow the general pattern of peaks and troughs we see from the other graphs. However, one type of terrorism did not decline with the others, and this was the group coded as Unknown. We encourage those interested in working with the GTD to focus on what makes an Unknown attack and possible propose a new encoding to be added to the GTD to house part of that data.
Our correlation matrix revealed some new information that we have not previous encountered in our study of Criminology. At first, we were disappointed that so few variables revealed a significant correlation. However, we soon recognized the value in variables that we expected to be correlated that actually weren't. We saw the effect of the catchphrase "We don't negotiate with terrorists" in this data by the fact that ransom amount was not correlated with ransom paid, and further that ransom paid was not correlated with the number of hostages released. We would propose future research on this topic to highlight the intricacies of hostage negotiations but especially how it functions in a hostile terrorist environment. While we would expect that as ransom paid increases, number of hostages would also increase, that is sadly not the reality, with a correlation of only 0.047.
Going back to our discussion of a linear trend in South Asian terrorism related fatalities, we noticed that the graph did not follow the peaks seen in the Middle East; instead, this data seemed to be fairly linear. We fit a linear regression line to this data and saw that it was in fact linear. We saw a high R^2 value of 0.905 and an extremely low p-value (less than 0.000). This suggests that global terrorism is steadily increasing, and is only propelled higher by conflict and tensions in areas like the Middle East.
With this data analysis, we hope to have highlighted and updated known terrorism trends as well as provided food for thought of previously overlooked trends.
The following resources were used or assisted in the creation of this project and may be of interest to the reader.
[1] START AT UMD - https://www.start.umd.edu/terrorism360 [2] GTD DATASET - https://www.start.umd.edu/gtd/ [3] Putting Terrorism in Context: Lessons from the global terrorism database by LaFree et al.