Project Proposal¶
Group Members: Ashley Tung, Rhea Nagori, Clara Kim, Tori Bengry, Isabella Demotto¶
Overview¶
The World Bank Group is the largest development bank comprised of five international organizations that aims to help developing countries by providing financial and technical assistance. This organization produces annual datasets for countries' political, macroeconomic, social, and environmental data. The dataset utilized in this project pulls from the World Bank Group’s database and can be accessed using Kaggle. The dataset contains 48 numerical features/columns from 1960 to 2022 for 268 countries and regions. From all available features, our research will focus on how a GDP and development status is influenced by a countries’ economic, environmental, social, and political factors and values. Given that our research focuses on the relationships between a country's economic activity with other factors, we will filter the dataset to only one year to remove any potential time-related biases and effects. Utilizing our filtered data, we can answer many questions related to countries' economies.
Data¶
Topic: World Bank World Development Indicators
https://www.kaggle.com/datasets/nicolasgonzalezmunoz/world-bank-world-development-indicators/data
Source: World Bank database
import pandas as pd
# Read Data
df = pd.read_csv("world_bank_development_indicators.csv")
# Change data type
df['date'] = pd.to_datetime(df['date'])
# Show only 2018 data and chosen columns
df = df[df['date'].dt.year == 2018][['country','GDP_current_US','population','human_capital_index','CO2_emisions','access_to_electricity%', 'agricultural_land%', 'life_expectancy_at_birth', 'government_expenditure_on_education%', 'inflation_annual%']]
df.head(5)
country | GDP_current_US | population | human_capital_index | CO2_emisions | access_to_electricity% | agricultural_land% | life_expectancy_at_birth | government_expenditure_on_education% | inflation_annual% | |
---|---|---|---|---|---|---|---|---|---|---|
58 | Afghanistan | 1.805322e+10 | 36686784.0 | 0.393489 | 10972.3800 | 93.430878 | 58.276988 | 63.081000 | NaN | 0.626149 |
122 | Africa Eastern and Southern | 1.012521e+12 | 649757148.0 | NaN | 598720.9575 | 43.028332 | 46.361118 | 63.365863 | 4.739750 | 4.720811 |
186 | Africa Western and Central | 7.681582e+11 | 442646825.0 | NaN | 210618.8900 | 51.212863 | 40.003345 | 57.189139 | 3.071543 | 1.784050 |
250 | Albania | 1.515642e+10 | 2866376.0 | 0.628666 | 5316.1000 | 100.000000 | 42.849672 | 79.184000 | 3.152945 | 2.028060 |
314 | Algeria | 1.749107e+11 | 41927007.0 | 0.531994 | 164534.1000 | 99.637741 | 17.356568 | 76.066000 | 6.324539 | 4.269990 |
Question(s) of Interest¶
What factors influence a country's GDP, and how do economic, social, environmental, and political factors contribute to economic development?'
Economic growth is a key indicator of a country’s development, and Gross Domestic Product (GDP) is one of the primary measures of a country’s economic performance. However, GDP can be influenced by a variety of factors beyond economic aspects. Through our project, we hope to explore the relationship between GDP and different social, environmental, and political factors. Through analyzing these relationships, we hope to dive deeper and find the answers to these questions:
How does inflation rates and government debt affect GDP?
How does education level (measured by Human Capital Index) impact GDP?
Does life expectancy reflect economic prosperity?
What is the relationship between CO2 emissions and GDP?
What types of energy sources does each country use and how does that influence economic growth?
Variables¶
To properly answer our research questions, we will use the following variables in our analysis:
GDP_current_US: Gross Domestic Product in US dollars. Gross domestic product measures the size of a countries’ economy and the total monetary value of goods and services produced by a country within the year.
population: Total population of a country/region.
human_capital_index: The Human Capital Index (HCI) is a measure of the amount of human capital that a person born today could expect to acquire by the time they turn 18. The HCI considers the countries’ health and education system and is reported on a scale from 0 to 1.
CO2_emisions: CO2 emissions measured using the unit of energy, kiloton (kt). renewvable_energy_consumption%: Renewable energy consumption as a percentage of a country’s total final energy consumption.
access_to_electricity: The percentage of the population in a country that has access to electricity.
Agricultural_land%: Agricultural land as a percentage of land area of the country/region.
life_expectancy_at_birth: Life expectancy at birth in years.
government_expenditure_on_education%: Government expenditure on education as a percentage of GDP.
inflation_annual%: Inflation of consumer prices as an annual percentage.
Methods¶
To answer our research questions, we will perform the following methods and analysis.
Linear regression to model GDP (GDP_current_US) as a function of variables such as human_capital_index, population, access_to_electricity, etc. which will allow us to quantify the impact of each factor on economic performance.
Logistic regression for classifying countries as developing or developed using predictors to understand what combination of features distinguishes economic development.
Decision tree regression will help identify non-linear relationships between GDP and other variables, to see which variables influence GDP.
K-Nearest Neighbors Regression to predict GDP based on the similarity of countries across different development indicators.
Clustering (K-Means) to group countries with similar development profiles based on indicators to uncover patterns or similarities in economic development.
Feature selection to understand the most impactful factors that influence GDP.