Project Proposal

In [4]:

import pandas as pd 
import numpy as np

Moriah David, Abhinay Reddy, Krishna Maganti, Theo Evans¶

Description of the dataset:¶

The World Educational Dataset provides a global insight on education and employment factors, and gives knowledge on educational characteristics in various locations. The dataset was referenced from Kaggle and was contributed by Nidula Elgiriyewithana. Moreover, the information and statistics from the dataset were contributed from UNESCO Institute for Statistics, UNICEF Global Database, and UN Statistics Division Database.

Descriptions of the question(s):¶

What is the relationship between birth rate and the difference in secondary school completion between men and women? Infer a country’s birth rate based on latitude, longitude, female literacy, female secondary school completion, unemployment, and gross university enrollment.

Descriptions of the variable(s):¶

Latitude: Latitude coordinates of the geographical location. Longitude: Longitude coordinates of the geographical location. Completion_Rate_Upper_Secondary_Male: Completion rate for upper secondary education among males. Completion_Rate_Upper_Secondary_Female: Completion rate for upper secondary education among females. Youth_15_24_Literacy_Rate_Male: Literacy rate among male youths aged 15-24. Youth_15_24_Literacy_Rate_Female: Literacy rate among female youths aged 15-24. Birth_Rate: Birth rate in the respective countries/areas. Gross_Tertiary_Education_Enrollment: Gross enrollment in tertiary education. Unemployment_Rate: Unemployment rate in the respective countries/areas.

Methods we will use for each question:¶

Linear regression to visualize the relationship between male and female secondary school completion and birth rate y=Birth_Rate, x1=Completion_Rate_Upper_Secondary_Male, x2=Completion_Rate_Upper_Secondary_Female Decision tree to predict birth rate from the 6 variables listed above.

In [23]:

df = pd.read_csv("Global_Education.csv", encoding='latin-1')
data = df[['Latitude ', 'Longitude', 'Completion_Rate_Upper_Secondary_Male', 'Completion_Rate_Upper_Secondary_Female', 'Youth_15_24_Literacy_Rate_Male', 'Youth_15_24_Literacy_Rate_Female', 'Birth_Rate', 'Gross_Tertiary_Education_Enrollment', 'Unemployment_Rate']]
data.head()

Out[23]:

	Latitude	Longitude	Completion_Rate_Upper_Secondary_Male	Completion_Rate_Upper_Secondary_Female	Youth_15_24_Literacy_Rate_Male	Youth_15_24_Literacy_Rate_Female	Birth_Rate	Gross_Tertiary_Education_Enrollment	Unemployment_Rate
0	33.939110	67.709953	32	14	74	56	32.49	9.7	11.12
1	41.153332	20.168331	76	80	99	100	11.78	55.0	12.33
2	28.033886	1.659626	22	37	98	97	24.28	51.4	11.70
3	42.506285	1.521801	0	0	0	0	7.20	0.0	0.00
4	11.202692	17.873887	24	15	0	0	40.73	9.3	6.89