In [4]:
import pandas as pd 
import numpy as np

Moriah David, Abhinay Reddy, Krishna Maganti, Theo Evans¶

Description of the dataset:¶

The World Educational Dataset provides a global insight on education and employment factors, and gives knowledge on educational characteristics in various locations. The dataset was referenced from Kaggle and was contributed by Nidula Elgiriyewithana. Moreover, the information and statistics from the dataset were contributed from UNESCO Institute for Statistics, UNICEF Global Database, and UN Statistics Division Database.

Descriptions of the question(s):¶

What is the relationship between birth rate and the difference in secondary school completion between men and women? Infer a country’s birth rate based on latitude, longitude, female literacy, female secondary school completion, unemployment, and gross university enrollment.

Descriptions of the variable(s):¶

Latitude: Latitude coordinates of the geographical location. Longitude: Longitude coordinates of the geographical location. Completion_Rate_Upper_Secondary_Male: Completion rate for upper secondary education among males. Completion_Rate_Upper_Secondary_Female: Completion rate for upper secondary education among females. Youth_15_24_Literacy_Rate_Male: Literacy rate among male youths aged 15-24. Youth_15_24_Literacy_Rate_Female: Literacy rate among female youths aged 15-24. Birth_Rate: Birth rate in the respective countries/areas. Gross_Tertiary_Education_Enrollment: Gross enrollment in tertiary education. Unemployment_Rate: Unemployment rate in the respective countries/areas.

Methods we will use for each question:¶

Linear regression to visualize the relationship between male and female secondary school completion and birth rate y=Birth_Rate, x1=Completion_Rate_Upper_Secondary_Male, x2=Completion_Rate_Upper_Secondary_Female Decision tree to predict birth rate from the 6 variables listed above.

In [23]:
df = pd.read_csv("Global_Education.csv", encoding='latin-1')
data = df[['Latitude ', 'Longitude', 'Completion_Rate_Upper_Secondary_Male', 'Completion_Rate_Upper_Secondary_Female', 'Youth_15_24_Literacy_Rate_Male', 'Youth_15_24_Literacy_Rate_Female', 'Birth_Rate', 'Gross_Tertiary_Education_Enrollment', 'Unemployment_Rate']]
data.head()
Out[23]:
Latitude Longitude Completion_Rate_Upper_Secondary_Male Completion_Rate_Upper_Secondary_Female Youth_15_24_Literacy_Rate_Male Youth_15_24_Literacy_Rate_Female Birth_Rate Gross_Tertiary_Education_Enrollment Unemployment_Rate
0 33.939110 67.709953 32 14 74 56 32.49 9.7 11.12
1 41.153332 20.168331 76 80 99 100 11.78 55.0 12.33
2 28.033886 1.659626 22 37 98 97 24.28 51.4 11.70
3 42.506285 1.521801 0 0 0 0 7.20 0.0 0.00
4 11.202692 17.873887 24 15 0 0 40.73 9.3 6.89