import pandas as pd
import numpy as np
Moriah David, Abhinay Reddy, Krishna Maganti, Theo Evans¶
Description of the dataset:¶
The World Educational Dataset provides a global insight on education and employment factors, and gives knowledge on educational characteristics in various locations. The dataset was referenced from Kaggle and was contributed by Nidula Elgiriyewithana. Moreover, the information and statistics from the dataset were contributed from UNESCO Institute for Statistics, UNICEF Global Database, and UN Statistics Division Database.
Descriptions of the question(s):¶
What is the relationship between birth rate and the difference in secondary school completion between men and women? Infer a country’s birth rate based on latitude, longitude, female literacy, female secondary school completion, unemployment, and gross university enrollment.
Descriptions of the variable(s):¶
Latitude: Latitude coordinates of the geographical location. Longitude: Longitude coordinates of the geographical location. Completion_Rate_Upper_Secondary_Male: Completion rate for upper secondary education among males. Completion_Rate_Upper_Secondary_Female: Completion rate for upper secondary education among females. Youth_15_24_Literacy_Rate_Male: Literacy rate among male youths aged 15-24. Youth_15_24_Literacy_Rate_Female: Literacy rate among female youths aged 15-24. Birth_Rate: Birth rate in the respective countries/areas. Gross_Tertiary_Education_Enrollment: Gross enrollment in tertiary education. Unemployment_Rate: Unemployment rate in the respective countries/areas.
Methods we will use for each question:¶
Linear regression to visualize the relationship between male and female secondary school completion and birth rate y=Birth_Rate, x1=Completion_Rate_Upper_Secondary_Male, x2=Completion_Rate_Upper_Secondary_Female Decision tree to predict birth rate from the 6 variables listed above.
df = pd.read_csv("Global_Education.csv", encoding='latin-1')
data = df[['Latitude ', 'Longitude', 'Completion_Rate_Upper_Secondary_Male', 'Completion_Rate_Upper_Secondary_Female', 'Youth_15_24_Literacy_Rate_Male', 'Youth_15_24_Literacy_Rate_Female', 'Birth_Rate', 'Gross_Tertiary_Education_Enrollment', 'Unemployment_Rate']]
data.head()
Latitude | Longitude | Completion_Rate_Upper_Secondary_Male | Completion_Rate_Upper_Secondary_Female | Youth_15_24_Literacy_Rate_Male | Youth_15_24_Literacy_Rate_Female | Birth_Rate | Gross_Tertiary_Education_Enrollment | Unemployment_Rate | |
---|---|---|---|---|---|---|---|---|---|
0 | 33.939110 | 67.709953 | 32 | 14 | 74 | 56 | 32.49 | 9.7 | 11.12 |
1 | 41.153332 | 20.168331 | 76 | 80 | 99 | 100 | 11.78 | 55.0 | 12.33 |
2 | 28.033886 | 1.659626 | 22 | 37 | 98 | 97 | 24.28 | 51.4 | 11.70 |
3 | 42.506285 | 1.521801 | 0 | 0 | 0 | 0 | 7.20 | 0.0 | 0.00 |
4 | 11.202692 | 17.873887 | 24 | 15 | 0 | 0 | 40.73 | 9.3 | 6.89 |