Jordyn Geller, David Giardino, Paulina Grekov, Kjorte Harra, and Sonya Melendez
Our group is researching student adaptivity levels in online education. We are interested in this topic because Covid-19 has led to a substantial increase in the use of online education across school levels. We want to uncover what types of students are successful in this schooling environment and why. We are interested in predicting student adaptability from gender, age, and financial status.
First we will produce descriptive graphs that will allow us to visualize and assess the distributional characteristics of the dataset. Then, for our first research question, we will develop decision trees and logistic regression models aiming to predict a student’s adaptability to online education based on relevant predictors such as student gender, age, and financial condition. We will compare the accuracy of predictions of our models to determine the strongest model for the given context.
We retrieved our data source from Kaggle (linked below). The original dataset originates from a 2021 Bangladesh study, that investigates factors associated with student adaptability to online education (Suzan et al., 2021). Student data were collected in online and offline surveys by the original study team. How variables are coded and some descriptive statistics are contained below.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from collections import Counter
from sklearn import tree
import requests
from io import StringIO
dat = pd.read_csv("students_adaptability_level_online_education.csv")
dat = dat[["Gender", "Age", "Financial Condition", "Adaptivity Level"]]
dat.head(5)
Gender | Age | Financial Condition | Adaptivity Level | |
---|---|---|---|---|
0 | Boy | 21-25 | Mid | Moderate |
1 | Girl | 21-25 | Mid | Moderate |
2 | Girl | 16-20 | Mid | Moderate |
3 | Girl | 11-15 | Mid | Moderate |
4 | Girl | 16-20 | Poor | Low |
dat.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1205 entries, 0 to 1204 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Gender 1205 non-null object 1 Age 1205 non-null object 2 Financial Condition 1205 non-null object 3 Adaptivity Level 1205 non-null object dtypes: object(4) memory usage: 37.8+ KB
dat.describe()
Gender | Age | Financial Condition | Adaptivity Level | |
---|---|---|---|---|
count | 1205 | 1205 | 1205 | 1205 |
unique | 2 | 6 | 3 | 3 |
top | Boy | 21-25 | Mid | Moderate |
freq | 663 | 374 | 878 | 625 |
unique_gender = dat.Gender.nunique()
most_freq_gender = dat.Gender.mode().values[0]
count_gender = dat.Gender.value_counts()
count_age = dat.Age.value_counts()
df_gender = pd.DataFrame(count_gender.items(), columns=['Gender', 'Count'])
df_age = pd.DataFrame(count_age.items(), columns=['Age', 'Count'])
df_age
Age | Count | |
---|---|---|
0 | 21-25 | 374 |
1 | 11-15 | 353 |
2 | 16-20 | 278 |
3 | 1-5 | 81 |
4 | 26-30 | 68 |
5 | 6-10 | 51 |
df2 = dat.join(pd.get_dummies(dat.Gender, drop_first=False))
df2['Girl'] = df2['Girl'].map({True: 'Female', False: 'Male'})
grouped = df2.groupby(['Age', 'Girl']).size().unstack()
grouped.plot(kind='bar', stacked=True)
plt.xlabel('Age Group')
plt.ylabel('Count')
plt.title('Distribution of Gender by Age')
plt.legend(title='Gender', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()
freq_table = pd.crosstab(df2['Adaptivity Level'], 'Frequency')
freq_table
col_0 | Frequency |
---|---|
Adaptivity Level | |
High | 100 |
Low | 480 |
Moderate | 625 |
freq_table = pd.crosstab(df2['Financial Condition'], 'Frequency')
freq_table
col_0 | Frequency |
---|---|
Financial Condition | |
Mid | 878 |
Poor | 242 |
Rich | 85 |
Suzan, M. H., Samrin, N. A., Biswas, A. A., & Pramanik, A. (2021, July). Students' Adaptability Level Prediction in Online Education using Machine Learning Approaches. In 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT) (pp. 1-7). IEEE.
Data retrieved from: https://www.kaggle.com/datasets/mdmahmudulhasansuzan/students-adaptability-level-in-online-education