Data Set - Loan Default Dataset - https://www.kaggle.com/datasets/yasserh/loan-default-dataset¶
Questions???¶
What best combination of categories can be used to predict defaulting successfully?¶
- Testing SVM and using Permutation feature importance to understand what features are most important to predicting defaulting
- Testing decision trees with different k parameters
Can you predict interest rates based on social factors (gender, religion, age)? Do these higher interest rates lead to higher default rates? Does this suggest a level of discrimination that occurs when banks give out loans?¶
- kNN decision tree of gender versus interest rate
- SVM of social factors
What variables are most important in having a high loan payout?¶
- Variables plan to use: Gender, lona-type, loan_purpose, credit worthiness, credit open, property value, construction type, term, occupancy_type, secured_by, credit type, credit score, dtir1, income, security type, region, age, co-applicant credit type to predict to predict loan_amount
Methods¶
- Using linear model, lasso, and ridge
Code To Read Data¶
!pip install kagglehub
import kagglehub
import pandas as pd
path = kagglehub.dataset_download("yasserh/loan-default-dataset")
dataset_path = f"{path}/Loan_Default.csv"
data = pd.read_csv(dataset_path)