Project Proposal¶
Group 28¶
Group Members:¶
- Mai Tah Lee mtlee2@wisc.edu
- Annie Purisch apurisch@wisc.edu
- Seth Mlodzik smlodzik@wisc.edu
- Tianxing Liu tliu398@wisc.edu
Write a one-page proposal (4 points) including a few lines of code to read data, descriptions of the question(s), variable(s), and methods you will use. Turn in a proposal.html (or .pdf), once per group.
Dataset link:¶
https://www.kaggle.com/datasets/uom190346a/mental-health-diagnosis-and-treatment-monitoring
Description of Dataset:¶
Dataset contains world health mental health diagnoses, treatment plans, and outcomes of those treatment plans. It also includes the symptoms, medication, other types of treatments, and patient demographics. Notably, the datset does not contain real data, it is synthetic.
Research Questions¶
- Primary Question: Can mental health patient's treatment outcome be predicted (improved, no change, deteriorated) based on their demographics, symptoms of severity, mood scores, and treatment types?
- Secondary Question: What factors contribute to adherence to treatment, and how does adherence influence treatment outcome?
Methods:¶
- Missing data will be accounted for, other categorical values will also be encoded, and numeric values will be normalized.
- Data Analysis: Will include analyzing the relationships between other variables like age, symptom severity, and treatment adherence with the treatment outcomes.
- Modeling: Classifications will be done with logistic regresision and decision trees to classify outcomes based on patient characteristics.
The main importance of the analysis is to identify key predictors of treatment success.
In [1]:
import pandas as pd
df = pd.read_csv("mental_health_diagnosis_treatment_.csv")
df.head(3)
Out[1]:
Patient ID | Age | Gender | Diagnosis | Symptom Severity (1-10) | Mood Score (1-10) | Sleep Quality (1-10) | Physical Activity (hrs/week) | Medication | Therapy Type | Treatment Start Date | Treatment Duration (weeks) | Stress Level (1-10) | Outcome | Treatment Progress (1-10) | AI-Detected Emotional State | Adherence to Treatment (%) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 43 | Female | Major Depressive Disorder | 10 | 5 | 8 | 5 | Mood Stabilizers | Interpersonal Therapy | 2024-01-25 | 11 | 9 | Deteriorated | 7 | Anxious | 66 |
1 | 2 | 40 | Female | Major Depressive Disorder | 9 | 5 | 4 | 7 | Antipsychotics | Interpersonal Therapy | 2024-02-27 | 11 | 7 | No Change | 7 | Neutral | 78 |
2 | 3 | 55 | Female | Major Depressive Disorder | 6 | 3 | 4 | 3 | SSRIs | Mindfulness-Based Therapy | 2024-03-20 | 14 | 7 | Deteriorated | 5 | Happy | 62 |
In [2]:
# variables
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 500 entries, 0 to 499 Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Patient ID 500 non-null int64 1 Age 500 non-null int64 2 Gender 500 non-null object 3 Diagnosis 500 non-null object 4 Symptom Severity (1-10) 500 non-null int64 5 Mood Score (1-10) 500 non-null int64 6 Sleep Quality (1-10) 500 non-null int64 7 Physical Activity (hrs/week) 500 non-null int64 8 Medication 500 non-null object 9 Therapy Type 500 non-null object 10 Treatment Start Date 500 non-null object 11 Treatment Duration (weeks) 500 non-null int64 12 Stress Level (1-10) 500 non-null int64 13 Outcome 500 non-null object 14 Treatment Progress (1-10) 500 non-null int64 15 AI-Detected Emotional State 500 non-null object 16 Adherence to Treatment (%) 500 non-null int64 dtypes: int64(10), object(7) memory usage: 66.5+ KB