This project involves developing a predictive model to assess loan default risks. Using a dataset that includes various borrower characteristics, the model predicts whether a loan will default based on historical data.
- Python
- Pandas
- Scikit-learn
- NumPy
- Matplotlib/Seaborn (for visualizations)
- Jupyter Notebook (for development)
The dataset used for this project contains information about borrowers, including features like age, income, employment length, loan amount, interest rate, and loan status. It consists of 32,581 entries with various attributes relevant to credit risk.
person_age
: Age of the borrowerperson_income
: Income of the borrowerperson_emp_length
: Employment length (in years)loan_amnt
: Amount of the loanloan_int_rate
: Interest rate of the loanloan_status
: Status of the loan (0 for non-default, 1 for default)- Additional features related to borrower demographics and credit history
- Data Preprocessing: Clean and preprocess the data, handle missing values, and encode categorical variables.
- Train-Test Split: Divide the dataset into training and testing sets.
- Model Selection: Use logistic regression to model the likelihood of default.
- Hyperparameter Tuning: Optimize the model using techniques such as grid search.
- Evaluation: Evaluate the model's performance using accuracy, confusion matrix, and classification report.
- Accuracy: 86.49%
- Confusion Matrix:
Predicted: No (0) | Predicted: Yes (1) | |
---|---|---|
Actual: No (0) | True Negatives | False Positives |
Actual: Yes (1) | False Negatives | True Positives |
- Classification Report: Detailed performance metrics (precision, recall, F1-score).
- Clone the repository:
git clone https://github.com/lakshyajoshii/credit_risk_model.git
- Navigate to the project directory:
cd credit_risk_model
- Run the main script:
python main.py
This project is licensed under the MIT License. See the LICENSE file for details.