
Take full control of your personal brand and privacy by migrating away from the big tech platforms!
Oct 27, 2023

Create a personal knowledge base and share your knowledge with your peers.
Oct 26, 2023

Use popular tools such as Plotly, Mermaid, and data frames.
Oct 25, 2023

Embed videos, podcasts, code, LaTeX math, and even test students!
Oct 24, 2023

Easily manage your projects - create ideation mind maps, Gantt charts, todo lists, and more!
Oct 23, 2023

Competition: Third Sichuan University Financial Technology Modeling Competition Award: First Place **Awared by:**The Education Department of Sichuan Provincial This project, presented for the Third Sichuan University Financial Technology Modeling Competition, focuses on customer credit risk prediction and identification. It emphasizes constructing a stable, high-performing binary classification model for credit risk management based on financial data. Key Highlights Data Analysis and Preparation Data Overview: Combined datasets include 24,983 samples and 205 features. After preprocessing, 124 features remain (33 textual and ~33% date-related features). Data Cleaning: Missing values: Median imputation for continuous variables. Mode imputation for discrete variables. Filling with -99 for features with >95% missing. Encoding methods: Count encoding for categories with <10 values. WOE binning for categories with >10 values. Time features are extracted based on hours/minutes or intervals from the current day. Feature Selection and Engineering: Importance-ranked features selected via XGBoost. Featuretools used to generate new feature combinations. Model Building Architecture: Three-layered stacking framework: Layer 1: Base models include CatBoost, LightGBM, XGBoost, and Random Forest. Layer 2: Outputs from base models serve as inputs for four distinct sub-models. Layer 3: Final predictions are generated through normalized weighted voting. Model Optimization: 5-fold cross-validation and grid search are applied to optimize hyperparameters for base models. Evaluation Model performance is assessed using AUC (Area Under the Curve): Individual Models: CatBoost: 0.8486 LightGBM: 0.8476 XGBoost: 0.8464 Random Forest: 0.8423 Stacking and Voting: Stacking 1: 0.8523 Stacking 2: 0.8530 Voting: 0.8594 (Best Performance) Credit Rating System Structure: Customers are classified into 9 levels based on predicted risk, with clear distribution and distinguishable credit tiers. 9.77% of users belong to the top levels (8 and above), reflecting the modelβs discriminatory power. Implementation: Integrated with a web-based system using Docker and Vue.js for front-end services. Compared to FICO models, the system provides zones for: Quality customers. Value exploration. Overestimation. Risk elimination. Conclusions and Suggestions The project demonstrates strong modeling capabilities through effective stacking and feature engineering. The authors suggest refining the model for real-world applications and exploring the scalability of the approach for diverse datasets. This project presents a robust credit risk modeling framework with promising performance, practical implications, and room for further enhancement in financial technology applications.
Jan 1, 2023