Wenjie Lan 🌼
Wenjie Lan

Master Student in Statistics

About Me

Wenjie Lan is a Master of Science student in Statistics at Duke University. She holds a Bachelor of Science degree in Financial Technology from the Southwestern University of Finance and Economics. Wenjie’s academic and research interests includes data privacy, probabilistic Machine Learning, and bayesian Statistics.

Check my online CV for the most recent update: https://www.overleaf.com/read/whhvmfkhzccj#33418f

During her master’s studies, Wenjie’s research focuses on differential privacy, probabilistic machine learning, and Bayesian statistics. Her summer 2025 research on differential privacy computation for the Gini Index is under the supervision of Prof. Jerome Reiter. Her ongoing 2025–2026 project examines the Hierarchical Conditional Diffusion model, under the guidance of Prof. Jerome Reiter and Prof. David Dunson.

During her undergraduate studies, Wenjie explored two main research tracks: (1) machine learning for risk assessment and prediction (credit risk prediction in ML competitions, 2022–2023; financial distress prediction using GCN+LSTM, summer 2022; systemic risk measurement using CoES, independent study 2023) and (2) applied methods for internship and project problems (bank position management optimization, internship project/patent 2023; high-frequency factor construction via signal processing, internship project 2024; privacy-preserving finance via federated learning, research assistant 2024).

Most importantly, she is grateful to her advisors, collaborators, and friends for their guidance and support.

Technical Skills:

  • Programming Languages: Python (Pandas, NumPy, Dask, Scikit-learn, PyTorch), R (dplyr, tidyr, tidymodels, sf, shiny), SQL, C&C++
  • Key Areas of Expertise: Probabilistic Machine Learning, Data privacy, Bayesian Statistics
Interests
  • High-dimensional Data
  • Statistical Computing
  • Data Privacy
  • Quantitative Finance
Education
  • MS Statistics

    Duke University

  • BS Financial Technology

    Southwestern University of Finance and Economics

📚 My Research

My current research interest include Data privacy, probabilistic machine learning, and Bayesian statistics. I apply a range of qualitative and quantitative methods to comprehensively investigate the role of science and technology in the economy.

Please reach out to collaborate 😃

Research
Projects
Customer Credit Risk Prediction and Identification

Customer Credit Risk Prediction and Identification

Competition: Third Sichuan University Financial Technology Modeling Competition Award: First Place **Awared by:**The Education Department of Sichuan Provincial This project, presented for the Third Sichuan University Financial Technology Modeling Competition, focuses on customer credit risk prediction and identification. It emphasizes constructing a stable, high-performing binary classification model for credit risk management based on financial data. Key Highlights Data Analysis and Preparation Data Overview: Combined datasets include 24,983 samples and 205 features. After preprocessing, 124 features remain (33 textual and ~33% date-related features). Data Cleaning: Missing values: Median imputation for continuous variables. Mode imputation for discrete variables. Filling with -99 for features with >95% missing. Encoding methods: Count encoding for categories with <10 values. WOE binning for categories with >10 values. Time features are extracted based on hours/minutes or intervals from the current day. Feature Selection and Engineering: Importance-ranked features selected via XGBoost. Featuretools used to generate new feature combinations. Model Building Architecture: Three-layered stacking framework: Layer 1: Base models include CatBoost, LightGBM, XGBoost, and Random Forest. Layer 2: Outputs from base models serve as inputs for four distinct sub-models. Layer 3: Final predictions are generated through normalized weighted voting. Model Optimization: 5-fold cross-validation and grid search are applied to optimize hyperparameters for base models. Evaluation Model performance is assessed using AUC (Area Under the Curve): Individual Models: CatBoost: 0.8486 LightGBM: 0.8476 XGBoost: 0.8464 Random Forest: 0.8423 Stacking and Voting: Stacking 1: 0.8523 Stacking 2: 0.8530 Voting: 0.8594 (Best Performance) Credit Rating System Structure: Customers are classified into 9 levels based on predicted risk, with clear distribution and distinguishable credit tiers. 9.77% of users belong to the top levels (8 and above), reflecting the model’s discriminatory power. Implementation: Integrated with a web-based system using Docker and Vue.js for front-end services. Compared to FICO models, the system provides zones for: Quality customers. Value exploration. Overestimation. Risk elimination. Conclusions and Suggestions The project demonstrates strong modeling capabilities through effective stacking and feature engineering. The authors suggest refining the model for real-world applications and exploring the scalability of the approach for diverse datasets. This project presents a robust credit risk modeling framework with promising performance, practical implications, and room for further enhancement in financial technology applications.

2022CUMCM-B-Mathematics-Modelling

2022CUMCM-B-Mathematics-Modelling

Competition: 2022CUMCM-B-Mathematics-Modelling Award: National Second Place **Awared by:**China Society for Industrial and Applied Mathematics This paper addresses the problem of passive localization and formation adjustment in UAV (Unmanned Aerial Vehicle) formation flying by developing models and innovative algorithms. The main contributions are summarized below: Problem 1, Subquestion 1: The task is transformed into finding the intersection points of multiple arc trajectories. A universal multi-point trajectory localization model is established in polar coordinates. An Information Source Traceback Algorithm (Appendix 2) is innovatively developed to match signal sources with UAV identifiers by comparing ideal and actual angles within an error margin. This allows solving corresponding trajectory equations and forming the final multi-point trajectory localization model (5.1.2) for UAV positioning. Problem 1, Subquestion 2: The problem is reformulated as an optimal incremental signal source planning task. A 0-1 optimization model (5.2.1) is created and solved using a looped search structure. The Information Source Traceback Algorithm is improved (Appendix 3) to account for slight UAV positional deviations. It identifies unknown UAVs by approximating their azimuthal angles. The solution reveals that adding only one UAV signal emitter achieves effective localization, excluding FY00 and FY01 (5.2.3). Problem 1, Subquestion 3: A greedy approach is used to construct an iterative optimization model for UAV positioning (5.3.1). Using fixed, unbiased signal sources (FY00 and FY01) as the coordinate system, other UAVs are iteratively selected for positioning and adjustment, reducing errors through convergence in three iterations (5.3.2). This results in an effective formation adjustment scheme. Problem 2: Based on the traceback algorithm and localization model, a conical formation adjustment scheme is developed (5.4.1). Localization starts with three UAVs forming a central equilateral triangle and expands outward layer by layer. The approach is generalized to any arbitrary formation (5.4.2), involving iterative adjustments using accurate UAV positions as references. Practical application is demonstrated with an example of adjusting UAVs in a “China”-shaped formation (5.4.3). Robustness Testing: The robustness of the improved traceback algorithm is verified through angle error discussions, ensuring reliability under slight positional deviations. Conclusion The paper emphasizes the Information Source Traceback Algorithm and Multi-Point Trajectory Localization Model as core methodologies. These are applied to real-world UAV localization and formation adjustment problems, offering optimized adjustment schemes and effective generalization to various scenarios.

A Synthetical Water Dispatching Model Give it or Give up

A Synthetical Water Dispatching Model Give it or Give up

Competition: Mathematical Contest in Modelling Award: Finalist (Top: 2% among 27,205 Teams) Awarded by: the Consortium for Mathematics and Its Applications Water Loss in Dams Due to Climate Change: A Mathematical Approach Water loss in dams resulting from climate change has become a prominent problem in recent years, thus influencing humans’ life and production. To help address this issue, mathematical models are required to be established. Problem 1 Problem 1 can be divided into three parts: Service Area Coordination: Maps are rasterized, and service areas for two dams are classified using a Comparative Optimization Algorithm. Comprehensive Dispatching Model for Water: Demand Side: An AIR Model is established to capture water demands, resulting in: 11858569 m³ to be drawn from the Glen Canyon Dam. 40978282 m³ to be drawn from the Hoover Dam. Supply Side: Analysis of water levels and water volumes is conducted, with water-electricity generation fitted through Polynomial Interpolation, laying the foundation for subsequent analysis. Dynamic Programming Model: Calculates the time until demands are unmet at fixed water levels: For the highest water level, the time is 495 days. Additional water as a function of time is derived (see Section 4.6). To consider Mexico’s residual claims, a Water-Supply Corridor Model is proposed, balancing respect for rights and interests (see Section 4.7). Problem 2 A Multi-Interest Tradeoff Model is developed using Goal Programming and Input-Output Theory: Economic Benefits as Criteria: Four “players” of competing interests are identified. Results include: 11848077 m³ drawn from the Glen Canyon Dam. 39125274 m³ drawn from the Hoover Dam. Reallocation results in increased water for industry and decreased water for agriculture (see Section 5.2, Table 5). Problem 3 When supply cannot meet all water demand: Inspired by the NSGA-II Algorithm (a type of Genetic Algorithm), specific approaches are recommended: Reducing the scale of industries with low water-use efficiency and allocating more water to efficient industries. Promoting technological innovation in industries with low water-use efficiency to improve resource utilization. Conclusion To ensure robustness, sensitivity analysis is conducted, and a summary article containing findings and suggestions has been written for the Drought and Thirst Magazine.

Experience

  1. Quantitative Research Intern

    BlackWing Asset Management Co., Limited

    Responsibilities include:

    • Developed an automated stock data pipeline using Python to scrape stock codes and real-time data from the Shanghai Stock Exchange, and store it in an SQL database for continuous updates
    • Constructed high-frequency (tick-level) price-volume factors, achieving weekly annualized long-short Sharpe ratios $>$5.0 and long-only information ratios $>$3.0, with 30% of factors demonstrating monthly IC averages $>$5% and ICIR $>$0.65.
    • Drafted a comprehensive factor construction manual, compiled outlier values, and developed methodologies for factor refinement and optimization.
  2. Quantitative Research Intern, Financial Markets Department

    Chongqing Fumin Bank Co.,Ltd.

    Responsibilities include:

    • Designed a system generating real-time trading strategies based on funding gaps and bank transaction restrictions using the Python DOcplex integer module to address liquidity risk and minimise costs for bank, resulting in nearly 5 million RMB annual savings. (submitted as a patent)
    • Developed a pricing model for municipal bonds, bolstering trader negotiation capabilities.
    • Constructed corporate credit rating models based on meachine learning models (xgboost et.).
  3. Analyst Intern, Social Services and Food Research Group

    Orient Securities Co., Ltd.

    Responsibilities include:

    • Utilized the Wind database and wrote in-depth research reports on companies such as Jiumaojiu International Holdings Limited, etc., and conducted expert interviews to analyze their competitiveness in the industry.
    • Created a grid-based city analysis model to gauge traffic and demographic metrics, aiding investors by estimating potential spaces for store count growth based on data collected through Python crawlers.
  4. President

    IOS Club (Certified by Apple Inc. and SWUFE)

    Responsibilities include:

    • Led the iOS Club (certified by Apple Inc. and SWUFE) with over 50 active members, organizing 10+ workshops and hackathons.
    • Secured partnerships with Apple Inc. and external organizations, enhancing resources and opportunities for members.
    • Mentor members participated in 20+ tech competetions, showcased at regional tech events, contributing to the club’s recognition as a top student-led initiative at SWUFE.

Education

  1. MS Statistics

    Duke University

    GPA: 4.0/4.0

    Courses included:

    • Predictive Modeling (A); Statistics Programming (A); Theory of Inference (A)

    Courses (in progress):

    • Real Analysis 1; Bayesian Statistics Modeling; Numeric Analysis; Mixing Times for Markov Chains
  2. BS Financial Technology

    Southwestern University of Finance and Economics

    GPA: 4.1/5.0 (Top 5%)

    Courses included:

    • Math: Financial Stochastic Analysis; Multivariate Statistical Analysis; Probability and Mathematical Statistics; Discrete Mathematics; Optimization Theory; Advanced Algebra Ⅱ & Ⅰ; Mathematical Analysis Ⅱ & Ⅰ
    • Programming and Algorithm: Foundation of Programming(C&C++); Data Structures; Machine Learning; Deep Learning; Fintech; Financial Big Data Analysis; Big Data Processing; Database Theories and Application
    • Finance: Microeconomics; Macroeconomics; Political Economics; Accounting; Monetary Finance; Corporate Finance; Financial Econometrics I (linear regression) & Ⅱ (time series); Investments; Fixed Income Securities; Financial Risk Management; Financial Stochastic Analysis
Skills & Hobbies & Languages
Technical Skills
Python
R
Data Science
SQL
C
Hobbies
Reading
Kpop
Dog
Languages
English
Chinese
Japanese
Awards
National Scholarship 2022
Ministry of Education of China ∙ December 2022
See certificate
University Second Honor Scholarship
SWUFE ∙ December 2021
See certificate
University Second Honor Scholarship
SWUFE ∙ December 2023
See certificate