Back to Experience

Personal Finance Health Predictor

Personal Project

Data Scientist / ML Engineer · Fall 2025 · Remote

PythonFastAPIXGBoostLightGBMScikit-learn

Overview

An enterprise-grade machine learning system that demonstrates the complete data science lifecycle from raw data exploration through production deployment. This project showcases expertise in building scalable, production-ready AI solutions for the financial services industry. Built three predictive models addressing critical fintech challenges: credit risk assessment, fraud detection, and customer segmentation. Processed 285,000+ financial records across three datasets (German Credit, Lending Club, Credit Card Fraud) to deliver actionable insights and automated decision-making capabilities.

Challenge

Financial services organizations face critical challenges in credit risk assessment, fraud detection, and customer segmentation. Traditional methods are inefficient, lack accuracy, and cannot scale to handle large volumes of transactions. There was a need for automated, data-driven solutions that could process hundreds of thousands of records, handle severely imbalanced datasets, and provide real-time predictions in production environments.

Solution

Developed a comprehensive end-to-end ML platform with three specialized models: credit risk prediction using ensemble methods (XGBoost, LightGBM), fraud detection with advanced anomaly detection techniques (Isolation Forest, SMOTE), and customer segmentation using multiple clustering algorithms. Built a production-ready FastAPI RESTful API with comprehensive validation, error handling, and testing. Implemented rigorous data engineering pipelines with feature engineering, handling of class imbalance, and resolution of data leakage issues.

Key Features

Three production-ready ML models: Credit Risk, Fraud Detection, and Customer Segmentation

Comprehensive EDA with 26+ engineered features and intelligent feature scaling

Advanced handling of class imbalance using SMOTE and ensemble methods

Data leakage detection and resolution in real-world scenarios

RESTful API with FastAPI, Pydantic validation, and Swagger UI documentation

Comprehensive test suite with 10+ unit tests achieving full endpoint coverage

40+ publication-quality visualizations documenting model performance

Business impact analysis with actionable recommendations and ROI calculations

Results & Impact

1

Credit Risk: Achieved 70% ROC-AUC with XGBoost, identified top 15 risk indicators, estimated $500K+ annual cost savings

2

Fraud Detection: Achieved exceptional 97.7% ROC-AUC and 85.7% recall on severely imbalanced data (0.17% fraud rate), estimated $16K+ monthly fraud prevention savings

3

Customer Segmentation: Achieved 0.39 Silhouette Score, identified 4 distinct customer personas enabling 15-20% improvement in marketing campaign conversion rates

4

Processed 285,000+ financial records across three datasets with comprehensive EDA and feature engineering

5

Built production-ready FastAPI application with 3 prediction endpoints, comprehensive test suite, and interactive API documentation

6

Delivered combined systems with estimated ROI of 250%+ through operational efficiency gains

Technology Stack

Frontend

Jupyter NotebooksMatplotlibSeabornPlotly

Backend

Python 3.8+FastAPIUvicornPydantic

Database

CSVJSONPickle (Model Storage)

Tools

Git/GitHubVS CodeJupyter LabPytestScikit-learnXGBoostLightGBMImbalanced-learnPandasNumPy