Situation
This the last part of RevoBank analysis, we develop machine learning with propensity modelling using logit regression.
As a Data Analyst at RevoBank, I developed a propensity model to predict customers likely to adopt the new PayLater installment feature for a targeted promotion campaign.
Backround
RevoBank launched a 3-month pilot for a PayLater installment feature on RevoShop transactions. The pilot demonstrated potential to increase credit card usage. To further boost activation, RevoBank proposed Project Contact - contacting customers to offer rewards for trying PayLater.
However, contacting all customers would be costly. RevoBank needed a targeted way to identify the highest potential customers worth contacting. By developing a propensity model using historical transaction data, I could predict each customer’s likelihood of activating PayLater if offered. This would enable RevoBank to only contact the most promising customers, saving costs while still driving sign-ups. My data-driven model provides the targeting capability needed to optimize Project Contact for maximum return.
Problem Definition
The problem was that RevoBank seeking a targeted way to identify customers likely to activate PayLater when contacted. A propensity model was needed.
Objective
The objectives of this analysis were to:
- Engineer features from historical transaction data, excluding future info
- Build a classification model to predict likelihood of PayLater activation
- Evaluate model accuracy in identifying high potential customers
- Rank customers by predicted activation probability
- Identify top customers to target for Project Contact promotion
- Enable data-backed optimization of outreach for higher ROI
Task & Action
Task | Action | Reason |
---|---|---|
Review data dictionary | Understood dataset contents, identified join keys | Prepared datasets for feature engineering |
Define model target variable | PayLater activation during pilot period as positive label | Aligned modeling goal to business objective |
Feature engineering | Created features from historical data, excluded future info | Input relevant info to model while avoiding leakage |
Create train/test split | Made separate datasets for training and evaluation | Assessed model generalizability |
Check correlations | Removed highly correlated variables | Avoided model overfit |
Check class imbalance | Verified balance of positive/negative classes | Awareness of data distributions |
Build logit model | Created model predicting target variable using logit regression | Propensity modeling approach |
Evaluate model performance | Assessed accuracy, confusion matrix, decile analysis, ROC, and K-S test | Determined model suitability for goal |
Feature Importance | Show feature importance from selected features in logit model | This is will help RevoBank to identify which factor most important to predict paylater user |
Rank customers | Scored customers by predicted activation probability | Enabled selective targeting |
Apply Benefit Cost Analysis | Chose top 30K customers from ranked list to BCA workspace | Estimate benefit to cost ratio from 30K potential customer to help RevoBank optimize their marketing acquisition cost |
Result - Slide Deck
From this project, I’ve got 72% accuracy with 61%-62% ROC score, and K-S test 48.33. This model have a lot of improvement due to condition of data. I’ve already improved feature engineering process, so the overall model performance can slightly better than actual performance (key answer)
Attachments
» Dataset Link « | » Data Dictionary Link « | » Pilot Data Link « | » BCA Spreadsheet Link « | » Colab Link «
I’m grateful you took the time to explore my portfolio site and view my work. Your interest means a lot.
Feel free to provide any feedback or ask questions - I welcome your thoughts. Thank you again for checking out my projects and considering my skills. I appreciate you taking the time to do so.