π― iGaming Retention Test - Part 4
Predicting Churn with Logistic Regression
Now that weβve confirmed retention uplift, letβs try to predict churn to understand which factors drive player loss.
Main Goals of the day
- Train a Logistic Regression model to predict churn
- Use sessions, deposits, and feature usage as predictors
- Evaluate accuracy, recall, and precision
- Interpret feature coefficients
Step by Step
π Step 1: Encoded features and target (churn)
π Step 2: Split dataset into train/test
π Step 3: Fitted logistic regression with sklearn
π Step 4: Evaluated model performance metrics
Insights
The model performs slightly above random:
AUC = 0.547, Accuracy = 54%, Recall = 52%Early churn models often struggle due to class imbalance β still, it reveals interesting patterns.
Code Snippet
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
X = df[['sessions', 'deposits', 'feature_used']]
y = df['churn']
model = LogisticRegression()
model.fit(X, y)
auc = roc_auc_score(y, model.predict_proba(X)[:,1])
print("AUC:", auc)
Next Step
Add SQL-based aggregation to structure metrics and retention KPIs
Enrich the dataset with session-level metrics before retraining