Suyash | Portfolio

The Business Problem: The "Friction vs. Fraud" Trade-off

Our legacy authentication system was a blunt instrument. It operated on static, binary rules (e.g., If IP != Last IP → Force 2FA). This resulted in a high False Positive Rate (FPR), where legitimate users—especially travelers or those with dynamic IPs—were constantly nagged by 2FA challenges. This friction was a direct driver of user churn.

We needed a model that could surgically distinguish between "Latent Risk" and "New Context," allowing us to only spend our "friction budget" where it was actually needed.

The Architecture: Moving to Surgical Segmentation

We replaced the static rule engine with a dynamic risk scoring system. While deep learning was considered for its predictive power, we ultimately chose Decision Trees for three strategic reasons critical to Product Data Science:

Auditability: In security, "Black Box" decisions are liabilities. We needed to explain to Customer Support exactly why a VIP user was blocked.
Inference Speed: We needed sub-millisecond scoring during the login handshake without the overhead of heavy model serving infrastructure.
Policy Generation: We didn't just need a probability; we needed clear cut-offs to define business logic.

LEGACY STATE (Static Logic)

User Login 
    ⬇
Static Rule Check 
    ⬇
GENERIC 2FA CHALLENGE
(Result: High friction for safe users, predictable patterns for fraudsters)

NEW STATE (Tree-Derived Logic)

User Login 
    ⬇
Feature Engineering (Device Trust, Time-of-Day, Velocity)
    ⬇
Decision Tree Classifier
    ⬇
GRANULAR SEGMENTATION
(Result: "Passive Auth" (No 2FA) for 90% of sessions)

Why "Old School" Won: Trees as Segmenters

The superpower of the Decision Tree in this context wasn't just classification—it was hyper-segmentation.

A logistic regression would have forced us to determine a single global probability threshold (e.g., Score > 0.7). This is often insufficient for global products. We found that risk factors like "Time of Day" were highly predictive for specific geos but irrelevant for others.

A regression struggles to "turn off" a variable for a specific subset of users without complex interaction terms. A tree, however, naturally isolates these sub-populations. It allowed us to automatically carve out "Safe Zones"—hyper-rectangles of logic where we could confidently suppress 2FA.

The Hidden Value: Data-Driven Definitions

The most underrated utility of this approach was using the model to settle business debates. Product and Engineering often argued over definitions: What is a "Trusted Device"? Is it 3 successful logins? 5?

Instead of guessing, we let the tree dictate the policy. We ran a tree predicting Future_Fraud_Event. The first split (e.g., Successful_Logins >= 4) gave us a mathematically justified threshold. We replaced "gut feel" rules with data-derived breakpoints.

Impact & Optimization

➜ 13% Reduction in 2FA Challenges: We successfully identified low-risk segments that were previously flagged by static rules.
➜ Zero Increase in Fraud: The reduction in friction did not compromise security.

Going Deeper: RuleFit

To maximize performance without losing interpretability, we eventually upgraded to RuleFit. This allowed us to:

Use a Random Forest to generate thousands of candidate rules (non-linear feature interactions).
Use Lasso Regression to select only the sparse set of rules that actually mattered.

This gave us the accuracy of an ensemble method while retaining the "If-Then" auditability required for a security product.