92.5% ACC
Student Intelligence System

Diagnose. Plan. Improve.

Traceback analyzes learning signals to identify why each student is struggling — and generates a personalized recovery plan.

Total Students
28,785
OULAD dataset
On Track
13,134
45.6% of students
Need Intervention
8,449
Critical + High urgency
Model Accuracy
92.5%
XGBoost classifier
Plans Generated
28,785
100% coverage
Root Cause Distribution
5 causes
Intervention Urgency
Avg Score by Root Cause
Avg Engagement by Final Result
Project Documentation

How Traceback Works

A full machine learning pipeline built on the OULAD dataset to diagnose student learning problems and generate personalized interventions.

The Pipeline

1

Data Cleaning & Merging

7 OULAD CSV files cleaned and merged into a master table of 28,785 students × 32 columns.

2

Feature Engineering

10 signals normalized. 3 composite risk scores built: academic_risk, engagement_risk, persistence_risk.

3

Root Cause Classification

XGBoost classifier trained on 13 features. 92.5% accuracy across 5 root cause categories.

4

Learning Plan Generation

Rule-based engine generates 4-step personalized plans using each student's real signal values.

The 5 Root Causes

NO_ENGAGEMENT

Student barely interacts with the platform. Attendance problem, not knowledge.

KNOWLEDGE_GAP

Engaging but scoring low despite effort. Foundational concept is missing.

DECLINING

Was performing OK but scores are dropping. Something changed recently.

EXAM_ANXIETY

Good coursework scores but poor exam performance. Performance-under-pressure issue.

NEEDS_SUPPORT

Consistently near but below threshold. Small targeted push needed.

Tech Stack

Built entirely in Python on Google Colab using open-source libraries and the OULAD dataset.

Python 3.12 pandas numpy scikit-learn XGBoost matplotlib OULAD Dataset Google Colab HTML/CSS/JS

Key Results

CLASSIFIER ACCURACY
92.5%
ENGAGEMENT SIGNAL
Distinction students average 28 clicks vs Withdrawn students at 3 clicks — a 9x difference.
TOP FEATURE
avg_score_norm accounts for 45.2% of model importance. Score patterns dominate over engagement signals.