🎯 PDB Leaderboard

Precision-aware evaluation of LLM debugging capabilities

PDB-Single Leaderboard

 

# Model Precision â–¼ Recall Pass@1

PDB-Wild Leaderboard

484 examples = 256 multi-line synthesized bugs (BigCodeBench + LiveCodeBench) + 228 real-world repository bugs (SWE-bench).

# Model Precision â–¼ Recall Pass@1

Leaderboard Submission

Anonymous release for NeurIPS 2026 Datasets & Benchmarks review. Submission instructions and contact details will be added in the camera-ready version.