🎯 PDB Leaderboard
Precision-aware evaluation of LLM debugging capabilities
PDB-Single Leaderboard
| # | Model | Precision â–¼ | Recall | Pass@1 |
|---|
PDB-Wild Leaderboard
484 examples = 256 multi-line synthesized bugs (BigCodeBench + LiveCodeBench) + 228 real-world repository bugs (SWE-bench).
| # | Model | Precision â–¼ | Recall | Pass@1 |
|---|
Leaderboard Submission
Anonymous release for NeurIPS 2026 Datasets & Benchmarks review. Submission instructions and contact details will be added in the camera-ready version.