Model Rankings
Quick Snapshot
| # | Model | RDAB Score | Tasks | Avg Cost / Task | Provider |
|---|
All Runs
| Task ↕ | Title ↕ | Diff. ↕ | Category ↕ | Model ↕ | Correctness ↕ | Code Quality ↕ | Efficiency ↕ | Stat Validity ↕ | RDAB Score ↓ | Tokens ↕ | Cost ($) ↕ | Score / $ |
|---|
Most LLMs get the right answer. RDAB checks if they did it the right way.