SWE-Perf Leaderboard - Code Performance Optimization Benchmark

SWE-Perf Performance Leaderboard

This leaderboard evaluates AI models on code performance optimization tasks. Each model is assessed on three key metrics:

Apply (%): Can the generated patch be applied cleanly to the codebase?
Correctness (%): Do all tests still pass after applying the patch?
Performance (%): How much runtime is saved after optimization?

End-to-End: Models generate complete solutions without any assistance.
Oracle: Models have access to the specific files that need optimization.