SWE-Perf Performance Leaderboard

This leaderboard evaluates AI models on code performance optimization tasks. Each model is assessed on three key metrics:

  • Apply (%): Can the generated patch be applied cleanly to the codebase?
  • Correctness (%): Do all tests still pass after applying the patch?
  • Performance (%): How much runtime is saved after optimization?

End-to-End: Models generate complete solutions without any assistance.
Oracle: Models have access to the specific files that need optimization.