How we test
Bought, used & scored by humans.
Every tool is paid for at full retail, out of our own pocket — no vendor demos, no sponsored access. Then real people run the same 240 prompts across writing, code, research, data and translation: the work you’d hand it day to day.
Two reviewers score every output blind, and that weighted result is what you see. Never a number a vendor gave us, never a verdict an AI wrote. We re-run the whole benchmark every quarter — this leaderboard moves fast.
Read full methodology →01
Paid at retail
Every tool bought as a normal user on the paid plan — no comped accounts, no vendor demos.
02
Run on real work
The same 240 prompts across writing, code, research, data and translation — tasks from actual workdays.
03
Scored blind by humans
Two reviewers grade each output side by side. No AI grading, no vendor-supplied numbers.
04
Re-run every quarter
The whole benchmark is rebuilt and rescored every 90 days — stale rankings get retired fast.