The Lab
We give the same task to multiple agentic AI tools, watch them work, and publish the full transcripts alongside our scoring. The agenticness score is derived from observed behavior, not vendor documentation.
Before you read any trial, read the methodology. The protocol is the part you should argue with — the trial of the day is downstream of it.
The Lab Methodology
Same task, same starter repo, same model versions, same scoring rubric, same observer. Plus the five-point public pact we make before any trial runs — including “we publish trials a paying vendor lost.”
Read the methodologyPublished trials
No trials published yet
Trial #1 — “Three coding agents, one OAuth task” — is in the lab. The methodology is locked first; the first trial ships after.
Get a one-line note when a trial publishes
One email per trial. No drip, no upsells, no roundup digests.