Skip to main content
Field Trials

The Lab

We give the same task to multiple agentic AI tools, watch them work, and publish the full transcripts alongside our scoring. The agenticness score is derived from observed behavior, not vendor documentation.

Before you read any trial, read the methodology. The protocol is the part you should argue with — the trial of the day is downstream of it.

Start here

The Lab Methodology

Same task, same starter repo, same model versions, same scoring rubric, same observer. Plus the five-point public pact we make before any trial runs — including “we publish trials a paying vendor lost.”

Read the methodology

Published trials

No trials published yet

Trial #1 — “Three coding agents, one OAuth task” — is in the lab. The methodology is locked first; the first trial ships after.

Get a one-line note when a trial publishes

One email per trial. No drip, no upsells, no roundup digests.

I use AI for: