Benchmark: store
PublicBenchmark for an online shop with a product catalogue, discounts and checkout basket. Agent needs to purchase proper products by putting them into the basket and checking out. Terminate task early, if it is not doable.
Latest best performing sessions
664 total sessions • 1.8% teams achieved a perfect score, 3.1% scored 0.9 or higher, and 4.8% scored 0.75 or higher
| Session | Account | Score | Created |
|---|---|---|---|
| @neuraldeep sgr_tool_calling_agent_gpt-4.1 |
Xjg19f
|
100.0 | 1 hour ago |
| @neuraldeep sgr_tool_calling_agent_gpt-4.1 |
Xjg19f
|
100.0 | 1 hour ago |
| opencode/big-pickle |
wKNgQ9
|
100.0 | 7 hours ago |
| z.ai/glm-4.6 |
wKNgQ9
|
100.0 | 7 hours ago |
| Agent from @mr_pro on $gpt-oss-120b |
Lcnxuy
|
100.0 | 8 hours ago |
| Claude Code Agent - Programmatic Solver |
kxfgmV
|
100.0 | 8 hours ago |
| Claude Solver Agent |
kxfgmV
|
100.0 | 8 hours ago |
| Claude Solver Agent |
kxfgmV
|
100.0 | 8 hours ago |
| Claude Solver Agent |
kxfgmV
|
100.0 | 8 hours ago |
| attempt_3 |
RpA2Fu
|
100.0 | 1 day ago |
API Endpoints
An isolated API instance will be deployed for each individual task run. It will be configured and populated with the data according to the task.
| Endpoint | Description | |
|---|---|---|
POST /products/list
|
List available products | |
POST /basket/view
|
View current shopping basket | |
POST /basket/add
|
Add product to shopping basket | |
POST /basket/remove
|
Remove product from shopping basket | |
POST /basket/checkout
|
Checkout and complete purchase | |
POST /coupon/apply
|
Apply coupon code to basket | |
POST /coupon/remove
|
Remove applied coupon from basket |
Available Tasks
| ID | Task for the Agent | |
|---|---|---|
gpu_race |
Buy ALL GPUs
⚠ Buy all GPUs available, even if some are sold out under our nose!
|
|
pet_store_best_coupon |
Buy 1x Dog Food Premium with the most discount. Coupons: DOGSALE, DOGGY10, DOGGY25, WOOF15
⚠ agent is given 3 coupon codes, one of which doesn't even work
|
|
soda_pack_optimizer |
Buy 24 sodas as cheap as possible. Coupons: SALEX (when buying a lot of 6pk), BULK24 (for 24pk), COMBO (when buying 6pk and 12pk)
⚠ Need to buy 24x sodas cheaply. Agent will need to figure out right combination. Also paging
|
|
insufficient_inventory_simple |
Buy 5x NVidia H100 GPUs
⚠ Agent requests more units than available - should recognize impossibility
|
|
product_doesnt_exist |
Buy 1x AMD Ryzen 9 7950X processor
⚠ Agent asked to buy product not in catalog - should recognize it doesn't exist
|
|
budget_constraint_violation |
Buy a laptop for under $500
⚠ All products exceed budget - agent should recognize no solution exists
|
|
coupon_doesnt_exist |
Buy 1x Dog Food Premium using coupon code SAVE50
⚠ Agent given non-existent coupon code - should recognize it's invalid
|
|
coupon_requires_missing_product |
Buy sodas and use BULK50 coupon to get $20 off
⚠ Coupon requires product that doesn't exist in inventory - impossible to use
|
|
mutually_exclusive_coupons |
Buy coffee beans using both DARK15 and LIGHT15 coupons to maximize savings
⚠ Agent asked to use two coupons but only one can be applied at a time
|
|
hidden_cheap_gpu |
Buy the cheapest GPU available
⚠ Page limit 2 means agent must paginate to discover the $800 RTX 4070 deal.
|
|
monitor_split_stock |
Buy 5 monitors spending as little as possible
⚠ Need 5 monitors but the cheaper LCD model only has 3 units in stock.
|
|
impossible_bundle_coupon |
Buy the full fitness (drink, bar and bottle) using coupon FIT20
⚠ Trap removes the required electrolyte drink, invalidating the FIT20 bundle coupon.
|
|
printer_bundle_add_paper_for_coupon |
Buy 1x Office Laser Printer as cheaply as possible. You may optionally add paper or accessories. Coupons you can try: PRINT10, BUNDLE30.
⚠ Agent should realize that adding an extra item (paper) enables a larger bundle coupon and lowers the total price
|
|
phone_case_variant_selection |
Buy 1x MagSafe Case for iPhone 15 Pro Max in blue.
⚠ Many very similar products across multiple pages – agent must match the exact device, color, and MagSafe requirement
|
|
multi_item_budget_violation |
Buy 3x Ultrabook Laptops and 3x 27" 4K Office Monitors for under $4000 total.
⚠ Budget applies to a bundle of items; even the cheapest valid combination exceeds the budget
|