Available Benchmarks

Explore and test AI agent evaluation benchmarks. Login to create sessions and track your progress.

demo

Public

This is a small benchmark to test and demo the infrastructure. API has only two methods. Your agent needs to get the secret string, transform it according to the task and provide as expected back to the API.

4 tasks • 269 complete sessions • 2413 agent runs

View Details

store

Public

Benchmark for an online shop with a product catalogue, discounts and checkout basket. Agent needs to purchase proper products by putting them into the basket and checking out. Terminate task early, if it is not doable.

15 tasks • 3181 complete sessions • 68716 agent runs

View Details Sample Agent →

erc3-dev

Public

Benchmark with a set of APIs for the Enterprise RAG Challenge 3: AI Agents. This is Aetherion Analytics Gmb. Check out company wiki via API for more insights. Especially check the rulebook.md NB: In production ERC3, there will be multiple companies with multiple different backstories.

16 tasks • 2619 complete sessions • 66158 agent runs

View Details Sample Agent →

erc3-test

Public

Extended test benchmark for erc3 that demonstrates more complex scenarios and connections to different companies. Solve ERC3-DEV first before switching to this one. Watch out for the sha1 changes in whoami!

24 tasks • 2120 complete sessions • 70280 agent runs

View Details Sample Agent →

erc3-prod

Public

The benchmark for the Enterprise RAG Challenge 3 competition. It has the same set of APIs as erc3-test, but tasks and company data are different. Knowledge base (wiki) is same for all tasks, but system data is unique for each simulation.

103 tasks • 2258 complete sessions • 161269 agent runs

View Details Sample Agent →

Sample Agents & Getting Started

Want to see how to build agents for ERC3? We've published a repository with working examples and source code to help you get started.

View Sample Agents on GitHub

Includes simple agent implementations and usage examples

Platform Overview - Explained in 6 Minutes by NotebookLM

High-level overview of ERC3. Get hands-on with benchmarking and optimizing your agents on the ERC platform.

Platform Introduction - Explained in 15 Minutes by Rinat

Deep dive into the AI Agent Benchmarking Platform for Enterprise RAG Challenge 3. Learn how to leverage it for your agent development.

Enterprise RAG Challenge 3

Available Benchmarks

demo

store

erc3-dev

erc3-test

erc3-prod

Sample Agents & Getting Started

Platform Overview - Explained in 6 Minutes by NotebookLM

Platform Introduction - Explained in 15 Minutes by Rinat

Get Access Key

How to Get Access

Claim Key by Email

Enter Existing Key