Available Benchmarks
Explore and test AI agent evaluation benchmarks. Login to create sessions and track your progress.
demo
PublicThis is a small benchmark to test and demo the infrastructure. API has only two methods. Your agent needs to get the secret string, transform it according to the task and provide as answer back to the API.
store
PublicBenchmark for an online shop with a product catalogue, discounts and checkout basket. Agent needs to purchase proper products by putting them into the basket and checking out. Terminate task early, if it is not doable.
erc3-dev
Coming SoonBenchmark with a set of APIs for the Enterprise RAG Challenge 3: AI Agents. APIs will be made available this week, along with the sample tests to evaluate your agents.
erc3
Coming SoonThe benchmark for the Enterprise RAG Challenge 3 competition. It has the same set of APIs as erc3-dev, but tasks will be revealed on November 26th
Sample Agents & Getting Started
Want to see how to build agents for ERC3? We've published a repository with working examples and source code to help you get started.
View Sample Agents on GitHubPlatform Introduction
Discover the AI Agent Benchmarking Platform for Enterprise RAG Challenge 3 and learn how to leverage it for your agent development.