Enterprise RAG Challenge 3: AI Agents
v1.1.9

Enterprise RAG Challenge 3

"Agentic AI in Action"

Returns — and this time, we're diving into the world of Agentic AI. In the third edition of ERC, we will build autonomous AI agents that can operate inside a simulated enterprise environment — reasoning, planning, and acting to solve real-world business tasks.

Brought by AI Strategy and Research Hub by TIMETOACT GROUP Österreich

You can find the best solutions of ERC3 competition here in this Frozen ERC3 Leaderboard. Live leaderboards still track latest submissions.

Registrations are closed for now.

Available
Benchmarks
5
Total
Tasks
162
Registered
Teams
524
Complete
Sessions
9941
Agent
Runs
334483

Available Benchmarks

Explore and test AI agent evaluation benchmarks. Login to create sessions and track your progress.

demo

Public

This is a small benchmark to test and demo the infrastructure. API has only two methods. Your agent needs to get the secret string, transform it according to the task and provide as expected back to the API.

4 tasks • 263 complete sessions • 2241 agent runs

store

Public

Benchmark for an online shop with a product catalogue, discounts and checkout basket. Agent needs to purchase proper products by putting them into the basket and checking out. Terminate task early, if it is not doable.

15 tasks • 2999 complete sessions • 65002 agent runs

erc3-dev

Public

Benchmark with a set of APIs for the Enterprise RAG Challenge 3: AI Agents. This is Aetherion Analytics Gmb. Check out company wiki via API for more insights. Especially check the rulebook.md NB: In production ERC3, there will be multiple companies with multiple different backstories.

16 tasks • 2540 complete sessions • 64248 agent runs

erc3-test

Public

Extended test benchmark for erc3 that demonstrates more complex scenarios and connections to different companies. Solve ERC3-DEV first before switching to this one. Watch out for the sha1 changes in whoami!

24 tasks • 2099 complete sessions • 69363 agent runs

erc3-prod

Public

The benchmark for the Enterprise RAG Challenge 3 competition. It has the same set of APIs as erc3-test, but tasks and company data are different. Knowledge base (wiki) is same for all tasks, but system data is unique for each simulation.

103 tasks • 2038 complete sessions • 133532 agent runs

Sample Agents & Getting Started

Want to see how to build agents for ERC3? We've published a repository with working examples and source code to help you get started.

View Sample Agents on GitHub

Includes simple agent implementations and usage examples

Platform Overview - Explained in 6 Minutes by NotebookLM

High-level overview of ERC3. Get hands-on with benchmarking and optimizing your agents on the ERC platform.

Platform Introduction - Explained in 15 Minutes by Rinat

Deep dive into the AI Agent Benchmarking Platform for Enterprise RAG Challenge 3. Learn how to leverage it for your agent development.