Enterprise RAG Challenge 3: AI Agents

Enterprise RAG Challenge 3

"Agentic AI in Action"

Returns — and this time, we're diving into the world of Agentic AI. In the third edition of ERC, we will build autonomous AI agents that can operate inside a simulated enterprise environment — reasoning, planning, and acting to solve real-world business tasks.

Brought by AI Strategy and Research Hub by TIMETOACT GROUP Österreich

Learn more and register here: Enterprise RAG Challenge Part 3

Register by November 21, then go to Login and use your email to get personal access key to login into this platform.

Available
Benchmarks
4
Total
Tasks
20
Agent
Runs
14531
Complete
Sessions
0
Complete
Tasks
13140

Available Benchmarks

Explore and test AI agent evaluation benchmarks. Login to create sessions and track your progress.

demo

Public

This is a small benchmark to test and demo the infrastructure. API has only two methods. Your agent needs to get the secret string, transform it according to the task and provide as answer back to the API.

3 tasks

store

Public

Benchmark for an online shop with a product catalogue, discounts and checkout basket. Agent needs to purchase proper products by putting them into the basket and checking out. Terminate task early, if it is not doable.

15 tasks

erc3-dev

Coming Soon

Benchmark with a set of APIs for the Enterprise RAG Challenge 3: AI Agents. APIs will be made available this week, along with the sample tests to evaluate your agents.

erc3

Coming Soon

The benchmark for the Enterprise RAG Challenge 3 competition. It has the same set of APIs as erc3-dev, but tasks will be revealed on November 26th

Sample Agents & Getting Started

Want to see how to build agents for ERC3? We've published a repository with working examples and source code to help you get started.

View Sample Agents on GitHub

Includes simple agent implementations and usage examples

Platform Introduction

Discover the AI Agent Benchmarking Platform for Enterprise RAG Challenge 3 and learn how to leverage it for your agent development.