Enterprise RAG Challenge 3: AI Agents
v1.1.5

Version 1.1.5

New
December 4, 2025

Improvements

Leaderboards now fold repeat submissions with the same score. This will allow to surface more submissions.

Version 1.1.4

December 3, 2025

Improvements

New version of Python SDK - 1.1.4. You can start anonymous tasks without spinning up a session:

import textwrap
from erc3 import ERC3

core = ERC3()

task = core.start_new_task("erc3-test", "name_a_project")

# do something with the task
client = core.get_erc_client(task)
print(client.who_am_i().wiki_sha1)

done = core.complete_task(task)
if done.eval:
    logs = textwrap.indent(done.eval.logs, "  ")
    print(f"Score: {done.eval.score}\n{logs}")

Version 1.1.3

December 2, 2025

Improvements

  • Limit event history to 100 per simulation (99% of AI Agents finish with less than 15 events)
  • Report LLM telemetry without any token usage as broken, explain fields in error message
  • Performance improvements to speed up simulations

Version 1.1.2

December 1, 2025

Improvements

  • Cache event histories with LRU cache in simulation state machine to reduce CPU workload.

Bug Fixes

  • Fixed /projects/search API to return proper next offset in ERC3-DEV/TEST benchmarks. Thanks to @nick for the bug report!

Version 1.1.1

December 1, 2025

Improvements

  • API delay now can be configured per benchmark. Enabled it only for ERC3-TEST. ERC3-DEV, DEMO and STORE have no delay. Thanks to @AigizK for the feedback.

Bug Fixes

  • Fixed ERC3-TEST evals: add_time_entry_lead_v2 and add_time_entry_lead_v3 (no longer expect events), nordic_followup_customers (more precise task requirement) Thanks to @xsirni for reporting these

Version 1.1.0

December 1, 2025

Features

  • Added ERC3-TEST benchmark with more complex cases
  • Added release notes capability
  • Version button in header that highlights for 3 days after release
  • Dedicated release notes page with markdown rendering

Improvements

  • Enhanced navigation and user experience
  • Better visibility of platform updates
  • Statistics now show total number of teams registered
  • API calls of simulated systems now feature 300ms delay (which is more realistic)
  • Benchmarks with available sample agents now link to these samples

Bug Fixes

  • Fixed a weird case of copying seed slice. It resulted in broken_system test sometimes returning answers. Thanks to @mr_pro for reporting it.