Corral logo

A framework for the science of agents and agents for science

Extensive utilities that facilitate research into agent methodologies and simplify the creation, deployment, and evaluation of scientific agents and environments.

Get Started

How Corral Works

A microservice architecture ensuring flexibility, scalability, and robust isolation.

Environments

The "world" the agent interacts with. Defines the task space, available tools, and provides observable feedback. From chemistry labs to HPC clusters.

Agents

Modular entities for perception and decision-making. Built with LLMs using scaffolds like ReAct, ToolCalling, LLMPlanner, and Reflection.

Tasks

Define problems for agents to solve with scoring functions for evaluation. Chain tasks into TaskGroups for complex multi-stage challenges.

Decoupled Architecture

Corral separates agents from environments via a client-server design with REST API communication.

CorralServer
Hosts and manages environments, provides the interaction interface via CorralRouter.
CorralRunner
Executes agents, orchestrates their lifecycle, feeds observations and relays actions.
Corral architecture diagram showing CorralRunner with Agent communicating via REST API to CorralServer with CorralRouter and Environment

Environments

Pre-built scientific environments spanning chemistry, physics, materials science, and more.

Foundational Principles

Cite this work

If you use Corral in your research, please consider citing:

@article{ríos-garcía2026ai,
  title   = {AI scientists produce results without reasoning scientifically},
  author  = {Martiño Ríos-García and Nawaf Alampara and Chandan Gupta and Indrajeet Mandal and Sajid Mannan and Ali Asghar Aghajani and N. M. Anoop Krishnan and Kevin Maik Jablonka},
  year    = {2026},
  journal = {arXiv preprint arXiv: 2604.18805}
}

Ready to benchmark?

Start evaluating AI agents on scientific tasks in minutes.

Scope:

Verbosity
tools.py
Python

No tasks at this scope.

Tools Used

score.py
Python

No subtasks at this scope.

subtask

Tools Used

score.py
Python
Node Types
Hypothesis
Test
Evidence
Judgment
Update
Commitment

Select a trace from the sidebar

to visualize its epistemological graph

- Agent Type
0
Nodes
0
Tool Calls

Load a trace directory to begin

Select a folder with JSON trace files