Technical descriptions of Corral's components and APIs.#

REST API Endpoints#

Task Management#

GET /tasks Returns: list[str] - Available task IDs

GET /tasks/{task_id}/prompt Returns: {"prompt": str} - Task prompt without tools

GET /tasks/{task_id}/guide?verbosity={level} Parameters: verbosity (str, optional) Returns: {"prompt": str} - Complete task guide

GET /tasks/{task_id}/tools?verbosity={level} Parameters: verbosity (str, optional) Returns: {"tools": list[dict]} - Structured tool definitions

GET /tasks/{task_id}/tools/guide?verbosity={level} Parameters: verbosity (str, optional) Returns: {"prompt": str} - Tools guide only

Task Execution#

POST /tasks/{task_id}/tools/execute Body: {"tool_name": str, "arguments": dict} Returns: {"result": ToolCall} - Tool execution result

POST /tasks/{task_id}/submit Body: {"answer": str} Returns: TrialCompletionResponse - Score and trial data

POST /tasks/{task_id}/surrender Returns: TrialCompletionResponse - Surrender confirmation

POST /tasks/{task_id}/configure Returns: {"status": str} - Configuration status

State Queries#

GET /tasks/{task_id}/status Returns: Current task state dictionary

GET /tasks/{task_id}/trials/{trial_id} Returns: {"trial_state": dict} - Specific trial state

GET /tasks/{task_id}/last_score Returns: {"score": float, "trial_id": str} - Most recent score

GET /dependency_chain Returns: {"dependency_chain": bool} - Chain support flag


Class: CorralRouter#

Interface for agent-environment communication.

Constructor:

interface = CorralRouter(
    base_url="http://localhost:8000",
    default_verbosity="brief",
)

Parameters:

  • base_url (str): URL of the Corral server

  • default_verbosity (str | None): Default tool verbosity level

Methods:

get_available_tasks() -> list[str]#

Returns list of available task IDs.

Example:

interface = CorralRouter(base_url="http://localhost:8000")
tasks = interface.get_available_tasks()
# Returns: ["task_1", "task_2", "task_3"]

get_task_prompt(task_id: str) -> str | list[dict]#

Returns task prompt without tool descriptions.

Parameters: - task_id (str): Task identifier

Returns: Task description string or list of message dicts

get_task_guide(task_id: str, verbosity: str | None = None) -> str#

Returns complete task guide including tools.

Parameters: - task_id (str): Task identifier

  • verbosity (str | None): Tool description verbosity level

Returns: Complete task guide with tool descriptions

get_tools_guide(task_id: str, verbosity: str | None = None) -> str#

Returns tools guide only.

Parameters: - task_id (str): Task identifier

  • verbosity (str | None): Tool description verbosity level

Returns: Tool descriptions

get_available_tools_for_task(task_id: str, verbosity: str | None = None) -> dict#

Returns structured tool information.

Parameters: - task_id (str): Task identifier

  • verbosity (str | None): Tool description verbosity level

Returns: Dictionary with tool definitions

execute_tool(task_id: str, tool_name: str, arguments: dict) -> ToolResponse#

Executes a tool in the environment.

Parameters: - task_id (str): Task identifier

  • tool_name (str): Name of tool to execute

  • arguments (dict): Tool arguments

Returns: ToolResponse object with success, result, and error fields

Example:

response = interface.execute_tool(
    "task_1", "calculator", {"operation": "+", "a": 10, "b": 5}
)
print(response.success)  # True
print(response.result)  # 15

submit_answer(task_id: str, answer: str) -> TaskTrialResult#

Submits final answer for scoring.

Parameters: - task_id (str): Task identifier

  • answer (str): Final answer

Returns: TaskTrialResult with score and state

surrender_task(task_id: str) -> TaskTrialResult#

Surrenders the current task without submitting answer.

Parameters: - task_id (str): Task identifier

Returns: TaskTrialResult with surrendered=True and score=0.0

get_task_status(task_id: str) -> dict#

Returns current task state.

Parameters: - task_id (str): Task identifier

Returns: Dictionary with current state including score, trial_id, etc.

supports_dependency_chain() -> bool#

Checks if environment supports task chaining.

Returns: Boolean indicating chain support


Environment Class#

Abstract Class: Environment#

Base class for all Corral environments.

Constructor:

Environment(
    task_id,
    base_work_dir="tmp",
    fs_manager=None,
)

Parameters: - task_id (str): Unique task identifier

  • base_work_dir (str): Base directory for trial workspaces

  • fs_manager (FSManager | None): Optional file system manager

Attributes: - task_id (str): Task identifier

  • tools (dict[str, Tool]): Available tools

  • state (TaskState): Current trial state

  • trial_counter (int): Trial number counter

  • current_work_dir (str): Current trial workspace

Abstract Methods (must implement):

get_task_prompt() -> str | list[dict]#

Returns task description for agent.

score() -> float#

Evaluates agent's solution and returns score (0.0 to 1.0).

Concrete Methods:

add_tool(tool: Tool) -> None#

Registers a tool with the environment.

Parameters: - tool (Tool): Tool to add

call_tool(tool_name: str, arguments: dict) -> ToolCall#

Executes a tool and records the call.

Parameters: - tool_name (str): Tool name

  • arguments (dict): Tool arguments

Returns: ToolCall object with result and status

submit_answer(answer: str) -> float#

Submits answer and computes score.

Parameters: - answer (str): Final answer

Returns: Score value

surrender() -> float#

Marks task as surrendered.

Returns: Current score (usually 0.0)

reset_state() -> str#

Resets environment for new trial.

Returns: New trial ID

get_current_work_dir() -> str#

Returns current trial workspace path.

configure_additional_apps() -> str#

Optional: Configure external services.

Returns: Configuration status message


BaseAgent Class#

Abstract Class: BaseAgent#

Base class for all Corral agents.

Constructor:

BaseAgent(
    model="openai/gpt-4o",
    max_iterations=10,
    api_endpoint=None,
    system_prompt=None,
    user_prompt=None,
    extractor_prompt=None,
    surrender_prompt=None,
    temperature=0.7,
    hooks=None,
    **kwargs,
)

Parameters: - model (str): LLM model identifier

  • max_iterations (int): Maximum reasoning iterations

  • api_endpoint (str | None): Optional API endpoint

  • system_prompt (str | None): System prompt template

  • user_prompt (str | None): User prompt template

  • extractor_prompt (str | None): Answer extraction prompt

  • surrender_prompt (str | None): Surrender instructions

  • temperature (float): LLM sampling temperature

  • hooks (AgentHooks | None): Lifecycle hooks

  • **kwargs: Additional LLM parameters

Attributes: - model (str): Model name

  • messages (list): Conversation history

  • token_usage (dict): Token usage statistics

  • hooks (AgentHooks): Registered hooks

Abstract Methods:

run(interface, task_id, history=None, task_prompt=None, examples=None, **kwargs) -> str#

Main agent reasoning loop. Must be implemented by subclasses.

Parameters: - interface (CorralRouter): Router interface - task_id (str): Task to solve - history (list | None): Conversation history - task_prompt (str | None): Override task prompt - examples (list | None): Few-shot examples - **kwargs: Additional parameters

Returns: Final answer string

Concrete Methods:

get_llm_response(tools: list[dict] | None = None) -> Any#

Calls LLM with current messages.

Parameters: - tools (list[dict] | None): Optional function definitions

Returns: LLM response object

get_total_token_usage() -> dict[str, int]#

Returns cumulative token usage.

Returns: Dictionary with prompt_tokens, completion_tokens, total_tokens

reset_token_usage() -> None#

Resets token counters.


Built-in Agent Types#

ReActAgent#

Implements Reasoning and Acting framework.

Additional Parameters: - All BaseAgent parameters

Response Format:

<thought>reasoning here</thought>
<action>tool_name</action>
<action_input>{"arg": "value"}</action_input>

ToolCallingAgent#

Uses native LLM function calling.

Additional Parameters: - All BaseAgent parameters

Note: Automatically converts Corral tools to OpenAI function format.

LLMPlanner#

Hierarchical planning agent.

Additional Parameters: - planner_model (str): Model for high-level planning

  • executor_model (str): Model for low-level execution

ReflectionAgent#

Self-reflective agent.

Additional Parameters: - max_reflections (int): Maximum reflection cycles


ToolVerbosity Enum#

Tool description verbosity levels.

class ToolVerbosity(Enum):
    MINIMAL = "minimal"  # Name + basic description
    BRIEF = "brief"  # + [BRIEF] sections
    DETAILED = "detailed"  # + [DETAILED] sections
    PROCEDURAL = "procedural"  # + [PROCEDURAL] sections
    CONTEXTUAL = "contextual"  # + [CONTEXTUAL] sections
    WORKFLOW = "workflow"  # + [WORKFLOW_INTEGRATION]
    SYNTACTICAL = "syntactical"  # + [SYNTACTICAL] sections
    COMPREHENSIVE = "comprehensive"  # + [RAISES], [LIMITATIONS], [EXAMPLES]
    FULL = "full"  # Complete docstring

HookPoint Enum#

Agent lifecycle hook points.

class HookPoint(Enum):
    BEFORE_TASK = "before_task"  # Before task starts
    AFTER_TASK = "after_task"  # After task completes
    BEFORE_ITERATION = "before_iteration"  # Before each LLM call
    AFTER_ITERATION = "after_iteration"  # After tools execute

TaskDefinition Dataclass#

Defines a task within a TaskGroup.

Fields: - name (str): Task name

  • description (str): Task description

  • tools (list[str]): Required tool names

  • scoring_fn (Callable): Scoring function

  • submission_format (dict[str, str]): Expected answer format

  • scoring_inputs (dict): Additional scoring parameters

  • input_from_tasks (list[str]): Task dependencies

  • initial_input (dict): Initial inputs

Methods: - has_dependencies() -> bool: Returns True if task has dependencies


TaskGroup Dataclass#

Container for related tasks.

Fields: - group_id (str): Group identifier

  • tasks (dict[str, TaskDefinition]): Task definitions

  • results (dict[str, Any]): Stored results

  • scores (dict[str, float]): Stored scores

  • chained_tasks (bool): Whether tasks are chained

Methods: - get_task_input(task_id: str) -> dict: Get inputs for task

  • store_result(task_id: str, result: dict, score: float) -> None: Store result

  • get_ordered_tasks() -> list[str]: Get tasks in dependency order

  • check_dependencies_satisfied(task_id: str) -> bool: Check if dependencies met


BenchmarkResult Dataclass#

Results from benchmark execution.

Fields: - task_results (dict[str, TaskTrialResults]): Results per task

  • k (list[int]): k values for pass@k metrics

  • total_duration (float | None): Total benchmark duration

  • verbosity (str | None): Tool verbosity used

Properties: - all_task_ids (list[str]): All task IDs

  • total_tasks (int): Number of tasks

  • all_results (list[TaskTrialResult]): Flat list of all trials

Methods: - average_score() -> float: Mean score across all trials

  • task_average_score(task_id: str) -> float: Mean score for task

  • pass_at_k(k: int) -> float: Pass@k metric

  • best_pass_k() -> tuple[int, float]: Best pass@k value

  • overall_average_duration() -> float | None: Mean trial duration

  • total_token_usage() -> dict[str, int]: Total tokens used

  • total_tool_execution_duration() -> float: Total time in tools


TaskTrialResult Dataclass#

Result of a single trial.

Fields: - task_id (str): Task identifier

  • trial_id (str): Trial identifier

  • score (float): Trial score

  • state (dict): Final state

  • tool_statistics (dict): Tool usage stats

  • duration (float | None): Trial duration in seconds

  • token_usage (dict | None): Token usage

  • error_message (str | None): Error if failed

  • surrendered (bool): Whether surrendered

Properties: - success (bool): True if score > 0, no error, not surrendered