Skip to content

TaskEvalContext

The evaluation context passed to evaluators containing all data needed to validate task completion.

Overview

TaskEvalContext is provided to evaluators during task evaluation. It contains the task definition, agent response, network trace, and configuration needed for validation.

Attributes

Attributes:

Name Type Description
task WebArenaVerifiedTask
agent_response_raw Any | TransformedAgentResponse | None
network_trace NetworkTrace
config WebArenaVerifiedConfig
Source code in src/webarena_verified/types/eval.py
class TaskEvalContext(BaseModel):
    task: WebArenaVerifiedTask
    agent_response_raw: Any | TransformedAgentResponse | None = None
    network_trace: NetworkTrace
    config: WebArenaVerifiedConfig

    model_config = ConfigDict(frozen=True, extra="forbid", arbitrary_types_allowed=True)

task instance-attribute

agent_response_raw = None class-attribute instance-attribute

network_trace instance-attribute

config instance-attribute

Field Descriptions

Field Type Description
task WebArenaVerifiedTask The task being evaluated, including expected values and evaluator configurations
agent_response_raw Any The raw agent response (string, dict, or parsed JSON). Used by AgentResponseEvaluator
network_trace NetworkTrace Captured network events from the agent's execution. Used by NetworkEventEvaluator
config WebArenaVerifiedConfig Framework configuration including site URLs and settings

Usage in Evaluators

Different evaluators access different fields from the context:

See Also