Skip to content

WebArena-Verified

TaskEvalContext

ServiceNow/webarena-verified

TaskEvalContext¶

The evaluation context passed to evaluators containing all data needed to validate task completion.

Overview¶

TaskEvalContext is provided to evaluators during task evaluation. It contains the task definition, agent response, network trace, and configuration needed for validation.

Attributes¶

Attributes:

Name	Type	Description
`task`	`WebArenaVerifiedTask`
`agent_response_raw`	`Any \| TransformedAgentResponse \| None`
`network_trace`	`NetworkTrace`
`config`	`WebArenaVerifiedConfig`

Source code in src/webarena_verified/types/eval.py

class TaskEvalContext(BaseModel):
    task: WebArenaVerifiedTask
    agent_response_raw: Any | TransformedAgentResponse | None = None
    network_trace: NetworkTrace
    config: WebArenaVerifiedConfig

    model_config = ConfigDict(frozen=True, extra="forbid", arbitrary_types_allowed=True)

`task` `instance-attribute` ¶

`agent_response_raw = None` `class-attribute` `instance-attribute` ¶

`network_trace` `instance-attribute` ¶

`config` `instance-attribute` ¶

Field Descriptions¶

Field	Type	Description
`task`	`WebArenaVerifiedTask`	The task being evaluated, including expected values and evaluator configurations
`agent_response_raw`	`Any`	The raw agent response (string, dict, or parsed JSON). Used by AgentResponseEvaluator
`network_trace`	`NetworkTrace`	Captured network events from the agent's execution. Used by NetworkEventEvaluator
`config`	`WebArenaVerifiedConfig`	Framework configuration including site URLs and settings

Usage in Evaluators¶

Different evaluators access different fields from the context:

AgentResponseEvaluator: Validates the structured response by accessing agent_response_raw
NetworkEventEvaluator: Validates network traffic by accessing network_trace

See Also¶

Evaluation Results - Understanding evaluator output format
WebArenaVerifiedTask - Task structure and definition