TaskEvalContext¶
The evaluation context passed to evaluators containing all data needed to validate task completion.
Overview¶
TaskEvalContext is provided to evaluators during task evaluation. It contains the task definition, agent response, network trace, and configuration needed for validation.
Attributes¶
Attributes:
| Name | Type | Description |
|---|---|---|
task |
WebArenaVerifiedTask
|
|
agent_response_raw |
Any | TransformedAgentResponse | None
|
|
network_trace |
NetworkTrace
|
|
config |
WebArenaVerifiedConfig
|
|
Source code in src/webarena_verified/types/eval.py
Field Descriptions¶
| Field | Type | Description |
|---|---|---|
task |
WebArenaVerifiedTask |
The task being evaluated, including expected values and evaluator configurations |
agent_response_raw |
Any |
The raw agent response (string, dict, or parsed JSON). Used by AgentResponseEvaluator |
network_trace |
NetworkTrace |
Captured network events from the agent's execution. Used by NetworkEventEvaluator |
config |
WebArenaVerifiedConfig |
Framework configuration including site URLs and settings |
Usage in Evaluators¶
Different evaluators access different fields from the context:
- AgentResponseEvaluator: Validates the structured response by accessing
agent_response_raw - NetworkEventEvaluator: Validates network traffic by accessing
network_trace
See Also¶
- Evaluation Results - Understanding evaluator output format
- WebArenaVerifiedTask - Task structure and definition