Subset Manager¶
WebArena-Verified provides commands to manage task subsets - curated collections of tasks for focused evaluation.
List Available Subsets¶
View all predefined subsets in the repository:
This lists all subset files in assets/dataset/subsets/, including:
webarena-verified-hard- Difficulty-prioritized 258-task subset (see Hard Subset)
Export a Subset¶
Export task definitions from a subset to a standalone JSON file:
# By name
docker run --rm \
-v ./:/output \
ghcr.io/servicenow/webarena-verified:latest \
subset-export \
--name webarena-verified-hard \
--output /output/webarena-verified-hard.json
# By path
docker run --rm \
-v ./:/data \
ghcr.io/servicenow/webarena-verified:latest \
subset-export \
--path /data/assets/dataset/subsets/custom-subset.json \
--output /data/custom-tasks.json
The exported file contains complete task definitions for all tasks in the subset, which you can use for evaluation or analysis.
Create a Custom Subset¶
Create a new subset from a custom task list:
This creates a new subset file in assets/dataset/subsets/ with:
- Task IDs extracted from the source file
- Optional description
- Computed checksum for integrity verification