Environments¶
WebArena-Verified uses Docker containers to provide isolated, reproducible environments for each website in the benchmark. To improve reset time and reduce storage requirements, we provide recipes to create slim images - optimized versions of the original environments while keeping the exact content and functionality.
Size Improvements¶
Slim images are significantly smaller than their original counterparts:
| Environment | Original Size | Slim Size | Reduction |
|---|---|---|---|
| Shopping Admin | 19.9 GB | 4.98 GB | ~70% smaller |
| Shopping | 117 GB | 17.8 GB | ~85% smaller |
| 107 GB | 19 GB | ~82% smaller | |
| GitLab | 155 GB | 34 GB | ~78% smaller |
Benefits:
- Smaller storage and memory footprint
- HTTP header-based authentication bypassing UI login
- All functionality preserved
Shopping Admin¶
1. Create the slim image¶
2. Run the container with auto-initialization¶
docker run -d --name admin-slim -p 7780:80 \
-e MAGENTO_BASE_URL=http://localhost:7780 \
shopping_admin_final_0719:slim
Access the admin (auto-login via header in Playwright)¶
from playwright.async_api import async_playwright
async with async_playwright() as p:
browser = await p.chromium.launch()
context = await browser.new_context()
# Set auto-login header
await context.set_extra_http_headers({
"X-M2-Admin-Auto-Login": "admin:admin1234"
})
page = await context.new_page()
await page.goto("http://localhost:7780/admin")
# You're now logged in as admin
Manual Initialization¶
If you want to manually initialize the environment (e.g., after stopping the container), run:
# Start container
docker run -d --name admin-slim -p 7780:80 shopping_admin_final_0719:slim
# Initialize Magento
docker exec admin-slim magento-init http://localhost:7780
Troubleshooting¶
Auto-login not working:
-
Test with curl:
-
Check module is enabled:
Should show "Module is enabled" -
Check DI is compiled:
Should contain compiled classes -
Check logs:
Look for "Auto-login successful" or error messages -
Recompile if needed:
Container not initializing:
- Check container logs: docker logs admin-slim
- Verify MAGENTO_BASE_URL is set correctly
- Wait ~30 seconds for initialization to complete
Database reset not working:
- Ensure archive exists: docker exec admin-slim ls -lh /var/backups/mysql/data.tar.gz
- Check services status: docker exec admin-slim supervisorctl status
Reddit¶
1. Create the slim image¶
Note: The script reuses existing data if available: - PostgreSQL archive (~1.6GB) - Optimized submission images (~6GB) - This saves ~30-60 minutes on subsequent runs
2. Run the container¶
The container is self-contained (no volume mounts needed) and auto-initializes on first start (~2-3 minutes).
Manual Initialization¶
Not typically needed (container auto-initializes), but can be run manually if needed:
# Start container
docker run -d --name reddit-slim -p 9999:80 postmill-populated-exposed-withimg:slim
# Check initialization status
docker exec reddit-slim postmill-init
Troubleshooting¶
Container not initializing:
- Check container logs: docker logs reddit-slim
- Wait ~2-3 minutes for initial data extraction
- Check initialization status: docker exec reddit-slim cat /run/postmill.env
Database reset not working: - Ensure archives exist:
docker exec reddit-slim ls -lh /var/backups/pgsql/data.tar.gz
docker exec reddit-slim ls -lh /var/backups/images/submission_images.tar.gz
docker exec reddit-slim supervisorctl status
- Verify database integrity: docker exec reddit-slim postmill-init (shows validation output)
Patches not applied: 1. Check rate limits removed:
Should return0
- Check HTTP client configured: Should show the configuration line